In my last post I discussed how semiotics and observing how discourse communities interact had influenced the design of our machine learning algorithms. I also emphasized the importance of discovering jargon words as part of our process of understanding user commands and intents.
In this post, we describe in more depth how this “theory” behind our algorithms actually works. We also discussed what constitute good jargon words. “Computer” is a poor example of a jargon word because it is too broad in meaning, whereas a term relating to a computer chip, e.g. “Threadripper” (a gaming processor from AMD) would be a better example as it is more specific in meaning and is used in fewer contexts.
Jargon terms and Entropy
So – how do we identify good jargon terms and what do we do with them in order to understand user commands?
To do this we use entropy. In general entropy is a measure of chaos or disorder and, in an information theory context, it can be used to determine how much information is conveyed by a term. Because jargon words have a very narrow and specific meaning within specific discourse communities, they have lower entropy (more information value) than broader more general terms.
To determine entropy we take each term in our synthetic documents (see this post for more information of how we create this data set) and build a probability profile of co-occurring terms. The diagram below shows an example (partial) probability distribution for the term ‘computer’.
Figure 1: Entropy – probability distributions for jargon terms
These co-occurring terms can be thought of as the context for each potential jargon word. We then use this probability profile to determine the entropy of the word. If that entropy is low then we consider it to be a candidate jargon word.
Having identified the low entropy jargon words in our synthetic command documents, we then use their probability distributions as attractors for these documents themselves. In this way (as seen in the diagram below) we create a set of document clusters where each cluster relates semantically to a jargon term. (Note: in the interest of clarity, clusters are described using high level topic as opposed to the jargon words themselves in the figure below).
Figure 2: Using jargon words as attractors to form clusters
We then build a graph within each cluster that connects documents based on how similar they are in terms of meaning. We identify ‘neighborhoods’ within these graphs that relate to areas of intense similarity. For example a cluster may be about “cardiovascular fitness” whereas a neighborhood may be more specifically about “High Intensity Training”, or “rowing” or “cycling”, etc.
Figure 3: Neighborhoods for the cluster “cardiovascular fitness”
These neighborhoods can be thought of as sub-topics within the overall cluster topic. Within each sub-topic we can then extract important meaning-based phrases that precisely describe what that neighborhood is about. e.g. “HIIT”, “anaerobic high-intensity period”, “cardio session”, etc.
Figure 4: Meaning based phrases for the “high intensity training” sub-topic
In this way we create meaning-based structure from completely unstructured content. Documents from the same cluster relate to the same discourse community. Documents from the same cluster that share similar important terms or phrases can be regarded as relating to the same sub-topic. If two clusters share a large number of important phrases then this represents a dialog between two discourse communities. If multiple important phrases are shared among many clusters, then this represents a dialogue among multiple communities.
So having described a little bit about the algorithms themselves, how do they help us understand the correct meaning behind a user’s command? Given this contextual partitioning of the data into discourses based on jargon terms, we can disambiguate among the many different meanings a term can have. For example, if the user were to say ‘open the window’ – we will be able to understand that there is a meaning (discourse) relating to both buildings and to software but if the user were to say ‘minimize the window’, we would understand that this could only have a software meaning and context. Fully understanding the nuances behind a user’s command is, of course, much more complicated than what I have just described, but the goal here is to give a high level overview of the approach.
In subsequent posts, we will discuss how we extract parameters from commands, accurately determine which app action to execute, and how we pass the correct parameters to that action.
David Patterson and Vladimir Dobrynin