Category

Natural Language

Voice for Databases

The rise of Voice, the fall of SQL

By Databases, Knowledge, Natural Language, Schema, Voice Search No Comments

Imagine if the only language you needed to talk to your database was English? No SQL. No NoSQL. No tables. Just the information you care about at the tip of your tongue. As natural as asking someone else…

At Aiqudo, we’re building a knowledge retrieval system to do exactly that, all while being personalized to your own domain or industry. (e.g., healthcare, finance, etc…). Take a nurse trying to look up a patient’s medications. Instead of having to manually look through a database, all it takes is a voice command, “Show me medication for Milo”, and they’d receive the appropriate information as shown in Figure 1 that could be displayed as well as spoken back, hands-free, even as the nurse is on-the-go during her busy day. 

 

Structured Healthcare Query

Figure 1: Structured Healthcare Query

Voice is IN, SQL is OUT!

See for yourself …

 

 

Tradeoffs: Full power vs Privacy?

The idea of having a natural language interface for a database may not be new, but it’s far from being a fully solved problem. Some solutions are built on the assumption they can directly interact with the database, and therefore, have full access to your data. The reality is that many companies don’t trust third parties with their most sensitive information. We believe organizations shouldn’t have to make these compromises, which is why designing a privacy-conscious solution was one of our top priorities.

With that in mind, tradeoffs are bound to occur based on whether a partner is willing to share the data, or wants to keep some of the data private. Our approach provides solutions for both options. Not having access to the data can limit the complexity of commands that can be asked with reasonable accuracy. An example of a complex command is  “Show me who the insurers are for Tomas Sauer’s patients,” which requires multiple logical jumps: from Dr. Tomas Sauer, to his patients, then to their insurers. This is a multi-hop query that allows you to navigate indirect connections within your data. On the other hand, not having access to your data can still produce a highly performant product. With the use of a custom schema of your database, we are still able to produce a system that works smoothly with basic commands such as aggregation commands and entity-attribute lookup.

Using Schema to understand structured queries

The core of the system is a semantic parser personalized to your database using a schema-based approach. The job of a semantic parser is to transform natural language commands into a machine-interpretable representation, which in our case is a fully formed database query. Unfortunately, databases aren’t very friendly when you don’t speak their language. By language, I’m talking about the structure or schema of your database. What types of entities exist? What are the properties (or attributes) that relate to these entities? What are synonyms for these entities and attributes? Let’s say we have an entity called “patient”. We want to know if it has data attributes like “medications”, “doctors”, and “birthdate” associated with it. Finally, it is very useful to support synonyms you want linked to the attributes,  like “age” to “birthdate”. This is the type of information that’s required to interact with your database natural voice commands, and without ever having to modify it. We don’t really care about any specific patient named Milo, or how he’s prescribed 325mg of Acetaminophen. All we care about is the schema, i.e., the knowledge that the patients exist in your database and that medications are a property of a patient.

Schema Entities Attributes and Synonyms

Figure 2: Example Schema – Entities, Attributes and Synonyms

Processing a command: Voice to Action

Structured Query Flow

Figure 3: Knowledge Retrieval System – Structured Query Flow

So where does the schema fit into our knowledge retrieval system? Going back to the semantic parser mentioned earlier, we can break the processing phase down to four parts:

  1. Domain Classification: The first part is to classify the domain of a user query. If you’re working with multiple schemas, this will help narrow down the domain for a given query (ex. healthcare, banking, etc.)
  2. Keyword and Policy Extraction: Keyword extraction is where we identify potential references to an entity or attribute in that text you input. Policy Extraction refers to the creation of entity-attribute mappings that define the order in which these terms are processed. For example, with “What’s the phone number of Milo’s doctor?”, we’re working with two different attributes “phone number” and “doctor”. Are we looking for Milo’s phone number or Milo’s doctor’s phone number? Getting this order right is critical. The name “policy” comes from the state-action policies in reinforcement learning which share similarities.
  3. Schema Mapping: At this point we have some idea of what knowledge/information you want returned, but most likely, a database won’t have any idea how to process that information. If you ask for a “doctor” when a database labels it “provider“, then we’re at a dead end. Lucky for us, we have that schema you gave us earlier and this won’t be an issue. We’ll know that “doctor” can be mapped to “provider” based on the similarity in meaning. The synonyms you provide help with this similarity computation, but similarity is not exclusive to the synonyms provided. Think of them as helpful hints to us that help clarify what a given entity or property means, optional but extremely useful. Depending on the domain we’re working with, synonyms may not need to be provided manually, and could be automatically generated.
  4. Structured Query Translation: Finally, we’ve accumulated all the information we need, and we move to the last phase, translating the command into a form your database understands, whether its SQL, Cypher, or some other structured database language. If you prefer to keep your data private, this structured query is what you’d be provided to execute on your own database (using your own credentials, thus maintaining data privacy). In the event you share your data, you’ll get full access to our Actionable Knowledge answers service that includes the target data as well as downstream actions that connect your data to your personalized apps and services. For more information on Actionable Knowledge, check out this blog post

Getting the information you need from your database doesn’t need to be a difficult task if you or your coworker have no idea how to write structured queries. Let us help you make their lives easier by making your data accessible in a language everyone knows.

Kenny Kang, Sunil Patil and Mark Maagdenberg

Voice for the Connected Worker

Natural Voice Recognition for Safety and Productivity in Industrial IoT

By Asset Management, Natural Language, Uncategorized, User Interface, Voice No Comments
Voice for AssetCare

Voice for AssetCare

Jim Christian

Jim Christian, Chief Technology and Product Officer, mCloud

It is estimated that there are 20 million field technicians operating worldwide. A sizable percentage of those technicians can’t always get to information they need to do their jobs. 

Why is that? After all, we train technicians, provide modern mobile devices and specialized apps, and send them to the field with stacks of information and modern communications. Smartphone sales in the United States grew from $3.8 billion in 2005 to nearly $80 billion in 2020. So why isn’t that enough?

One problem is that tools that work fine in offices don’t necessarily work for field workers. A 25-year old mobile app designer forgets that a 55-year old field worker cannot read small text on a small screen or see with the glare of natural light. In industrial and outdoor settings field workers frequently wear gloves and other protective gear. Consider a technician who needs to enter data on a mobile device while outside in freezing weather. This worker could easily choose to wait to enter data until he’s back in his truck and can take off his gloves, and as a result, not entering the data exactly right. Or a technician may need to wear gloves and may find it difficult to type on a mobile device.

A voice-based interface can be a great help in these situations. Wearable devices that respond to voice are becoming more common. For instance, RealWear makes a headset that is designed to be worn with a hardhat, and one model is intrinsically safe and can be used in hazardous areas. But voice interfaces have not become popular in industrial settings. Why is that?

We could look to the OODA loop–short for Observe, Orient, Decide, and Act–for insights. The OODA concept was developed by the U.S. Air Force as a mental model for fighter pilots. Fighter pilots need to act quickly. Understanding the OODA loop that applies in a particular situation is helpful to improve, to act more quickly and decisively. Field technicians don’t have life-and-death situations to evaluate, but the OODA loop still applies. The speed and accuracy of their work depends on their OODA loop for the task at hand.

Consider two technicians who observe an unexpected situation, perhaps a failed asset. John orients himself by taking off his gloves to call his office, then searches for drawings in his company’s document management, and then calls his office again to confirm his diagnosis. Meanwhile, Jane orients herself by doing the same search, but talking instead of typing, keeping her eyes on the asset all the time. Assuming that the voice system is robust, Jane is able to use her eyes and her voice at the same time, accelerating her Observe and Orient phases. Jane will do a faster, better job. A system where the Observe and Orient phases are difficult–John’s experience–can be inferior and will be rejected by users, whereas Jane’s experience with a short, easy OODA loop will be acceptable.

A downside of speaking to a device is that traditional voice recognition systems can be painfully slow and limited. These systems recognize the same commands that a user would type or click with a mouse, but most people type and click much faster than they talk. Consider the sequence of actions required to take a picture and send it to someone on a smartphone using your fingers: open photo app, take a picture, close the photo app, open the photo gallery app, select the picture, select the open photo to share that picture, select a recipient, type a note, and hit send. That could be nine or ten distinct operations. Many people can do this rapidly with their fingers, even if it is a lot of steps. Executing that same sequence with an old-style, separate voice command for each step would be slow and painful and most people would find it worse than useless. 

The solution is natural voice recognition, where the voice system recognizes what the speaker intends and understands what “call Dave” means. Humans naturally understand that a phrase such as “call Dave” is shorthand for a long sequence (“pick up the phone”, “open the contact list”, “search for ‘Dave'”, etc.).  Natural voice recognition has come a long way in recent years and systems like Siri and Alexa have become familiar for personal use. Field workers often have their own shorthand for their industry, like “drop the transmission” or “flush the drum”, which their peers understand but Siri or Alexa don’t.

At mCloud, we see great potential in applying natural voice recognition to field work in industries such as oil & gas. Consider a field operator who is given a wearable device with a camera and voice control, and who is able to say things like, “take a picture and send it to John” or “take a picture, add a note ‘new corrosion under insulation at the north pipe rack’ and send to Jane” or “give me a piping diagram of the north pipe rack.”  This worker will have no trouble accessing useful information, and in using that information to orient himself to make good decisions. An informed field operator will get work done faster, with less trouble, and greater accuracy.

The U.S. Chemical Safety Board analyzes major safety incidents at oil & gas and chemical facilities. A fair number of incidents have a contributing factor of field workers not knowing something or not having the right information. For instance, an isobutane release at a Louisiana refinery in 2016 occurred in part when field operators used the wrong procedure to remove the gearbox on a plug valve. There was a standard procedure but about 3% of the plug valves in the refinery were an older design that required different steps to remove the gearbox. This is an example where the field workers were wearing protective gear and followed the procedure that was correct for a different type of valve, wrong for the valve in front of them. Field workers like this generally have written procedures, but occasionally the work planner misses something or reality in the field is different than what was expected. This  means that field workers need to adapt, perhaps by calling for help or looking up information such as alternate procedures.

Examples where natural voice recognition can help include finding information, calling other people for advice, recording measurements and observations, inspecting assets, stepping through repair procedures, describing the state of an asset along with recommendations and questions, writing a report about the work done, and working with other people to accomplish tasks. Some of these examples are ad hoc tasks, like taking a picture or deciding to call someone. Other examples are part of larger, structured jobs. An isolation procedure in a chemical plant or replacing a transmission are examples of complex procedures with multiple steps that can require specialized information or where unexpected results from one step may require the field worker to get re-oriented, find new information, or get help.

Aiqudo has powerful tech for natural voice recognition and mCloud is pleased to be working with Aiqudo to apply this technology. Working together, we can help field workers get what they need by simply asking for it, talk to the right people by simply asking for help, confirm their status in a natural way, and in general get the right job done, effectively and without mistakes.


This post is authored by Jim Christian, Chief Technology and Product Officer, mCloud.

Aiqudo and mCloud recently announced a strategic partnership that brings natural Voice technology into emerging devices, such as smart glasses, to support AR/VR and connected worker applications, and also into new domains such as industrial IoT, remote support and healthcare.

AI Neural Networks

Enhancing Aiqudo’s Voice AI Natural Language Understanding with Deep Learning

By Artificial Intelligence, Deep Learning, Machine Learning, Natural Language, Neural Networks No Comments

Aiqudo provides the most extensive library of voice-triggered actions for mobile apps and other IOT devices. At this difficult time of Covid-19, voice is becoming mandatory as more organizations are seeing the need for contactless interactions. To further improve the performance of Aiqudo voice, we enhanced our unique Intent Matching using Semiotics with Deep Learning (DL) for custom Named Entity Recognition (NER) and Part of Speech (POS) Tagging. 

The task in question was to recognize the relevant Named Entities from user’s commands. This specific task is known as Named Entity Recognition (NER) in the Natural Language Processing (NLP) community. For example, ‘play Adele on Youtube’ involves two named entities, ‘Adele’ and ‘Youtube’. Extracting both entities correctly is critical for understanding the user’s intent, retrieving the right app and executing the correct action. Publicly available NER tools, such as NLTK, Spacy and Stanford NLP proved unsuitable for our purposes for the following reasons:

  1. they often made mistakes especially when processing short sentences typically seen in user commands
  2. they make mistakes such as labelling ‘Youtube’ as an ‘Organization’ and ‘Adele’ as ‘Person’, as opposed to the entity types we need within this command context  – which is ‘App’ and ‘Artist’.
  3. these tools don’t provide us with the granularity we need. As we support a very broad set of verticals or domains, our granularity needs for parameter types is very high – we need to identify almost 70 different parameter types in total (and this continues to grow). It’s not enough for us to identify a parameter as an “Organization”; we need to know if it is a “Restaurant”, “Business” or a “Stock ticker”

Part of Speech (POS) tagging is another essential aspect for both NER detection and action retrieval, but, again, public POS taggers such as NLTK, Spacy and Stanford NLP don’t work well for short commands. The situation gets worse for verbs such as ‘show’, ‘book’, ‘email’, ‘text’, which are normally regarded as nouns by most existing POS taggers. We, therefore, needed to develop our own custom NER module that also facilitates and produces more accurate POS information.

Fortunately, we already had a database of 13K+ commands relating to actions already in our platform and this provided the training data to build an integrated DL model. Example commands (with parameters extracted) in our database included ‘play $musicQuery on $mobileApp’ and, Show my $shoppingList, Navigate from $fromLocation to $toLoaction, etc. (Our named entity types start with ‘$’) For each entity, we created a number of realistic values, such as ‘grocery list’ and ‘DIY list’ for ‘$shoppingList’, and ‘New York” and ‘Atlanta’ for ‘$fromLocation’. We created around 3.7 million instantiated queries, e.g., ‘play Adele on Youtube’,Show my DIY list, and Navigate from New York to Atlanta’. We then used existing POS tools to label all words, chose the most popularly labelled POS pattern for each template, and finally labelled each relevant query accordingly. 

To make the data understandable to a neural network, we then needed to represent each word or token digitally, i.e. as vectors of certain dimensions. This is called Word Embedding. We tried several embedding methods, including Transformer tokenizer, Elmo, Google 300d, GloVe, and random embeddings of different dimensions. A pre-trained transformer produced the best results but required the most expensive computing resources such as a GPU. Elmo produced the 2nd best results but also needed a GPU for efficient computing time. Random embeddings of 64 dimensions work well on a CPU and can produce good results comparable to Elmo, while also being less expensive. Such tradeoffs are critical when you go from a theoretical AI approach to rolling AI techniques into production at scale. 

Our research and experiments were based on the state-of-the-art DL NER architecture of a residual Bidirectional LSTM. We integrated two relevant tasks: POS tagging and multi-label multi-class classification for potential entity types. Therefore, our present solution is a multi-inputs multi-outputs DL model. The neural architecture and data flow are illustrated in Fig. 1. The input module takes users’ speech and transforms it into text; the embedding layer represents the text in a sequence of vectors; the two bidirectional layers capture important recurrent patterns in the sequence; the  residual connection restores some lost features; these patterns and features are then used for labelling named entities and creating POS tags; or are flattened to make global classification for entity (parameter) types.

Deep Learning Architecture

Fig. 1 Neural architecture for Aiqudo Multitask Flow

One real life scenario would be as follows: A user wants to greet his friend Rodrigo on Whatsapp. He issues the following command verbally to his phone ‘Whatsapp text Rodrigo good morning’ (not a well-formed command, but this is common in practice). Each word in his speech is then mapped to a token integer, by which a 64 dimensional vector is indexed; the digital representation of all vectors goes through the neural network of two bidirectional LSTM layers and one residual connection layer; the network outputs parameter and value pairs and POS tags in time series; and the network is flattened on another branch and outputs parameter types. Our platform now has all the information needed to pass on to the next Natural Language Understanding (NLU) component in our system (see Figure 2), to fully understand the user’s intent and execute the correct action for them.

Online Intent Pipeline

Fig. 2 Aiqudo Online Intent Pipeline

Before we could go live in production, we needed to test the performance of the pipeline thoroughly. We devised 600k test scenarios that spanned 114 parameter distributions covering a range of command lengths from very short 2-term commands to much longer 15-term commands. We also focused on out-of-vocabulary parameter terms (terms that do not occur in the training data such as names of cities and movies for example) to ensure that the model could also handle these. 

Analysis of this approach in conjunction with the Aiqudo platform showed how it improved platform performance: The general entity recall ratio increased by over 10%. This integrated multitask model specifically fits well with Aiqudo’s requirements:

  1. The model was trained on our own corpus and produces entities and POS tags compatible with our on-boarded mobile app commands
  2. The three relevant tasks share most hidden layers and better weight optimization can therefore be achieved very efficiently
  3. The system can be easily adapted to newly on-boarded actions by expanding or adjusting the training corpus and/or annotating tags
  4. The random embedding model runs fast enough even on CPUs and produces much better results than publicly available NLP tools

We plan to continue to use DL where appropriate within our platform to complement and augment our existing Semiotics-based NLU engine. Possible future work includes: 

  1. extending the solution for any other languages (our system has commands on-boarded in several languages to use for training)
  2. tagging information and multi-label outputs haven’t been explicitly utilized as yet; we plan to leverage this information to further improve NER performance 
  3. the DL model can be further expanded by integrating it with other subtasks such as predicting relevant mobile apps from commands and/or actions. 

This powerful pipeline employing this flexible combination of Semiotics, Deep Learning and Grammar-based algorithms will offer more powerful Aiqudo voice services in the future. 

Xiwu Han, Hudson Mendes and David Patterson – Aiqudo R&D

Q Actions 1.6.2 just released to App Store!

By App Actions, Artificial Intelligence, Conversation, Digital Assistants, Knowledge, Machine Learning, Natural Language, Voice Search No Comments

New Q Actions version now in the App Store

This version of Q Actions features contextual downstream actions, integration with your calendar, as well as under the bonnet improvements to our matching engines. Q Actions help users power through their day by being more useful and thoughtful.

Contextual Awareness

Q Actions understands the context when performing your actions. Let’s say you call a contact in your phonebook with the command “call Tiffany”. You can then follow-up with the command “navigate to her house”. Q Actions is aware of the context based on your previous command and is able to use that information in a downstream action.


  • say “call Tiffany”
    • then “navigate to her house”

Calendar Integration


Stay on top of your schedule and daily events with the recently added Calendar actions. Need to see what’s coming up next? Just ask “when is my next meeting?” and Q Actions will return a card with all the important event information. Need to quickly schedule something on your calendar? Say “create a new event” and after a few questions, your event is booked. On the go and need to join a video conferencing meeting? Simply say “join my next meeting” and Q Actions will take you directly to your meeting in Google Meet. All you have to do from there is confirm your camera/audio settings and join!

  • “when is my next meeting?”
  • “create a new event”
  • “join my next meeting”

Simply do more with voice! Q Actions is now available on the App Store.

Internships: The New Normal

By Natural Language, Startup Culture No Comments

Kenny Kang

Intern Voice: Kenny Kang

Working at a startup can be described as, interesting, but in the best way possible. As a comparison, my summer roommate interned for a larger corporate company and we developed two completely different ideas of a what a ‘normal’ working environment is. Apparently, it isn’t ‘normal’ for an internship project to become a feature in the company’s main product. It also isn’t ‘normal’ to have the opportunity of presenting directly to C suite executives. And it definitely isn’t ‘normal’ to be talking to your CEO about his wild college days during company outings. I could write entire essays about all the reasons I loved working at Aiqudo, but there was one that always made my friends question my sanity: I kept describing the work itself as ‘fun!’. Even for a startup, I’m not sure how normal that is.

During my internship, I created a question answering service which dealt with knowledge-based queries. For example, questions like “How old is Tom Brady?” or “Which movies were Lawrence Fishburne and Keanu Reeves in together?”. While the problem itself was interesting, it was the freedom I had that made it really engaging. Since there isn’t always a straightforward solution when dealing with Natural Language Processing (NLP) problems, I needed to constantly approach the next obstacle in new ways, such as using certain tools in unconventional ways or reading up on the latest research. Each new day felt like solving a new puzzle, and that’s what made it consistently so enjoyable! Of course, I ran into plenty of issues that seemed impossible to get around, but luckily, I had an amazing mentor, Sunil, who was always there to point me in the right direction.

This past summer has been an incredible experience. I came in thinking I would leave with a few new skills. Not only have I learned several valuable skills, I’ve also developed a newfound confidence in my ability to think through complex problems, and set a higher bar for any company I’d want to work for in the future.

Q Actions 2.0

Do more with Voice! Q Actions 2.0 now available on Google Play

By Action Recipes, App Actions, Artificial Intelligence, Conversation, Digital Assistants, Natural Language, Voice, Voice Search No Comments

Do more with Voice

Q Actions 2.0 is here. With this release, we wanted to focus on empowering users throughout their day. As voice is playing a more prevalent part in our everyday lives, we’re uncovering more use cases where Q Actions can be of help. In Q Actions 2.0, you’ll find new features and enhancements that are more conversational and useful.

Directed Dialogue™

Aiqudo believes the interaction with a voice assistant should be casual, intuitive, and conversational. Q Actions understands naturally spoken commands and is aware of the apps installed on your phone, so it will only return personalized actions that are relevant to you. When a bit more information is required from you to complete a task, Q Actions will guide the conversation until it fully understands what you want to do. Casually chat with Q Actions and get things done.

Sample commands:

  • “create new event” (Google Calendar)
  • “message Mario (WhatsApp, Messenger, SMS)
  • “watch a movie/tv show” (Netflix, Hulu)
  • “play some music” (Spotify, Pandora, Google Play Music, Deezer)

Q Cards™

In addition to providing relevant app actions from personal apps that are installed on your phone, Q Actions will now display rich information through Q Cards™. Get up-to-date information from cloud services on many topics: flight status, stock pricing, restaurant info, and more. In addition to presenting the information in a simple and easy-to-read card, Q Cards™ support Talkback and will read aloud relevant information.

Sample commands:

  • “What’s the flight status of United 875?”
  • “What’s the current price of AAPL?”
  • “Find Japanese food

Voice Talkback™

There are times when you need information but do not have the luxury of looking at a screen. Voice Talkback™ is a feature that reads aloud the critical snippets of information from an action. This enables you to continue to be productive, without the distraction of looking at a screen. Execute your actions safely and hands-free.

Sample commands:

  • “What’s the stock price of Tesla?” (E*Trade)
    • Q: “Tesla is currently trading at $274.96”
  • “Whose birthday is it today?” (Facebook)
    • Q: “Nelson Wynn and J Boss are celebrating birthdays today”
  • “Where is the nearest gas station?”
    • Q: “Nearest gas at Shell on 2029 S Bascom Ave and 370 E Campbell Ave, 0.2 miles away, for $4.35”

Compound Commands

An enhancement to our existing curated Actions Recipes, users can now create Action Recipes on the fly using Compound Command. Simply join two of your favorite actions using “and” into a single command. This allows the users the capability to create millions of Action Recipe combinations from our database of 4000+ actions.

Sample commands:

  • “Play Migos on Spotify and set volume to max”
  • “Play NPR and navigate to work”
  • “Tell Monica I’m boarding the plane now and view my boarding pass”

Simply do more with voice! Q Actions is now available on Google Play.

Q Actions - Directed Dialogue

Q Actions – Task completion through Directed Dialogue™

By Conversation, Digital Assistants, Natural Language, User Interface, Voice No Comments

When an action or a set of actions require specific input parameters, Directed Dialogue™ allows the user to submit the required information through very simple, natural back-and-forth conversation. Enhanced with parameter validation, and user confirmation,Directed Dialogue™ allows complex tasks to be performed with confidence.Directed Dialogue™ is  not about open-ended conversations, but  it about getting things done, simply and efficiently.

With Q Actions, Directed Dialogue™ is automatically enabled  for every action in the system because we know the semantic requirements of each and every action’s parameters. It is not constrained, and  applies across all actions across all verticals.

Another application of Directed Dialogue™ is input refinement. Let’s say I want to purchase batteries. If I just say, “add batteries to my shopping cart” I can get the wrong product added to my cart, as on Alexa, which does the wrong thing for a new product order (the right thing happens on a reorder). In the case of Q Actions, I can provide the brand Duracell and the type 9V 4 pack with very simpleDirected Dialogue™, and exactly the right product is added to my cart – in the Amazon or Walmart app.

Get Q Actions today.

Poison Bottle

AI for Voice to Action – Part 2: Machine Learning Algorithms

By Artificial Intelligence, Command Matching, Machine Learning, Natural Language No Comments

My last post discussed the important step of automatically generating vast amounts of relevant content relating to commands to which we apply our machine learning algorithms. Here I want to delve into the design of our algorithms.

Given a command, our algorithms need to:

  1.   Understand the meaning and intent behind the command
  2.   Identify and extract parameters from it
  3.   Determine which app action is most appropriate
  4.   Execute the chosen action and pass the relevant parameters to the action

This post and the next one will address point 1. The other points will be covered in subsequent posts.

So how do we understand what a user means based on their command? Typically commands are short (3 or 4 terms), which makes it very difficult to disambiguate among the multiple meanings a term can have. So if someone says “search for Boston” do they want directions to a city or do they want to listen to a rock band on Spotify? In order to disambiguate among all the possibilities we need to know if a) any of the command terms can have different meanings, b) what those meanings are and finally c) which is the correct one based on context.

Semiotics

In order to do this we developed a suite of algorithms which feed off the data we generated previously (See post #3). These algorithms are inspired by semiotics, the study of how meaning is communicated. Semiotics originated as a theory of how we interpret the meaning of signs and symbols. Given a sign in one context, for example a flag with a skull and crossbones on it, you would assign a particular meaning to it (i.e. Pirates).

Pirate Symbol

Whereas, if you changed the context to a bottle, then the meaning changes completely

Poison Bottle

Poison – do not drink!

Linguists took these ideas and applied them to language and how, given a term (e.g. ‘window’), its meaning can change depending on the meaning of the words around it in the sentence (meanings could be physical window in a room, software window, window of opportunity, etc.).  By applying these ideas to our data we can understand the different meanings a term can have based on its context.

Discourse Communities

We also drew inspiration from discourse communities. A discourse community is a group of people involved in and communicating about a particular topic. They tend to use the same language for important concepts (sometimes called jargon) within their community, and these terms have a specific, understood and agreed meaning within the community to make communication easier. For example members of a cycling community have their own set of terms that is fairly unique to them that they all understand and adhere to. If you want to see what I mean, go here and learn the meanings of such terms as an Athena, a Cassette, a Chamois (very important!) and many other terms. Similarly motor enthusiasts will have their own ‘lingo’. If you want to be able to differentiate your AWS from your ABS and your DDI from your DPF then get up to speed here.

Our users use apps, so in addition we would expect to discover gaming discourses, financial discourses, music discourses, social media discourses and so on. Our goal was to develop a suite of machine learning algorithms which could automatically identify these communities through their important jargon terms. By identifying the jargon terms we can build a picture of the relationship between these terms and other terms used by each discourse community within our data. A characteristic of jargon words is that they have a very narrow meaning within a discourse compared to other terms. For example the term ‘computer’ is a very general term that can have multiple meanings across many discourses – programming, desktop, laptop, tablet, phone, firmware, networks etc. … ‘Computer’ isn’t a very good example of a jargon term as it is too general and broad in meaning. We want to identify narrow, specific terms that have a very precise meaning within a single discourse, e.g. a specific type of processor, or a motherboard. Our algorithms do a remarkable job of identifying these jargon terms and are foundational to our ability to extract meaning, precisely understand user commands and thereby the real intent that lies behind them.

In my next post I will go into the details behind the algorithms that enable us to identify these narrow-meaning, community-specific jargon terms and ultimately to build a model that understands the meaning and intent behind user queries.