At Aiqudo two critical problems we solve in voice control are the action discovery problem and the cognitive load problem.
In my first post I discussed how using technology to overcome the challenges of bringing voice control into the mainstream motivated me to get out of bed in the morning. I get a kick out of seeing someone speaking naturally to their device and smiling when it does exactly what they wanted.
In our second post in the series we discussed how Aiqudo has built the the largest (and growing) mobile app action index in the world and our process for on-boarding actions. On-boarding an action only takes minutes – there is no programming involved and we are not reliant on the app developer to set this up or provide an API. This enables enormous scalability of actions compared to the Amazon and Google approaches that rely on a programming solution where developers are required to code to these platforms, add specific intents, and go through a painful approval process.
In this post I wanted to start to elaborate on our overall approach and discuss specifically how we create the large amounts of content for our patented machine learning algorithms to analyze, in order to be able to understand a user’s intent. Ours is a significant achievement since even large teams are facing challenges in solving this problem in a generic fashion – as the following quote from Amazon shows.
“The way we’re solving that is that you’ll just speak, and we will find the most relevant skill that can answer your query … The ambiguity in that language, and the incredible number of actions Alexa can take, that’s a super hard AI problem.” – Amazon
At Aiqudo, we have already solved the challenge that Amazon is working on. Our users don’t have to specify which app to use and we automatically pick the right actions for their command thereby reducing the cognitive load for the user.
The starting point for generating the content we need is the end of the action on-boarding process, when a few sample commands are added to the action. These training commands enable us to start the machine learning processes that enable us to
- extract the correct meaning from the natural language command
- understand the intent; and
- execute the correct action on the best app
The first step in this process is to gather content relating to each command on-boarded (command content). As is typical with machine learning approaches we are data hungry – the more data we have, the better our performance. Therefore we use numerous data repositories specific to on-boarded commands and apps and interrogate them to identify related content that can be used to augment the language used in the command.
Content augmentation removes noise and increases the semantic coverage of terms
Teaching a machine to correctly understand what a user intends from just a few terms in a command is problematic (as it would be for a human) – there isn’t enough context to fully understand the command – e.g. ‘open the window’ – is this a software related command or a command related to a room? Augmenting the command with additional content adds a lot more context for the algorithms to better understand meaning and intent. This augmented content forms the basis of a lexicon of terms relating to each on-boarded command. Later, when we apply our machine learning algorithms this provides the raw data to enable us to build and understand meaning – e.g. we can understand that a movie is similar to a film, rain is related to weather, the term ‘window’ has multiple meanings and so on.
It is equally important that each command’s lexicon is highly relevant to the command and low in noise – for this reason we automatically assess each term within the lexicon to determine its relevance and remove noise. Once we have the low noise lexicon this becomes a final lexicon of terms relating to each command. We then generate multiple command documents from the lexicon for each command. Each command document is generated by selecting terms based on the probability of its occurrence within the command’s lexicon. The more likely a term occurs within the command’s lexicon, the more likely it is to occur in a command document. Note by doing this we are synthetically creating documents which do not make sense to a human, but are a reflection of the probabilities of occurrence of terms in the command’s lexicon. It is these synthetically created command documents which we use to train our machine learning algorithms to understand meaning and intent. Because these are synthetically generated we can also control the number of command documents we create to fine tune the learning process.
Once we have carefully created a relevant command lexicon and built a repository of documents which relate to each command that has been on-boarded, we are ready to analyze the content, identify topics and subtopics, disambiguate among the different meanings words have and understand contextual meaning. Our innovative content augmentation approach allows us to quickly deploy updated machine learned models that can immediately match new command variants, so we don’t have to wait for large numbers of live queries for training as with other approaches.
The really appealing thing about this approach is it is language agnostic – it allows us to facilitate users speaking in any language by interrogating multilingual content repositories. Currently we are live in 12 markets in 7 languages and and are enabling new languages. We’re proud of this major accomplishment in such a short timeframe.
In my next post in this series, I will say a little more about the machine learning algorithms we have developed that have enabled us to build such a scalable, multi-lingual solution.