A while back a friend bought an Alexa speaker. He was so excited about the prospects of speaking to his device and getting cool things done without leaving the comfort of his chair. A few weeks later when I next saw him I asked how he was getting on with it and his reply was very insightful and typical of the problems current voice platforms pose.
Initially when he plugged it in, after asking the typical questions everyone does (‘what is the weather’ and ‘play music by Adele’) he set about seeing what other useful things he could do. He quickly found out that it wasn’t easy to find out what 3rd party skills were integrated with Alexa (I call this the action discovery problem). When he found a resource to provide this information he went about adding skills – local news headlines, a joke teller, Spotify (requiring registration), quiz questions and so on. Then he hit his next problem – in order to use these skills he had to learn a very specific set of commands in order to execute the functionality. This was fine for two or three skills but it very soon became overwhelming. He found himself forgetting the precise language to use for each specific skill and soon became frustrated (the cognitive load problem).
Last week when I saw him again he had actually given the speaker to his son who was using it as a music player in his bedroom. Once the initial ‘fun’ of the device wore off it became apparent that there was very little real utility from it for him. While some skills had value it was painful to find out about them in the first place, add them to Alexa and then remember the specific commands to execute them…
The reason I found this so interesting was that these are precisely the problems we have solved at Aiqudo. Our goal is to provide consumers a truly natural voice interface to actions, starting with all the functionality in their phone apps, without having to remember specific commands needed to execute them. For example if I want directions to the SAP centre in San Jose to watch the Sharks I might say, ‘navigate to the SAP Centre’, ‘I want to drive to the SAP Centre’ or ‘directions to the SAP Centre’. Since a user can use any of these commands, or other variants, they should all just work. Constraining users to learn the precise form of a command just frustrates them and provides a poor user experience. In order to leverage the maximum utility from voice, we need to understand the meaning and intent behind the command irrespective of what the user says and be able to execute the right action.
So how do we do it?
This is not a simple answer, so we plan to cover the main points in a series of blog posts over the coming weeks. These will focus at a high level on the processes, the technology, the challenges and the rationale behind our approach. Our process has 2 main steps.
- Understand the functionality available in each app and on-board these actions into our Action Index
- Understand the intent of a user’s command and subsequently, automatically execute the correct action.
In step 1, by doing the ‘heavy lifting’ and understanding the functionality available within the app ecosystem for users, we overcome the action discovery problem my friend had with his Alexa speaker. Users can simply say what they want to do and we find the best action to execute automatically – the user doesn’t need to do anything. In fact if they don’t have an appropriate app on their device for the command they have just issued we actually recommend it to them and they can install it!
Similarly in step 2, by allowing users the freedom to speak naturally and choose whatever linguistic form of commands they wish, we overcome the second problem with Alexa – the cognitive load problem – users no longer have to remember very specific commands to execute actions. Voice should be the most intuitive user interface – just say what you want to do. We built the Aiqudo platform to understand the wide variety of ways users might phrase their commands, allowing users to go from voice to action easily and intuitively. And did I mention that the Aiqudo platform is multilingual, enabling natural language commands in any language the user chooses to speak in.
So getting back to my initial question – what motivates me to get out of bed in the morning? – well, I’m excited to use technology to bring the utility of the entire app ecosystem to users all over the world so they can speak naturally to their devices and get stuff done without having to think about it!
In the next post in this series, we’ll talk about step 1 – making the functionality in apps available to users.