Voice Enable System Settings with Q Actions 1.3.3!

By | App Actions, Digital Assistants, News, Voice Search | No Comments

Somewhere in the Android Settings lies the option for you turn on Bluetooth, turn off Wifi, and change sound preferences. These options are usually buried deep under menus and sub-menus. Discoverability is an issue and navigating to the options usually means multiple taps within the Settings app. Yes, there’s a search bar within the Settings app, but it’s clunky, requires typing and only returns exact matches. Some of these options are accessible through the quick settings bar, but discovery and navigation issues still exist. 

In the latest release, simply tell Q Actions what System Settings you want to change. Q Actions can now control your Bluetooth, Wifi, music session, and sound settings through voice.

Configure your Settings:

  • “turn on/off bluetooth”
  • “turn wifi on/off”

Control your music:

  • “play next song”
  • “pause music”
  • “resume my music”

Toggle your sound settings:

  • “enable do not disturb”
  • “mute ringer”
  • “increase the volume”
  • “put my phone on vibrate”

In addition to placing calls to your Contacts, Q Actions helps you manage Contacts via voice. Easily add a recent caller as a contact in your phonebook or share a friend’s contact info with simple commands. If you have your contact’s address in your Contacts, you can also get directions to the address using your favorite navigation app.

Place calls to Contacts:

  • “call Jason Chen
  • “dial Mario on speaker”

Manage and share your Contacts:

  • “save recent number as Mark Johnson
  • “edit Helen’s contact information“
  • “share contact info of Daniel Phan
  • “view last incoming call”

Bridge the gap between your Contacts and navigation apps:

  • “take me to Rob’s apartment”
  • “how do I get to Mike’s house?”

Unlock your phone’s potential with voice! Q Actions is now available on Google Play.

Data Augmentation

AI for Voice to Action – Part 1: Data

By | Artificial Intelligence, Machine Learning, Voice Search | No Comments

At Aiqudo two critical problems we solve in voice control are the action discovery problem and the cognitive load problem.

In my first post I discussed how using technology to overcome the challenges of bringing voice control into the mainstream motivated me to get out of bed in the morning. I get a kick out of seeing someone speaking naturally to their device and smiling when it does exactly what they wanted.

In our second post in the series we discussed how Aiqudo has built the the largest (and growing) mobile app action index in the world and our process for on-boarding actions. On-boarding an action only  takes minutes – there is no programming involved and we are not reliant on the app developer to set this up or provide an API. This enables enormous scalability of actions compared to the Amazon and Google approaches that rely on a programming solution where developers are required to code to these platforms, add specific intents, and go through a painful approval process.

In this post  I wanted to start to elaborate on our overall approach and discuss specifically how we create the large amounts of content for our patented machine learning algorithms to analyze, in order to be able to understand a user’s intent. Ours is a significant achievement since even large teams are facing challenges in solving this problem in a generic fashion – as the following quote from Amazon shows.   

“The way we’re solving that is that you’ll just speak, and we will find the most relevant skill that can answer your query … The ambiguity in that language, and the incredible number of actions Alexa can take, that’s a super hard AI problem.” – Amazon

At Aiqudo, we have already solved the challenge that Amazon is working on. Our users don’t have to specify which app to use  and we automatically pick the right actions for their command thereby reducing the cognitive load for the user.

The starting point for generating the content we need is the end of the action on-boarding process, when a few sample commands are added to the action. These training commands enable us to start the machine learning processes that enable us to

  1. extract the correct meaning from the natural language command
  2. understand the intent; and
  3. execute the correct action on the best app

The first step in this process is to gather content relating to each command on-boarded (command content). As is typical with machine learning approaches we are data hungry – the more data we have, the better our performance. Therefore we use numerous data repositories specific to on-boarded commands and apps and interrogate them to identify related content that can be used to augment the language used in the command.

Content Augmentation for Machine Learning

Content augmentation removes noise and increases the semantic coverage of terms

 

Teaching a machine to correctly understand what a user intends from just a few terms in a command is problematic (as it would be for a human) – there isn’t enough context to fully understand the command – e.g. ‘open the window’ – is this a software related command or a command related to a room? Augmenting the command with additional content adds a lot more context for the algorithms to better understand meaning and intent. This augmented content forms the basis of a lexicon of terms relating to each on-boarded command. Later, when we apply our machine learning algorithms this provides the raw data to enable us to build and understand meaning – e.g. we can understand that a movie is similar to a film, rain is related to weather, the term ‘window’ has multiple meanings and so on.

It is equally important that each command’s lexicon is highly relevant to the command and low in noise – for this reason we automatically assess each term within the lexicon to determine its relevance and remove noise. Once we have the low noise lexicon this becomes a final lexicon of terms relating to each command. We then generate multiple command documents from the lexicon for each command. Each command document is generated by selecting terms based on the probability of its occurrence within the command’s lexicon. The more likely a term occurs within the command’s lexicon, the more likely it is to occur in a command document. Note by doing this we are synthetically creating documents which do not make sense to a human, but are a reflection of the probabilities of occurrence of terms in the command’s lexicon. It is these synthetically created command documents which we use to train our machine learning algorithms to understand meaning and intent. Because these are synthetically generated we can also control the number of command documents we create to fine tune the learning process.

Once we have carefully created a relevant command lexicon and built a repository of documents which relate to each command that has been on-boarded, we are ready to analyze the content, identify topics and subtopics, disambiguate among the different meanings words have and understand contextual meaning.  Our innovative content augmentation approach allows us to quickly deploy updated machine learned models that can immediately match new command variants, so we don’t have to wait for large numbers of live queries for training as with other approaches.

The really appealing thing about this approach is it is language agnostic – it allows us to facilitate users speaking in any language by interrogating multilingual content repositories. Currently we are live in 12 markets in 7 languages and and are enabling new languages. We’re proud of this major accomplishment in such a short timeframe.  

In my next post in this series, I will say a little more about the machine learning algorithms we have developed that have enabled us to build such a scalable, multi-lingual solution.

Ever-growing index of App Actions

The largest mobile App Action index in the world!

By | App Actions, Digital Assistants, Voice Search | No Comments

You often hear the phrase “Going from 0 to 1” when it comes to the accomplishment of reaching a first milestone – an initial product release, the first user, the first partner, the first sale.   Here at Aiqudo, I believe our “0 to 1” moment occurred at the end of the summer in 2017 when we reached our aspirational goal of on-boarding a total of 1000 Actions. It was a special milestone for us as we had built an impressive library of actions across a broad category of apps, using simple software tools, in a relatively short time, with only a handful of devs and interns.  For comparison, we were only 5 months in operation and already had one tenth the number of actions as that “premier bookseller in the cloud” company. These were not actions for games and trivia – these were high utility actions in mobile apps that were not available in other voice platforms. On top of that, we did it all without a single app developer’s help – no APIs required. That’s right, no outside help!

So how were we able to accomplish this? Quite simply, we took the information we knew about Android and Android apps and built a set of tools and techniques that allowed us to reach specific app states or execute app functions.  Our initial approach provided simple record and replay mechanics allowing us to reach virtually any app state that could be reached by the user. Consequently, actions such as showing a boarding pass for an upcoming flight, locating nearby friends through social media or sending a message could be built, tested, and deployed in a matter of minutes with absolutely no programming involved!   But we haven’t stopped there. We also incorporate app-specific and system-level intents whenever possible, providing even more flexibility to the action on-boarding process and our growing library of actions including those that control Alarms, Calendar, Contacts, Email, Camera, Messaging and Phone to name a few. With the recent addition of system level actions, we now offer a catalog of very useful actions for controlling various mobile device settings such as audio controls, display orientation and brightness, wifi, bluetooth,  flash and speaker volume.

Our actions on-boarding process and global actions library solves the action discovery problem that we described in an earlier post. We do the heavy lifting, so all you need to say is show my actions”, or “show my actions for Facebook” and get going! And you don’t need to register your credentials to invoke your personal actions.

Today our action library is ~4000 strong and supports 7 languages across 12 locales.  Not bad for a company less than a year and a half old! We haven’t fully opened up the spigot either! 

Of course, all of this would not be possible without the hard work of the Aiqudo on-boarding team whose job, among other things, is to create and maintain Actions for our reference Q Actions app as well as our partner integrations.   The team continues to add new and interesting actions to the Aiqudo Action library and optimize and re-onboard actions as needed to maintain a high quality of service.

Check back with us for a follow-on post where we’ll discuss how our team maintains actions through automated testing.

vintage alarm clock

What motivates me to get out of bed in the morning?

By | Artificial Intelligence, Digital Assistants, Voice Search | No Comments

A while back a friend bought an Alexa speaker. He was so excited about the prospects of speaking to his device and getting cool  things done without leaving the comfort of his chair. A few weeks later when I next saw him I asked how he was getting on with it and his reply was very insightful and typical of the problems current voice platforms pose.

Initially when he plugged it in, after asking the typical questions everyone does (‘what is the weather’ and ‘play music by Adele’) he set about seeing what other useful things he could do. He quickly found out that it wasn’t easy to find out what 3rd party skills were integrated with Alexa (I call this the action discovery problem). When he found a resource to provide this information he went about adding skills – local news headlines, a joke teller, Spotify (requiring registration), quiz questions and so on. Then he hit his next problem – in order to use these skills he had to learn a very specific set of commands in order to execute the functionality. This was fine for two or three skills but it very soon became overwhelming. He found himself forgetting the precise language to use for each specific skill and soon became frustrated (the cognitive load problem).

Last week when I saw him again he had actually given the speaker to his son who was using it as a music player in his bedroom. Once the initial ‘fun’ of the device wore off it became apparent that there was very little real utility from it for him. While some skills had value it was painful to find out about them in the first place, add them to Alexa and then remember the specific commands to execute them…

The reason I found this so interesting was that these are precisely the problems we have solved at Aiqudo. Our goal is to provide consumers a truly natural voice interface to actions, starting with all the functionality in their phone apps, without having to remember specific commands needed to execute them. For example if I want directions to the SAP centre in San Jose to watch the Sharks I might say, ‘navigate to the SAP Centre’,  ‘I want to drive to the SAP Centre’ or ‘directions to the SAP Centre’. Since a user can use any of these commands, or other variants, they should all just work. Constraining users to learn the precise form of a command just frustrates them and provides a poor user experience. In order to leverage the maximum utility from voice, we need to understand the meaning and intent behind the command irrespective of what the user says and be able to execute the right action.

So how do we do it?

This is not a simple answer, so we plan to cover the main points in a series of blog posts over the coming weeks. These will focus at a high level on the processes, the technology, the challenges and the rationale behind our approach. Our process has 2 main steps.

  • Understand the functionality available in each app and on-board these actions into our Action Index
  • Understand the intent of a user’s command and subsequently, automatically execute the correct action.

In step 1, by doing the ‘heavy lifting’ and understanding the functionality available within the app ecosystem for users, we overcome the action discovery problem my friend had with his Alexa speaker. Users can simply say what they want to do and we find the best action to execute automatically – the user doesn’t need to do anything. In fact if they don’t have an appropriate app on their device for the command they have just issued we actually recommend it to them and they can install it!  

Similarly in step 2, by allowing users the freedom to speak naturally and choose whatever linguistic form of commands they wish, we overcome the second problem with Alexa – the cognitive load problemusers no longer have to remember very specific commands to execute actions. Voice should be the most intuitive user interface – just say what you want to do.  We built the Aiqudo platform to understand the wide variety of ways users might phrase their commands, allowing users to go from voice to action easily and intuitively.  And did I mention that the Aiqudo platform is multilingual, enabling natural language commands in any language the user chooses to speak in.

So getting back to my initial question – what motivates me to get out of bed in the morning? – well, I’m excited to use technology to bring the utility of the entire app ecosystem to users all over the world so they can speak naturally to their devices and get stuff done without having to think about it!

In the next post in this series, we’ll talk about step 1making the functionality in apps available to users.

world cup with voice

Hacking the World Cup with Voice

By | Digital Assistants, Voice Search | No Comments

“Watch the world cup live”

Can your digital assistant let you *watch* the world cup live with a single command? The answer is NO!

Not Google Assistant, not Alexa, not Siri.  And definitely not when you are not situated on your couch. They can show you the scores, and upcoming matches, but that’s easy 🙂

But … you don’t want to know old scores. You want to watch the match live! Now!

That’s what we do at Aiqudo – using Moto Voice, powered by Aiqudo, and the Moto Projector Mod, we’ve enabled this incredible experience – this Voice to Action(™) experience comes to you from the Fox Sports app.

“Turn on the projector” and get some beers!

Q Actions - Call

Q Actions 1.3 update is now available on Google Play!

By | Digital Assistants, User Interface, Voice Search | No Comments

Q Actions now enables you to make calls directly using voice commands, regardless of if your contact is in your phonebook or a third-party app like WhatsApp.

Remembering friends and family across multiple phone books and communication apps is cumbersome. Through voice, you can privately tell Q Actions which contact you want to connect with and what app you want to place the call with, safely and hands free.

Juggling multiple phone books across your apps can be tedious … We got your back!

Also, try out some of the new and improved actions from familiar apps that you already have on your phone: Netflix, Spotify, Waze, Maps, Facebook, and more.

Just launch Q Actions and say:

    • “dial John”, “call Jason on WhatsApp” Phone/WhatsApp
    • “Play Stranger Things”, “watch Netflix originals” Netflix
    • “play songs by Drake”, “play mint playlist”Spotify
    • “take me to work”, “I want to drive home” Waze
    • “are any of my friends nearby?”, “view upcoming events” Facebook

App Store Icon

As always, we welcome your feedback.

 

 

This Week In Voice Podcast March 15 2018

This Week in Voice – Podcast with Aiqudo

By | Digital Assistants, Voice, Voice Search | No Comments

Season 2, Episode 8 of the “This Week In Voice” podcast features Aiqudo’s co-founders discussing the latest developments in the world of voice.

Hear CEO John and CTO Rajat provide their opinions and perspectives on several recent developments: Alexa’s new “follow-up mode”, Google’s recently announced multi-step routines, the availability of Alexa and Assistant on tablets, the current issues with these assistant platforms on phones, the challenges for banking, payments and other private activities using standalone voice assistants, and the potential proliferation of vertical and specialized voice applications.

You can listen to the podcast here.

Other options: