Often, when you have something to do, you start by searching for information about a particular Thing. Sometimes, you know exactly what that Thing is, but often, you find the Thing by using information related to it.
“Who is Taylor Swift?” → Taylor Swift
“Who directed Avatar” → “James Cameron”
The “Thing” is what we call a Knowledge Entity and something that you can do with that Thing is what we call a Downstream Action. The bond between that Knowledge Entity and the Downstream Action is what we refer to as Actionable Knowledge.
How do we do this? Our Knowledge database holds information about all kinds of entities such as movies, TV series, athletes, corporations etc. These Entities have rich semantic structure; we have detailed information about the different attributes of these Entities along with the Actions one can perform on those entities. An Action may be generic (watch a show), but can also be explicitly connected to a mobile app or service (watch the show on Disney+). This knowledge allows the user to follow up on an Entity command with an Action.
For example, asking a question such as “How tall is Tom Brady?” allows you to get his height i.e., 6’ 4” or 1.93 metres (based on the Locale of who’s asking) since Knowledge captures these important attributes about Tom Brady. Note that these attributes are different for different types of Entities. That is determined by the Schema of the Entity, which allows validation, normalization and transformation of data.
A command like “Who is Tom Brady?” returns a Q Card with information about Tom Brady, as shown below. As there may be multiple entities referring to “Tom Brady”, a popularity measure is computed so that the correct Tom Brady is returned, based on popularity, context and your current session. Popularity is a special attribute that is computed from multiple attributes of the entity. An Entity Card surfaces the various attributes associated with the attribute, such as when Tom Brady was born, how tall and heavy he is, and what sport he plays. There are also attributes that define potential Actions that can follow, so “go to his Instagram” will instantly take you to Tom Brady’s account in the Instagram app.
Actions are about getting things done! Here’s another example of being able to instantly go from information to Action using Actionable Knowledge. Asking “Who is Tom Petty?” followed by a command “listen to him on Spotify” will start playing his music. This is a powerful feature that provides a great user experience and rapid Time to Action® .
The three pillars of the Aiqudo’s Q Actions Platform allow us to implement downstream Actions:
- Semantically rich Entities in Actionable Knowledge
- AI-based Search
- Powerful Action execution engine for mobile apps and cloud services
We are not limited by just the name of the entity. Our AI-based search allows you to find entities using various attributes of the entity. For example, you can search for stock information by saying “How is Tesla stock doing today?” or “Show me TSLA stock price”. Aiqudo understands both the corporation name or the stock ticker when it needs to find information on a company’s stock price. Some apps like Yahoo Finance can only understand the stock ticker; it may not be built to accept the name of the company as an input. Our platform allows us to fill this gap by decoupling action execution from search intent detection. A middle-tier federation module acts as a bridge between intent extraction and Action execution by extracting the right attributes of the Entity returned by the search to those required by the Action execution engine. In the above example it extracts the stockTicker attribute (TSLA), from the corporation entity retrieved by the search (Tesla) and feeds it to the Action engine.
Voila! Job done!
So, what can you do with that Thing? Well, you can instantly perform a meaningful Action on it using the apps on your mobile phone. In the example above, you can jump to Yahoo News to get the latest finance news about Tesla, or go to the stock quote screen within E*Trade, the app you use and trust, to buy Tesla shares and make some money!
It’s great to see various platforms announce specific accessibility features on this Global Accessibility Awareness Day.
A feature that caught our attention today was Google’s Assistant-powered Action Blocks.
It’s a new app that allows users to create simple shortcuts to Actions they commonly perform. They are powered by Google Assistant, but allow for invocation through a tap.
We built this functionality into Aiqudo’s Q Actions when we launched it in 2017. Our approach is different in several ways:
- The user does not need to do any work, Q Actions does it automatically for the user
- Q Actions builds these dynamically – your most recently used Actions, and your favorite ones are automatically tracked for you – you just need to say “show my actions”
- These handy Action shortcuts are available to you with one swipe to the right in the Q Actions app. One tap to invoke your favorite action.
- There’s no new app, just for accessibility – it’s built in to your Assistant interface for convenience – you just need to say “Hello Q, show my Actions”
- There are hundreds of unique high-utility Actions you can perform that are not available in any other platform, including Google Assistant. Here are a few examples:
- “whose birthday is it today?” (Facebook)
- “show my orders” (Amazon, Walmart)
- “start meditating” (Headspace)
- “watch the Mandalorian” (Disney+)
- “watch Fierce Queens” (Quibi)
- “show my tasks” (Microsoft To Do, Google Tasks)
- “show my account balances” (Etrade)
- “join my meeting with my camera off” (Google Hangouts, Zoom)
- “call Mark” (Whatsapp, Messenger, Teams, Slack,…)
- “send money to John” (PayPal)
- . and on and on and on…
It’s just easier, better and more powerful!
And available to everyone!
We all remember playing playing the game Monopoly as kids right? Well I recently stumbled upon a version of the game that uses voice commands to control the game flow and act as the “bank” – a role most of us avoided so we wouldn’t have to deal with all the annoying transactions such as selling properties and buildings, collecting taxes, exchanging currency, and paying out people as they passed “GO”. Admittedly, it’s a novel use for voice commands in a classic game. But what about using voice to manage apps controlling our real money?
Voice to power more than games
Back in June of last year, we wrote a blog post that talked about the power of voice to perform activities in mobile banking apps. The post specifically referenced Bank of America’s Erica virtual voice assistant as a tool to help users accomplish common, often time-consuming banking activities without the need to memorize complex menus or worse, speak to the dreaded online customer service representative. The net result of this; a simple, pleasant, user experience that builds brand loyalty and customer retention.
Enter Aiqudo Voice to Action®
Well, that got me thinking. I’ve been an E*Trade banking customer for years and all this time I’ve never really used voice to make payments, transfer money or check balances.
Nonetheless, I decided to see if I could recreate and hopefully improve upon my previous experience – this time using our very own Q Actions app. The following video highlights some of my efforts.
But can I trust this new way of banking?
Yes. You may have noticed in the video that I am not providing credentials to access my account in E*Trade. That’s because I’m already authenticated. Previous to shooting the video, I had provided credentials, by way of fingerprint biometric, as part of the very first action execution. Note that Aiqudo did not manage this process; it was handled completely by the mobile app. And because of this, the data used to hold the credential, lives entirely in the app itself and is neither passed to nor processed by Aiqudo systems at all. This separation of duties maintains privacy of user data and hence increases trust in using the technology.
A personalized experience
Personalization is a word typically used to describe how an app or other system function adjusts to provide an experience tailored specifically to a user. It’s often used in conjunction with AI and machine learning systems as the end result of acquiring, processing and suggesting courses of action or data upon which to act. We enable personalization in the previous actions a couple of ways. If you have, say, more than one voice-enabled banking app on your mobile device (similar to what I have), our system can be configured to remember the user’s preferred app action. For instance, if I were to say the command “check my balances” Aiqudo suggests actions from both E*Trade AND Wells Fargo. If I choose the E*Trade action, the next time I say the command it will remember E*Trade and perform the action right away – no need to ask again. Likewise, whenever the action requires the user to provide input such as account number or payee, the system can store these away for subsequent use. These are simple examples but add a nice touch to an already-useful integration.
What if I don’t bank with E*Trade? What can I do with other apps and is it safe?
Aiqudo maintains similar actions for apps like Venmo and Paypal that allow “send money to <username>” type actions. In each of these cases, Aiqudo defers authentication to the app before completing the transaction and also ensures that the data used by the action in the app, e.g., the payee’s phone number or email address never leaves the device or the app. The following video illustrates this.
With the proper integration of our ActionKit SDK into a banking app such as E*Trade, the end user reaps the benefits of a trusted, highly- useful voice-powered interface that enables complex and often multi-step operations with ease and reduces Time to Action® for many activities within the app.
Q Actions 2.4 now available on Google Play
The recent release of Q Actions 2.4 emphasizes Aiqudo’s focus on productivity and utility through voice. As voice assistants are becoming an increasingly ubiquitous part of our daily lives, Aiqudo aims to empower users to get things done. Many of the improvements and enhancements are “under the hood” – we’ve increased personalization and expanded the knowledge that drives our Actions.
Our content-rich Q Cards leverage Actionable Knowledge to extend functionality into popular 3rd party apps. Start by asking about an artist, music group, sports athlete, or celebrity: “who is Tom Hanks”. Aiqudo’s Q Card not only presents information about the actor, but will ask “what next?”. You say “view his Twitter account” or “go to his Instagram”, Actionable Knowledge will drop you exactly where you want to go!
Sample Actionable Knowledge Flow:
- Ask “who is Taylor Swift?”
- Select one of the supported Actionable Knowledge apps
- “listen to her on Spotify”
- “go to her Facebook profile”
- “check out her Instagram”
Personalization … with privacy
Q Actions is already personalized, showing you Action choices based on the apps you already trust. We can now leverage personal data as signals to personalize your experience, while still protecting your privacy. It’s another iteration of our continued focus and dedication to increase productivity and augment utility using voice. For example, if you checked in to your United Airlines flight, and then, the following day, say “show my boarding pass”, the United Airlines action is promoted to the top – exactly what you’d expect the system to do for you.
Our new Personal Data Manager allows secure optimization for specific apps. If you have a Spotify playlist called “Beach Vibes”, and you say “play Beach Vibes”, we understand what you want and we will promote your personal playlist over a random public channel by that name. Your playlists are not shipped off the device to our servers, but we can still use the relevant information to short-cut your day! If “Casimo Caputo” is a friend in Facebook Messenger, Messenger will trump WhatsApp for “tell Casimo Caputo let’s meet for lunch”. But “message Mark Smith let’s play Fifa tonight” brings up WhatsApp since Mark Smith is your WhatsApp buddy.
Simply do more with voice! Q Actions is now available on Google Play.
For over 2 years Aiqudo has been leading the charge of deep app integration with voice assistants on Android phones. Today, our Android platform continues to do many things that no other platform can. Now, we’re incredibly proud to announce the latest release of our Q Actions app for iOS. We’ve been working on the latest iOS release for months, and it represents a full suite of actions functionality driven by the new ActionKit SDK for iOS. This new ActionKit is also what iOS developers can use to easily configure voice into their own apps.
iOS is a more restrictive and closed ecosystem than Android. Many of the platform capabilities that Android provides are not available to third-party developers in Apple’s ecosystem. For instance, apps are not allowed to freely communicate with each other, and it’s difficult to determine what apps are installed. Such restrictions challenge digital assistants like Q Actions, which rely on knowledge of a user’s apps to provide relevant results and the ability to communicate with apps in order to automate and execute actions in other apps.
Q Actions for iOS enables app developers to define their own voice experience for their users rather than being subject to the limitations of SiriKit or Siri Shortcuts. Currently, SiriKit limits developers’ ability to expose functionality in Siri, allowing only broad categories that dilute the differentiated app experiences that developers have built. With Q Actions for iOS, brands and businesses will be able to maintain their differentiating features and brand recognition, rather than conform to a generalized category.
With this release, we took a hard look at what was needed to build a comparable experience to what we have on Android. To make it more powerful for iOS app developers, we pushed most of the functionality into the ActionKit SDK. The result is that ActionKit powers all the actions available in the app, allowing developers to offer an equivalent experience in their iOS app. The ActionKit SDK is available for embedding in any iOS app today.
Let’s take a look at what Q Actions and the Aiqudo platform offers right now:
Easily discover actions for your phone
Q Actions helpfully provides an Action Summary with a categorized list of apps and actions for your device. Browse by category, tap on an app to view sample commands, or tap a command to execute the action.
Go beyond Siri
Q Actions supports hundreds of new actions! Watch Netflix Originals or stream live video on Facebook with simple commands like “watch Narcos” or “stream live video”.
True Natural Language
Q Actions for iOS leverages Aiqudo’s proprietary, semiotics-based language modeling system to power support for natural language commands. Rather than the exact match syntax required by Siri Shortcuts, Aiqudo understands the wide variations in commands that consumers use when interacting naturally with their voice. Plus, Aiqudo is multilingual, currently supporting commands in seven languages worldwide.
Content-rich Cards for informational queries
Get access to web results from Bing, translate phrases or look at stock quotes directly from Q Actions. Get rich audio and visual feedback from cards.
There’s still a lot to come! We’ve already shown how Aiqudo can enable a better voice experience in the car. We’ve also seen how voice can help users engage meaningfully with your app. We’re working hard to build a ubiquitous voice assistant platform and this release on iOS gets us one step closer. Stay tuned as we’ll be talking more about some of the challenges of bringing our voice platform to iOS and iOS app developers, and more importantly, how we’re aligned with Apple’s privacy-centric approach.
Do more with Voice
Q Actions 2.0 is here. With this release, we wanted to focus on empowering users throughout their day. As voice is playing a more prevalent part in our everyday lives, we’re uncovering more use cases where Q Actions can be of help. In Q Actions 2.0, you’ll find new features and enhancements that are more conversational and useful.
Aiqudo believes the interaction with a voice assistant should be casual, intuitive, and conversational. Q Actions understands naturally spoken commands and is aware of the apps installed on your phone, so it will only return personalized actions that are relevant to you. When a bit more information is required from you to complete a task, Q Actions will guide the conversation until it fully understands what you want to do. Casually chat with Q Actions and get things done.
- “create new event” (Google Calendar)
- “message Mario” (WhatsApp, Messenger, SMS)
- “watch a movie/tv show” (Netflix, Hulu)
- “play some music” (Spotify, Pandora, Google Play Music, Deezer)
In addition to providing relevant app actions from personal apps that are installed on your phone, Q Actions will now display rich information through Q Cards™. Get up-to-date information from cloud services on many topics: flight status, stock pricing, restaurant info, and more. In addition to presenting the information in a simple and easy-to-read card, Q Cards™ support Talkback and will read aloud relevant information.
- “What’s the flight status of United 875?”
- “What’s the current price of AAPL?”
- “Find Japanese food”
There are times when you need information but do not have the luxury of looking at a screen. Voice Talkback™ is a feature that reads aloud the critical snippets of information from an action. This enables you to continue to be productive, without the distraction of looking at a screen. Execute your actions safely and hands-free.
- “What’s the stock price of Tesla?” (E*Trade)
- Q: “Tesla is currently trading at $274.96”
- “Whose birthday is it today?” (Facebook)
- Q: “Nelson Wynn and J Boss are celebrating birthdays today”
- “Where is the nearest gas station?”
- Q: “Nearest gas at Shell on 2029 S Bascom Ave and 370 E Campbell Ave, 0.2 miles away, for $4.35”
An enhancement to our existing curated Actions Recipes, users can now create Action Recipes on the fly using Compound Command. Simply join two of your favorite actions using “and” into a single command. This allows the users the capability to create millions of Action Recipe combinations from our database of 4000+ actions.
- “Play Migos on Spotify and set volume to max”
- “Play NPR and navigate to work”
- “Tell Monica I’m boarding the plane now and view my boarding pass”
Simply do more with voice! Q Actions is now available on Google Play.
Somewhere in the Android Settings lies the option for you turn on Bluetooth, turn off Wifi, and change sound preferences. These options are usually buried deep under menus and sub-menus. Discoverability is an issue and navigating to the options usually means multiple taps within the Settings app. Yes, there’s a search bar within the Settings app, but it’s clunky, requires typing and only returns exact matches. Some of these options are accessible through the quick settings bar, but discovery and navigation issues still exist.
In the latest release, simply tell Q Actions what System Settings you want to change. Q Actions can now control your Bluetooth, Wifi, music session, and sound settings through voice.
Configure your Settings:
- “turn on/off bluetooth”
- “turn wifi on/off”
Control your music:
- “play next song”
- “pause music”
- “resume my music”
Toggle your sound settings:
- “enable do not disturb”
- “mute ringer”
- “increase the volume”
- “put my phone on vibrate”
In addition to placing calls to your Contacts, Q Actions helps you manage Contacts via voice. Easily add a recent caller as a contact in your phonebook or share a friend’s contact info with simple commands. If you have your contact’s address in your Contacts, you can also get directions to the address using your favorite navigation app.
Place calls to Contacts:
- “call Jason Chen”
- “dial Mario on speaker”
Manage and share your Contacts:
- “save recent number as Mark Johnson”
- “edit Helen’s contact information“
- “share contact info of Daniel Phan”
- “view last incoming call”
Bridge the gap between your Contacts and navigation apps:
- “take me to Rob’s apartment”
- “how do I get to Mike’s house?”
Unlock your phone’s potential with voice! Q Actions is now available on Google Play.
In my first post I discussed how using technology to overcome the challenges of bringing voice control into the mainstream motivated me to get out of bed in the morning. I get a kick out of seeing someone speaking naturally to their device and smiling when it does exactly what they wanted.
In our second post in the series we discussed how Aiqudo has built the the largest (and growing) mobile app action index in the world and our process for on-boarding actions. On-boarding an action only takes minutes – there is no programming involved and we are not reliant on the app developer to set this up or provide an API. This enables enormous scalability of actions compared to the Amazon and Google approaches that rely on a programming solution where developers are required to code to these platforms, add specific intents, and go through a painful approval process.
In this post I wanted to start to elaborate on our overall approach and discuss specifically how we create the large amounts of content for our patented machine learning algorithms to analyze, in order to be able to understand a user’s intent. Ours is a significant achievement since even large teams are facing challenges in solving this problem in a generic fashion – as the following quote from Amazon shows.
“The way we’re solving that is that you’ll just speak, and we will find the most relevant skill that can answer your query … The ambiguity in that language, and the incredible number of actions Alexa can take, that’s a super hard AI problem.” – Amazon
At Aiqudo, we have already solved the challenge that Amazon is working on. Our users don’t have to specify which app to use and we automatically pick the right actions for their command thereby reducing the cognitive load for the user.
The starting point for generating the content we need is the end of the action on-boarding process, when a few sample commands are added to the action. These training commands enable us to start the machine learning processes that enable us to
- extract the correct meaning from the natural language command
- understand the intent; and
- execute the correct action on the best app
The first step in this process is to gather content relating to each command on-boarded (command content). As is typical with machine learning approaches we are data hungry – the more data we have, the better our performance. Therefore we use numerous data repositories specific to on-boarded commands and apps and interrogate them to identify related content that can be used to augment the language used in the command.
Teaching a machine to correctly understand what a user intends from just a few terms in a command is problematic (as it would be for a human) – there isn’t enough context to fully understand the command – e.g. ‘open the window’ – is this a software related command or a command related to a room? Augmenting the command with additional content adds a lot more context for the algorithms to better understand meaning and intent. This augmented content forms the basis of a lexicon of terms relating to each on-boarded command. Later, when we apply our machine learning algorithms this provides the raw data to enable us to build and understand meaning – e.g. we can understand that a movie is similar to a film, rain is related to weather, the term ‘window’ has multiple meanings and so on.
It is equally important that each command’s lexicon is highly relevant to the command and low in noise – for this reason we automatically assess each term within the lexicon to determine its relevance and remove noise. Once we have the low noise lexicon this becomes a final lexicon of terms relating to each command. We then generate multiple command documents from the lexicon for each command. Each command document is generated by selecting terms based on the probability of its occurrence within the command’s lexicon. The more likely a term occurs within the command’s lexicon, the more likely it is to occur in a command document. Note by doing this we are synthetically creating documents which do not make sense to a human, but are a reflection of the probabilities of occurrence of terms in the command’s lexicon. It is these synthetically created command documents which we use to train our machine learning algorithms to understand meaning and intent. Because these are synthetically generated we can also control the number of command documents we create to fine tune the learning process.
Once we have carefully created a relevant command lexicon and built a repository of documents which relate to each command that has been on-boarded, we are ready to analyze the content, identify topics and subtopics, disambiguate among the different meanings words have and understand contextual meaning. Our innovative content augmentation approach allows us to quickly deploy updated machine learned models that can immediately match new command variants, so we don’t have to wait for large numbers of live queries for training as with other approaches.
The really appealing thing about this approach is it is language agnostic – it allows us to facilitate users speaking in any language by interrogating multilingual content repositories. Currently we are live in 12 markets in 7 languages and and are enabling new languages. We’re proud of this major accomplishment in such a short timeframe.
In my next post in this series, I will say a little more about the machine learning algorithms we have developed that have enabled us to build such a scalable, multi-lingual solution.
You often hear the phrase “Going from 0 to 1” when it comes to the accomplishment of reaching a first milestone – an initial product release, the first user, the first partner, the first sale. Here at Aiqudo, I believe our “0 to 1” moment occurred at the end of the summer in 2017 when we reached our aspirational goal of on-boarding a total of 1000 Actions. It was a special milestone for us as we had built an impressive library of actions across a broad category of apps, using simple software tools, in a relatively short time, with only a handful of devs and interns. For comparison, we were only 5 months in operation and already had one tenth the number of actions as that “premier bookseller in the cloud” company. These were not actions for games and trivia – these were high utility actions in mobile apps that were not available in other voice platforms. On top of that, we did it all without a single app developer’s help – no APIs required. That’s right, no outside help!
So how were we able to accomplish this? Quite simply, we took the information we knew about Android and Android apps and built a set of tools and techniques that allowed us to reach specific app states or execute app functions. Our initial approach provided simple record and replay mechanics allowing us to reach virtually any app state that could be reached by the user. Consequently, actions such as showing a boarding pass for an upcoming flight, locating nearby friends through social media or sending a message could be built, tested, and deployed in a matter of minutes with absolutely no programming involved! But we haven’t stopped there. We also incorporate app-specific and system-level intents whenever possible, providing even more flexibility to the action on-boarding process and our growing library of actions including those that control Alarms, Calendar, Contacts, Email, Camera, Messaging and Phone to name a few. With the recent addition of system level actions, we now offer a catalog of very useful actions for controlling various mobile device settings such as audio controls, display orientation and brightness, wifi, bluetooth, flash and speaker volume.
Our actions on-boarding process and global actions library solves the action discovery problem that we described in an earlier post. We do the heavy lifting, so all you need to say is “show my actions”, or “show my actions for Facebook” and get going! And you don’t need to register your credentials to invoke your personal actions.
Today our action library is ~4000 strong and supports 7 languages across 12 locales. Not bad for a company less than a year and a half old! We haven’t fully opened up the spigot either!
Of course, all of this would not be possible without the hard work of the Aiqudo on-boarding team whose job, among other things, is to create and maintain Actions for our reference Q Actions app as well as our partner integrations. The team continues to add new and interesting actions to the Aiqudo Action library and optimize and re-onboard actions as needed to maintain a high quality of service.
Check back with us for a follow-on post where we’ll discuss how our team maintains actions through automated testing.