For over 2 years Aiqudo has been leading the charge of deep app integration with voice assistants on Android phones. Today, our Android platform continues to do many things that no other platform can. Now, we’re incredibly proud to announce the latest release of our Q Actions app for iOS. We’ve been working on the latest iOS release for months, and it represents a full suite of actions functionality driven by the new ActionKit SDK for iOS. This new ActionKit is also what iOS developers can use to easily configure voice into their own apps.
iOS is a more restrictive and closed ecosystem than Android. Many of the platform capabilities that Android provides are not available to third-party developers in Apple’s ecosystem. For instance, apps are not allowed to freely communicate with each other, and it’s difficult to determine what apps are installed. Such restrictions challenge digital assistants like Q Actions, which rely on knowledge of a user’s apps to provide relevant results and the ability to communicate with apps in order to automate and execute actions in other apps.
Q Actions for iOS enables app developers to define their own voice experience for their users rather than being subject to the limitations of SiriKit or Siri Shortcuts. Currently, SiriKit limits developers’ ability to expose functionality in Siri, allowing only broad categories that dilute the differentiated app experiences that developers have built. With Q Actions for iOS, brands and businesses will be able to maintain their differentiating features and brand recognition, rather than conform to a generalized category.
With this release, we took a hard look at what was needed to build a comparable experience to what we have on Android. To make it more powerful for iOS app developers, we pushed most of the functionality into the ActionKit SDK. The result is that ActionKit powers all the actions available in the app, allowing developers to offer an equivalent experience in their iOS app. The ActionKit SDK is available for embedding in any iOS app today.
Let’s take a look at what Q Actions and the Aiqudo platform offers right now:
Easily discover actions for your phone
Q Actions helpfully provides an Action Summary with a categorized list of apps and actions for your device. Browse by category, tap on an app to view sample commands, or tap a command to execute the action.
Go beyond Siri
Q Actions supports hundreds of new actions! Watch Netflix Originals or stream live video on Facebook with simple commands like “watch Narcos” or “stream live video”.
True Natural Language
Q Actions for iOS leverages Aiqudo’s proprietary, semiotics-based language modeling system to power support for natural language commands. Rather than the exact match syntax required by Siri Shortcuts, Aiqudo understands the wide variations in commands that consumers use when interacting naturally with their voice. Plus, Aiqudo is multilingual, currently supporting commands in seven languages worldwide.
Content-rich Cards for informational queries
Get access to web results from Bing, translate phrases or look at stock quotes directly from Q Actions. Get rich audio and visual feedback from cards.
There’s still a lot to come! We’ve already shown how Aiqudo can enable abetter voice experience in the car. We’ve also seen how voice can help usersengage meaningfully with your app. We’re working hard to build a ubiquitous voice assistant platform and this release on iOS gets us one step closer. Stay tuned as we’ll be talking more about some of the challenges of bringing our voice platform to iOS and iOS app developers, and more importantly, how we’re aligned with Apple’s privacy-centric approach.
It’s no secret that a growing number of companies are recognizing the opportunities for new, branded experiences presented by voice interfaces powered by AI. In fact, Gartner predicts that 25 percent of digital workers will use virtual assistants daily by 2021, and brands already using chatbots have seen the number of leads they collect increase by as much as 600 percent over traditional lead generation methods.
These AI-driven voice assistants and chatbots have also become useful cost-cutting tools for companies with large subscriber bases – banks, insurance companies, and mobile phone operators, to name a few. A 2017 Juniper Research report calculates that, for every inquiry handled by a chatbot, banks save four minutes of an agent’s time, which translates to a cost saving of $0.70 per query. These platforms are expected to save banks an estimated $7.3 billion in operational costs by 2023.
The real opportunity presented by voice assistants is in delighting the customer and strengthening brand loyalty, which inevitably drives revenue. We’re entering an exciting time where voice has the ability to redefine the relationship that consumers have with their technology and open up aspects or functionality that the user didn’t previously know — or know they even cared –about.
A 2017 PwC report described chatbots as adding “a new dimension to the power of ‘personal touch’ and massively [enhancing] customer delight and loyalty.”
In my own life, I can’t think of a better example of this than Erica, Bank of America’s AI-driven virtual financial assistant. Working in and following the space for a few years, I am really impressed with what Bank of America has built for its customers in Erica.
Erica caters to the bank’s customer service requirements in a number of ways: sending notifications to customers, providing balance information, sharing money-saving tips, providing credit report updates, facilitating bill payments, and helping customers with simple transactions. Recently, BofA expanded Erica’s capabilities to help clients make smarter financial decisions by providing them with personalized, proactive insight.
For me, instead of calling the BofA customer service 800 number and spending 20 to 30 minutes navigating menus, waiting on hold, or being transferred and repeating the process all over again, I can talk to Erica and quickly complete transactions. Erica averages a mere three minutes time-to-resolution via voice within the app. Think about all the things you could get done in those saved minutes instead, not to mention a break on your blood-pressure medicine.
Another aspect where Erica shines for me is in exposing capabilities within the app that aren’t obvious or are buried deep in the menu structure. One feature I use all the time is the ability to put an international travel notice on my card before I leave the country (so my credit card works overseas) — sometimes I even use it standing in the TSA security line. Another feature I love is being able to find my routing and account numbers quickly and easily by simply asking Erica. Who hasn’t spent valuable time on a fishing expedition in their banking app while hoping the webpage (waiting for automatic payment information) doesn’t time-out first?
The proof of the value of Erica’s voice interface is in the user adoption numbers: just over a year after introduction, Erica has surpassed 7 million users and has handled more than 50 million client requests. And since launching Erica’s proactive insights in late 2018, daily client engagement with Erica has more than doubled. In an interview with American Banker, BofA’s head of digital banking attributes Erica’s strong adoption to its easy-to-use transaction-search functions and financial advice, two areas where the bank continues to focus on harnessing the power of voice to delight its customers.
Thing is, for all of Erica’s benefits for both consumers and BofA, building this kind of voice-activated assistance in-house — from scratch — isn’t fast, easy, or cheap. The Erica development team boasted 100 people in 2017 — before introduction — and has surely grown by now, given her success. And it took those 100 people nearly two years to get Erica ready for prime time, at a cost estimated at $30 million dollars. Why so expensive? As one BofA VP noted, during development, the bank “learned [that] there are over 2,000 different ways to ask us to move money.”
At Aiqudo, we’ve figured out — and operationalized — the technical heavy lifting needed to create a voice assistant: NLU, intent detection, action execution, multiple languages, the analytics platform; there’s no reason for partners to reinvent the wheel. We provide partners with a turnkey voice capability in their app. Developers retain control of this critical new Voice UI (and all of their users’ data) rather than surrendering the direct relationship with their users to voice platforms. Until now, developers have been required to create skills for each voice platform, which risks commoditizing the app and losing the brand they have worked so hard to develop. In contrast, Aiqudo offers a cost-effective solution that allows developers to focus on adding value to their app rather than on customizing for voice.
Disclaimer: Bank of America developed their voice technology without the assistance or use of Aiqudo technology.
The following transcript was taken from a casual conversation with my son.
Son: Dad, what are you working on?
Me: It’s a new feature in our product called “Auto Mode”. We just released it in version 2.1 of our Q Actions App for Android. We even made a video of it. We can watch it after dinner if you’re interested.
Son: The feature sounds cool. What’s it look like?
Me: Well, here. We have this special setting that switches our software to look like the screen in a car. See how the screen is wider than it is tall? Yeah, that’s because most car screens are like that too.
Son: Wait. How do you get your software into cars? Can’t you just stick the tablet on the dashboard?
Me: Humm, not quite. We develop the software so that car makers can combine it with their own software inside the car’s console. We’ll even make it look like they developed it by using their own colors and buttons. I’m showing you how this works on a tablet because it’s easier to demonstrate to other people – we just tell them to pretend it’s the car console. Until cars put our software into their consoles, we’ll make it easy for users to use “Auto Mode” directly on their phones. Just mount the phone on the car’s dash and say “turn on auto mode” – done!
Son: So how do you use it? And what does that blue button with a microphone in it do?
Me: Well, we want anyone in the car to be able to say a command like “navigate to Great America” or “what’s the weather like in San Jose?”or “who are Twenty One-Pilots?”. The button is simply a way to tell the car to listen. When we hear a command, our software figures out what to do and what to show on the console in the car. Sometimes it even speaks back the answer. Now we don’t always want people to have to press the button on the screen so we’ll work with the car makers to add a button on the steering wheel or even a microphone that is always listening for a special phrase such as “Ok, Q” to start.
Son: How does it do that? I mean, the command part.
Me: Good question. Since you’re smart and know a little about software, I’ll keep it short. Our software takes a command and tries to figure out what app or service can best provide the answer. For example, if the command is about showing the route to say, an amusement park like Great America, we’ll ask Google Maps to handle it, which it does really well. Lots of cars come installed with mapping software like Google Maps so it’s best to let them handle those. For other types of commands that ask for information, like “what’s the weather like in San Jose” or “who are Twenty One Pilots”, we’ll send it off to servers in the cloud. They then send us back answers and we format it and display it on the screen – in a pretty looking card like this one.
Me: Sometimes, apps running on our phones can best answer these commands and we use them to handle it.
Son: Wait. Phones? How are phones involved? I only see you using a tablet.
Me: Ahhh. You’ve discovered our coolest feature. We use Apps already installed on your phone. Do you see those rectangle-looking things in the upper right corner of the tablet? The ones with the pictures and names of people? Well, those are phone profiles. They appear when a person connects their phone, running our Q Actions app, to the car’s console through Bluetooth, sort of like you do with wireless earbuds. When connected, our software in the console sends the phone your commands and the phone in turn attempts to execute the command using one of the installed apps. Let me explain with an example. Let’s pretend you track your daily homework assignments using the Google Tasks app on your phone. Now you hop into the car and your phone automatically pairs with the console. Now I asked you to show me your homework assignments. You then press the mic button and say “show my homework tasks”. The software in the console would intelligently route the command to your phone (because Google Tasks is not on the console), open Google Tasks on your phone, grab all your homework assignments and send them back to the console to be displayed in a nice card. Oh, and it would also speak back your homework assignments as well. Let’s see what happens when I tell it to view my tasks.
Son: Big deal. I can just pick up my phone and do that. Why do I need to use voice for that?
Me: Because if you’re the driver, you don’t want to be fumbling around with your phone, possibly getting into an accident! Remember, this is supposed to help drivers with safe, “hands-free” operation. You put your phone in a safe place and our software figures out how to use it to get the answers.
Son:Why can’t the car makers put all these apps in the console so you don’t have to use your phone?
Me: Great question. Most people carry their phones on them at all times, especially when they drive. And these phones have all their favorite apps with all their important personal information stored in them. There’s no way the car makers could figure out which apps to include when you buy the car. And even if you could download these apps onto the console, all your personal information that’s on your phone would have to transferred over to the console, app by app. Clumsy if you ask me. I prefer to keep my information on my phone and private, thank you very much!
Son: Oh. Now I get it. So what else does the software do?
Me:The console can call a family member. If you say “call Dad”, the software looks for ‘dad’ in your phone’s address book and dials the number associated with it. But wait. You’re probably thinking ‘What’s so special about that? All the cool cars do it”. Well, we know that a bunch of apps can make phone calls so we show you which ones and let you decide. Also, If you have two numbers for ‘dad’, say a home and mobile number, the software will ask you to choose one to call. Let’s see how this works when I say “call Dad”.
Me: It asks you to pick an app. I say ‘phone’ and then it asks me to pick a number since my dad has both a home and mobile number. I say ‘mobile’ and it dials the number through my phone.
Son: Cool. But what if I have two people with the same name, like Julie?
Me: It will ask you to pick a ‘Julie’ when it finds more than one. And it will remember that choice next time you ask it to call Julie. See what happens when I want to call Jason. It shows me all the people in my address book who are named Jason along with their phone numbers. If a person has more than one number it will say ‘Multiple’
Son: Wow. What else?
Me: How about sending a message on WhatsApp? Or setting up a team meeting in the calendar. Or joining a meeting from the car if you are running late. Or even checking which one of your friends have birthdays today. All these actions are performed on your phone using the apps you are familiar with and use.
Son: Which app shows you your friends birthdays? That’s kind of neat.
Son: I don’t use Facebook. I use Instagram. It’s way better. Plus all the cool kids use it now.
Me: You get the picture though, right?
Son: So what if all of my friends are in the car with you and we connect to the console? How does the software know where to send the command?
Me: We use the person’s voice to identify who they are and route the command to the right person’s phone automatically.
Son: Really? That seems way too hard.
Me: Not really. Although we haven’t implemented it yet, the technology exists to do this sort of thing today.
Son: Going back to main screen, why does the list of actions under ‘Recent’ and ‘Favorites’ change when you change people?
Me:Oh, you noticed that! Whenever the software switches to a new profile, we grab the ‘Recent’ and ‘Favorites’ sections from that person’s phone and display it in the tablet, er, console. This is our way of making the experience more personalized or familiar to the way the app appears on your phone. In fact, the ‘Favorites’ are like handy shortcuts for frequently used actions, like “call Mom”.
Me: One more thing. Remember the other buttons on the home screen? One looked like a music note, the other a picture for messaging and so on. Well, when you press those, a series of icons appear across the screen, each showing an action that belongs to that group. If your phone had Spotify installed, we would show you a few Spotify actions. If Pandora was installed, we would show you Pandora actions and so on. Check out what happens when I activate my profile. Notice how Pandora appears? That’s because Pandora is on my phone and not on the tablet like Google Play Music and YouTube Music.
Me: Same is true for messaging and calling. Actions from apps installed on your phone would appear. You would simply tap on the icon to run the action. In fact, if you look carefully, you’ll notice that all the actions that show up on the console are also in the ‘My Actions’ screen in the Q Actions app on your Android Phone. Check out what’s on the tablet vs. my phone.
Me: Oh and before I forget, there’s one last item I’d like to tell you about.
Son: What’s that.
Me: Notifications. If you send me a message on WhatsApp, Messenger or WeChat, a screen will popup letting me know I have a message from you. I can listen to the message by pressing a button or respond to the message – by voice, of course, all while keeping my focus on the road. You’ll get the response just as if I had sent it while holding the phone.
Son: Cool. I’ll have fun sending you messages on your way home from work.
Son: Hey, can I try this out on my phone?
Me: Sure. Just download our latest app from the Google Play Store. After you get it installed, goto the Preferences section under Settings and check the box that says ‘Auto Mode’ (BETA). You’ll automatically be switched into Auto Mode on your phone. Now this becomes your console in the car.
Of course, things appear a bit smaller than on your phone than what I’ve shown you on the tablet. Oh, and since you’re not connected to another phone, all the commands you give it will be performed by apps on your phone. Try it out and let me know what you think.
Son: Ok. I’ll play around with it this week.
Me: Great. Now let’s go see what your mom’s made us for dinner.
Q Actions 2.0 is here. With this release, we wanted to focus on empowering users throughout their day. As voice is playing a more prevalent part in our everyday lives, we’re uncovering more use cases where Q Actions can be of help. In Q Actions 2.0, you’ll find new features and enhancements that are more conversational and useful.
Aiqudo believes the interaction with a voice assistant should be casual, intuitive, and conversational. Q Actions understands naturally spoken commands and is aware of the apps installed on your phone, so it will only return personalized actions that are relevant to you. When a bit more information is required from you to complete a task, Q Actions will guide the conversation until it fully understands what you want to do. Casually chat with Q Actions and get things done.
“create new event” (Google Calendar)
“message Mario” (WhatsApp, Messenger, SMS)
“watch a movie/tv show” (Netflix, Hulu)
“play some music” (Spotify, Pandora, Google Play Music, Deezer)
In addition to providing relevant app actions from personal apps that are installed on your phone, Q Actions will now display rich information through Q Cards™. Get up-to-date information from cloud services on many topics: flight status, stock pricing, restaurant info, and more. In addition to presenting the information in a simple and easy-to-read card, Q Cards™ support Talkback and will read aloud relevant information.
“What’s the flight status of United 875?”
“What’s the current price of AAPL?”
“Find Japanese food”
There are times when you need information but do not have the luxury of looking at a screen. Voice Talkback™ is a feature that reads aloud the critical snippets of information from an action. This enables you to continue to be productive, without the distraction of looking at a screen. Execute your actions safely and hands-free.
“What’s the stock price of Tesla?” (E*Trade)
Q: “Tesla is currently trading at $274.96”
“Whose birthday is it today?” (Facebook)
Q: “Nelson Wynn and J Boss are celebrating birthdays today”
“Where is the nearest gas station?”
Q: “Nearest gas at Shell on 2029 S Bascom Ave and 370 E Campbell Ave, 0.2 miles away, for $4.35”
An enhancement to our existing curated Actions Recipes, users can now create Action Recipes on the fly using Compound Command. Simply join two of your favorite actions using “and” into a single command. This allows the users the capability to create millions of Action Recipe combinations from our database of 4000+ actions.
“Play Migos on Spotify and set volume to max”
“Play NPR and navigate to work”
“Tell Monica I’m boarding the plane nowand view my boarding pass”
Simply do more with voice! Q Actions is now available on Google Play.
Wonder why you can’t talk to your apps, and why your apps can’t talk back to you?Stop wondering, as Talkback™ in Q Actions does exactly that. Ask “show my tasks” and the system executes the right action (Google Tasks) and, better yet, tells you what your tasks are – safely and hands-free, as you drive your car.
Driving to work and stuck in traffic?Ask “whose birthday is it today?” and hear the short list of your friends celebrating their birthdays (Facebook). You can then say “tell michael happy birthday” to wish Mike (WhatsApp or Messenger). And if you are running low on gas, just say “find me a gas station nearby” and Talkback™ will tell you where the nearest gas station is and how much you’ll pay for a gallon of unleaded fuel.
When an action or a set of actions require specific input parameters, Directed Dialogue™ allows the user to submit the required information through very simple, natural back-and-forth conversation. Enhanced with parameter validation, and user confirmation,Directed Dialogue™ allows complex tasks to be performed with confidence.Directed Dialogue™isnot about open-ended conversations, butit about getting things done, simply and efficiently.
With Q Actions, Directed Dialogue™ is automatically enabledfor every action in the system because we know the semantic requirements of each and every action’s parameters. It is not constrained, andapplies across all actions across all verticals.
Another application of Directed Dialogue™ is input refinement. Let’s say I want to purchase batteries. If I just say, “add batteries to my shopping cart” I can get the wrong product added to my cart, as on Alexa, which does the wrong thing for a new product order (the right thing happens on a reorder). In the case of Q Actions, I can provide the brand “Duracell” and the type “9V 4 pack” with very simpleDirected Dialogue™, and exactly the right product is added to my cart – in the Amazon or Walmart app.
As the usage model for cars continues to shift away from traditional ownership and leasing to on-demand, ridesharing, and in the future, autonomous vehicle (AV) scenarios, how we think about our personal, in-car experience will need to shift as well.
Unimaginable just a few short years ago, today, we think nothing of jumping into our car and streaming our favorite music through the built-in audio system using our Spotify or Pandora subscription. We also expect the factory-installed navigation system to instantly pull up our favorite or most-commonly used locations (after we’ve entered them) and present us with the best route to or from our current one. And once we pair our smartphone with the media system, we can have our text and email messages not only appear on the onboard screen but also read to us using built-in text-to-speech capabilities. It’s a highly personalized experience in our car.
When we use a pay-as-you-go service, such as Zipcar, we know we’re unlikely to have access to all of the tech comforts of our own vehicle, but we can usually find a way to get our smartphone paired for handsfree calling and streaming music using Bluetooth. If not, we end up using the navigation app on our phone and awkwardly holding it while driving, trying to multitask. It’s not pretty. And when we hail a rideshare, we don’t expect to have access to any of the creature comforts of our own car.
But what if we could?
Just as our relationship to media shifted from an ownership model–CDs or MP3 files on iPods–to subscription-based experiences that are untethered to a specific device but can be accessed anywhere at any time, it’s time to shift our thinking about in-car experiences in the same way.
It’s analogous to accessing your Amazon account and continuing to watch the new season of “True Detective” on the TV at your Airbnb–at the exact episode where you left off last week. Or listening to your favorite Spotify channel at your friend’s house through her speakers.
All your familiar apps (not just the limited Android Auto or Apple CarPlay versions) and your personalized in-car experience–music, navigation, messaging, even video (if you’re a passenger, of course)–will be transportable to any vehicle you happen to jump into, whether it’s a Zipcar, rental car or some version of a rideshare that’s yet to be developed. What’s more, you’ll be able to easily and safely access these apps using voice commands. Whereas today our personal driving environment is tied to our own vehicle, it will become something that’s portable, evolving as our relationship to cars changes over time.
Just on the horizon of this evolution in our relationship with automobiles? Autonomous vehicles, or AVs, in which we become strictly a passenger, perhaps one of several people sharing a ride. Automobile manufacturers today are thinking deeply about what this changing relationship means to them and to their brands. Will BMW become “The Ultimate Riding Machine?”(As a car guy, I personally hope not!) And if so, what will be the differentiators?
Many car companies see the automobile as a new digital platform, for which each manufacturer creates its own, branded, in-car digital experience. In time, when we hail a rideshare or an autonomous vehicle, we could request a Mercedes because we know that we love the Mercedes in-car digital experience, as well as the leather seats and the smooth ride.
What happens if we share the ride in the AV, because, well, they are rideshare applications after all? The challenge for the car companies becomes creating a common denominator of services that define that branded experience while still enabling a high degree of personalization. Clearly, automobile manufacturers don’t want to become dumb pipes on wheels, but if we all just plug in our headphones and live on our phones, comfy seats alone aren’t going to drive brand loyalty for Mercedes. On the other hand, we don’t all want to listen to that one guy’s death metal playlist all the way to the city.
The car manufacturers cannot create direct integrations to all services to accommodate infinite personalization. In the music app market alone there are at least 15 widely used apps, but what if you’re visiting from China? Does your rideshare support China’s favorite music app, QQ? We’ve already made our choices in the apps we have on our phones, so transporting that personalized experience into the shared in-car experience is the elegant way to solve that piece of the puzzle.
This vision of the car providing a unique digital experience is not that far-fetched, nor is it that far away from becoming reality. It’s not only going to change our personal ridesharing experience, but it’s also going to be a change-agent for differentiation in the automobile industry.
Somewhere in the Android Settings lies the option for you turn on Bluetooth, turn off Wifi, and change sound preferences. These options are usually buried deep under menus and sub-menus. Discoverability is an issue and navigating to the options usually means multiple taps within the Settings app. Yes, there’s a search bar within the Settings app, but it’s clunky, requires typing and only returns exact matches. Some of these options are accessible through the quick settings bar, but discovery and navigation issues still exist.
In the latest release, simply tell Q Actions what System Settings you want to change. Q Actions can now control your Bluetooth, Wifi, music session, and sound settings through voice.
Configure your Settings:
“turn on/off bluetooth”
“turn wifi on/off”
Control your music:
“play next song”
“resume my music”
Toggle your sound settings:
“enable do not disturb”
“increase the volume”
“put my phone on vibrate”
In addition to placing calls to your Contacts, Q Actions helps you manage Contacts via voice. Easily add a recent caller as a contact in your phonebook or share a friend’s contact info with simple commands. If you have your contact’s address in your Contacts, you can also get directions to the address using your favorite navigation app.
Place calls to Contacts:
“call Jason Chen”
“dial Mario on speaker”
Manage and share your Contacts:
“save recent number as Mark Johnson”
“edit Helen’s contact information“
“share contact info of Daniel Phan”
“view last incoming call”
Bridge the gap between your Contacts and navigation apps:
“take me to Rob’s apartment”
“how do I get to Mike’s house?”
Unlock your phone’s potential with voice! Q Actions is now available on Google Play.
You often hear the phrase “Going from 0 to 1” when it comes to the accomplishment of reaching a first milestone – an initial product release, the first user, the first partner, the first sale. Here at Aiqudo, I believe our “0 to 1” moment occurred at the end of the summer in 2017 when we reached our aspirational goal of on-boarding a total of 1000 Actions. It was a special milestone for us as we had built an impressive library of actions across a broad category of apps, using simple software tools, in a relatively short time, with only a handful of devs and interns. For comparison, we were only 5 months in operation and already had one tenth the number of actions as that “premier bookseller in the cloud” company. These were not actions for games and trivia – these were high utility actions in mobile apps that were not available in other voice platforms. On top of that, we did it all without a single app developer’s help – no APIs required. That’s right, no outside help!
So how were we able to accomplish this? Quite simply, we took the information we knew about Android and Android apps and built a set of tools and techniques that allowed us to reach specific app states or execute app functions. Our initial approach provided simple record and replay mechanics allowing us to reach virtually any app state that could be reached by the user. Consequently, actions such as showing a boarding pass for an upcoming flight, locating nearby friends through social media or sending a message could be built, tested, and deployed in a matter of minutes with absolutely no programming involved! But we haven’t stopped there. We also incorporate app-specific and system-level intents whenever possible, providing even more flexibility to the action on-boarding process and our growing library of actions including those that control Alarms, Calendar, Contacts, Email, Camera, Messaging and Phone to name a few. With the recent addition of system level actions, we now offer a catalog of very useful actions for controlling various mobile device settings such as audio controls, display orientation and brightness, wifi, bluetooth, flash and speaker volume.
Our actions on-boarding process and global actions library solves the action discovery problem that we described in an earlier post. We do the heavy lifting, so all you need to say is “show my actions”, or “show my actions for Facebook” and get going! And you don’t need to register your credentials to invoke your personal actions.
Today our action library is ~4000 strong and supports 7 languages across 12 locales. Not bad for a company less than a year and a half old! We haven’t fully opened up the spigot either!
Of course, all of this would not be possible without the hard work of the Aiqudo on-boarding team whose job, among other things, is to create and maintain Actions for our reference Q Actions app as well as our partner integrations. The team continues to add new and interesting actions to the Aiqudo Action library and optimize and re-onboard actions as needed to maintain a high quality of service.
Check back with us for a follow-on post where we’ll discuss how our team maintains actions through automated testing.
A while back a friend bought an Alexa speaker. He was so excited about the prospects of speaking to his device and getting cool things done without leaving the comfort of his chair. A few weeks later when I next saw him I asked how he was getting on with it and his reply was very insightful and typical of the problems current voice platforms pose.
Initially when he plugged it in, after asking the typical questions everyone does (‘what is the weather’ and ‘play music by Adele’) he set about seeing what other useful things he could do. He quickly found out that it wasn’t easy to find out what 3rd party skills were integrated with Alexa (I call this the action discovery problem). When he found a resource to provide this information he went about adding skills – local news headlines, a joke teller, Spotify (requiring registration), quiz questions and so on. Then he hit his next problem – in order to use these skills he had to learn a very specific set of commands in order to execute the functionality. This was fine for two or three skills but it very soon became overwhelming. He found himself forgetting the precise language to use for each specific skill and soon became frustrated (the cognitive load problem).
Last week when I saw him again he had actually given the speaker to his son who was using it as a music player in his bedroom. Once the initial ‘fun’ of the device wore off it became apparent that there was very little real utility from it for him. While some skills had value it was painful to find out about them in the first place, add them to Alexa and then remember the specific commands to execute them…
The reason I found this so interesting was that these are precisely the problems we have solved at Aiqudo. Our goal is to provide consumers a truly natural voice interface to actions, starting with all the functionality in their phone apps, without having to remember specific commands needed to execute them. For example if I want directions to the SAP centre in San Jose to watch the Sharks I might say, ‘navigate to the SAP Centre’, ‘I want to drive to the SAP Centre’ or ‘directions to the SAP Centre’. Since a user can use any of these commands, or other variants, they should all just work. Constraining users to learn the precise form of a command just frustrates them and provides a poor user experience. In order to leverage the maximum utility from voice, we need to understand the meaning and intent behind the command irrespective of what the user says and be able to execute the right action.
So how do we do it?
This is not a simple answer, so we plan to cover the main points in a series of blog posts over the coming weeks. These will focus at a high level on the processes, the technology, the challenges and the rationale behind our approach. Our process has 2 main steps.
Understand the functionality available in each app and on-board these actions into our Action Index
Understand the intent of a user’s command and subsequently, automatically execute the correct action.
In step 1, by doing the ‘heavy lifting’ and understanding the functionality available within the app ecosystem for users, we overcome the action discovery problem my friend had with his Alexa speaker. Users can simply say what they want to do and we find the best action to execute automatically – the user doesn’t need to do anything. In fact if they don’t have an appropriate app on their device for the command they have just issued we actually recommend it to them and they can install it!
Similarly in step 2, by allowing users the freedom to speak naturally and choose whatever linguistic form of commands they wish, we overcome the second problem with Alexa – the cognitive load problem – users no longer have to remember very specific commands to execute actions. Voice should be the most intuitive user interface – just say what you want to do. We built the Aiqudo platform to understand the wide variety of ways users might phrase their commands, allowing users to go from voice to action easily and intuitively. And did I mention that the Aiqudo platform is multilingual, enabling natural language commands in any language the user chooses to speak in.
So getting back to my initial question – what motivates me to get out of bed in the morning? – well, I’m excited to use technology to bring the utility of the entire app ecosystem to users all over the world so they can speak naturally to their devices and get stuff done without having to think about it!
Intern Voice: Kenny Kang describes the "new normal" for internships.
"As a comparison, my summer roommate interned for a larger corporate company and we developed 2 completely different ideas of a what a ‘normal’ working environment is."