Voice Announcements

Q Actions for Android: Giving App Notifications a Voice and You More Options

By App Actions, Digital Assistants, Knowledge, Voice Search No Comments

We’re ending a crazy 2020 with something sweet – the release of Q Actions 2.5! In our latest version, we’re proud to announce a couple of unique and useful features: Voice Announcements and new Parameter Options. 

Voice Announcements

Now, app notifications have a Voice! We’ve made it super-simple for your apps to talk to you, and for you to follow up … hands free! The Voice Announcements feature gives you full control – you decide which app notifications are announced, and when. You can select one of our preset time profiles like Work “8:00AM – 5:00PM” or create your own. Specific Voice Announcements for incoming Calls, Texts, WhatsApp, and Twitter allow you to act upon the notification with a follow-on action. We empower users to simply do more with voice. Check out the video below to see hands free Voice Announcements in action.

 

Parameter Options

Q Actions can help users select from among multiple valid options for an action using voice. Our Custom Knowledge knows what actions and content are available on a particular app or service. For example, Q Actions knows an awful lot about popular movies, TV series and music.  So, the next time you feel like watching Star Wars, we can give you a list of titles relevant to the app you’ve selected. Of course, if you know exactly what you want, simply tell Q Actions and it’ll take you straight to that title.

Parameter Options

Parameter Options

Do more with voice hands free! Q Actions is now available on Google Play.

 

Phillip Lam, Kiran Hemaraj and Sunil Patil

Privacy

Personalization without Compromising Privacy

By App Actions, Digital Assistants, Personalization, Privacy No Comments

If you’ve been following technology news recently, you might have heard that there’s a privacy war brewing.  It should also come as no surprise that the digital assistants you use on a daily basis know a terrifying amount of information about you.  At the same time, there’s no arguing that some of this is ultimately useful to you, as a consumer.  This personal information is used to enable phone calls to your loved ones, or to take you to the right address when you navigate “home”.

As consumers become more privacy-conscious, however, they’re starting to ask if perhaps they’re giving more than they’re getting.  Where do you draw the line?  You might be fine with letting Amazon get access to your calendar, but what about your Spotify password, or to your online banking account?

At Aiqudo, we care deeply about user privacy and providing utility.  Often, this means we need to work that much harder at things that may seem easy or trivial for other digital assistants because they have all this access to your data.  Let’s look at a few of the ways that Aiqudo is able to deliver personal and private Actions for your mobile device.

The first and simplest way that Aiqudo can guarantee privacy is by simply not collecting the data in the first place.  For example, we don’t require you to create an account, or to give us credentials to access any of the apps or services available on our platform – you use your trusted apps as you normally would, e.g., with biometric authentication.  The only information Aiqudo collects is what apps the user has installed on their device, and what the user says, i.e., user commands.  The former is used to personalize and filter our Action results to what is most relevant to that particular user and device.  In addition, Aiqudo uses a randomized identifier to track a user within our system.  This is not tied to any personal information like an email address or phone number.  This identifier is also unique to the Q Actions application, which means that user data from other applications cannot be correlated to Aiqudo user activity either.

What happens to the data that Aiqudo does collect?  Ultimately, only aggregated data is stored for the long term.  This data is valuable to understand what kinds of queries users ask, or when we may have incorrectly classified an intent.  We do not use this data to track queries made by an individual user, or create a user profile. Aiqudo is GDPR compliant.

Aiqudo’s Private Data Manager 

Private Data Manager

However in some cases, we need to know a little bit more about you.  If you’re trying to send your TPS report to Bill, we’d like to be able to identify the right contact to send that critical document to.  So while you may notice that we do ask for access to things like your calendar, or your contact list in the Q Actions app, it’s important to know that we never send this information to our servers.  Instead, what we do is annotate user queries with hints to indicate that a certain word or phrase matches a local contact or meeting name.  This improves the accuracy of our intent matching without requiring direct access to personal or private information.

This approach is simple, but very powerful.  We’ve also added the ability to send hints about previous Actions that a user has run, and their input or output.  For example, if you searched for Chinese restaurants nearby, we might store the resultant list of restaurants on your device.  Then, if you follow up by telling Q Actions “take me to the second one”, we know which restaurant you’re talking about and can start turn-by-turn directions to that address.

That’s not all we can do.  A business has a lot more information. Sometimes we get review ratings or a phone number in addition to an address.  We can search this information locally when you refer to previous actions that you’ve taken, or when starting a new interaction with Q Actions.  This means we can take that restaurant and send its address to a friend.  Or we can generate options when you say something like, “get me in touch with someone in the Engineering department”.

Another really powerful thing that we can do with our Private Data Manager is understand some of the relevant data in your apps (with your permission, of course), e.g., your Spotify playlists. So if you say “play Calming Acoustic”, which happens to refer to one of your favorite playlists, we kick off this action in Spotify (not Pandora) without you having to explicitly say so; this information stays safe on your device, and within your trusted apps.

Personalized Actions

Personalized Actions

We’ve talked about how this works with simple, everyday examples, but the functionality we’ve built means we have the unique ability to work with privacy-conscious or sensitive applications in verticals like finance, or healthcare.  Partners also have the ability to import structured data into the Action Kit (SDK) on the client.  This data is searched whenever a user makes a request, and the user query is annotated with hints, just like contacts or other built in data types.  Partners have full control over what is stored, or when it is updated.

I hope this gives you a better understanding of how we treat private data.  As a company, we firmly believe that users should be able to control the flow of their data, and not feel like it’s being taken hostage because of a handful of useful or maybe even critical features that they have on their phone.  Most users don’t fully understand what data is being collected, or how it can be used in the wrong hands.  It’s our job to educate and put in place sensible safeguards that restrict the flow of private data while still being able to deliver the same level of utility.  We’ve shown that with the right kind of thinking and a little (or a lot) of elbow grease, this is possible, and consumers should demand nothing less.

Voice for the Connected Worker

Natural Voice Recognition for Safety and Productivity in Industrial IoT

By Asset Management, Natural Language, Uncategorized, User Interface, Voice No Comments
Voice for AssetCare

Voice for AssetCare

Jim Christian

Jim Christian, Chief Technology and Product Officer, mCloud

It is estimated that there are 20 million field technicians operating worldwide. A sizable percentage of those technicians can’t always get to information they need to do their jobs. 

Why is that? After all, we train technicians, provide modern mobile devices and specialized apps, and send them to the field with stacks of information and modern communications. Smartphone sales in the United States grew from $3.8 billion in 2005 to nearly $80 billion in 2020. So why isn’t that enough?

One problem is that tools that work fine in offices don’t necessarily work for field workers. A 25-year old mobile app designer forgets that a 55-year old field worker cannot read small text on a small screen or see with the glare of natural light. In industrial and outdoor settings field workers frequently wear gloves and other protective gear. Consider a technician who needs to enter data on a mobile device while outside in freezing weather. This worker could easily choose to wait to enter data until he’s back in his truck and can take off his gloves, and as a result, not entering the data exactly right. Or a technician may need to wear gloves and may find it difficult to type on a mobile device.

A voice-based interface can be a great help in these situations. Wearable devices that respond to voice are becoming more common. For instance, RealWear makes a headset that is designed to be worn with a hardhat, and one model is intrinsically safe and can be used in hazardous areas. But voice interfaces have not become popular in industrial settings. Why is that?

We could look to the OODA loop–short for Observe, Orient, Decide, and Act–for insights. The OODA concept was developed by the U.S. Air Force as a mental model for fighter pilots. Fighter pilots need to act quickly. Understanding the OODA loop that applies in a particular situation is helpful to improve, to act more quickly and decisively. Field technicians don’t have life-and-death situations to evaluate, but the OODA loop still applies. The speed and accuracy of their work depends on their OODA loop for the task at hand.

Consider two technicians who observe an unexpected situation, perhaps a failed asset. John orients himself by taking off his gloves to call his office, then searches for drawings in his company’s document management, and then calls his office again to confirm his diagnosis. Meanwhile, Jane orients herself by doing the same search, but talking instead of typing, keeping her eyes on the asset all the time. Assuming that the voice system is robust, Jane is able to use her eyes and her voice at the same time, accelerating her Observe and Orient phases. Jane will do a faster, better job. A system where the Observe and Orient phases are difficult–John’s experience–can be inferior and will be rejected by users, whereas Jane’s experience with a short, easy OODA loop will be acceptable.

A downside of speaking to a device is that traditional voice recognition systems can be painfully slow and limited. These systems recognize the same commands that a user would type or click with a mouse, but most people type and click much faster than they talk. Consider the sequence of actions required to take a picture and send it to someone on a smartphone using your fingers: open photo app, take a picture, close the photo app, open the photo gallery app, select the picture, select the open photo to share that picture, select a recipient, type a note, and hit send. That could be nine or ten distinct operations. Many people can do this rapidly with their fingers, even if it is a lot of steps. Executing that same sequence with an old-style, separate voice command for each step would be slow and painful and most people would find it worse than useless. 

The solution is natural voice recognition, where the voice system recognizes what the speaker intends and understands what “call Dave” means. Humans naturally understand that a phrase such as “call Dave” is shorthand for a long sequence (“pick up the phone”, “open the contact list”, “search for ‘Dave'”, etc.).  Natural voice recognition has come a long way in recent years and systems like Siri and Alexa have become familiar for personal use. Field workers often have their own shorthand for their industry, like “drop the transmission” or “flush the drum”, which their peers understand but Siri or Alexa don’t.

At mCloud, we see great potential in applying natural voice recognition to field work in industries such as oil & gas. Consider a field operator who is given a wearable device with a camera and voice control, and who is able to say things like, “take a picture and send it to John” or “take a picture, add a note ‘new corrosion under insulation at the north pipe rack’ and send to Jane” or “give me a piping diagram of the north pipe rack.”  This worker will have no trouble accessing useful information, and in using that information to orient himself to make good decisions. An informed field operator will get work done faster, with less trouble, and greater accuracy.

The U.S. Chemical Safety Board analyzes major safety incidents at oil & gas and chemical facilities. A fair number of incidents have a contributing factor of field workers not knowing something or not having the right information. For instance, an isobutane release at a Louisiana refinery in 2016 occurred in part when field operators used the wrong procedure to remove the gearbox on a plug valve. There was a standard procedure but about 3% of the plug valves in the refinery were an older design that required different steps to remove the gearbox. This is an example where the field workers were wearing protective gear and followed the procedure that was correct for a different type of valve, wrong for the valve in front of them. Field workers like this generally have written procedures, but occasionally the work planner misses something or reality in the field is different than what was expected. This  means that field workers need to adapt, perhaps by calling for help or looking up information such as alternate procedures.

Examples where natural voice recognition can help include finding information, calling other people for advice, recording measurements and observations, inspecting assets, stepping through repair procedures, describing the state of an asset along with recommendations and questions, writing a report about the work done, and working with other people to accomplish tasks. Some of these examples are ad hoc tasks, like taking a picture or deciding to call someone. Other examples are part of larger, structured jobs. An isolation procedure in a chemical plant or replacing a transmission are examples of complex procedures with multiple steps that can require specialized information or where unexpected results from one step may require the field worker to get re-oriented, find new information, or get help.

Aiqudo has powerful tech for natural voice recognition and mCloud is pleased to be working with Aiqudo to apply this technology. Working together, we can help field workers get what they need by simply asking for it, talk to the right people by simply asking for help, confirm their status in a natural way, and in general get the right job done, effectively and without mistakes.


This post is authored by Jim Christian, Chief Technology and Product Officer, mCloud.

Aiqudo and mCloud recently announced a strategic partnership that brings natural Voice technology into emerging devices, such as smart glasses, to support AR/VR and connected worker applications, and also into new domains such as industrial IoT, remote support and healthcare.

AI Neural Networks

Enhancing Aiqudo’s Voice AI Natural Language Understanding with Deep Learning

By Artificial Intelligence, Deep Learning, Machine Learning, Natural Language, Neural Networks No Comments

Aiqudo provides the most extensive library of voice-triggered actions for mobile apps and other IOT devices. At this difficult time of Covid-19, voice is becoming mandatory as more organizations are seeing the need for contactless interactions. To further improve the performance of Aiqudo voice, we enhanced our unique Intent Matching using Semiotics with Deep Learning (DL) for custom Named Entity Recognition (NER) and Part of Speech (POS) Tagging. 

The task in question was to recognize the relevant Named Entities from user’s commands. This specific task is known as Named Entity Recognition (NER) in the Natural Language Processing (NLP) community. For example, ‘play Adele on Youtube’ involves two named entities, ‘Adele’ and ‘Youtube’. Extracting both entities correctly is critical for understanding the user’s intent, retrieving the right app and executing the correct action. Publicly available NER tools, such as NLTK, Spacy and Stanford NLP proved unsuitable for our purposes for the following reasons:

  1. they often made mistakes especially when processing short sentences typically seen in user commands
  2. they make mistakes such as labelling ‘Youtube’ as an ‘Organization’ and ‘Adele’ as ‘Person’, as opposed to the entity types we need within this command context  – which is ‘App’ and ‘Artist’.
  3. these tools don’t provide us with the granularity we need. As we support a very broad set of verticals or domains, our granularity needs for parameter types is very high – we need to identify almost 70 different parameter types in total (and this continues to grow). It’s not enough for us to identify a parameter as an “Organization”; we need to know if it is a “Restaurant”, “Business” or a “Stock ticker”

Part of Speech (POS) tagging is another essential aspect for both NER detection and action retrieval, but, again, public POS taggers such as NLTK, Spacy and Stanford NLP don’t work well for short commands. The situation gets worse for verbs such as ‘show’, ‘book’, ‘email’, ‘text’, which are normally regarded as nouns by most existing POS taggers. We, therefore, needed to develop our own custom NER module that also facilitates and produces more accurate POS information.

Fortunately, we already had a database of 13K+ commands relating to actions already in our platform and this provided the training data to build an integrated DL model. Example commands (with parameters extracted) in our database included ‘play $musicQuery on $mobileApp’ and, Show my $shoppingList, Navigate from $fromLocation to $toLoaction, etc. (Our named entity types start with ‘$’) For each entity, we created a number of realistic values, such as ‘grocery list’ and ‘DIY list’ for ‘$shoppingList’, and ‘New York” and ‘Atlanta’ for ‘$fromLocation’. We created around 3.7 million instantiated queries, e.g., ‘play Adele on Youtube’,Show my DIY list, and Navigate from New York to Atlanta’. We then used existing POS tools to label all words, chose the most popularly labelled POS pattern for each template, and finally labelled each relevant query accordingly. 

To make the data understandable to a neural network, we then needed to represent each word or token digitally, i.e. as vectors of certain dimensions. This is called Word Embedding. We tried several embedding methods, including Transformer tokenizer, Elmo, Google 300d, GloVe, and random embeddings of different dimensions. A pre-trained transformer produced the best results but required the most expensive computing resources such as a GPU. Elmo produced the 2nd best results but also needed a GPU for efficient computing time. Random embeddings of 64 dimensions work well on a CPU and can produce good results comparable to Elmo, while also being less expensive. Such tradeoffs are critical when you go from a theoretical AI approach to rolling AI techniques into production at scale. 

Our research and experiments were based on the state-of-the-art DL NER architecture of a residual Bidirectional LSTM. We integrated two relevant tasks: POS tagging and multi-label multi-class classification for potential entity types. Therefore, our present solution is a multi-inputs multi-outputs DL model. The neural architecture and data flow are illustrated in Fig. 1. The input module takes users’ speech and transforms it into text; the embedding layer represents the text in a sequence of vectors; the two bidirectional layers capture important recurrent patterns in the sequence; the  residual connection restores some lost features; these patterns and features are then used for labelling named entities and creating POS tags; or are flattened to make global classification for entity (parameter) types.

Deep Learning Architecture

Fig. 1 Neural architecture for Aiqudo Multitask Flow

One real life scenario would be as follows: A user wants to greet his friend Rodrigo on Whatsapp. He issues the following command verbally to his phone ‘Whatsapp text Rodrigo good morning’ (not a well-formed command, but this is common in practice). Each word in his speech is then mapped to a token integer, by which a 64 dimensional vector is indexed; the digital representation of all vectors goes through the neural network of two bidirectional LSTM layers and one residual connection layer; the network outputs parameter and value pairs and POS tags in time series; and the network is flattened on another branch and outputs parameter types. Our platform now has all the information needed to pass on to the next Natural Language Understanding (NLU) component in our system (see Figure 2), to fully understand the user’s intent and execute the correct action for them.

Online Intent Pipeline

Fig. 2 Aiqudo Online Intent Pipeline

Before we could go live in production, we needed to test the performance of the pipeline thoroughly. We devised 600k test scenarios that spanned 114 parameter distributions covering a range of command lengths from very short 2-term commands to much longer 15-term commands. We also focused on out-of-vocabulary parameter terms (terms that do not occur in the training data such as names of cities and movies for example) to ensure that the model could also handle these. 

Analysis of this approach in conjunction with the Aiqudo platform showed how it improved platform performance: The general entity recall ratio increased by over 10%. This integrated multitask model specifically fits well with Aiqudo’s requirements:

  1. The model was trained on our own corpus and produces entities and POS tags compatible with our on-boarded mobile app commands
  2. The three relevant tasks share most hidden layers and better weight optimization can therefore be achieved very efficiently
  3. The system can be easily adapted to newly on-boarded actions by expanding or adjusting the training corpus and/or annotating tags
  4. The random embedding model runs fast enough even on CPUs and produces much better results than publicly available NLP tools

We plan to continue to use DL where appropriate within our platform to complement and augment our existing Semiotics-based NLU engine. Possible future work includes: 

  1. extending the solution for any other languages (our system has commands on-boarded in several languages to use for training)
  2. tagging information and multi-label outputs haven’t been explicitly utilized as yet; we plan to leverage this information to further improve NER performance 
  3. the DL model can be further expanded by integrating it with other subtasks such as predicting relevant mobile apps from commands and/or actions. 

This powerful pipeline employing this flexible combination of Semiotics, Deep Learning and Grammar-based algorithms will offer more powerful Aiqudo voice services in the future. 

Xiwu Han, Hudson Mendes and David Patterson – Aiqudo R&D

Covid Information

QTime: What I Learned as an Aiqudo Intern

By App Actions, Startup Culture, Uncategorized, Voice No Comments

Mithil Chakraborty

Intern Voice: Mithil Chakraborty

Hi! My name is Mithil Chakraborty and I’m currently a senior at Saratoga High School. During the summer of 2020, I had the privilege of interning at Aiqudo for 6 weeks as a Product Operations intern. Although I had previously coded in Java, HTML/Javascript, and Python, this was still my first internship at a company. Coming in, I was excited but a bit uncertain thinking that I would not be able to fully understand the core technology (Q System) or how the app’s actions are created. But even amidst the COVID-19 Pandemic, I learned a tremendous amount about not only on boarding and debugging actions, but how startups work; the drive from each of the employees was admirable and really stood out to me. As the internship progressed, I felt like a part of the team. Phillip, Mark, and Steven did a great job making me feel welcome and explaining the Q Tools program, Q App, and on boarding procedures. 

As I played around with the app, I realized how cool the capabilities were. During the iOS stage of my internship, I verified and debugged numerous iOS Q App actions and contributed to the latest release of the iOS Q Actions app. From there, I researched new actions to on board for Android, focusing on relevant information and new apps. As a result, I proposed actions that would display COVID-19 information in Facebook and open Messenger Rooms. Through this process, I learned how to implement Voice Talkback too for the Facebook COVID-19 info action, using Android Device Monitor and Q Tools. The unique actions I finally on boarded included:

  • “show me coronavirus info” >> talks back first 3 headlines in COVID-19 Info Center Pane on Facebook 
  • “open messenger rooms” >> creates and opens a Messenger Room

Covid Information

Users don’t have to say an exact phrase in order for the app to execute the correct action; the smart AI-based intent matching system will only run the relevant actions from Facebook or Messenger based on the user’s query.  The user does not even have to mention the app by name – the system picks the right app automatically.

When these actions finally got implemented, it felt rewarding to see my work easily accessible on smartphones; thereafter, I told my friends and family about the amazing Q Actions app so they could see my work. Throughout my Aiqudo internship, the team was incredibly easy to talk to and they always encouraged questions. It showed me the real-life applications of software engineering and AI, which I hadn’t been exposed to before, and the importance of collaboration and perseverance, especially when I was debugging pesky actions for iOS. This opportunity taught me in a hands-on way the business and technical skills needed for a startup like Aiqudo to be nimble and successful, which I greatly appreciated. Overall, my time at Aiqudo was incredibly memorable and I hope to be back soon.

Thank you Phillip, Mark, Steven, Rajat and the rest of the Aiqudo team for giving me this valuable experience this summer! 

AssetCare

mCloud Brings Natural Language Processing to Connected Workers through Partnership with Aiqudo

By Artificial Intelligence, Asset Management, News, Press, Uncategorized, User Interface, Voice No Comments

CANADA NEWSWIRE, VANCOUVER, OCTOBER 1, 2020

mCloud Technologies Corp. (TSX-V: MCLD) (OTCQB: MCLDF) (“mCloud”   or the “Company”), a leading provider of asset management solutions combining IoT, cloud computing, and artificial intelligence (“AI”), today announced it has entered into a strategic partnership with Aiqudo Inc. (“Aiqudo”), leveraging Aiqudo’s Q Actions® Voice AI platform and Action Kit SDK to bring new voice-enabled interactions to the Company’s AssetCare™️ solutions for Connected Workers.

By combining AssetCare with Aiqudo’s powerful Voice to Action® platform, mobile field workers will be able to interact with AssetCare solutions through a custom digital assistant using natural language.

“mCloud’s partnership with Aiqudo provides AssetCare with a distinct competitive edge as we deliver AssetCare to our oil and gas, nuclear, wind, and healthcare customers all around the world. Connected workers will benefit from reduced training time, ease of use, and support for multiple languages” 

In the field, industrial asset operators and field technicians will be able to communicate with experts, find documentation, and pull up relevant asset data instantly and effortlessly. This will expedite the completion of asset inspections and operator rounds – an industry-first using hands-free, simple, and intuitive natural commands via head mounted smart glasses. Professionals will be able to call up information on-demand with a single natural language request, eliminating the need to search using complex queries or special commands.

Here’s a demonstration of mCloud’s AssetCare capabilities on smart glasses with Aiqudo.

“mCloud’s partnership with Aiqudo provides AssetCare with a distinct competitive edge as we deliver AssetCare to our oil and gas, nuclear, wind, and healthcare customers all around the world,” said Dr. Barry Po, mCloud’s President, Connected Solutions and Chief Marketing Officer. “Connected workers will benefit from reduced training time, ease of use, and support for multiple languages.”

“We are excited to power mCloud solutions with our Voice to Action platform, making it easier for connected workers using AssetCare to get things done safely and quickly,” said Dr. Rajat Mukherjee, Aiqudo’s Co-Founder and CTO. “Our flexible NLU and powerful Action Engine are perfect for creating custom voice experiences for applications on smart glasses and smartphones.”

Aiqudo technology will join the growing set of advanced capabilities mCloud is now delivering by way of its recent acquisition of kanepi Group Pty Ltd. (“kanepi”). The Company announced on September 22 it expected to roll out new Connected Worker capabilities to 1,000 workers in China by the end of the year, targeting over 20,000 in 2021.

BUSINESSWIRE:  mCloud Brings Natural Language Processing to Connected Workers through Partnership with Aiqudo

Official website: www.mcloudcorp.com  Further Information: mCloud Press 

About mCloud Technologies Corp.

mCloud is creating a more efficient future with the use of AI and analytics, curbing energy waste, maximizing energy production, and getting the most out of critical energy infrastructure. Through mCloud’s AI-powered AssetCare™ platform, mCloud offers complete asset management solutions in five distinct segments: commercial buildings, renewable energy, healthcare, heavy industry, and connected workers. IoT sensors bring data from connected assets into the cloud, where AI and analytics are applied to maximize their performance.

Headquartered in Vancouver, Canada with offices in twelve locations worldwide, the mCloud family includes an ecosystem of operating subsidiaries that deliver high-performance IoT, AI, 3D, and mobile capabilities to customers, all integrated into AssetCare. With over 100 blue-chip customers and more than 51,000 assets connected in thousands of locations worldwide, mCloud is changing the way energy assets are managed.

mCloud’s common shares trade on the TSX Venture Exchange under the symbol MCLD and on the OTCQB under the symbol MCLDF. mCloud’s convertible debentures trade on the TSX Venture Exchange under the symbol MCLD.DB. For more information, visit www.mcloudcorp.com.

About Aiqudo

Aiqudo’s Voice to Action® platform voice enables applications across multiple hardware environments including mobile phones, IoT and connected home devices, automobiles, and hands-free augmented reality devices.  Aiqudo’s Voice AI comprises a unique natural language command understanding engine, the largest Action Index and action execution platform available, and the company’s Voice Graph analytics platform to drive personalization based on behavioral insights.   Aiqudo powers customizable white label voice assistants that give our partners control of their voice brand and enable them to define their users’ voice experience.  Aiqudo currently powers the Moto Voice digital assistant experience on Motorola smartphones in 7 languages across 12 markets in North and South America, Europe, India and Russia.  Aiqudo is based in Campbell, CA with offices in Belfast, Northern Ireland.

SOURCE mCloud Technologies Corp.

For further information:

Wayne Andrews, RCA Financial Partners Inc., T: 727-268-0113, wayne.andrews@mcloudcorp.com; Barry Po, Chief Marketing Officer, mCloud Technologies Corp., T: 866-420-1781

Classifier Architecture

A Classifier Tuned to Action Commands

By Artificial Intelligence, Command Matching, Machine Learning No Comments

One thing we have learned through our journey of building the Q Actions® Voice platform is that there are few things as unpredictable as what users will say to their devices. These range from noise or nonsense queries (utterances with no obvious intent such as “this is really great”), to genuine queries such as “when does the next Caltrain leave for San Francisco”. We needed a way to filter the noise before passing genuine queries to Q Actions. As we thought about this further, we decided to categorize the genuine commands into the following 4 classes:

  • Noise or nonsense commands
  • Action Commands that Apps were best suited to answer (such as the Caltrain query above)
  • Queries that were informational in nature, such as “how tall is Tom Cruise”
  • Mathematical queries – “what is the square route of 2024”.

This classifier would enable us to route each query internally within our platform to provide the best user experience. So we set about building a 4-class classifier for Noise, App, Informational & Math. Since we have the world’s largest mobile Action library, and Action commands are our specialty, it was critical to attain as high a classification accuracy as possible for the App type so we route as many valid user commands as possible to our proprietary Action execution engine.

We considered a number of different approaches initially when deciding the best technology to use to do this. These included convolutional & recurrent Multilayer Perceptron’s (MLP), a 3 layer MLP and Transformer models such as BERT & ALBERT plus one we trained ourselves to allow for assessing the impact of different hyperparameters (number of heads, depth etc). We also experimented with different ways to embed the query information within the networks such as word embeddings (Word2vec & Glove) and sentence embeddings such as USE and NNLM.

We created a number of data sets with which to train and test the different models. Our goal was to identify the best classifier to deploy in production as determined by its ability to accurately classify the commands in each test set. We used existing valid user commands for our App Action training & test data sets. Question datasets were gathered from sources such as  Kaggle, Quora and Stanford QA. Mathematical queries were generated using a program written in house and from https://github.com/deepmind/mathematics_dataset. Noise data was obtained from actual noisy queries based on our live traffic from powering Motorola’s Moto Voice Assistant. All this data was split into training and test sets and used to train and test each of our models. The following table shows the size of each data set.

Dataset Training set size Test set size
APP 1794616 90598
Noise 71201 45778
Informational 128180 93900
Math 154518 22850

The result of our analysis was that the 3 layer MLP with USE embedding provided us with the best overall classification accuracy across all 4 categories.

The architecture of this classifier is shown in the following schematic. It gives a posterior probabilistic classification for an input query.

Classifier Architecture

Figure 1  Overview of the model

In effect, the network consisted of two components : the embedding layer followed by a 3 layer feed forward MLP. The first layer consists of N dense units, the second M dense units (where M < N) and the output is a softmax function which is typically used for multi class classification and will assign a probability for each class. As can be seen from Figure 1 the “APP” class has the highest probability and would be the model prediction for the command ‘Call Bill’.

The embedding layer relies on a Tensorflow hub module, which has two advantages:

  • we don’t have to worry about text preprocessing
  • we can benefit from transfer learning (utilizing a pre trained model on a large volume data often based on transformer techniques for text classification )

The hub module used is based on the Universal Sentence encoder (USE) which can give us a rich semantic representation of queries and can also be fine-tuned for our task. USE is much more powerful than word embedding processes as it can embed not only words but phrases and sentences. It is trained on a variety of data sources and a variety of tasks with the aim of dynamically facilitating a wide diversity of natural language understanding tasks.  The output from this embedding layer is a 512-dimensional vector.

We expect similar sentences to have similar embeddings as shown in the following heatmap, where the more similar two sentences are, the darker the color is. Similarity is based on cosine similarity of vectors. We demonstrate the strong similarity between two APP commands (‘view my profile’, view my Facebook profile’); two INFORMATIONAL queries (‘What is Barack obama’s age’, How old is Barack obama’) and two MATH queries (‘calculate 2+2’ ‘add 2+2’)

Heatmap

Figure 2  Semantic similarity

The MLP’s two hidden layers consist of N=500 and M=100 units.  If a model has more hidden units (a higher-dimensional representation space), and/or more layers, then the network can learn more complex representations. However, it makes the network more computationally expensive and may lead to learning unwanted patterns—patterns that improve performance only in terms of the training data (overfitting) but degrade generalization (poorer performance on the test data). This is why it is important to ensure MLP settings are chosen based on the performance on a range of unseen test sets.

In terms of overall performance, our model gives us an accuracy of 98.8% for APP, 86.9% for Informational, 83.5% for Mathematical and 52.3% for Noise. From this it can be seen that we achieved our goal of correctly classifying almost all App Action commands correctly. Informational and Mathematical commands also had a high degree of accuracy, while noise was the worst performing class. The reason Noise was the poorest is because Noise is very difficult to define. Noise can range from grammatically correct sentences with no relevance to the other 3 categories (such as “the weather is hot today”) to complete random nonsense. This is very hard to predict in advance to create a good training set for. We are still working on this aspect of our classifier and plan to improve its performance on this category in the future as a result of improved training data.

Niall Rooney and David Patterson

Q Actions 1.6.2 just released to App Store!

By App Actions, Artificial Intelligence, Conversation, Digital Assistants, Knowledge, Machine Learning, Natural Language, Voice Search No Comments

New Q Actions version now in the App Store

This version of Q Actions features contextual downstream actions, integration with your calendar, as well as under the bonnet improvements to our matching engines. Q Actions help users power through their day by being more useful and thoughtful.

Contextual Awareness

Q Actions understands the context when performing your actions. Let’s say you call a contact in your phonebook with the command “call Tiffany”. You can then follow-up with the command “navigate to her house”. Q Actions is aware of the context based on your previous command and is able to use that information in a downstream action.


  • say “call Tiffany”
    • then “navigate to her house”

Calendar Integration


Stay on top of your schedule and daily events with the recently added Calendar actions. Need to see what’s coming up next? Just ask “when is my next meeting?” and Q Actions will return a card with all the important event information. Need to quickly schedule something on your calendar? Say “create a new event” and after a few questions, your event is booked. On the go and need to join a video conferencing meeting? Simply say “join my next meeting” and Q Actions will take you directly to your meeting in Google Meet. All you have to do from there is confirm your camera/audio settings and join!

  • “when is my next meeting?”
  • “create a new event”
  • “join my next meeting”

Simply do more with voice! Q Actions is now available on the App Store.

Q Card for Tom Petty

What can you do with that Thing?

By Conversation, Knowledge, User Interface, Voice Search No Comments

Often, when you have something to do, you start by searching for information about a particular Thing. Sometimes, you know exactly what that Thing is, but often, you find the Thing by using information related to it. 

“Who is Taylor Swift?” → Taylor Swift

“Who directed Avatar”  → “James Cameron”

The “Thing” is what we call a Knowledge Entity and something that you can do with that Thing is what we call a Downstream Action. The bond between that Knowledge Entity and the Downstream Action is what we refer to as Actionable Knowledge.

Actionable Knowledge

How do we do this? Our Knowledge database holds information about all kinds of entities such as movies, TV series, athletes, corporations etc. These Entities have rich semantic structure; we have detailed information about the different attributes of these Entities along with the Actions one can perform on those entities. An Action may be generic (watch a show), but can also be explicitly connected to a mobile app or service (watch the show on Disney+). This knowledge allows the user to follow up on an Entity command with an Action. 

For example, asking a question such as “How tall is Tom Brady?”  allows you to get his height i.e., 6’ 4” or 1.93 metres (based on the Locale of who’s asking) since Knowledge captures these important attributes about Tom Brady. Note that these attributes are different for different types of Entities. That is determined by the Schema of the Entity, which allows validation, normalization and transformation of data.

A command like “Who is Tom Brady?” returns a Q Card with information about Tom Brady, as shown below. As there may be multiple entities referring to “Tom Brady”, a popularity measure is computed so that the correct Tom Brady is returned, based on popularity, context and your current session. Popularity is a special attribute that is computed from multiple attributes of the entity. An Entity Card surfaces the various attributes associated with the attribute, such as when Tom Brady was born, how tall and heavy he is, and what sport he plays. There are also attributes that define potential Actions that can follow, so “go to his Instagram” will instantly take you to Tom Brady’s account in the Instagram app. 

Q Card for Tom Brady

Actions are about getting things done! Here’s another example of being able to instantly go from information to Action using Actionable Knowledge.  Asking “Who is Tom Petty?” followed by a command “listen to him on Spotify” will start playing his music. This is a powerful feature that provides a great user experience and rapid Time to Action® .

Q Card for Tom Petty

The three pillars of the Aiqudo’s Q Actions Platform allow us to implement downstream Actions:

  1. Semantically rich Entities in Actionable Knowledge
  2. AI-based Search
  3. Powerful Action execution engine for mobile apps and cloud services

AI Search

We are not limited by just the name of the entity. Our AI-based search allows you to find entities using various attributes of the entity. For example, you can search for stock information by saying “How is Tesla stock doing today?” or “Show me TSLA stock price”.   Aiqudo understands both the corporation name or the stock ticker when it needs to find information on a company’s stock price.  Some apps like Yahoo Finance can only understand the stock ticker; it may not be built to accept the name of the company as an input. Our platform allows us to fill this gap by decoupling action execution from search intent detection. A middle-tier federation module acts as a bridge between intent extraction and Action execution by extracting the right attributes of the Entity returned by the search to those required by the Action execution engine. In the above example it extracts the stockTicker attribute (TSLA),  from the corporation entity retrieved by the search (Tesla) and feeds it to the Action engine. 

Q Card for Tesla Stock

Voila! Job done!

So, what can you do with that Thing? Well, you can instantly perform a meaningful Action on it using the apps on your mobile phone. In the example above, you can jump to Yahoo News to get the latest finance news about Tesla, or go to the stock quote screen within E*Trade, the app you use and trust, to buy Tesla shares and make some money!

Mobile Accessibility

Accessibility plus utility plus convenience!

By Digital Assistants, User Interface, Voice Search No Comments

It’s great to see various platforms announce specific accessibility features on this Global Accessibility Awareness Day.

A feature that caught our attention today was Google’s Assistant-powered Action Blocks.

It’s a new app that allows users to create simple shortcuts to Actions they commonly perform. They are powered by Google Assistant, but allow for invocation through a tap.

My Actions and Favorites

We built this functionality into Aiqudo’s Q Actions when we launched it in 2017. Our approach is different in several ways:

  • The user does not need to do any work, Q Actions does it automatically for the user
  • Q Actions builds these dynamically – your most recently used Actions, and your favorite ones are automatically tracked for you – you just need to say “show my actions”
  • These handy Action shortcuts are available to you with one swipe to the right in the Q Actions app. One tap to invoke your favorite action. 
  • There’s no new app, just for accessibility – it’s built in to your Assistant interface for convenience – you just need to say “Hello Q, show my Actions”
  • There are hundreds of unique high-utility Actions you can perform that are not available in any other platform, including Google Assistant. Here are a few examples:
    • “whose birthday is it today?” (Facebook)
    • “show my orders” (Amazon, Walmart)
    • “start meditating” (Headspace)
    • “watch the Mandalorian” (Disney+)
    • “watch Fierce Queens” (Quibi)
    • “show my tasks” (Microsoft To Do, Google Tasks)
    • “show my account balances” (Etrade)
    • “join my meeting with my camera off” (Google Hangouts, Zoom)
    • “call Mark” (Whatsapp, Messenger, Teams, Slack,…)
    • “send money to John” (PayPal)
    • . and on and on and on…

It’s just easier, better and more powerful! 

And available to everyone!