Intern Voice: Kenny Kang
Working at a startup can be described as, interesting, but in the best way possible. As a comparison, my summer roommate interned for a larger corporate company and we developed two completely different ideas of a what a ‘normal’ working environment is. Apparently, it isn’t ‘normal’ for an internship project to become a feature in the company’s main product. It also isn’t ‘normal’ to have the opportunity of presenting directly to C suite executives. And it definitely isn’t ‘normal’ to be talking to your CEO about his wild college days during company outings. I could write entire essays about all the reasons I loved working at Aiqudo, but there was one that always made my friends question my sanity: I kept describing the work itself as ‘fun!’. Even for a startup, I’m not sure how normal that is.
During my internship, I created a question answering service which dealt with knowledge-based queries. For example, questions like “How old is Tom Brady?” or “Which movies were Lawrence Fishburne and Keanu Reeves in together?”. While the problem itself was interesting, it was the freedom I had that made it really engaging. Since there isn’t always a straightforward solution when dealing with Natural Language Processing (NLP) problems, I needed to constantly approach the next obstacle in new ways, such as using certain tools in unconventional ways or reading up on the latest research. Each new day felt like solving a new puzzle, and that’s what made it consistently so enjoyable! Of course, I ran into plenty of issues that seemed impossible to get around, but luckily, I had an amazing mentor, Sunil, who was always there to point me in the right direction.
This past summer has been an incredible experience. I came in thinking I would leave with a few new skills. Not only have I learned several valuable skills, I’ve also developed a newfound confidence in my ability to think through complex problems, and set a higher bar for any company I’d want to work for in the future.
Do more with Voice
Q Actions 2.0 is here. With this release, we wanted to focus on empowering users throughout their day. As voice is playing a more prevalent part in our everyday lives, we’re uncovering more use cases where Q Actions can be of help. In Q Actions 2.0, you’ll find new features and enhancements that are more conversational and useful.
Aiqudo believes the interaction with a voice assistant should be casual, intuitive, and conversational. Q Actions understands naturally spoken commands and is aware of the apps installed on your phone, so it will only return personalized actions that are relevant to you. When a bit more information is required from you to complete a task, Q Actions will guide the conversation until it fully understands what you want to do. Casually chat with Q Actions and get things done.
- “create new event” (Google Calendar)
- “message Mario” (WhatsApp, Messenger, SMS)
- “watch a movie/tv show” (Netflix, Hulu)
- “play some music” (Spotify, Pandora, Google Play Music, Deezer)
In addition to providing relevant app actions from personal apps that are installed on your phone, Q Actions will now display rich information through Q Cards™. Get up-to-date information from cloud services on many topics: flight status, stock pricing, restaurant info, and more. In addition to presenting the information in a simple and easy-to-read card, Q Cards™ support Talkback and will read aloud relevant information.
- “What’s the flight status of United 875?”
- “What’s the current price of AAPL?”
- “Find Japanese food”
There are times when you need information but do not have the luxury of looking at a screen. Voice Talkback™ is a feature that reads aloud the critical snippets of information from an action. This enables you to continue to be productive, without the distraction of looking at a screen. Execute your actions safely and hands-free.
- “What’s the stock price of Tesla?” (E*Trade)
- Q: “Tesla is currently trading at $274.96”
- “Whose birthday is it today?” (Facebook)
- Q: “Nelson Wynn and J Boss are celebrating birthdays today”
- “Where is the nearest gas station?”
- Q: “Nearest gas at Shell on 2029 S Bascom Ave and 370 E Campbell Ave, 0.2 miles away, for $4.35”
An enhancement to our existing curated Actions Recipes, users can now create Action Recipes on the fly using Compound Command. Simply join two of your favorite actions using “and” into a single command. This allows the users the capability to create millions of Action Recipe combinations from our database of 4000+ actions.
- “Play Migos on Spotify and set volume to max”
- “Play NPR and navigate to work”
- “Tell Monica I’m boarding the plane now and view my boarding pass”
Simply do more with voice! Q Actions is now available on Google Play.
When an action or a set of actions require specific input parameters, Directed Dialogue™ allows the user to submit the required information through very simple, natural back-and-forth conversation. Enhanced with parameter validation, and user confirmation,Directed Dialogue™ allows complex tasks to be performed with confidence.Directed Dialogue™ is not about open-ended conversations, but it about getting things done, simply and efficiently.
With Q Actions, Directed Dialogue™ is automatically enabled for every action in the system because we know the semantic requirements of each and every action’s parameters. It is not constrained, and applies across all actions across all verticals.
Another application of Directed Dialogue™ is input refinement. Let’s say I want to purchase batteries. If I just say, “add batteries to my shopping cart” I can get the wrong product added to my cart, as on Alexa, which does the wrong thing for a new product order (the right thing happens on a reorder). In the case of Q Actions, I can provide the brand “Duracell” and the type “9V 4 pack” with very simpleDirected Dialogue™, and exactly the right product is added to my cart – in the Amazon or Walmart app.
Get Q Actions today.
My last post discussed the important step of automatically generating vast amounts of relevant content relating to commands to which we apply our machine learning algorithms. Here I want to delve into the design of our algorithms.
Given a command, our algorithms need to:
- Understand the meaning and intent behind the command
- Identify and extract parameters from it
- Determine which app action is most appropriate
- Execute the chosen action and pass the relevant parameters to the action
This post and the next one will address point 1. The other points will be covered in subsequent posts.
So how do we understand what a user means based on their command? Typically commands are short (3 or 4 terms), which makes it very difficult to disambiguate among the multiple meanings a term can have. So if someone says “search for Boston” do they want directions to a city or do they want to listen to a rock band on Spotify? In order to disambiguate among all the possibilities we need to know if a) any of the command terms can have different meanings, b) what those meanings are and finally c) which is the correct one based on context.
In order to do this we developed a suite of algorithms which feed off the data we generated previously (See post #3). These algorithms are inspired by semiotics, the study of how meaning is communicated. Semiotics originated as a theory of how we interpret the meaning of signs and symbols. Given a sign in one context, for example a flag with a skull and crossbones on it, you would assign a particular meaning to it (i.e. Pirates).
Whereas, if you changed the context to a bottle, then the meaning changes completely
Poison – do not drink!
Linguists took these ideas and applied them to language and how, given a term (e.g. ‘window’), its meaning can change depending on the meaning of the words around it in the sentence (meanings could be physical window in a room, software window, window of opportunity, etc.). By applying these ideas to our data we can understand the different meanings a term can have based on its context.
We also drew inspiration from discourse communities. A discourse community is a group of people involved in and communicating about a particular topic. They tend to use the same language for important concepts (sometimes called jargon) within their community, and these terms have a specific, understood and agreed meaning within the community to make communication easier. For example members of a cycling community have their own set of terms that is fairly unique to them that they all understand and adhere to. If you want to see what I mean, go here and learn the meanings of such terms as an Athena, a Cassette, a Chamois (very important!) and many other terms. Similarly motor enthusiasts will have their own ‘lingo’. If you want to be able to differentiate your AWS from your ABS and your DDI from your DPF then get up to speed here.
Our users use apps, so in addition we would expect to discover gaming discourses, financial discourses, music discourses, social media discourses and so on. Our goal was to develop a suite of machine learning algorithms which could automatically identify these communities through their important jargon terms. By identifying the jargon terms we can build a picture of the relationship between these terms and other terms used by each discourse community within our data. A characteristic of jargon words is that they have a very narrow meaning within a discourse compared to other terms. For example the term ‘computer’ is a very general term that can have multiple meanings across many discourses – programming, desktop, laptop, tablet, phone, firmware, networks etc. … ‘Computer’ isn’t a very good example of a jargon term as it is too general and broad in meaning. We want to identify narrow, specific terms that have a very precise meaning within a single discourse, e.g. a specific type of processor, or a motherboard. Our algorithms do a remarkable job of identifying these jargon terms and are foundational to our ability to extract meaning, precisely understand user commands and thereby the real intent that lies behind them.
In my next post I will go into the details behind the algorithms that enable us to identify these narrow-meaning, community-specific jargon terms and ultimately to build a model that understands the meaning and intent behind user queries.