Voice for AssetCare

Voice for AssetCare

Jim Christian

Jim Christian, Chief Technology and Product Officer, mCloud

It is estimated that there are 20 million field technicians operating worldwide. A sizable percentage of those technicians can’t always get to information they need to do their jobs. 

Why is that? After all, we train technicians, provide modern mobile devices and specialized apps, and send them to the field with stacks of information and modern communications. Smartphone sales in the United States grew from $3.8 billion in 2005 to nearly $80 billion in 2020. So why isn’t that enough?

One problem is that tools that work fine in offices don’t necessarily work for field workers. A 25-year old mobile app designer forgets that a 55-year old field worker cannot read small text on a small screen or see with the glare of natural light. In industrial and outdoor settings field workers frequently wear gloves and other protective gear. Consider a technician who needs to enter data on a mobile device while outside in freezing weather. This worker could easily choose to wait to enter data until he’s back in his truck and can take off his gloves, and as a result, not entering the data exactly right. Or a technician may need to wear gloves and may find it difficult to type on a mobile device.

A voice-based interface can be a great help in these situations. Wearable devices that respond to voice are becoming more common. For instance, RealWear makes a headset that is designed to be worn with a hardhat, and one model is intrinsically safe and can be used in hazardous areas. But voice interfaces have not become popular in industrial settings. Why is that?

We could look to the OODA loop–short for Observe, Orient, Decide, and Act–for insights. The OODA concept was developed by the U.S. Air Force as a mental model for fighter pilots. Fighter pilots need to act quickly. Understanding the OODA loop that applies in a particular situation is helpful to improve, to act more quickly and decisively. Field technicians don’t have life-and-death situations to evaluate, but the OODA loop still applies. The speed and accuracy of their work depends on their OODA loop for the task at hand.

Consider two technicians who observe an unexpected situation, perhaps a failed asset. John orients himself by taking off his gloves to call his office, then searches for drawings in his company’s document management, and then calls his office again to confirm his diagnosis. Meanwhile, Jane orients herself by doing the same search, but talking instead of typing, keeping her eyes on the asset all the time. Assuming that the voice system is robust, Jane is able to use her eyes and her voice at the same time, accelerating her Observe and Orient phases. Jane will do a faster, better job. A system where the Observe and Orient phases are difficult–John’s experience–can be inferior and will be rejected by users, whereas Jane’s experience with a short, easy OODA loop will be acceptable.

A downside of speaking to a device is that traditional voice recognition systems can be painfully slow and limited. These systems recognize the same commands that a user would type or click with a mouse, but most people type and click much faster than they talk. Consider the sequence of actions required to take a picture and send it to someone on a smartphone using your fingers: open photo app, take a picture, close the photo app, open the photo gallery app, select the picture, select the open photo to share that picture, select a recipient, type a note, and hit send. That could be nine or ten distinct operations. Many people can do this rapidly with their fingers, even if it is a lot of steps. Executing that same sequence with an old-style, separate voice command for each step would be slow and painful and most people would find it worse than useless. 

The solution is natural voice recognition, where the voice system recognizes what the speaker intends and understands what “call Dave” means. Humans naturally understand that a phrase such as “call Dave” is shorthand for a long sequence (“pick up the phone”, “open the contact list”, “search for ‘Dave'”, etc.).  Natural voice recognition has come a long way in recent years and systems like Siri and Alexa have become familiar for personal use. Field workers often have their own shorthand for their industry, like “drop the transmission” or “flush the drum”, which their peers understand but Siri or Alexa don’t.

At mCloud, we see great potential in applying natural voice recognition to field work in industries such as oil & gas. Consider a field operator who is given a wearable device with a camera and voice control, and who is able to say things like, “take a picture and send it to John” or “take a picture, add a note ‘new corrosion under insulation at the north pipe rack’ and send to Jane” or “give me a piping diagram of the north pipe rack.”  This worker will have no trouble accessing useful information, and in using that information to orient himself to make good decisions. An informed field operator will get work done faster, with less trouble, and greater accuracy.

The U.S. Chemical Safety Board analyzes major safety incidents at oil & gas and chemical facilities. A fair number of incidents have a contributing factor of field workers not knowing something or not having the right information. For instance, an isobutane release at a Louisiana refinery in 2016 occurred in part when field operators used the wrong procedure to remove the gearbox on a plug valve. There was a standard procedure but about 3% of the plug valves in the refinery were an older design that required different steps to remove the gearbox. This is an example where the field workers were wearing protective gear and followed the procedure that was correct for a different type of valve, wrong for the valve in front of them. Field workers like this generally have written procedures, but occasionally the work planner misses something or reality in the field is different than what was expected. This  means that field workers need to adapt, perhaps by calling for help or looking up information such as alternate procedures.

Examples where natural voice recognition can help include finding information, calling other people for advice, recording measurements and observations, inspecting assets, stepping through repair procedures, describing the state of an asset along with recommendations and questions, writing a report about the work done, and working with other people to accomplish tasks. Some of these examples are ad hoc tasks, like taking a picture or deciding to call someone. Other examples are part of larger, structured jobs. An isolation procedure in a chemical plant or replacing a transmission are examples of complex procedures with multiple steps that can require specialized information or where unexpected results from one step may require the field worker to get re-oriented, find new information, or get help.

Aiqudo has powerful tech for natural voice recognition and mCloud is pleased to be working with Aiqudo to apply this technology. Working together, we can help field workers get what they need by simply asking for it, talk to the right people by simply asking for help, confirm their status in a natural way, and in general get the right job done, effectively and without mistakes.

This post is authored by Jim Christian, Chief Technology and Product Officer, mCloud.

Aiqudo and mCloud recently announced a strategic partnership that brings natural Voice technology into emerging devices, such as smart glasses, to support AR/VR and connected worker applications, and also into new domains such as industrial IoT, remote support and healthcare.


Author Aiqudo

Aiqudo connects the nascent world of digital voice assistants to the useful, mature world of mobile apps through our Q Voice-to-Action™ platform. We let people use natural voice commands to instantly execute actions in mobile apps.

More posts by Aiqudo

Leave a Reply