Google has confirmed its voice assistant, powered by Duplex, which is a completely automated system that places calls on your behalf, is set to roll out across more geographies and devices. This sets up for mainstream adoption of Duplex, especially as it goes cross-platform (iOS and Android). Additionally, Google is working hard to get companies to migrate some or all of their compute to its cloud service. Quartz’s coverage of Google’s Cloud Next, the company’s cloud extravaganza held recently in San Francisco, highlights that “new AI algorithms or data-analytics tools might allow businesses to offer new features or streamline parts of these tech giants’ businesses, but they’re also new products that can scale instantly to thousands of enterprise customers.” But it’s likely not only analytics and AI tools, Google’s research suggests the possibility to extend the tool set to language AI.
Conversational AI — the ability to converse with a machine — is driving a revolution in human / machine collaboration. Companies are going to see value in language capabilities but it’s hard for a machine.
Language is unique to humans. We use language to convey information that is rich, nuanced and amplified by empathy, which is our ability to imagine someone else’s experience. Machines handle information with a speed and scale that humans cannot match but machines fail when they are unable to understand context. Context is something that humans understand intuitively; a function of how our brains process language, situations and sensory information and can derive meaning in the moment.
Voice as the new search
All the major tech companies are placing a big focus on voice and conversational computing. Alexa, Siri, Cortana and Google Assistant are the most important. Google’s Duplex, currently only available for restaurant bookings, aims to create a seamlessness between the virtual and physical world. The assistant can keep us going back to Google for more searches, generating more ad revenue for Google. After all, any call to a business that takes place between humans in the physical world is a click lost to the virtual world. Today Google uses the data to update business information — say, hours of operation — but the ability of Duplex to gather vast amounts of data on the physical world via voice represents a profound shift in how we access information.
Performance and efficiency
From an AI frontier perspective, Google’s research and development gives important insights into the priorities for Google’s conversational and language AI development. A reading of the research suggests two important priorities from a technical perspective: develop context while ensuring the AI is also computationally efficient. This constraint is important because context is hard to do. Google’s mobile strategy and cross-platform media will only be possible by keeping voice computationally efficient.
Google’s research discusses important techniques and the development of key state-of-the-art AI.
- AI needs a way to store knowledge from a previous problem and apply it to a different problem. Google’s AI researchers have taken some clever techniques from computer vision and applied them to language. Transfer learning is a method that takes a model used for one task and applies it to another task. Think of it as a pre-populated template. It makes it easier and faster to train new AI on new constructs, say, new languages or new use cases.
- AI needs to keep learning and working effectively on a phone without the constant need to ping the central model on a periodic basis. This keeps speeds high and bandwidth needs low while not compromising the AI’s learning. Federated learning is a technique where model development is distributed over millions of mobile devices and is able to provide highly personalized models, Theoretically, there is no compromise regarding user privacy.
- Better context understanding for conversational AI starts with understanding how to improve task-oriented dialogue. The goal is to allow Google Assistant to have more turn-taking with a human. This, in turn, increases Google’s ability to extend Duplex beyond the narrow use of restaurant bookings today. Google’s research links past dialogue with present dialogue in a more efficient way, by adding a separate AI piece (a new recurrent neural network) which communicates what was said before (context) with what is being spoken in the present. It’s a little like bridging time with a “smart memory” capability during a conversation so that the AI knows what happened earlier. Paper here.
- A new type of neural network structure designed to help an AI understand sequences, and apply it to sequences of words. This is a powerful improvement on the previous state-of-the-art and uses an attention mechanism that learns contextual relations between words in a text. This means that the AI is able to learn context to a far greater extent than before. Paper here.
In the voice wars, context will be king. But computational efficiency matters — especially for Google which wants to be able to link the online and offline worlds, can’t position for privacy like Apple’s Siri or be always on wifi like Amazon’s Alexa devices. A true consumer, mobile, cross-platform, ad-driven experience will require both approaches to be equal priority.