The rise and rise (and future) of voice
Search Engines

The rise and rise (and future) of voice

Of the many, many, many surprising things to have happened over the last couple of years, the rise of voice-based interfaces is one of the more pleasant things. Siri might have been the frontrunner for the modern voice assistant, but Amazon’s Alexa was the ubiquitous star of this year’s CES. VoiceLabs predicts that 24.5 million voice-first devices such as Amazon Echo and Google Home will ship in 2017, more than triple last year’s figure.

“It’s no surprise that Google and Amazon are focusing heavily on voice search and natural language going forward,” says Tim Tuttle, founder and CEO at MindMeld, “when you consider that in 2015 alone, voice search rose from ‘statistical zero’ to make up 10% of all searches globally - that’s an estimated 50 billion searches per month.”

Founded in 2011 and formerly known as Expect Labs, San Francisco-based MindMeld offers companies the ability to embed voice commands within their applications and services, and has the company was listed on CB Insights’ AI 100 list and described by The Verge as “Siri on Steroids”.

MindMeld and voice services such as Viv and Siri - before they were acquired by Samsung and Apple respectively – could have existed before or even during the first dot.com boom. Tuttle explains how speech recognition has been touted at the future of computing for decades, but always been out of reach.

“This all began to change in the late 2000s and early 2010s. Fuelled by massive amounts of data from exploding mobile internet usage, supervised learning began to deliver surprisingly promising results. Long-standing AI research challenges such as speech recognition and machine translation began to see leaps in accuracy, which dwarfed all improvements made over the previous decades combined.”

“As a result of these advances in machine learning, virtual assistants, which had a notoriously hit-or-miss track record in their early years, started to see significant widespread adoption for the first time.”

 

The business of voice

Siri was the first of these such assistants, but the arrival and success of Amazon’s Alexa has seen the market explode. More than 40 products at 2017’s showed off some integration with Amazon’s voice service, and many others were showing off their own versions of voice-enabled assistants. When asked whether this transition to voice is being driven by the companies or by the will of the people, Tuttle is in no doubt that it’s the latter.

“Consumers worldwide increasingly expect that quick assistance and expert advice will be only a simple voice command or chat message away. While changes may not happen overnight, over the next five years, conversational AI will become the primary way that we will interact with many online services.”

And where the people go, businesses have to follow, or be left behind.

“As a result of this dramatic change in user behaviour, businesses have been forced to respond in order to meet the expectations of their users. The forward-thinking organisations also realise that voice and conversational interfaces can become a significant competitive differentiator. They realise that the first movers will gain a huge strategic advantage in this emerging conversational application landscape.”

“Human conversation is truly the lingua franca, and any organisation which masters the ability to understand the natural language requests of their users will gain a huge strategic advantage in this emerging conversational application landscape.”

 

Out of the home and in the office

Voice might be invading your home, but what about work? A survey by Creative Strategies last year found just 1% of people use voice assistants for work. Mindmeld’s own public survey found just 3% of people used voice assistants at work.

However, there seems to be a much higher adoption within the IT community. A recent survey of IT pros by Spiceworks suggested that 19% of businesses are currently using intelligent assistants/chatbots for work-related tasks on company-owned devices, which another 30% are planning to use them in business over the next three years.

“In the second half of 2016, enterprises for the first time began making significant investments in voice for enterprise and B2B applications,” says Tuttle. “We expect this trend to continue.”

Some of this investment may already be starting to bear fruit. Storage virtualisation provider Tintri is already working on chatbot integration for its services, and, according to The Reg, will soon be working on voice integration for Alexa.

 

Challenges remain

It’s not all pleasant chats with affable robots, however. Tom Warren at The Verge recently lamented about the lack of interoperability between voice assistants and how he shouts the same command at different assistants until one does what he asks. He has a point: Alexa can’t do anything on your iPhone, Siri can’t communicate with your Xbox etc., which all leads to a frustrating user experience. Tuttle, however, thinks there is a solution coming.

“Users will continue to rely on their favourite digital assistant, like Siri, Alexa or Cortana, as the first choice for finding information,” he says. “They will provide the answer if it is in their knowledge base, and in cases where the user might want to enlist the help of a more specialised service - to book a ride through Uber or check a transaction with Wells Fargo - your smartphone assistant will serve as the broker to connect you with the specialised third-party service.”

“From the user's perspective, they are asking a single question to their favourite virtual assistant. Behind the scenes, however, a wide range of third-party conversational services will emerge in order to field all of the various requests that Siri, Cortana, and Alexa are unable to handle on their own.”

 

Lost in translation

“Conversational applications may seem simple on the surface, but building truly useful conversational experiences represents one of the hardest AI challenges solvable today,” says Tuttle. This, he explains, is down to the inherent complexity of human language.

While it can be relatively quick and easy to build simple applications that support a very narrow vocabulary and set of commands, the user experience will always suffer if it doesn’t work in a way that feels natural.

“Applications to date that have succeeded in delighting users impose few constraints on a user’s vocabulary; they simply let users speak to the application as if they are conversing with another human. Applications which can understand broad-vocabulary natural language are notoriously complex due to the inherent combinatorial complexity of language or what’s also called the ‘curse of dimensionality’.”

“In other words, the number of different ways a human might phrase even a simple question can quickly explode into many thousands of variations. The human brain is remarkable at making sense of many trillions of language variations in a fraction of a second with near-perfect accuracy. This same feat is all but impossible for today’s most advanced AI technology.”

On a more practical level, there’s still a lot of work to be done on making voice a tool everyone can use. Aside from the fact there’s still a large bias towards English, you’ll be hard-pressed to find voice-recognition technologies that can cope with anything other than West-Coast American or Queen’s English.

“For the most common languages and dialects, this is largely a solved problem,” says Tuttle. “Uncommon accents and dialects have long been the Achilles heel for speech recognition systems.”

“Much progress is being made but there is still much work to do. The reason for this challenge lies in the need to train conversational systems on thousands or millions of examples of voice and language samples. The more data translates into the more accurate the system. For more obscure languages and dialects, collecting data can be challenging.”

One of the other challenges around voice is getting the right skills. Data scientists, machine learning researchers, and computational linguists are all in short supply in the technology industry, which means linguists who have spent year in academia are suddenly a very valuable proposition for both startups and the big companies.

“Only a small number of technical universities have historically offered a curriculum in these disciplines [natural language understanding and computational linguistics] and each university graduates only a handful of experts every year.”

“With the recent explosion of interest in conversational applications, there has been a mismatch between supply and demand. As a result, subject-matter experts will remain a valuable commodity until academia and industry can rebalance the equation.”

 

A post-GUI future?

Wearables and voice are often touted as the future. Some predict AI-driven hearables like the ones seen in the 2013 film Her will render smartphones obsolete. Will we one day live in a post-GUI world?

“It is not hard to envision some applications where a voice-first experience would be ideal,” says Tuttle. “Over the next decade, voice will certainly become a feature in every application and on every device where it might prove useful.”

“Regardless, GUI-based interfaces are unlikely to become obsolete. There will remain many applications and situations where a traditional GUI will still be preferable.”

 

Also read:
Forget the home, voice assistants are invading the workplace
Why companies are giving voice assistants physical forms people can relate to

PREVIOUS ARTICLE

«Typical 24: Stuart Hall, Epicor Software Corporation

NEXT ARTICLE

C-suite career advice: Eric Berridge, Bluewolf»
author_image
Dan Swinhoe

Dan is Senior Staff Writer at IDG Connect. Writes about all manner of tech, from driverless cars , AI, and Green IT to Cloudy stuff, security, and IoT. Dislikes autoplay ads/videos and garbage written about 'milliennials'.  

  • twt
  • twt
  • Mail

Add Your Comment

Most Recent Comments

Our Case Studies

IDG Connect delivers full creative solutions to meet all your demand generatlon needs. These cover the full scope of options, from customized content and lead delivery through to fully integrated campaigns.

images

Our Marketing Research

Our in-house analyst and editorial team create a range of insights for the global marketing community. These look at IT buying preferences, the latest soclal media trends and other zeitgeist topics.

images

Poll

Should companies have Bitcoins on hand in preparation for a Ransomware attack?