InfoShot: The history of voice technology
Digital Assistance

InfoShot: The history of voice technology

The earliest computers used punch cards for data entry. The existence of typewriters has meant there have been computer keyboards almost as long as there have been computers. The original computer mouse was invented in 1964.

But what about voice technology? Today companies are falling over themselves to push their own voice-enabled digital assistants. But is voice recognition a new input method?


The history of voice technology

The 1950s and 60s saw a number of fictional AI computers appear in sci-fi novels, film, and TV: Mike the HOLMES IV system in Robert A. Heinlein’s novel The Moon Is a Harsh Mistress; Star Trek’s nameless ‘Computer’ on the small screen; and the malevolent HAL 9000  from the classic Stanley Kubrick film 2001: A Space Odyssey all showed the kind of human-like interaction people wanted from computers, even if the reality was still a long way off.

Though it might be surprising, the history of voice technology in actual reality also has its roots in the early 50s. The pioneering Bell Labs AUDREY system could recognize digits spoken by a single, pre-designated voice speaking very slowly and deliberately. According to one description, AUDREY ‘occupied a six-foot high relay rack, was expensive, consumed substantial power and exhibited the myriad maintenance problems associated with complex vacuum-tube circuitry.’

Although today its focus seems to be around Watson and Quantum computing, IBM was once a pioneer in the field of voice tech. At the 1962 Seattle World Fair, IBM’s Shoebox was able to understand 16 words – 10 of which were the digits 0-9, enabling it to solve basic arithmetic problems. Nine years later, Big Blue’s Automatic Call Identification system enabled engineers to talk to and receive “spoken” answers from a computer via a telephone line.

Carnegie Mellon University, as part of a DARPA project, developed the HARPY speech recognition system in the 1970s capable of understanding over 1,000 words. CMU’s follow up Sphinx project has seen various iterations since its inception in 1986, and Spinx-4 is still being worked on.

After HARPY, the rise of Hidden Markov Models (HMM) – a probability model used in machine learning – being used for speech recognition saw machine vocabulary rise from the hundreds of words to thousands. Computers had begun to develop language skills level with that of small children.

The 80s saw IBM develop a voice-activated typewriter called Tangora, which boasted a 20,000-word vocabulary but had to be trained to individual speakers. IBM eventually released its Speech Server Series in 1992 and Personal Dictation System in 1993.

Released in 1987, Worlds of Wonder's Julie doll promised to understand what children said to it. Apple released a concept video for the Knowledge Navigator; a very Siri-like service it was unable to actually deliver at the time. A recent parody video shows what an actual attempt at a fully-fledged assistant in the 80s would have been like.

In 1990 Dragon – now a part of Nuance - launched Dragon Dictate, the first commercially available dictation service. In the mid-nineties, BellSouth’s Voice Activated Link (VAL) was an early frontrunner to the voice assistants we have today; customers could theoretically find information by telling an ‘electronic attendant’ what they were looking for; options included restaurant guides and classified car ads.

The early 2000s saw something of a plateau in the technology and only incremental improvements: the year 2000 saw Motorola promise a ‘cyberassistant’ called Mya, but despite buying television ads, it was never actually released.

Since the late 2000s, however, there has an explosion of voice technologies thanks to increases in processing speed and power, driven largely by Cloud computing. Google debuted a voice search iPhone app in 2008, followed by Android in 2010. Google Now was first released in 2012, but has since been phased into Google Assistant.

2011 saw the launch of Siri and the first of the digital voice assistants that are so common today. Microsoft launched Cortana in 2014. Samsung had S-Voice as far back as 2012, but has since switched focus to Bixby as of last year. In 2015 Baidu added voice assistant capabilities to its DuerOS, while 2017 saw Yandex launch Alisa, Orange launched Djingo, and Line’s parent company Naver release Clover.

Today, digital assistants are available in fridges, lights, TVs, thermostats, have all but conquered continuous speech – though intent can often be lost - and are looking to add an increasing number of languages and accents to their repertoires.


Voice in the workplace

Voice assistants are invading the home at a rate of knots. Juniper Research predicts over 50% of US households will have at least one by 2022. But what does the arrival of voice technology en masse mean for the workplace?

On the whole, it seems voice is yet to take over; a survey by Creative Strategies in 2016 found just 1% of people use voice assistants for work, while a survey by conversational UI provider Mindmeld (now owned by Cisco) said just 3% of people used voice assistants at work. However, many predict this will change.

“Since virtual assistants combine artificial intelligence with voice recognition, they can do things in seconds that used to take minutes, simply by recognizing your voice, doing the task, and telling you when they’ve done it,” says Joe Manuele, VP of Customer Experience and Workplace Productivity at Dimension Data. “Businesses will recognize their time saving benefits – automating menial tasks traditionally carried out by humans, or completing them much more quickly. I expect the virtual assistant trend will accelerate through the coming year.”

A new study of 1,000 UK office workers from co-working space broker Workthere found 23% of office workers believe that voice-activated technology would be the most useful technology to improve the way they work in the next 5 years. Dimension Data’s Digital Workplace reports claim 62% of organizations expect to welcome virtual assistants into the workplace over the next two years.

“If you look at the reasons why consumers love voice assistants, the same reasons apply to the workforce,” says Kees Jacobs, Digital Proposition Lead at Capgemini, “Speaking is natural, so employees don’t need to learn a new skill to use the technology, making it easy to integrate voice assistants into their daily work life.”

“AI powered voice assistants will be adopted very quickly throughout the workplace, particularly in areas where it is difficult for employees to “type or swipe”, such as hospitals for hygiene reasons, or in factories where workers wear protective gloves. Employees working in the retail and consumer goods industries will also benefit from voice assistants and we anticipate big implications in everything to do with buying, merchandizing and commercial functions.”

However, not all agree that voice is ready for prime time. Pascal Kaufmann, founder of intelligence startup Starmind, says voice tech still lacks essential competencies to be useful in business:

“Assistants such as Alexa can be fed pre-learnt skills and patterns, but they cannot learn by themselves on how to improve, understand, or adapt to situations.”

“At this state speech recognition is not far enough developed to become a meaningful option for real business adoption, except for when you are a white male living in the US and not having a strange accent and using words that have been fed into the system. It is only when algorithms are able to develop that self-sufficient competence, will it be ready to genuinely understand voices and reach the workplace potential we all know it has.”

Whether they are ready for the rigours of business or not, many companies are looking to embed voice within the business, either through integration with the likes of Alexa or homegrown capabilities.

Cisco now offers a voice assistant called Spark which is designed to aid meetings and conferences with the ability to find and book available rooms, suggest relevant documents ahead of time, enable screen sharing, record discussions and take meeting notes. Ricoh now offers a voice-integrated whiteboard which can take notes and share files on command.

There are also a number of services – for example French startup Snips or the Open Source Jasper project -  that allow you to create your own voice capabilities outside of the Google/Amazon/Microsoft ecosystem. Adenin Technologies’ Now Assistant connects to company data, and allows companies to connect internal data sources and create answers to questions around HR, sales, or workflows. Customers include Cisco and public transport operator Transdev.

In November, Amazon itself announced Alexa for Business, which promises to do everything from notifying IT about a broken printer to surface information such as the latest sales or inventory data. Microsoft and Amazon are looking to overcome interoperability issues and integrate Cortana and Alexa – giving the former a wider range of skills and the latter greater access to the likes of Office 365.

“Research from Stanford University has shown that speech recognition software is three times faster than prose typed normally, a clear enabler of time-savings and improved productivity for organizations,” says Nils Lenke, Senior Director of Corporate Research at Nuance Communications. “The next step for the industry is to give them industry-specific ‘PhDs’. This means they are ‘trained’ and coached with their own domain expertise - in retail or financial services, for example.”


Voice in IT

But what about IT functions? A 2016 survey of IT pros by Spiceworks suggested that 19% of businesses were using intelligent assistants/chatbots for work-related tasks on company-owned devices, with another 30% planning to use them in business over the next three years (admittedly the same company’s 2017 State of IT report said 9% currently use, and another 5% planned to, however).

Workthere’s study suggests a quarter of employees in IT departments believe voice-activated technology would the most beneficial way to improve the way they work in the next 5 years. There are an increasing number of third-party voice skills on Alexa for IT-centric tasks such as network diagnostics, IP address lookups, and programming questions.

Companies are also developing their own voice skills for internal use: Capital One, for example developed a private internal Alexa for Business skill that allows its teams to quickly check systems status or to request specific updates on high severity events.

“By mid-2018, we should expect to see all major industries rolling out voice-based interfaces,” says Alois Reitbauer, VP and chief technical strategist at Dynatrace. “Voice is so intuitive; it makes sense as the next technological evolution. Companies can greatly increase productivity without having to follow set workflows, learn software, sit through demos or trainings; they can just start talking and go.”

“Once voice is done successfully and people grow accustomed to the experience, it will become the new standard and extremely difficult to revert to anything else.”

Companies such as Tintri and Dynatrace are starting to experiment with voice UI for IT use cases; Tintri’s, for example, can automatically spin up VMs, while Dynatrace’s Davis assistant provides voice-based information for application performance issues.

“Voice will become the new CLI across all of IT,” says Donyel Jones-Williams, ‎Director of Product Marketing Management at Juniper Networks “CLI is an antiquated way to communicate with the machines that underpin every business around the world, this is why we've seen the rise of scripting languages and development environments like SaltStack to automate the antiquated, mundane tasks away with abstraction.

“To date, this abstraction has only been available to a select few, but voice assistance liberates the underlying complexity for the masses. The next step in the voice assistance journey for IT is to simply contextualize our natural language to the desire outcome.”


The end of graphical interfaces?

While there was a steady trail of milestones in voice technology throughout the last 50-odd years, actual adoption out in the real world was close to zero for a long time. Today, however, thanks to the quality of the technology, combined with the brains Cloud-computing offers, and the variety of form factors voice assistants can be embedded into, there has been a serious uptick in adoption. But is this the end of the Graphical User Interface (GUI), and the rise of Voice User Interface (VUI)?

“I don’t think VUIs will completely replace GUIs any time soon,” says Sanjay Malhotra, CTO of mobile app development company Clearbridge Mobile. “It’s important to consider how a VUI will add value to the company; if the internal team has to regularly navigate through a complicated or overloaded GUI, integrating voice search may be a valuable solution, but depending on the situation, visual context is absolutely necessary.” 


Also read:
With Djingo, Orange plans to take on Alexa in your home
Why companies are giving voice assistants physical forms people can relate to
The rise and rise (and future) of voice
Why voice means we need to rethink the API
Forget the home, voice assistants are invading the workplace


«Why Steve Jobs is the ‘reverse case study’ for IT leadership


Engaging the millennial workforce by modernizing enterprise applications»
Dan Swinhoe

Dan is Senior Staff Writer at IDG Connect. Writes about all manner of tech from driverless cars, AI, and Green IT to Cloudy stuff, security, and IoT. Dislikes autoplay ads/videos and garbage written about 'milliennials'.  

  • twt
  • twt
  • Mail

Recommended for You


How a Washington crackdown on Huawei could backfire for everyone

Phil Muncaster reports on China and beyond


5G is over-hyped and expectations need reining in

Dan Swinhoe casts a critical eye on the future


What can we learn from tech initiatives in the Middle East?

Keri Allan looks at the latest trends and technologies

Most Recent Comments

Our Case Studies

IDG Connect delivers full creative solutions to meet all your demand generatlon needs. These cover the full scope of options, from customized content and lead delivery through to fully integrated campaigns.


Our Marketing Research

Our in-house analyst and editorial team create a range of insights for the global marketing community. These look at IT buying preferences, the latest soclal media trends and other zeitgeist topics.



Should the government regulate Artificial Intelligence?