Nuance and the Art of Speech Recognition

The potential of speech recognition in healthcare and security

Even if you don't own an iPhone, you would have heard about Siri, the personal assistant who can cater to almost every need. Over time, Siri's conversational style has gotten better and it can even stand up for itself. For instance, on one occasion I got annoyed with him and told him that he is not as intelligent as he thinks he is. His response: “I think that's subject to opinion, Ayesha.”

Speech recognition technology has been around for years, but there is one serious player in town that has made it its speciality. The company is Nuance and I have been invited to spend a few days at their new office in Cambridge, Boston. Close to the train station and the major universities MIT and Harvard, it’s a great strategic move. It means Nuance can attract new graduates before they are beckoned by the allure of Silicon Valley.

Nuance has firmly established itself in the speech recognition space and the numbers show it. Just last year it processed over eight billion mobile cloud transactions. Its average monthly users accessing the cloud is over a 100 million. While this month, Nuance will for the first time since the company began in 2000, break one billion cloud transactions.

A major part of Nuance's work is analysing text-based information and working on creating intelligent systems. With virtual cloud assistant, Project Wintermute [YouTube video], Nuance hopes to create an 'ever-present assistant’ that is available to you wherever you may happen to be. While 'Project Nina’ on the other hand, has been 're-imagined' for the purpose of providing customer service for enterprises

Nuance is keen to emphasise that it builds applications for large customers, and most people are not even aware that Nuance is behind it. Is Apple one of Nuance's large customers? The Nuance executives look at each other with a nervous smile. Finally Mike Thompson, executive vice president breaks the silence: “Apple is a partner of ours and we do not talk about Apple at all.”

Voice as your password

A key area of investment has been improving voice biometrics, particularly in the customer service space. Gregory Pal, vice president of product strategy explains: “When [the system] looks at these 150 characteristics of your voice, you can't reverse engineer it back into the original voice.”

“So from a security perspective, if someone were to compromise the repository of all the voice prints, there's really nothing that they could do with them. Whereas if I compromise a set of text-based passwords or numerical pins, I could use those in any number of ways,” he continues.

“Voice is just as unique as a fingerprint or an iris scan. To the human ear two voices may sound very similar but to a computer when you are looking at all these other characteristics of a voice, you can then differentiate,” Pal explains.

The art of speech in medicine

Healthcare is Nuance's biggest business area and a key part of its focus is clinical documentation improvement. With the US healthcare system shifting towards the digital age in the form of electronic health records, many physicians are struggling to adjust. Nick van Terheyden, chief medical information officer for Nuance believes speech recognition technology is the solution.

“Speech recognition has been around for many years and I think it is already providing value in the clinical space by easing some of the challenges of capturing information,” says Terheyden.

“Clinicians have been turned towards the computer, in part because they have to capture information and they have been asked to capture information in structured data form. The problem for doctors is, they are not very good at data entry,” explains Terheyden.

“[So] how do we turn medicine back into an art and allow physicians to focus on the patient?” he asks. “What we are trying to do with speech is to allow physicians to capture that narrative, and turn that into useful data. Having to type into a machine is detracting from the interaction [with the patient] and it’s not really capturing useful information.”

Terheyden gives a quick demonstration of how the system development tool kit works. As he dictates the patient information into the desktop, it is rapidly added to the system with complete accuracy. I get a bit nervous when he dictates the patient dosage information. What if it makes a mistake?

“One of the things that comes up all the time is, who is responsible for the information? We are very clear about this. We present data back to the clinician who is the ultimate decision-maker and he has all of the options to say ‘no that is not right’ or ‘this is not appropriate’.”

From my time here at the Nuance office, it's clear that the possibilities for speech recognition in our digital world are endless. Nuance might have its competitors but I reckon Nuance will always be one step ahead. It's time for me to leave but before I head back to the UK I try one more thing:

“Siri, do you use Nuance technology?”

Siri: “I can't answer that.”

Go figure.


Ayesha Salim is e-Content Writer at IDG Connect