Search Site

<< back

interviews

28.09.14

Big data helping speech recognition become mainstream

Source: National Health Executive September/October 2014

Steve Young, Professor of Information Engineering at the University of Cambridge, and a global expert in speech recognition technologies, gives his thoughts on the advances and challenges facing this ‘growing’ research area. David Stevenson reports.

University of Cambridge’s Professor Steve Young will be the 2015 recipient of the Institute of Electrical and Electronics Engineers (IEEE) James L Flanagan Speech and Audio Processing Award.

The annual prize is given to an individual or small team for “an outstanding contribution to the advancement of speech and/or audio signal processing”. For the last 35 years, Young, who is Professor of Information Engineering at the University, has focused his attention on developing systems that allow humans to interact with machines using voice.

He told NHE that research in this area made steady but not spectacular progress from the mid-1980s to the mid-2000s. “But over the last five to 10 years we’ve seen really quite significant acceleration in progress,” he said. “And that is why we are now seeing speech recognition coming into the mainstream with services like Apple Siri and Google Now, and the new smart watches that do speech recognition.”

Prof Young, who is the senior pro-vice-chancellor responsible for planning and resources at Cambridge, added that modern systems are built on the notion of building statistical models that represent the data.

“So the way you build a speech recogniser, essentially, is that you get some data, which is people speaking, you transcribe the data and then you try to model the data and find a way to automatically generate the transcriptions yourself – and then you have a speech recogniser,” he said. “The key to all of that is some quite sophisticated statistical modelling algorithms and the availability of the data.”

Big data

The expert told us that it is the nature of data, and its wide availability nowadays, that has changed the speech recognition landscape. “When you speak into your phone, the signal is being routed to a server farm somewhere in North Virginia if you’re Apple or the Arizona desert if you’re Google, and it is being processed there and the result is being fed-back to your phone,” said Prof Young.

This allows two things to happen. Firstly, it unleashes the possibility of using some very powerful computing to recognise people’s voices. Then secondly, and more importantly, the companies are capturing the data.

“When Siri was first launched, for example, it wasn’t that great,” said Prof Young, “but as more people started using it the company was capturing huge amounts of data. And then by using and collecting the data and upgrading the models, people found the recognition improved so they used the system more, so they gave more data. That has happened over a wide range of fields, and it is the ‘big data’ paradigm that we are hearing a lot about.”

He added that the internet is also allowing organisations, be it research or commercial, to collect huge datasets and do things that they could never do before, and that is what has led to a rapid improvement in performance and an “explosion of interest” in the field.

“It is nice, looking back, that we’ve had these bursts of speech recognition research being a very hot topic, then people being disillusioned with it and it going out of fashion, and then coming back,” added Prof Young. “And we are in one of those phases where it is back and people are very interested, especially with the big players investing huge amounts in improving services.”

Dictation and voice recognition in healthcare

Dictation in the medical area has been one of the mainstays, certainly for commercial dictation applications, he added.

He noted that doctors have persevered, and because they have persevered, “and in some cases had to – particularly in the US where everything has to be recorded – the dictation systems have made progress and been widely used”.

He stated that advances have also been made in transcription and more general conversation systems (where a computer can listen to two humans having a conversation or it can be one of the participants).

In fact, Prof Young feels that the challenges in developing these technologies are now moving more from transcribing the audio into words, which has been the focus for the last 30 years, into actually understanding what the words mean and the semantics behind them, especially with regards to conversational systems.

“I would expect that what has been started by Siri and Google Now is going to expand and we’re going to see a whole plethora of agents being available for having conversations about booking hotels and restaurants,” he said, “but particularly in healthcare, as this is the field which is ripe for providing this type of service.

“I think we’ll start to see these coming in within the next few years in focused application areas and then becoming more and more general and widely acceptable over the next decade.”

Conversational systems

Currently, Prof Young is working on developing conversational systems – not specifically in healthcare yet – to access tourist information.

“For example, finding a restaurant or hotel,” he told us, “and we’ve been working with some automobile companies to develop in-car voice recognition.”

He outlined that with many people getting used to satnavs, in the future people may be able to talk to their cars and say: ‘I’d like to stop off and have a meal, what is there in the local area?’ The car would then be able to search and book into wherever it finds appropriate, after a conversation with the driver about the available options.

“We’re working on that now and many of the algorithms we’re starting to develop are not rule based,” said Prof Young.

“Traditionally these types of things have been developed by a programmer sitting down and writing rules, such as ‘what would the user ask?’ And ‘how should the system respond?’ But this doesn’t scale and the system you deploy doesn’t get any better. What we want to do is deploy systems that learn from their own users and get better and more competent automatically, and that really is the focus of my work now.”

He added that conversational systems are particularly interesting, and believes that the use of automation if it is done “sensibly and effectively”, could make a big impact in the future care of the elderly and in managing an ageing population.

Despite dedicating 35 years to research in the field of speech recognition, and with his research helping to set global standards for benchmarking systems and being the basis of many commercial systems, Prof Young remains modest about his award, joking that organisations sometimes feel they have to give them out “just because someone has been around long enough”.

Nevertheless, he said he is “humbled” to become the 2015 recipient of the IEEE James L Flanagan Speech and Audio Processing Award.

Tell us what you think – have your say below, or email us directly at [email protected]

Print Email Share Comment

national health executive tv

latest healthcare news

09/09/2020NHS England commits £30m to join up HR and staff rostering systems

As NHS England looks to support new ways of working, it has launched a £30m contract tender for HR and staff rostering systems, seeking sup... more >

09/09/2020Gender equality in NHS leadership requires further progress

New research carried out by the University of Exeter, on behalf of NHS Confederation, has shown that more progress is still needed to achieve gen... more >

09/09/2020NHS Trust set for big savings in shift to digital patient letters

Up and down the country, NHS trusts are finding new and innovative ways to leverage the power of digital technologies. In Bradford, paper appoint... more >

more latest healthcare news >

most commented

Rate of paramedics leaving ambulance service nearly doubles

more >

Action needed ‘urgently’ to get value for money from use of agency staff

more >

A collaborative approach to tackling agency

more >

the scalpel's daily blog

28/08/2020Covid-19 can signal a new deal with the public on health

Danny Mortimer, Chief Executive, NHS Employers & Deputy Chief Executive, NHS Confederation The common enemy of coronavirus united the public side by side with the NHS in a way that many had not seen in their lifetimes and for others evoked war-time memories. It was an image of defiance personified by the unforgettable N... more >

comment

23/09/2019NHS England dementia director prescribes rugby for mental health and dementia patients

Reason to celebrate as NHS says watching rugby can be good for your mental health and wellbeing. As the best rugby players in the world repr... more >

21/06/2019Peter Kyle MP: It’s time to say thank you this Public Service Day

Taking time to say thank you is one of the hidden pillars of a society. Being on the receiving end of some “thanks” can make communit... more >

13/06/2019Nurses named as least-appreciated public sector workers

Nurses have been named as the most under-appreciated public sector professionals as new research reveals how shockingly under-vauled our NHS, edu... more >

10/06/2019Creating the Cardigan integrated care centre

Peter Skitt, county director and commissioner for Ceredigion Hywel Dda University Health Board, looks ahead to the new integrated care centre bei... more >

more comment >

last word

Haseeb Ahmad: ‘We all have a role to play in getting innovations quicker’

Haseeb Ahmad, president of the Association of the British Pharmaceutical Industry (ABPI), sits down with National Health Executive as part of our... more > more last word articles >

editor's comment

Matt Roberts, National Health Executive Editorial Lead. NHE May/June 2020 Edition We’ve been through so much as a health sector and a society in recent months with coronavirus and nothing can take away from the loss and difficulties that we’ve faced but it vital we also don’t disregard the amazing efforts we’v... read more >