The mythical 10 years

Speech recognition is one of those technologies who have been around for  a while, but have never become mature enough to be considered established and part of everyday’s life like, instead, digital cameras, retina displays, and bluetooth. However, for a few years now speech recognition technology has be “sort-of” working so as some of us started building applications and products around it; Siri and Google voice search are now the most popular evidence of that. But in fact, although it allows building useful applications, speech recognition by computers is still far from the human ability to deal with highly noisy, highly distorted, or highly accented speech. Thinks of our ability to understand speech at a cocktail party: speech recognition by computers is light years away from that. Speech recognition is still fragile and brittle. Because of that, speech recognition has always been “almost there … but not quite .” There is always a sense that computer’s speech recognition capabilities will be close to those of humans, well, in 5 to 10 years from now. And that statement has been true every year of the past 50+ years.

Roger K. Moore, a long timer in speech research, a professor at the University of Sheffield, UK, and a long time friend, has been conducting surveys targeted to senior and young speech scientists trying to determine when they think speech recognition will be a solved problem, so to speak. The results from 3 surveys, conducted in 1997, 2003, and 2009, are reported in this paper.  As an example of the survey results, when asked when do they think  “It will possible to hold a telephone conversation with an automatic chat-line system for more than 10 minutes without realizing it isn’t human,” the median answer in all three surveys was … well … in year 2050 … meaning we are slowly getting close to that date. “Never” is the median answer to the question about when do speech recognition experts  think ‘There will be no more need for speech research” (we speech researchers have some job security, indeed). And … when do speech researchers themsleves think that “speech recognition will be commonly available at home?” .. well, the answer is mostly “…about 10 years from now …” and that answer was the same in 1997, 2003, and 2009. That is a proof of the moving 10 year horizon of pervasive speech adoption. One of the funnies question of the survey is: in which year  you think the following statement will be true “A leading cause of time away from work is being hoarse from talking all the time, and people buy keyboards as an alternative to speaking.” If you want to know what speech scientists think with respect to that, read the paper.

However the situation is not that grim. Some interesting applications of speech recognition are out there, many people try to make the technology better, and we all still believe in it. Otherwise we wouldn’t be writing and reading blogs like this. More to come, in the next posts, to convince you that “speech recognition” is still hot. Stay tuned!

~ by Roberto Pieraccini on June 18, 2012.

One Response to “The mythical 10 years”

  1. Thanks for revealing your ideas. I would also like to convey that video games have been at any time evolving. Today’s technology and innovations have aided create genuine and fun games. All these entertainment video games were not as sensible when the real concept was first being used. Just like other forms of technologies, video games way too have had to develop as a result of many decades. This itself is testimony towards the fast growth of video games.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: