The mythical 10 years

•June 18, 2012 • 1 Comment

Speech recognition is one of those technologies who have been around for  a while, but have never become mature enough to be considered established and part of everyday’s life like, instead, digital cameras, retina displays, and bluetooth. However, for a few years now speech recognition technology has be “sort-of” working so as some of us started building applications and products around it; Siri and Google voice search are now the most popular evidence of that. But in fact, although it allows building useful applications, speech recognition by computers is still far from the human ability to deal with highly noisy, highly distorted, or highly accented speech. Thinks of our ability to understand speech at a cocktail party: speech recognition by computers is light years away from that. Speech recognition is still fragile and brittle. Because of that, speech recognition has always been “almost there … but not quite .” There is always a sense that computer’s speech recognition capabilities will be close to those of humans, well, in 5 to 10 years from now. And that statement has been true every year of the past 50+ years.

Roger K. Moore, a long timer in speech research, a professor at the University of Sheffield, UK, and a long time friend, has been conducting surveys targeted to senior and young speech scientists trying to determine when they think speech recognition will be a solved problem, so to speak. The results from 3 surveys, conducted in 1997, 2003, and 2009, are reported in this paper.  As an example of the survey results, when asked when do they think  “It will possible to hold a telephone conversation with an automatic chat-line system for more than 10 minutes without realizing it isn’t human,” the median answer in all three surveys was … well … in year 2050 … meaning we are slowly getting close to that date. “Never” is the median answer to the question about when do speech recognition experts  think ‘There will be no more need for speech research” (we speech researchers have some job security, indeed). And … when do speech researchers themsleves think that “speech recognition will be commonly available at home?” .. well, the answer is mostly “…about 10 years from now …” and that answer was the same in 1997, 2003, and 2009. That is a proof of the moving 10 year horizon of pervasive speech adoption. One of the funnies question of the survey is: in which year  you think the following statement will be true “A leading cause of time away from work is being hoarse from talking all the time, and people buy keyboards as an alternative to speaking.” If you want to know what speech scientists think with respect to that, read the paper.

However the situation is not that grim. Some interesting applications of speech recognition are out there, many people try to make the technology better, and we all still believe in it. Otherwise we wouldn’t be writing and reading blogs like this. More to come, in the next posts, to convince you that “speech recognition” is still hot. Stay tuned!


Can machines really think?

•June 18, 2012 • Leave a Comment

Can machines really think? This question has been haunting machine intelligence  experts and philosopher way before today’s Siri’s speech understanding, and IBM Watson’s Jeopardy! question answering challenge. Even though today we are not thinking anymore about whether  machines can think or not, the question was quite popular in the heydays of Artificial Intelligence, the cold war, and the pioneers of computer science. The following video includes clips from a 1950 interview replayed in the 1992 PBS documentary The Machine That Changed the World. In the interview, MIT’s professor and scientific advisor to the White House Jerome Wiesner says (it is 1950, and people smoke pipe in public) that machines will be actually thinking in a matter of “four or five years.” Around the same time frame, more than 60 years ago, the “father of machine perception”, Oliver Selfridge from Lincoln Labs  has no doubts that “machines can and will think” even though he is sure that his daughter will never marry a computer. And in a 1960 Paramount News feature dubbed “Electronic brain translates Russian to English” (electronic brain? eek…) an early machine translation engineer states, without the slightest shade of doubt in his voice, that if “their experiments” go well, they will be able to translate the whole output from the Soviet Union in just a few hours’ computer time a week. In the video there is even a brief appearance of Claude Shannon, the father of “information theory.” Enjoy the video:

The Voice in the Machine is Alive

•June 16, 2012 • Leave a Comment

Hello there. A couple of months after the publication of my book, The Voice in the Machine: Building Computers that Understand Speech, I decided to start a new blog on the same theme. Having the fortune of working at ICSI, the International Computer Science Institute,  one of the few independent advanced research places where  computer speech, language, and AI research is still vivid and unfettered (together with many other disciplines, such as networking, security, neurosciences, bio-informatics, and computer architectures), and being computers that understand speech one of my lifetime loves, I will use this blog to collect my notes and my points of view on the evolution of the science and technology of what i like to call “talking machines”. Having the additional fortune of having worked on both sides of the research-industry chasm for more three decades, I will do my best to put down into words my presumably unbiased point of view on the related science, technology, and business. Talk to you soon…stay tuned.