July 2012

  • If I had to chose one of the areas of human-machine natural communication where we haven’t ben able to make any significant stride during the past decades, I would choose “general” language understanding. Don’t get me wrong. Language understanding per se has made huge steps ahead. IBM Watson‘s victory over Jeopardy! human champions is a testimony of

    Read more →

  • Apples and Oranges

    There is a lot of talking about the performance of Apple’s Siri. An article appeared on the New York Times  a few days ago brutally destroying Siri from the point of view of its performance, and others compare it with Google Voice Search. As a professional in the field, having followed Google Voice Search closely, knowing well the people who work

    Read more →

  • Singing computers

    Building a computer that speaks with the same naturalness and intelligibility of humans is not a much easier task than building a computer that understand speech. In fact it took decades to reach the quality of modern speech synthesizer, and yet the superiority of real human voice is still unbeatable. Still today, whenever possible, automated spoken dialog systems on the phone

    Read more →

  • Put that there!

    One of the first multimodal interaction systems, dubbed Put-that-there, was built at the MIT Architecture Machine lab in the late 1970s by Chris Schmandt, who is now the director of the Speech and Mobility Group at MIT Media Labs.  Here is a demo from 1979, where you see the integration of speech and gesture recognition  to

    Read more →

  • Although speech recognition is getting better and better, it keeps making mistakes that often annoy us. Much more than humans would do in similar situations. And we have been trying to make it better for decades. What’s wrong with it? Scientists are constantly testing and trying to improve speech recognition in adverse conditions, such as

    Read more →