Computers that understand speech: Where are we? Where are we going?

Popular science fiction of the past century brought to us visions that have been abundantly surpassed by today’s technology, except for machines that understand speech. The speech recognition technology used today was invented in the 1970s, and refined through the following 40 years to bring us popular applications like Google Voice Search and Siri. Compared with the early systems, speech recognition today is orders of magnitude more capable and accurate, to the point that a multi-billion dollar industry emerged from it. However, speech recognition is still brittle and far from human-like capabilities. On the one hand, recognizing speech is a surprisingly difficult problem, since it involves decoding many layers of abstraction combined into a highly variable one-dimensional signal. On the other hand we have not yet achieved a full understanding of how humans perform speech recognition, and the simplifying assumptions we make in our models impose severe limitations on the achievement of higher performance. While the industry is placing a lot of emphasis on larger and larger amounts of data, some researchers are going back to the fundamental problems of speech modeling and are trying to approach them from a different perspective. In this talk I will present the general problem of speech recognition, the solutions that pushed the envelope of the digital signal processing technology towards today’s successes, and the problems that we still need to solve.


Roberto Pieraccini is currently the CEO of the International Computer Science Institute in Berkeley, CA. Prior to that he was the CTO of SpeechCycle, a research manager at IBM T.J. Watson Research and SpeechWorks International, and a member of technical staff at Bell Labs and AT&T Shannon Laboratories. He started his career in the 1980s as a researcher at CSELT, the research laboratories of the Italian telephone company. His research interests range from speech recognition to spoken language understanding and dialog, multimodal interaction, and machine learning. He is a fellow of IEEE and ISCA, a member of the AVIOS board, and a member of the editorial board of several scientific and technology magazines. He is the author of more than 120 papers and articles in the field and of “The Voice in the Machine”, a general-audience book published in 2012 by MIT Press on the history of “computers that understand speech.”

Untitled Document