IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), May 17-21, 2004, Montréal, Québec, Canada
A three-stage architecture for speech recognition is presented including pre-processing, phoneme recognition, and natural language post-processor. Within this context of phoneme-based utterance recognition, this paper focuses on the often problematic speed of the second stage and reengineers a standard Two-Level Dynamic Programming (TLDP) approach to achieve an increase in speed of 75%. Our Fast Two-Level Dynamic Programming Algorithm (FTLDP) uses a phoneme clustering technique to reduce the reference search space and silence detection to reduce the length of the utterance to recognize. An overview of the FTLDP algorithm is presented as well as some results.
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004) [Proceedings].