Abstract | Browsing through large volumes of spoken audio is known to be a challenging task for end users. One way to facilitate this task is to provide keyphrases extracted from the audio, thus allowing users to quickly get the gist of the audio document or sections of it.<br /><br /> Previous methods for extracting keyphrases from spoken audio have used text-based summarization techniques on automatic speech transcription. The method of Désilets et al (2000) was found to produce accurate keyphrases for transcriptions with Word Error Rates (WER) of the order of 25%, but performance was less than ideal for transcripts with WERs of the order of 60%. With such transcripts, a large proportion of the extracted keyphrases included serious transcription errors.<br /><br /> In this paper, we extend thos previous methods by taking advantage of the fact that the mistranscribed keyphrases tend to have a low semantic coherence with the correctly transcribed ones. We measure semantic cohesiveness by computing Pointwise Mutual Information (PMI) of phrases in a large Terabyte corpus, and use that measure to filter semantic outliers from the list of extracted keyphrases. We evaluated the effectiveness of the technique and found that it removes half of the mistranscribed keyphrases, while removing at most 15% of correctly transcribed keyphrases. |
---|