Using Monolingual Source-Language Data to Improve MT Performance.

From National Research Council Canada

Download	View accepted manuscript: Using Monolingual Source-Language Data to Improve MT Performance. (PDF, 274 KiB)
Author	Search for: Ueffing, Nicola
Format	Text, Article
Conference	International Workshop on Spoken Language Translation (IWSLT 2006), November 27-28, 2006, Kyoto, Japan
Abstract	Statistical machine translation systems are usually trained on large amounts of bilingual text and of monolingual text in the target language. In this paper, we will present a self-training approach, which additionally explores the use of monolingual source text, namely the documents to be translated, to improve the system performance. An initial version of the translation system is used to translate the source text. Among the generated translations, target sentences of low quality are automatically identified and discarded. The reliable translations together with their sources are then used as a new bilingual corpus for training an additional phrase translation model. Thus, the translation system can be adapted to the new source data even if no bilingual data in this domain is available. Experimental evaluation was performed on a standard ChineseEnglish translation task. We focus on settings where the domain and/or the style of the test data is different from that of the training material. We will show a significant improvement in translation quality through the use of the adaptive phrase translation model. BLEU score rises up to 1.1 points, and mWER is reduced by up to 3.1% absolute.
Publication date	2006
In	Proceedings of the International Workshop on Spoken Language Translation (IWSLT 2006).
Language	English
NRC number	NRCC 48808
NPARC number	8914333
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	de5c3ff8-2697-49bf-8470-347d38d6eee8
Record created	2009-04-22
Record modified	2020-10-09

Date modified:: 2025-03-11