Association for Computational Linguistics (ACL) Second Workshop on Statistical Machine Translation (WMT07), June 23, 2007, Prague, Czech Republic
Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text in the target language. In this paper we explore the use of transductive semi-supervised methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations on the French-English EuroParl data set and on data from the NIST Chinese-English large data track. We show a significant improvement in translation quality on both tasks.
Proceedings of Association for Computational Linguistics (ACL) Second Workshop on Statistical Machine Translation (WMT07).