Download | - View final version: Lattice desegmentation for statistical machine translation (PDF, 301 KiB)
|
---|
DOI | Resolve DOI: https://doi.org/10.3115/v1/P14-1010 |
---|
Author | Search for: Salameh, Mohammad; Search for: Cherry, Colin1; Search for: Kondrak, Grzegorz |
---|
Affiliation | - National Research Council of Canada. Information and Communication Technologies
|
---|
Format | Text, Article |
---|
Conference | 52nd Annual Meeting of the Association for Computational Linguistics, June 23-25, 2014, Baltimore, Maryland |
---|
Abstract | Morphological segmentation is an effective sparsity reduction strategy for statistical machine translation (SMT) involving morphologically complex languages. When translating into a segmented language, an extra step is required to desegment the output; previous studies have desegmented the 1-best output from the decoder. In this paper, we expand our translation options by desegmenting n-best lists or lattices. Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model (LM). We investigate this technique in the context of English-to-Arabic and English-to-Finnish translation, showing significant improvements in translation quality over desegmentation of 1-best decoder outputs. |
---|
Publication date | 2014-06-25 |
---|
Publisher | Association for Computational Linguistics |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
NPARC number | 21275904 |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | 72236846-b40a-4563-94bd-5d6d0bb299aa |
---|
Record created | 2015-07-31 |
---|
Record modified | 2020-06-02 |
---|