DOI | Trouver le DOI : https://doi.org/10.3115/1118905.1118925 |
---|
Auteur | Rechercher : Martin, Joel1; Rechercher : Johnson, Howard1; Rechercher : Farley, Benoît1; Rechercher : Maclachlan, Anna1 |
---|
Affiliation | - Conseil national de recherches du Canada. Institut de technologie de l'information du CNRC
|
---|
Format | Texte, Article |
---|
Conférence | HLT-NAACL-PARALLEL '03 : Human Language Technology and North American Chapter of Association of Computational Linguistics 2003, May 27 - June 1, 2003. |
---|
Résumé | A parallel corpus of texts in English and in Inuktitut, an Inuit language, is presented. These texts are from the Nunavut Hansards. The parallel texts are processed in two phases, the sentence alignment phase and the word correspondence phase. Our sentence alignment technique achieves a precision of 91.4% and a recall of 92.3%. Our word correspondence technique is aimed at providing the broadest coverage collection of reliable pairs of Inuktitut and English morphemes for dictionary expansion. For an agglutinative language like Inuktitut, this entails considering substrings, not simply whole words. We employ a Pointwise Mutual Information method (PMI) and attain a coverage of 72.3% of English words and a precision of 87%. |
---|
Date de publication | 2003 |
---|
Dans | |
---|
Langue | anglais |
---|
Numéro du CNRC | NRCC 47119 |
---|
Numéro NPARC | 5765030 |
---|
Exporter la notice | Exporter en format RIS |
---|
Signaler une correction | Signaler une correction (s'ouvre dans un nouvel onglet) |
---|
Identificateur de l’enregistrement | bce8df0d-20c8-4b42-a200-223ed4fb92b3 |
---|
Enregistrement créé | 2009-03-29 |
---|
Enregistrement modifié | 2020-04-02 |
---|