Download | - View final version: Improving cuneiform language identification with BERT (PDF, 292 KiB)
|
---|
DOI | Resolve DOI: https://doi.org/10.18653/v1/W19-1402 |
---|
Author | Search for: Bernier-Colborne, Gabriel1; Search for: Goutte, Cyril1; Search for: Léger, Serge1 |
---|
Affiliation | - National Research Council of Canada. Digital Technologies
|
---|
Format | Text, Article |
---|
Conference | The Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, 2019-6 - 2019-6, Ann Arbor, MI, USA |
---|
Abstract | We describe the systems developed by the National Research Council Canada for the Cuneiform Language Identification (CLI) shared task at the 2019 VarDial evaluation campaign. We compare a state-of-the-art baseline relying on character n-grams and a traditional statistical classifier, a voting ensemble of classifiers, and a deep learning approach using a Transformer network. We describe how these systems were trained, and analyze the impact of some preprocessing and model estimation decisions. The deep neural network achieved 77% accuracy on the test data, which turned out to be the best performance at the CLI evaluation, establishing a new state-of-the-art for cuneiform language identification. |
---|
Publication date | 2019 |
---|
Publisher | Association for Computational Linguistics |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | e6334b65-1734-4c0e-86da-7fb23d5cf6be |
---|
Record created | 2019-12-19 |
---|
Record modified | 2020-05-30 |
---|