| Download | - View final version: Improving cuneiform language identification with BERT (PDF, 292 KiB)
|
|---|
| DOI | Resolve DOI: https://doi.org/10.18653/v1/W19-1402 |
|---|
| Author | Search for: Bernier-Colborne, Gabriel1; Search for: Goutte, Cyril1; Search for: Léger, Serge1 |
|---|
| Affiliation | - National Research Council of Canada. Digital Technologies
|
|---|
| Format | Text, Article |
|---|
| Conference | The Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, 2019-6 - 2019-6, Ann Arbor, MI, USA |
|---|
| Abstract | We describe the systems developed by the National Research Council Canada for the Cuneiform Language Identification (CLI) shared task at the 2019 VarDial evaluation campaign. We compare a state-of-the-art baseline relying on character n-grams and a traditional statistical classifier, a voting ensemble of classifiers, and a deep learning approach using a Transformer network. We describe how these systems were trained, and analyze the impact of some preprocessing and model estimation decisions. The deep neural network achieved 77% accuracy on the test data, which turned out to be the best performance at the CLI evaluation, establishing a new state-of-the-art for cuneiform language identification. |
|---|
| Publication date | 2019 |
|---|
| Publisher | Association for Computational Linguistics |
|---|
| In | |
|---|
| Language | English |
|---|
| Peer reviewed | Yes |
|---|
| Export citation | Export as RIS |
|---|
| Report a correction | Report a correction (opens in a new tab) |
|---|
| Record identifier | e6334b65-1734-4c0e-86da-7fb23d5cf6be |
|---|
| Record created | 2019-12-19 |
|---|
| Record modified | 2020-05-30 |
|---|