| Download | - View final version: Transfer learning improves french cross-domain dialect identification: NRC @ VarDial 2022 (PDF, 319 KiB)
|
|---|
| Author | Search for: Bernier-Colborne, Gabriel1; Search for: Leger, Serge1; Search for: Goutte, Cyril1 |
|---|
| Affiliation | - National Research Council Canada. Digital Technologies
|
|---|
| Format | Text, Article |
|---|
| Conference | Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, October 2022, Gyeongju, Republic of Korea |
|---|
| Abstract | We describe the systems developed by the National Research Council Canada for the French Cross-Domain Dialect Identification shared task at the 2022 VarDial evaluation campaign. We evaluated two different approaches to this task: SVM and probabilistic classifiers exploiting n-grams as features, and trained from scratch on the data provided; and a pre-trained French language model, CamemBERT, that we fine-tuned on the dialect identification task. The latter method turned out to improve the macro-F1 score on the test set from 0.344 to 0.430 (25% increase), which indicates that transfer learning can be helpful for dialect identification. |
|---|
| Publication date | 2022-10-06 |
|---|
| Publisher | Association for Computational Linguistics |
|---|
| Licence | |
|---|
| In | |
|---|
| Language | English |
|---|
| Peer reviewed | Yes |
|---|
| Export citation | Export as RIS |
|---|
| Report a correction | Report a correction (opens in a new tab) |
|---|
| Record identifier | 7d0c4e22-ed47-4519-a0d3-0f1c1b25b516 |
|---|
| Record created | 2022-10-19 |
|---|
| Record modified | 2022-10-21 |
|---|