| Abstract | This chapter explores fundamental ideas and challenges in using statistical and computational methods for the analysis of vowel harmony from text corpora. These methods are centered on the quantification and visualization of the degree and type of harmony in a language, which can be used for cross-linguistic comparison, measuring historical change in harmony, and unsupervised identification of previously unknown harmony patterns. Models using these methods can differ greatly in robustness, interpretability, and generality. A successful and appropriate model requires consideration of many factors, including the types of corpora to be analyzed, the effect of morphological structure and phonological alternations on harmony, and the potential for anomalous patterns in proper names, loanwords, onomatopoeia, and similar types of words. The chapter includes a general overview of data-driven approaches, important considerations in selecting and using data sources, linguistic considerations and how different models handle them, and work on data-driven visualizations of harmony. |
|---|