Résumé | Knowing the degree of semantic contrast, or oppositeness, between words has widespread application in natural language processing, including machine translation, and information retrieval. Manually-created lexicons focus on strict opposites, such as antonyms, and have limited coverage. On the other hand, only a few automatic approaches have been proposed, and none have been comprehensively evaluated. Even though oppositeness may seem to be a simple and fairly intuitive idea at first glance, any deeper analysis quickly reveals that it is in fact a complex and heterogeneous phenomenon. In this paper we present a large crowdsourcing experiment to determine the amount of human agreement on the concept of oppositeness and its different kinds. In the process, we flesh out key features of different kinds of opposites and also determine their relative prevalence. We then present an automatic and empirical measure of lexical contrast that combines corpus statistics with the structure of a published thesaurus. Using four different datasets, we evaluated our approach on two different tasks, solving closest-to-opposite questions and distinguishing synonyms from antonyms. The results are analyzed across four parts of speech and across five different kinds of opposites.We show that our measure of lexical contrast obtains high precision and large coverage, outperforming existing methods. |
---|