Abstract | We present an unsupervised learning algorithm that mines large text corpora for patterns that express implicit semantic relations. For a given input word pair <em>X:Y</em> with some unspecified semantic relations, the corresponding output list of patterns (P<sub><em>1</em></sub>,…,P<sub><em>m</em></sub>)is ranked according to how well each pattern <em>P<sub>i</sub></em> expresses the relations between<em> X </em>and <em>Y</em> . For example, given ostrich = <em>X</em> and bird = <em>Y</em> , the two highest ranking output patterns are "<em>X</em> is the largest <em>Y</em>" and "<em>Y</em> such as the <em>X</em>". The output patterns are intended to be useful for finding further pairs with the same relations, to support the construction of lexicons, ontologies, and semantic networks. The patterns are sorted by <em>pertinence</em>, where the pertinence of a pattern <em>P<sub>i</sub></em> for a word pair <em>X:Y</em> is the expected relational similarity between the given pair and typical pairs for <em>P<sub>i</sub></em> . The algorithm is empirically evaluated on two tasks, solving multiple-choice SAT word analogy questions and classifying semantic relations in noun-modifier pairs. On both tasks, the algorithm achieves state-of- the-art results, performing significantly better than several alternative pattern ranking algorithms, based on tf-idf. |
---|