Paper recognized for lasting impact on natural language processing

AAAI recognized Prof. Rada Mihalcea's 2006 paper which devised a way to semantically compare short texts.
Portrait of Rada Mihalcea
Rada Mihalcea

Research by Janice M. Jenkins Professor of Computer Science Rada Mihalcea has been recognized by the Association for the Advancement of Artificial Intelligence (AAAI) with a Classic Paper Award Honorable Mention. Her 2006 paper “Corpus-based and Knowledge-based Measures of Text Semantic Similarity” was co-authored with Pacific Northwest National Laboratory researcher Courtney Corley and Istituto per la Ricerca Scientifica e Tecnologica researcher Carlo Strapparava.  

The paper is the most influential among all those published at AAAI 2006, with the highest number of citations for that year’s publications. 

The paper presented a method to measure semantic similarity, which scores how related the meanings of two texts are. The measurement can be compared to lexical similarity, which more straightforwardly compares the form of two words or phrases without accounting for their relationships with other words. This project was novel for its use of corpus-based and knowledge-based measures of similarity, while previous work at the time focused largely on synonym tests between individual words.

This research was later used as a component in various other tasks and applications such as finding relations between text spans to construct text graphs; identify text plagiarism, or to find similar text constructs in literature; support student answer evaluation in educational applications; construct knowledge graphs; cluster similar texts; and more. 

The method introduced by Mihalcea’s paper has been an inspiration for a large body of work on the topic, including several annual evaluations on semantic text similarity organized by the ACL Semeval, an active line of research on textual entailment (later renamed as natural language inference), and more recently the quickly growing research area of neural models for semantic text similarity including models such as BERT, ROBERTA, and others.