A Proposal for the Automatic Distinction of Homomorphic Idiomatic and Non-idiomatic Phrases in WordNet

Idiomatic phrases composed of several lexemes pose various problems for NLP. It is often not obvious whether a given sequence of words is intended for idiomatic or literal interpretation. We propose a solution that detects idioms based on the semantic classes of their constituents. After annotating the idioms in WordNet with this information, they can be compiled into a tree structure to efficiently identify the constructions.