SBA-term: Sparse Bilingual Association for Terms

Bilingual semantic term association is very useful in cross-language information retrieval, statistical machine translation, and many other applications in natural language processing. In this paper, we present a method, named SBA-term, which applies sparse linear regression (Lasso, Least Squares with l1 penalty) and L2 rescaling for design matrix to the task of bilingual term association. The approach hinges on formulating the task as a feature selection problem within a classification framework. Our experimental results indicate that our novel proposed method is more efficient than co-occurrence at extracting relevant bilingual terms semantic associations. In addition, our approach connects the vibrant area of sparse machine learning to an important problem of natural language processing.