Paraphrase Identification on the Basis of Supervised Machine Learning Techniques

This paper presents a machine learning approach for paraphrase identification which uses lexical and semantic similarity information. In the experimental studies, we examine the limitations of the designed attributes and the behavior of three machine learning classifiers. With the objective to increase the final performance of the system, we scrutinize the influence of the combination of lexical and semantic information, as well as techniques for classifier combination.

[1]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[2]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[5]  Marius Pasca,et al.  Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web , 2005, IJCNLP.

[6]  Kam-Fai Wong,et al.  Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005, Proceedings , 2005, IJCNLP.

[7]  Chris Brockett,et al.  Support Vector Machines for Paraphrase Identification and Corpus Construction , 2005, IJCNLP.

[8]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[9]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[10]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[13]  Satoshi Sekine,et al.  Automatic paraphrase acquisition from news articles , 2002 .

[14]  Manuel Palomar,et al.  A Maximum Entropy-based Word Sense Disambiguation System , 2002, COLING.

[15]  Ido Dagan,et al.  Evaluating Predictive Uncertainty, Visual Objects Classification and Recognising textual entailment : selected proceedings of the First PASCAL Machine Learning Challenges Workshop , 2006 .

[16]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[17]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[18]  Ted Pedersen,et al.  Assessing System Agreement and Instance Difficulty in the Lexical , 2002, SENSEVAL.

[19]  Zornitsa Kozareva,et al.  The Role and Resolution of Textual Entailment in Natural Language Processing Applications , 2006, NLDB.

[20]  Ido Dagan,et al.  PROBABILISTIC TEXTUAL ENTAILMENT: GENERIC APPLIED MODELING OF LANGUAGE VARIABILITY , 2004 .

[21]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[22]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[23]  Ted Pedersen,et al.  Using semantic relatedness for word sense disambiguation , 2002 .

[24]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner , 2007 .