论文信息 - A hybrid method for clause splitting in unrestricted English texts

A hybrid method for clause splitting in unrestricted English texts

It is important to know the structure of the sentence for many NLP tasks. In this paper we propose a hybrid method for clause splitting in unrestricted English texts which requires less human work than existing approaches. The results of a machine learning algorithm, trained on an annotated corpus, are processed by a shallow rule-based module in order to improve the accuracy of the method. The evaluation of the results showed that the machine learning algorithm is useful for identification of clause’s boundaries and the rule-based module improves the results. Using some very simple rules we can report precision of around 88%.

Constantin Orasan | Constantin Orasan

[1] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[2] Walter Daelemans,et al. Resolving PP attachment Ambiguities with Memory-Based Learning , 1997, CoNLL.

[3] Harris Papageorgiou. Clause recognition in the framework of alignment , 1997 .

[4] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[5] Eva I. Ejerhed,et al. Finding Clauses in Unrestricted Text by Finitary and Stochastic Methods , 1988, ANLP.

[6] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..

[7] Geoffrey Sampson,et al. English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[8] J. Veenstra,et al. Fast NP Chunking using Memory-Based learning techniques , 1998 .

[9] Adwait Ratnaparkhi,et al. A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[10] Walter Daelemans,et al. TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[11] Jan Svartvik,et al. A __ comprehensive grammar of the English language , 1988 .