Order Estimation of Japanese Paragraphs by Supervised Machine Learning and Various Textual Features

Abstract In this paper, we propose a method to estimate the order of paragraphs by supervised machine learning. We use a support vector machine (SVM) for supervised machine learning. The estimation of paragraph order is useful for sentence generation and sentence correction. The proposed method obtained a high accuracy (0.84) in the order estimation experiments of the first two paragraphs of an article. In addition, it obtained a higher accuracy than the baseline method in the experiments using two paragraphs of an article. We performed feature analysis and we found that adnominals, conjunctions, and dates were effective for the order estimation of the first two paragraphs, and the ratio of new words and the similarity between the preceding paragraphs and an estimated paragraph were effective for the order estimation of all pairs of paragraphs.

[1]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[2]  Masaki Murata,et al.  Word Order Acquisition from Corpora , 2000, COLING.

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Danushka Bollegala,et al.  A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization , 2006, Annual Meeting of the Association for Computational Linguistics.

[5]  Nikiforos Karamanis,et al.  Stochastic Text Structuring Using the Principle of Continuity , 2002, INLG.

[6]  Dino Pedreschi,et al.  Efficient Mining of Temporally Annotated Sequences , 2006, SDM.

[7]  Masaki Murata,et al.  Order estimation of Japanese paragraphs by supervised machine learning , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[8]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[9]  Masaki Murata,et al.  Japanese Sentence Order Estimation using Supervised Machine Learning with Rich Linguistic Clues , 2013, Int. J. Comput. Linguistics Appl..

[10]  Masato Murata Automatic Detection of Mis-Spelled Japanese Expressions Using a New Method for Automatic Extraction of Negative Exemples Based on Positive Examples , 2002 .

[11]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[12]  Naoaki Okazaki,et al.  Improving Chronological Sentence Ordering by Precedence Relation , 2004, COLING.

[13]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .