Arabic Documents classification method a Step towards Efficient Documents Summarization

The massive growth of online information obliged the availability of a thorough research in the domain of automatic text summarization within the Natural Language Processing (NLP) community. To reach this goal, different approaches should be integrated and collaborated. One of these approaches is the classification od documents. Therefore, the aim of this paper is to propose a successful framework for agricultural documents classification as a step forward for a language independent automatic summarization approach. The main target of our serial research is to propose a complete novel framework which not only responses to the question, but also gives the user an opportunity to find additional information that is related to the question. We implemented the proposed method. As a case study, the implemented method is applied on Arabic text in the agriculture field. The implemented approach succeeded in classifying the documents submitted by the user. The approach results have been evaluated using Recall, Precision and F-score measures.

[1]  Sung-Hyon Myaeng,et al.  Text genre classification with genre-revealing and subject-revealing features , 2002, SIGIR '02.

[2]  John M. Swales,et al.  Genre Analysis: English in Academic and Research Settings , 1993 .

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[6]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[7]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[8]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[9]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[10]  M. Benkhalifa,et al.  Text categorization using the semi-supervised fuzzy c-means algorithm , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[11]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[12]  Sarah Steiner Gender, Genre, and Writing Style in Formal Written Texts , 2014 .

[13]  Tet Hin Yeap,et al.  ECG Beat Classification By A Neural Network , 1990, [1990] Proceedings of the Twelfth Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[14]  Johannes Keizer,et al.  AGROVOC Web Services: Improved, Real-Time Access to an Agricultural Thesaurus , 2006 .

[15]  N. Fairclough Discourse and social change , 1992 .

[16]  Marina Santini A Shallow Approach To Syntactic Feature Extraction For Genre Classification , 2003 .

[17]  Børge Svingen Using Genetic Programming for Document Classification , 1998, FLAIRS Conference.

[18]  Vijay K. Bhatia,et al.  Analysing Genre: Language Use in Professional Settings , 2014 .

[19]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[20]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[21]  Dino Isa,et al.  Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine , 2008, IEEE Transactions on Knowledge and Data Engineering.

[22]  ThrunSebastian,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000 .

[23]  Andreas Rauber,et al.  Integrating automatic genre analysis into digital libraries , 2001, JCDL '01.

[24]  D. S. Guru,et al.  Symbolic representation of text documents , 2010, Bangalore Compute Conf..

[25]  Inger Askehave Communicative Purpose as Genre Determinant , 2017 .

[26]  Choochart Haruechaiyasak,et al.  Article Recommendation Based on a Topic Model for Wikipedia Selection for Schools , 2008, ICADL.

[27]  N. Kando,et al.  Analysis of Multi-Document Viewpoint Summarization Using Multi-Dimensional Genres , 2004 .