Indexation automatique des textes arabes: Etat de l’art Automatic indexing of Arabic documents: State of the art

L’indexation des documents est une phase cruciale dans le processus de fouille de textes. Elle permet de representer les documents par les descripteurs les plus pertinents vis-a-vis de leurs contenus. A ce propos, plusieurs approches sont proposees dans la litterature, notamment pour l’anglais, mais elles sont inexploitables par les documents en langue arabe en raison de ses caracteristiques specifiques, de sa richesse morphologique et grammaticale et de son vocabulaire. Cet article dresse un etat de l’art des methodes d’indexation et de leurs apports a la langue arabe. Nous proposons une categorisation des travaux selon les approches et les methodes les plus utilisees en indexation automatique de documents textuels. Nous avons adopte une selection qualitative des articles. Ainsi, avons-nous retenu les travaux constituant des contributions significatives au niveau de l’indexation et presentant des resultats considerables.

[1]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[2]  Makoto Nagao,et al.  General Word Sense Disambiguation Method Based on a Full Sentential Context , 1998, WordNet@ACL/COLING.

[3]  Bassam Al-Salemi,et al.  Statistical Bayesian Learning for Automatic Arabic Text Categorization , 2011 .

[4]  Stephen E. Robertson,et al.  Experimentation as a way of life: Okapi at TREC , 2000, Inf. Process. Manag..

[5]  J. Watada,et al.  An evidential reasoning based LSA approach to document classification for knowledge acquisition , 2010, 2010 IEEE International Conference on Industrial Engineering and Engineering Management.

[6]  Abdelwadood Moh'd. Mesleh Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study , 2007, SCSS.

[7]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[8]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[9]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10]  Saleh Alsaleem,et al.  Automated Arabic Text Categorization Using SVM and NB , 2011, Int. Arab. J. e Technol..

[11]  Keke Chen,et al.  Model Formulation: A Document Clustering and Ranking System for Exploring MEDLINE Citations , 2007, J. Am. Medical Informatics Assoc..

[12]  Mohammed J. Bawaneh,et al.  Arabic Text Classification using K-NN and Naive Bayes , 2008 .

[13]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[14]  Elsayed M. Saad,et al.  Toward an ARABIC Stop-Words List Generation , 2012 .

[15]  Xiaojun Wan,et al.  CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction , 2008, COLING.

[16]  Nashat Mansour,et al.  An auto-indexing method for Arabic text , 2008, Inf. Process. Manag..

[17]  Bin Tang,et al.  Document Representation and Dimension Reduction for Text Clustering , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[18]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[19]  Fadi Thabtah,et al.  Naïve Bayesian Based on Chi Square to Categorize Arabic Data , 2009 .

[20]  Fouzi Harrag,et al.  Stemming as a feature reduction technique for Arabic Text Categorization , 2011, 2011 10th International Symposium on Programming and Systems.

[21]  Tarek F. Gharib,et al.  Arabic Text Classification Using Support Vector Machines , 2009, Int. J. Comput. Their Appl..

[22]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[23]  Mohamed S. Abdel-Wahab,et al.  An Intelligent System For Arabic Text Categorization , 2006 .

[24]  Reda A. El-Khoribi,et al.  An Intelligent System Based on Statistical Learning For Searching in Arabic Text , 2006 .

[25]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[26]  Rehab Duwairi A Distance-based Classifier for Arabic Text Categorization , 2005, DMIN.

[27]  Abdulmohsen Al-Thubaity,et al.  Automatic Arabic Text Classification , 2008 .

[28]  Amine Bensaid,et al.  Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm , 2004 .

[29]  G. Kanaan,et al.  Support vector machine text classification system: Using Ant Colony Optimization based feature subset selection , 2008, 2008 International Conference on Computer Engineering & Systems.

[30]  Joseph Dichy,et al.  An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents , 2010, CICLing.

[31]  Alaa M. El-Halees,et al.  Arabic Text Classification Using Maximum Entropy , 2015 .

[32]  Driss Mammass,et al.  A Semantic Proximity Based System of Arabic Text Indexation , 2010, ICISP.

[33]  Rehab M. Duwairi Machine learning for Arabic text categorization: Research Articles , 2006 .

[34]  Joseph Dichy,et al.  Levée d'ambigüité par la méthode d'exploration contextuelle: la séquence 'alif-nûn (ان) en arabe , 2009 .

[35]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[36]  Trevor Cohen,et al.  MEDRank: Using graph-based concept ranking to index biomedical texts , 2011, Int. J. Medical Informatics.

[37]  Ismail Hmeidi,et al.  Performance of KNN and SVM classifiers on full word Arabic articles , 2008, Adv. Eng. Informatics.

[38]  Fadi Thabtah,et al.  VSMs with K-Nearest Neighbour to Categorise Arabic Text Data , 2008 .

[39]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[40]  Zakaria Suliman Zubi Using some web content mining techniques for Arabic text classification , 2009 .

[41]  Adrien Bougouin État de l'art des méthodes d'extraction automatique de termes-clés , 2013 .

[42]  Riyad Al-Shalabi,et al.  Improving KNN Arabic Text Classification with N-Grams Based Document Indexing , 2008 .

[43]  Laila Khreisat,et al.  Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study , 2006, DMIN.

[44]  Mustafa Yaseen,et al.  Using EM for Text Classification on Arabic , 2009 .

[45]  Joseph Dichy,et al.  The Automatic Categorization of Arabic Documents by Boosting Decision Trees , 2009, 2009 Fifth International Conference on Signal Image Technology and Internet Based Systems.

[46]  Xuedong Gao,et al.  Study of text classification methods for data sets with huge features , 2010, 2010 2nd International Conference on Industrial and Information Systems.

[47]  Jaber Alwedyan,et al.  Categorize arabic data sets using multi-class classification based on association rule approach , 2011, ISWSA '11.

[48]  Izzat Alsmadi,et al.  Indexing of Arabic documents automatically based on lexical analysis , 2012, ArXiv.

[49]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[50]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[51]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[52]  Mohammed Benkhalifa,et al.  Integrating WordNet knowledge to supplement training data in semi‐supervised agglomerative hierarchical clustering for text categorization , 2001, Int. J. Intell. Syst..

[53]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[54]  Abdelwadood Mesleh,et al.  Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System , 2007 .