TopicRank : ordonnancement de sujets pour l'extraction automatique de termes-clés

RESUME. Les termes-cles sont les mots ou les expressions polylexicales qui representent le contenu principal d’un document. Ils sont utiles pour diverses applications telles que l’indexation automatique ou le resume automatique, mais ne sont cependant pas disponibles pour la plupart des documents. La quantite de ces documents etant de plus en plus importante, l’extraction manuelle des termes-cles n’est pas envisageable et la tâche d’extraction automatique de termes-cles suscite alors l’interet des chercheurs. Dans cet article nous presentons TopicRank, une methode non supervisee a base de graphe pour l’extraction de termes-cles. Cette methode groupe les termes-cles candidats en sujets, ordonne les sujets et extrait de chacun des meilleurs sujets le terme-cle candidat qui le represente le mieux. Les experiences realisees montrent une amelioration significative vis-a-vis de l’etat de l’art des methodes a base de graphe pour l’extraction non supervisee de termes-cles.

[1]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[2]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[3]  Paul Over,et al.  Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Rui Wang,et al.  How Preprocessing Affects Unsupervised Keyphrase Extraction , 2014, CICLing.

[6]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[7]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[8]  Florian Boudin,et al.  TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction , 2013, IJCNLP.

[9]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[10]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[11]  Xuanjing Huang,et al.  Keyphrase Extraction from Online News Using Binary Integer Programming , 2011, IJCNLP.

[12]  Ian H. Witten,et al.  Domain-independent automatic keyphrase indexing with small training sets , 2008, J. Assoc. Inf. Sci. Technol..

[13]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[14]  Iraklis Varlamis,et al.  SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs , 2010, COLING.

[15]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[16]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[17]  Günter Neumann,et al.  DFKI KeyWE: Ranking Keyphrases Extracted from Scientific Articles , 2010, SemEval@ACL.

[18]  M. Teresa Cabré Castellví,et al.  Automatic term detection: A review of current systems , 2001 .

[19]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[20]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[21]  Denyse Baillargeon,et al.  Bibliographie , 1929 .

[22]  Florian Boudin,et al.  Keyphrase Extraction for N-best Reranking in Multi-Sentence Compression , 2013, HLT-NAACL.

[23]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[24]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[25]  Joongmin Choi,et al.  Web Document Clustering by Using Automatic Keyphrase Extraction , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[26]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[27]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[28]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[29]  Zornitsa Kozareva,et al.  Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing , 2013 .

[30]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[32]  Ian H. Witten,et al.  Proceedings of the third ACM conference on Digital libraries , 1998 .

[33]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[34]  Darren Pearce A Comparative Evaluation of Collocation Extraction Techniques , 2002, LREC.

[35]  Patrick Paroubek,et al.  Indexation libre et contrôlée d’articles scientifiques. Présentation et résultats du défi fouille de textes DEFT2012 (Controlled and free indexing of scientific papers. Presentation and results of the DEFT2012 text-mining challenge) [in French] , 2012, DEFT@TALN.

[36]  B. Magnini,et al.  A Keyphrase-Based Approach to Summarization : the LAKE System at DUC-2005 , 2005 .

[37]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.