Improved Automatic Keyword Extraction Given More Linguistic Knowledge

In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and n-grams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the PoS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.

[1]  M. Alter,et al.  Epidemiology of hepatitis B in Europe and worldwide. , 2003, Journal of hepatology.

[2]  Éric Gaussier,et al.  Towards Automatic Extraction of Monolingual and Bilingual Terminology , 1994, COLING.

[3]  Branimir Boguraev,et al.  Applications of term identification technology: domain description and content characterisation , 1999, Natural Language Engineering.

[4]  J. Cleveland,et al.  Guidelines for infection control in dental health-care settings--2003. , 2003, MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports.

[5]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[6]  P. Tiollais,et al.  Hepatitis B virus. , 1991, Scientific American.

[7]  Christopher J. Fox,et al.  Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[8]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[9]  Reza Malekzadeh,et al.  HEPATITIS B IN IRAN , 2000 .

[10]  N. Leung,et al.  Chronic hepatitis B virus infection in Asian countries , 2000, Journal of gastroenterology and hepatology.

[11]  Mohammad Reza Zali,et al.  Epidemiology of hepatitis B in the Islamic Republic of Iran , 2021, Eastern Mediterranean Health Journal.

[12]  H. Farzadegan,et al.  Epidemiology of viral hepatitis among Iranian population--a viral marker study. , 1980, Annals of the Academy of Medicine, Singapore.

[13]  Jerome I. Tokars,et al.  Recommendations for preventing transmission of infections among chronic hemodialysis patients , 2001 .

[14]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[15]  E. Jury EASL International Consensus Conference on Hepatitis C , 1999, Journal of hepatology.

[16]  E. Wong Health Care Epidemiology , 2004 .

[17]  Ralf Steinberger Cross-lingual keyword assignment , 2001, Proces. del Leng. Natural.

[18]  H. Margolis,et al.  Strategies to prevent and control hepatitis B and C virus infections: a global perspective. , 1999, Vaccine.

[19]  Nina Wacholder,et al.  Document Processing with LinkIT , 2000, RIAO.

[20]  K. Azimi,et al.  CAUSES OF CIRRHOSIS IN A SERIES OF PATIENTS AT A UNIVERSITY HOSPITAL IN TEHRAN , 2002 .

[21]  F. André,et al.  Hepatitis B epidemiology in Asia, the Middle East and Africa. , 2000, Vaccine.

[22]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[23]  M. Sabri,et al.  Hepatitis B surface antigen and anti-hepatitis C antibodies among blood donors in the Islamic Republic of Iran. , 2000, Eastern Mediterranean health journal = La revue de sante de la Mediterranee orientale = al-Majallah al-sihhiyah li-sharq al-mutawassit.

[24]  Reza Malekzadeh,et al.  REASSESSMENT OF THE ROLE OF HEPATITIS B AND C VIRUSES IN POST NECROTIC CIRRHOSIS AND CHRONIC HEPATITIS IN SOUTHERN IRAN , 1999 .

[25]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[26]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[27]  Alter Mj,et al.  The epidemiology of viral hepatitis in the United States. , 1994 .

[28]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[29]  H. Margolis,et al.  Hepatitis B: Evolving Epidemiology and Implications for Control , 1991, Seminars in liver disease.