Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text

Key-phrase extraction plays a useful a role in research areas of Information Systems IS like digital libraries. Short metadata like key phrases are beneficial for searchers to understand the concepts found in the documents. This paper evaluates the effectiveness of different supervised learning techniques on biomedical full-text: Sequential Minimal Optimization SMO and K-Nearest Neighbor, both of which could be embedded inside an information system for document search. The authors use these techniques to extract key phrases from PubMed and evaluate the performance of these systems using the holdout validation method. This paper compares different classifier techniques and performance differences between the full-text and it's abstract. Compared with the authors' previous work, which investigated the performance of Naive Bayes, Linear Regression and SVMreg1/2, this paper finds that SVMreg-1 performs best in key-phrase extraction for full-text, whereas Naive Bayes performs best for abstracts. These techniques should be considered for use in information system search functionality. Additional research issues also are identified.

[1]  Huaiqing Wang,et al.  Intelligent Agent-Based e-Learning System For Adaptive Learning , 2011, AMCIS.

[2]  Ze Ji,et al.  I think I have heard that one before : recurrence-based word learning with a robot , 2013 .

[3]  Kyungsook Han,et al.  Computational Identification of Interaction Motifs in Hepatitis C Virus NS5A and Human Proteins , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[4]  Igor Jurisica,et al.  Feature Selection for Improving Case-Based Classifiers on High-Dimensional Data Sets , 2005, FLAIRS.

[5]  I. Cicekli,et al.  Turkish keyphrase extraction using KEA , 2007, 2007 22nd international symposium on computer and information sciences.

[6]  Maher A. Sid-Ahmed,et al.  Investigating the Performance of Naive- Bayes Classifiers and K- Nearest Neighbor Classifiers , 2007 .

[7]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[8]  S.R. El-Beltagy,et al.  KP-Miner: A Simple System for Effective Keyphrase Extraction , 2006, 2006 Innovations in Information Technology.

[9]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[11]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[12]  K. Srinathan,et al.  Automatic keyphrase extraction from scientific documents using N-gram filtration technique , 2008, ACM Symposium on Document Engineering.

[13]  Stuart Aitken,et al.  Mining housekeeping genes with a Naive Bayes classifier , 2006, BMC Genomics.

[14]  Mohamed Salah Hamdi SOMSE: A Neural Network Based Approach to Web Search Optimization , 2008, Int. J. Intell. Inf. Technol..

[15]  Min Song,et al.  KPSpotter: a flexible information gain-based keyphrase extraction system , 2003, WIDM '03.

[16]  Min Song,et al.  Extraction of Key Phrases from Biomedical Full Text with Supervised Learning Techniques , 2009, AMCIS.

[17]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[18]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[19]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[20]  Riyaz Sikora,et al.  Iterative feature construction for improving inductive learning algorithms , 2009, Expert Syst. Appl..

[21]  Feng-Chia Li,et al.  Comparison of the Hybrid Credit Scoring Models Based on Various Classifiers , 2010, Int. J. Intell. Inf. Technol..

[22]  Ben Choi,et al.  Clustering Web Pages into Hierarchical Categories , 2007, Int. J. Intell. Inf. Technol..

[23]  John Seely Brown,et al.  Growing Up: Digital: How the Web Changes Work, Education, and the Ways People Learn , 2000 .

[24]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[25]  Gordon W. Paynter,et al.  An Evaluation of Document Keyphrase Sets , 2003, J. Digit. Inf..

[26]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[27]  Kuang-hua Chen,et al.  Automatic Identification of Subjects for Textual Documents in Digital Libraries , 1999, ArXiv.

[28]  Rustam M. Vahidov,et al.  Machine Learning-Based Demand Forecasting in Supply Chains , 2007, Int. J. Intell. Inf. Technol..

[29]  David R. Gilbert,et al.  An Empirical Comparison of Supervised Machine Learning Techniques in Bioinformatics , 2003, APBC.

[30]  Pavel Brazdil,et al.  Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks , 2006, IFIP AI.

[31]  Evangelos E. Milios,et al.  Narrative text classification for automatic key phrase extraction in web document corpora , 2005, WIDM '05.

[32]  Huaiqing Wang,et al.  An Agent-Based Approach to Process Management in E-Learning Environments , 2008, Int. J. Intell. Inf. Technol..

[33]  Stefan Böttcher,et al.  Supporting Text Retrieval by Typographical Term Weighting , 2007, Int. J. Intell. Inf. Technol..

[34]  Sophia Ananiadou,et al.  The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms , 1998, ECDL.