Text Mining and Knowledge Discovery from Big Data: Challenges and Promise

With the fast development of networking, data storage, and the data collection capacity, Big Data is now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents text mining and the ways used to categorize document structure techniques in big data. This subject poses a big challenge when it comes to guaranteeing the quality of extracted features in text documents to describe user interests or preferences due to large amounts of noise. This subject has many models and algorithms but still needs more to achieve best results for users, making this an open issue that needs more research.

[1]  B. Keller A Logic for Representing Grammatical Knowledge , 1992, ECAI.

[2]  Philip J. Hayes,et al.  TCS: a shell for content-based text categorization , 1990, Sixth Conference on Artificial Intelligence for Applications.

[3]  Bernard Schwartz,et al.  Turning information into knowledge , 1977 .

[4]  Alexander A. Morgan,et al.  Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles , 2002, SKDD.

[5]  Lior Rokach,et al.  Context-Sensitive Medical Information Retrieval , 2004, MedInfo.

[6]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[7]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[8]  Lynette Hirschman,et al.  Evaluating Message Understanding Systems: An Analysis of the Third Message Understanding Conference (MUC-3) , 1993, CL.

[9]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[10]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Ada Wai-Chee Fu,et al.  Incremental Document Clustering for Web Page Classification , 2002 .

[12]  Alex Pentland,et al.  Big Data and Management , 2014 .

[13]  Philip J. Hayes,et al.  CONSTRUE/TIS: A System for Content-Based Indexing of a Database of News Stories , 1990, IAAI.

[14]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[15]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[16]  Peter Neuhaus,et al.  The Complexity of Recognition of Linguistically Adequate Dependency Grammars , 1997, ACL.

[17]  Ronen Feldman,et al.  TEG—a hybrid approach to information extraction , 2005, Knowledge and Information Systems.

[18]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[19]  Kian-Lee Tan,et al.  epiC: an extensible and scalable system for processing Big Data , 2014, The VLDB Journal.

[20]  Walter Daelemans,et al.  Introduction to Special Issue on Machine Learning Approaches to Shallow Parsing , 2002, J. Mach. Learn. Res..

[21]  Fatima El Jamiy,et al.  The potential and challenges of Big data - Recommendation systems next level application , 2015, ArXiv.

[22]  Philip J. Hayes,et al.  A News Story Categorization System , 1988, ANLP.

[23]  Christian Bach,et al.  Data Mining and Warehousing , 2014 .

[24]  Aravind K. Joshi,et al.  A Formal Look at Dependency Grammars and Phrase-Structure Grammars, with Special Consideration of Word-Order Phenomena , 1994, ArXiv.

[25]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[26]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.