A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models

Although information extraction and data mining appear together in many applications, their interface in most current systems would better be described as serial juxtaposition than as tight integration. Information extraction populates slots in a database by identifying relevant subsequences of text, but is usually not aware of the emerging patterns and regularities in the database. Data mining methods begin from a populated database, and are often unaware of where the data came from, or its inherent uncertainties. The result is that the accuracy of both suffers, and significant mining of complex text sources is beyond reach. This position paper proposes the use of unified, relational, undirected graphical models for information extraction and data mining, in which extraction decisions and data-mining decisions are made in the same probabilistic “currency,” with a common inference procedure—each component thus being able to make up for the weaknesses of the other and therefore improving the performance of both. For example, data mining run on a partiallyfilled database can find patterns that provide “topdown” accuracy-improving constraints to information extraction. Information extraction can provide a much richer set of “bottom-up” hypotheses to data mining if the mining is set up to handle additional uncertainty information from extraction. We outline an approach and describe several models, but provide no experimental results.

[1]  Dmitry Zelenko,et al.  Kernel methods for relation extraction , 2003 .

[2]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[3]  David D. Jensen Statistical challenges to inductive inference in linked data , 1999, AISTATS.

[4]  Tim Leek,et al.  Information Extraction Using Hidden Markov Models , 1997 .

[5]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[6]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[7]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[8]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[9]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[10]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[11]  Mark Craven,et al.  Representing Sentence Structure in Hidden Markov Models for Information Extraction , 2001, IJCAI.

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  R. Mooney,et al.  Learning to Combine Trained Distance Metrics for Duplicate Detection in Databases , 2002 .

[14]  Jennifer Neville,et al.  Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[15]  Kevin D. Ashley,et al.  Improving the representation of legal case texts with information extraction methods , 2001, ICAIL '01.

[16]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[17]  Dan Klein,et al.  Named Entity Recognition with Character-Level Models , 2003, CoNLL.

[18]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[19]  William W. Cohen,et al.  Joins that Generalize: Text Classification Using WHIRL , 1998, KDD.

[20]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[21]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[22]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[23]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[24]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[27]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[28]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[29]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[30]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[31]  Andrew McCallum,et al.  Learning with Scope, with Application to Information Extraction and Classification , 2002, UAI.

[32]  S. Griffis EDITOR , 1997, Journal of Navigation.

[33]  Raymond J. Mooney,et al.  A Mutually Beneficial Integration of Data Mining and Information Extraction , 2000, AAAI/IAAI.

[34]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[35]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[36]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[37]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[38]  C. McGreavy,et al.  Data Mining and Knowledge Discovery for Process Monitoring and Control , 1999 .

[39]  Seán Slattery,et al.  Data Mining on Symbolic Knowledge Extracted from the Web , 2000 .

[40]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[41]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[42]  Dan Roth,et al.  Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[43]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[44]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[45]  Scott Miller,et al.  A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[46]  Jennifer Neville,et al.  Randomization Tests for Relational Learning , 2003 .

[47]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[48]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[49]  Thomas S. Morton,et al.  Coreference for NLP Applications , 2000, ACL.

[50]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[51]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[52]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[53]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[54]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[55]  Pedro M. Domingos,et al.  Relational Markov models and their application to adaptive web navigation , 2002, KDD.

[56]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[57]  A. Karimi,et al.  Master‟s thesis , 2011 .