Conjoint Mining of Data and Content with Applications in Business, Bio-medicine, Transport Logistics and Electrical Power Systems

Digital information within an enterprise consists of (1) structured data and (2) unstructured content. The structured data includes enterprise and business data like sales, customers, products, accounts, inventory and enterprise assets, etc. while the content includes contracts, reports, emails, customer opinions, transcribed calls, on-line inquires, complements and complaints. Further, cutting edge businesses also using GPS tracking or surveillance monitors as well as sensor technologies for productivity, performance and efficiency measures, and these are provided by outsourcers etc. Similarly in the Biomedical area, resources can be structured data say in Swiss- Prot or unstructured text information in journal articles stored in content repositories such as PubMed. The structured data and the unstructured content generally reside in entirely separate repositories with the former being managed by a DBMS and the latter by a content manager frequently provided by an outsourcer or vendor [76]. This separation is undesirable since the information content of these sources is complementary. Further, each outsourcer or vendor keep the data on their own Cloud, and data are not sharable between the vendor systems, and most vendor system were not integrated with the enterprise systems, and leaves the organization to consolidate the data and information manually for data analytics. Effective knowledge and information use requires seamless access and intelligent analysis of information in its totality to allow enterprises to gain enhanced critical insights. This is becoming even more important, as the proportion of structured to unstructured information has shifted from 50-50 in the 1960s to 5-95 today [1]. Unless we can effectively utilize the unstructured content conjointly with the structured data, we will only obtain very limited and shallow knowledge discovery from an increasingly narrow slice of information. The techniques developed in our research will then be used to address significant issues in three application areas, but potential applications with significant impact are much more extensive.

[1]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[2]  James Nga-Kwok Liu,et al.  Inter-transactional association rules for multi-dimensional contexts for prediction and their application to studying meteorological data , 2001, Data Knowl. Eng..

[3]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[4]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[5]  Steffen Staab,et al.  CREAM: creating relational metadata with a component-based, ontology-driven annotation framework , 2001, K-CAP '01.

[6]  Tharam S. Dillon,et al.  UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[7]  Lorraine K. Tanabe,et al.  Tagging gene and protein names in biomedical text , 2002, Bioinform..

[8]  David E. Millard,et al.  Automatic Ontology-Based Knowledge Extraction from Web Documents , 2003, IEEE Intell. Syst..

[9]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Peter Willett,et al.  Protein Structures and Information Extraction from Biological Texts: The PASTA System , 2003, Bioinform..

[11]  John Riedl,et al.  Insert movie reference here: a system to bridge conversation and item-oriented web sites , 2006, CHI.

[12]  Elizabeth Chang,et al.  Searching Services "on the Web": A Public Web Services Discovery Approach , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[13]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[14]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[15]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[16]  Fabio Vitali,et al.  Web information systems , 1998, CACM.

[17]  Jianmin Jia,et al.  What You Don'T Know About Customer-Perceived Quality: the Role of Customer Expectation Distributions , 1999 .

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  T. Dillon,et al.  Electricity price short-term forecasting using artificial neural networks , 1999 .

[20]  Tharam S. Dillon,et al.  IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding , 2006, PAKDD.

[21]  Alexandre Termier,et al.  TreeFinder: a first step towards XML data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[22]  SchwartzRichard,et al.  An Algorithm that Learns Whats in a Name , 1999 .

[23]  Tharam S. Dillon,et al.  Mining Unordered Distance-Constrained Embedded Subtrees , 2008, Discovery Science.

[24]  Tharam S. Dillon,et al.  Keynote 2: Trust and Reputation Relationships in Service-Oriented Environments , 2005, ICITA.

[25]  Tharam S. Dillon,et al.  Protein ontology: vocabulary for protein data , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[26]  Karl Aberer,et al.  Emergent Semantics Systems , 2007, ESOE.

[27]  Mukesh K. Mohania,et al.  Efficiently linking text documents with relevant structured information , 2006, VLDB.

[28]  Tharam S. Dillon,et al.  Using Competitive Learning between Symbolic Rules as a Knowledge Learning Method , 2008, IFIP AI.

[29]  Fedja Hadzic,et al.  Mining Distance-Constrained Embedded Subtrees , 2011 .

[30]  R. Oliver A Cognitive Model of the Antecedents and Consequences of Satisfaction Decisions , 1980 .

[31]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[32]  J. Wenny Rahayu,et al.  Ontologies on the MOVE , 2004, DASFAA.

[33]  F. Hadzic,et al.  MB3-Miner: efficiently mining eMBedded subTREEs using Tree Model Guided candidate generation , 2005 .

[34]  Hiroki Arimura,et al.  Discovering Frequent Substructures in Large Unordered Trees , 2003, Discovery Science.

[35]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[36]  Fedja Hadzic,et al.  Tree Mining Application to Matching of Heterogeneous Knowledge Representations , 2007 .

[37]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[38]  Tharam S. Dillon,et al.  Knowledge acquisition of conjunctive rules using multilayered neural networks , 1993, Int. J. Intell. Syst..

[39]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[40]  Elizabeth Chang,et al.  Medical ontologies to support human disease research and control , 2005, Int. J. Web Grid Serv..

[41]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[42]  Michael L. Brodie Computer Science 2.0: A New World of Data Management , 2007, VLDB.

[43]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[44]  David W. Embley,et al.  Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[45]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[46]  Tharam S. Dillon,et al.  Optimal Operations Planning in a Large Hydro-Thermal Power System , 1983 .

[47]  Hans Weigand,et al.  An XML-Enabled Association Rule Framework , 2003, DEXA.

[48]  Michel C. A. Klein,et al.  Ontology Versioning and Change Detection on the Web , 2002, EKAW.

[49]  Tharam S. Dillon,et al.  Stochastic optimization and modelling of large hydrothermal systems for long-term regulation , 1980 .

[50]  Maurice D. Mulvenna,et al.  Discovering Internet marketing intelligence through online analytical web usage mining , 1998, SGMD.

[51]  Tharam S. Dillon,et al.  Towards the Mental Health Ontology , 2008, 2008 IEEE International Conference on Bioinformatics and Biomedicine.

[52]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[53]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[54]  Tharam S. Dillon,et al.  A Practical Approach to the Derivation of a Materialized Ontology View , 2004 .

[55]  Tharam S. Dillon,et al.  Conjoint Data Mining of Structured and Semi-structured Data , 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid.

[56]  Jian Su,et al.  Recognizing Names in Biomedical Texts: a Machine Learning Approach , 2004 .

[57]  Kun Deng,et al.  Research of Web Pages Categorization , 2007 .

[58]  Tharam S. Dillon,et al.  SLA-Based Trust Model for Cloud Computing , 2010, 2010 13th International Conference on Network-Based Information Systems.

[59]  Matthew Hurst,et al.  Deriving marketing intelligence from online discussion , 2005, KDD '05.

[60]  Tharam S. Dillon,et al.  Razor: mining distance-constrained embedded subtrees , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[61]  Tharam S. Dillon,et al.  Theoretical and Practical Considerations of Uncertainty and Complexity in Automated Knowledge Acquisition , 1995, IEEE Trans. Knowl. Data Eng..

[62]  Sridhar Ramaswamy,et al.  On the Discovery of Interesting Patterns in Association Rules , 1998, VLDB.

[63]  Rahul Gupta,et al.  LIPTUS: associating structured and unstructured information in a banking environment , 2007, SIGMOD '07.

[64]  Hsin-Hsi Chen,et al.  Mining opinions from the Web: Beyond relevance retrieval , 2007 .

[65]  Ke Wang,et al.  Discovering Structural Association of Semistructured Data , 2000, IEEE Trans. Knowl. Data Eng..

[66]  Dan Roth,et al.  Semantic Integration in Text: From Ambiguous Names to Identifiable Entities , 2005, AI Mag..

[67]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[68]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.

[69]  Tharam S. Dillon,et al.  Automated knowledge acquisition , 1994, Prentice Hall International series in computer science and engineering.

[70]  Tharam S. Dillon,et al.  SEQUEST: Mining frequent subsequences using DMA strips , 2006 .

[71]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[72]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[73]  Fazlollah M. Reza,et al.  Introduction to Information Theory , 2004, Lecture Notes in Electrical Engineering.

[74]  Alexandra Poulovassilis,et al.  A Semantic Approach to Integrating XML and Structured Data Sources , 2001, CAiSE.

[75]  Tharam S. Dillon,et al.  MB3-Miner: mining eMBedded subTREEs using Tree Model Guided candidate generation , 2005 .

[76]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[77]  Mong-Li Lee,et al.  Efficient Mining of XML Query Patterns for Caching , 2003, VLDB.

[78]  Tharam S. Dillon,et al.  U3 - Mning Unordered Embedded Subtrees Using TMG Candidate Generation , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[79]  Ricardo Baeza-Yates,et al.  Computer Science 2 , 1994 .

[80]  Tharam S. Dillon,et al.  A Statistical-Heuristic Feature Selection Criterion for Decision Tree Induction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[81]  Robert Meersman,et al.  Formal Ontology Engineering in the DOGMA Approach , 2002, OTM.