Towards Semantics-Enabled Distributed Infrastructure for Knowledge Acquisition

We summarize progress on algorithms and software knowledge acquisition from large, distributed, autonomous, and semantically disparate information sources. Some key results include: scalable algorithms for constructing predictive models from data based on a novel decomposition of learning algorithms that interleaves queries for sufficient statistics from data with computations using the statistics; provably exact algorithms from distributed data (relative to their centralized counterparts); and statistically sound approaches to learning predictive models from partially specified data that arise in settings where the schema and the data semantics and hence the granularity of data differ across the different sources.

[1]  Hillol Kargupta,et al.  Distributed Data Mining: Algorithms, Systems, and Applications , 2003 .

[2]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[3]  Vasant Honavar,et al.  Learning accurate and concise naïve Bayes classifiers from attribute value taxonomies and data , 2006, Knowledge and Information Systems.

[4]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[5]  Jie Bao,et al.  Privacy-Preserving Reasoning on the SemanticWeb , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[6]  Vasant Honavar,et al.  Learning decision tree classifiers from attribute value taxonomies and partially specified data , 2003, ICML 2003.

[7]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[8]  Vasant Honavar,et al.  A Tableau-Based Federated Reasoning Algorithm for Modular Ontologies , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[9]  Vasant Honavar,et al.  A Semantic Importing Approach to Knowledge Reuse from Multiple Ontologies , 2007, AAAI.

[10]  V. S. Subrahmanian,et al.  An ontology-extended relational algebra , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[11]  Vasant Honavar,et al.  A Framework for Learning from Distributed Data Using Sufficient Statistics and Its Application to Learning Decision Trees , 2004, Int. J. Hybrid Intell. Syst..

[12]  Vasant Honavar,et al.  Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources , 2005, Discovery Science.

[13]  Diego Calvanese,et al.  Data Integration: A Logic-Based Perspective , 2005, AI Mag..

[14]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[15]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[16]  Vasant Honavar,et al.  Learning classifiers from distributed, semantically heterogeneous, autonomous data sources , 2004 .