A Lightweight Approach to Extract Interschema Properties from Structured, Semi-Structured and Unstructured Sources in a Big Data Scenario

The knowledge of interschema properties (e.g., synonymies, homonymies, hyponymies and subschema similarities) plays a key role for allowing decision-making in sources characterized by disparate for...

[1]  W. Bruce Croft,et al.  Probabilistic techniques for phrase extraction , 2001, Inf. Process. Manag..

[2]  Dariusz Mrozek,et al.  Soft and Declarative Fishing of Information in Big Data Lake , 2018, IEEE Transactions on Fuzzy Systems.

[3]  Zhoujun Li,et al.  Concept-based Short Text Classification and Ranking , 2014, CIKM.

[4]  Sungzoon Cho,et al.  Bag-of-concepts: Comprehending document representation through clustering words in distributed representation , 2017, Neurocomputing.

[5]  Michel C. A. Klein,et al.  Matching Unstructured Vocabularies Using a Background Ontology , 2006, EKAW.

[6]  Luigi Palopoli,et al.  Uniform Techniques for Deriving Similarities of Objects and Subschemes in Heterogeneous Databases , 2003, IEEE Trans. Knowl. Data Eng..

[7]  Karl Aberer,et al.  SMART: A tool for analyzing and reconciling schema matching networks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[9]  Hamidah Ibrahim,et al.  An approach for instance based schema matching with google similarity and regular expression , 2017, Int. Arab J. Inf. Technol..

[10]  Ahmed K. Elmagarmid,et al.  Usage-Based Schema Matching , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Magnus Sahlgren,et al.  Automatic bilingual lexicon acquisition using random indexing of parallel corpora , 2005, Nat. Lang. Eng..

[12]  Julian Szymański,et al.  Comparative Analysis of Text Representation Methods Using Classification , 2014, Cybern. Syst..

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Mohd Abdul Hameed,et al.  Supervised Opinion Mining of Social Network Data Using a Bag-of-Words Approach on the Cloud , 2012, BIC-TA.

[15]  Toon Calders,et al.  Towards Information Profiling: Data Lake Content Metadata Management , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[16]  Chuang Lin,et al.  Thinking and Modeling for Big Data from the Perspective of the I Ching , 2017, Int. J. Inf. Technol. Decis. Mak..

[17]  W. Bruce Croft,et al.  Probabilistic Retrieval of OCR Degraded Text Using N-Grams , 1997, ECDL.

[18]  Grzegorz Kondrak,et al.  N-Gram Similarity and Distance , 2005, SPIRE.

[19]  Yi Peng,et al.  Evaluation of Classification Algorithms Using MCDM and Rank Correlation , 2012, Int. J. Inf. Technol. Decis. Mak..

[20]  Luigi Palopoli,et al.  Experiences using DIKE, a system for supporting cooperative information system and data warehouse design , 2003, Inf. Syst..

[21]  Hyunsoo Kim,et al.  Dimension Reduction in Text Classification with Support Vector Machines , 2005, J. Mach. Learn. Res..

[22]  Silvana Castano,et al.  Global Viewing of Heterogeneous Data Sources , 2001, IEEE Trans. Knowl. Data Eng..

[23]  Ricardo Campos,et al.  YAKE! Keyword extraction from single documents using multiple local features , 2020, Inf. Sci..

[24]  Yi Peng,et al.  Evaluation of clustering algorithms for financial risk analysis using MCDM methods , 2014, Inf. Sci..

[25]  Prashant Bhat,et al.  Web Video Object Mining: Expectation Maximization and Density Based Clustering of Web Video Metadata Objects , 2016 .

[26]  Jung-ran Park Metadata Quality in Digital Repositories: A Survey of the Current State of the Art , 2009 .

[27]  Silvana Castano,et al.  Semantic integration of semistructured and structured data sources , 1999, SGMD.

[28]  Florentin Smarandache,et al.  An Extension Collaborative Innovation Model in the Context of Big Data , 2013, Int. J. Inf. Technol. Decis. Mak..

[29]  Sumit Jain,et al.  Schema matching technique for heterogeneous web database , 2015, INFOCOM 2015.

[30]  Donatella Castelli,et al.  Dealing with metadata quality: The legacy of digital library efforts , 2013, Inf. Process. Manag..

[31]  Yi Peng,et al.  MACHINE LEARNING METHODS FOR SYSTEMIC RISK ANALYSIS IN FINANCIAL SECTORS , 2019, Technological and Economic Development of Economy.

[32]  Erhard Rahm,et al.  Generic schema matching, ten years later , 2011, Proc. VLDB Endow..

[33]  Gunter Saake,et al.  Improving XML schema matching performance using Prüfer sequences , 2009, Data Knowl. Eng..

[34]  Sourav S. Bhowmick,et al.  A Model for XML Schema Integration , 2002, EC-Web.

[35]  Alejandro Zunino,et al.  Persisting big-data: The NoSQL landscape , 2017, Inf. Syst..

[36]  José F. Aldana-Montes,et al.  Semantic similarity measurement using historical google search patterns , 2013 .

[37]  Mong-Li Lee,et al.  XClust: clustering XML schemas for effective integration , 2002, CIKM '02.

[38]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[39]  Yan Zhang,et al.  Ontology enhancement and concept granularity learning: keeping yourself current and adaptive , 2011, KDD.

[40]  Rickard Cöster,et al.  Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization , 2004, COLING.

[41]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[42]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[43]  Enrique Herrera-Viedma,et al.  Leveraging Localized Social Media Insights for Industry Early Warning Systems , 2017, Int. J. Inf. Technol. Decis. Mak..

[44]  Yong Shi,et al.  Classifying With Adaptive Hyper-Spheres: An Incremental Classifier Based on Competitive Learning , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[45]  Aida Boukottaya,et al.  Schema matching for transforming structured documents , 2005, DocEng '05.

[46]  Yan Zhang,et al.  Adaptive Concept Resolution for document representation and its applications in text mining , 2015, Knowl. Based Syst..

[47]  Badawia M. Albassuny Automatic metadata generation applications: a survey study , 2008, Int. J. Metadata Semant. Ontologies.

[48]  Philip A. Bernstein,et al.  HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching , 2009, Proc. VLDB Endow..

[49]  Giovanni Quattrone,et al.  Integration of XML Schemas at various "severity" levels , 2006, Inf. Syst..

[50]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[51]  Luigi Palopoli,et al.  A graph-based approach for extracting terminological properties from information sources with heterogeneous formats , 2004, Knowledge and Information Systems.

[52]  Sarantos Kapidakis Rating quality in metadata harvesting , 2015, PETRA.

[53]  Yuji Tosaka,et al.  Metadata Quality Control in Digital Repositories and Collections: Criteria, Semantics, and Mechanisms , 2010 .

[54]  Luigi Palopoli,et al.  DIKE: a system supporting the semi‐automatic construction of cooperative information systems from heterogeneous databases , 2003, Softw. Pract. Exp..

[55]  Jianhui Chen,et al.  Developing a Provenance Warehouse for the Systematic Brain Informatics Study , 2017, Int. J. Inf. Technol. Decis. Mak..

[56]  Marco Patella,et al.  Approximate similarity search: A multi-faceted problem , 2009, J. Discrete Algorithms.

[57]  Silvana Castano,et al.  A Method for the Unification of XML Schemata , 2002, Inf. Softw. Technol..