Extracting Representative Information to Enhance Flexible Data Queries

Extracting representative information is of great interest in data queries and web applications nowadays, where approximate match between attribute values/records is an important issue in the extraction process. This paper proposes an approach to extracting representative tuples from data classes under an extended possibility-based data model, and to introducing a measure (namely, relation compactness) based upon information entropy to reflect the degree that a relation is compact in light of information redundancy. Theoretical analysis and data experiments show that the approach has desirable properties that: 1) the set of representative tuples has high degrees of compactness (less redundancy) and coverage (rich content); 2) it provides a way to obtain data query outcomes of different sizes in a flexible manner according to user preference; and 3) the approach is also meaningful and applicable to web search applications.

[1]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[2]  Anthony K. H. Tung,et al.  Finding representative set from massive data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Wei-Ying Ma,et al.  Learning similarity measure for natural image retrieval with relevance feedback , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Etienne E. Kerre,et al.  Normalization Based on Fuzzy Functional Dependency in a Fuzzy Relational Data Model , 1996, Inf. Syst..

[5]  Keng Siau,et al.  Informational and Computational Equivalence in Comparing Information Modeling Methods , 2004, J. Database Manag..

[6]  Shyi-Ming Chen,et al.  Fuzzy query translation for relational database systems , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[7]  S. Leeds,et al.  Perception and Cognition: Issues in the Foundations of Psychology , 1978 .

[8]  Henri Prade,et al.  Generalizing Database Relational Algebra for the Treatment of Incomplete/Uncertain Information and Vague Queries , 1984, Inf. Sci..

[9]  Mahmood R. Azimi-Sadjadi,et al.  An Adaptable Connectionist Text-Retrieval System With Relevance Feedback , 2004, IEEE Transactions on Neural Networks.

[10]  Ying Li,et al.  KDD CUP-2005 report: facing a great challenge , 2005, SKDD.

[11]  Seung-won Hwang,et al.  Enabling soft queries for data retrieval , 2007, Inf. Syst..

[12]  Sherry Marcus,et al.  Graph-based technologies for intelligence analysis , 2004, CACM.

[13]  Guoqing Chen Fuzzy logic in data modeling: semantics, constraints, and database design , 1998 .

[14]  Vesper Owei,et al.  An intelligent approach to handling imperfect information in concept-based natural language queries , 2002, TOIS.

[15]  Shinichi Tamura,et al.  Pattern Classification Based on Fuzzy Relations , 1971, IEEE Trans. Syst. Man Cybern..

[16]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[17]  Arun K. Majumdar,et al.  Fuzzy Functional Dependencies and Lossless Join Decomposition of Fuzzy Relational Database Systems , 1988, ACM Trans. Database Syst..

[18]  Janusz Kacprzyk,et al.  Computing with words in intelligent database querying: standalone and Internet-based applications , 2001, Inf. Sci..

[19]  Hsuan-Shih Lee An optimal algorithm for computing the max-min transitive closure of a fuzzy similarity matrix , 2001, Fuzzy Sets Syst..

[20]  Herbert A. Simon,et al.  On the Forms of Mental Representation , 1978 .

[21]  Patrick Bosc,et al.  SQLf: a relational database language for fuzzy querying , 1995, IEEE Trans. Fuzzy Syst..

[22]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Evaluation , 2000, TREC.

[23]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[24]  B. Buckles,et al.  A fuzzy representation of data for relational databases , 1982 .

[25]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[26]  Patrick Bosc,et al.  About projection-selection-join queries addressed to possibilistic relational databases , 2005, IEEE Transactions on Fuzzy Systems.

[27]  Guoqing Chen,et al.  Retraction and Generalized Extension of Computing With Words , 2007, IEEE Transactions on Fuzzy Systems.

[28]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[29]  Sujeet Shenoi,et al.  An equivalence classes model of fuzzy relational databases , 1990 .

[30]  Guoqing Chen,et al.  An Incremental Approach to Efficiently Retrieving Representative Information for Mobile Search on Web , 2010, 2010 Ninth International Conference on Mobile Business and 2010 Ninth Global Mobility Roundtable (ICMB-GMR).

[31]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[32]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[33]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[34]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[35]  Ollivier Haemmerlé,et al.  Fuzzy querying of incomplete, imprecise, and heterogeneously structured data in the relational model using ontologies and rules , 2005, IEEE Transactions on Fuzzy Systems.

[36]  E. F. Codd,et al.  A Relational Model for Large Shared Data Banks , 1970 .

[37]  Guoqing Chen,et al.  Equivalence and transformation of extended algebraic operators in fuzzy relational databases , 2006, Fuzzy Sets Syst..

[38]  Donald H. Kraft,et al.  Evaluation of information retrieval systems: A decision theory approach , 1978, J. Am. Soc. Inf. Sci..

[39]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[40]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[41]  Ronald R. Yager,et al.  Efficient computation of transitive closures , 1990 .

[42]  Guoqing Chen,et al.  Introducing Relation Compactness for Generating a Flexible Size of Search Results in Fuzzy Queries , 2009, IFSA/EUSFLAT Conf..

[43]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[44]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[45]  James F. Baldwin,et al.  A fuzzy relational inference language , 1984 .

[46]  Juan C. Cubero,et al.  A new definition of fuzzy functional dependency in fuzzy relational databases , 1994, Int. J. Intell. Syst..

[47]  Steve Fox,et al.  Evaluating implicit measures to improve web search , 2005, TOIS.

[48]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[49]  Tommy W. S. Chow,et al.  Multilayer SOM With Tree-Structured Data for Efficient Document Retrieval and Plagiarism Detection , 2009, IEEE Transactions on Neural Networks.

[50]  Etienne E. Kerre,et al.  A General Treatment of Data Redundancy in a Fuzzy Relational Data Model , 1992, J. Am. Soc. Inf. Sci..

[51]  Herbert A. Simon,et al.  Why a Diagram is (Sometimes) Worth Ten Thousand Words , 1987, Cogn. Sci..