Closest Match Based Information Retrieval and Recommendation Engine using Signature-Trees and Fuzzy Relevance Sorting

This paper proposes a recommendation technique to avoid exhaustive search to be ran on the database with thousands of records, before coming to a conclusion or inference, where it can be said that recommended thing is matching up to a significant percentage of what was initially desired. Often such searches involve not just the simple full-match search based on indexes, but also the partial or nearby match searches where which percentage of match between entities is relevant enough for ultimate recommendation. Usually these problems are tackled by various methods like Fuzzy operations, Reg-Ex searches, Clustering, Similarity Analysis each having its own set of effectiveness as well as efficiency. Our goal here was to create a search and recommendation system which can perform fuzzy-search and fuzzy-similarity-analysis with near-match percentages in an effective, efficient as well as user-friendly manner on thousands of records/ files/ rows with 100s of attributes/ features/ columns. Inspired from Google's Image Searching Algorithm, that search on the basis of signatures based on feature-extraction from each image, we have created Match engine, that read schema of data or files, compiles encoded signature and store them as an index. That index is then converted into a tree (S-Tree), on the basis of relevance of each field/ column and data frequency observed. After compilation done, system can now search and recommendation of best matches in very efficient manner. For further optimization we use heuristics like dividing feature sets into hard-filters and soft-filters, former demands full match and later demands fuzzy match. On arriving even one best match, we can retrieve other matches without searching. Our technique though not that modern and actually inspired, but based on ensemble methods used to provide fast and efficient results. We have proved quicker than full scan searches. In future we plan to make signature comparison engine on variety of advanced data types of features like Geo-coordinates and synonyms. And storing compiled signatures trees into distributed database/grid, query will run concurrently to match the results, or signatures passing through machine learning techniques. Currently system used for recipe recommendation and in future this will be used in applications like dating system’s, film and music recommendation.

[1]  Panayiotis Bozanis,et al.  Signature-based structures for objects with set-valued attributes , 2002, Inf. Syst..

[2]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  L. R. Rasmussen,et al.  In information retrieval: data structures and algorithms , 1992 .

[4]  Uwe Deppisch,et al.  S-tree: a dynamic balanced signature index for office retrieval , 1986, SIGIR '86.

[5]  Hans-Jörg Schek,et al.  A Signature Access Method for the Starburst Database System , 1989, VLDB.

[6]  Václav Snásel,et al.  Fuzzy signatures organized using S-Tree , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[7]  Sven Helmer,et al.  Evaluating different approaches for indexing fuzzy sets , 2003, Fuzzy Sets Syst..

[8]  Yon Dohn Chung,et al.  Skyline queries on keyword-matched data , 2013, Inf. Sci..

[9]  Yangjun Chen,et al.  Signature files and signature trees , 2002, Inf. Process. Lett..

[10]  Louis Wehenkel,et al.  A complete fuzzy decision tree technique , 2003, Fuzzy Sets Syst..

[11]  Yannis Manolopoulos,et al.  Improved Methods for Signature-Tree Construction , 2000, Comput. J..

[12]  Dik Lun Lee,et al.  A partitioned signature file structure for multiattribute and text retrieval , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[13]  Pavel Zezula,et al.  Dynamic partitioning of signature files , 1991, TOIS.

[14]  Christos Faloutsos,et al.  Signature files: an access method for documents and its analytical performance evaluation , 1984, TOIS.

[15]  Hadi Sadoghi Yazdi,et al.  Hierarchical tree clustering of fuzzy number , 2014, J. Intell. Fuzzy Syst..

[16]  OlaruCristina,et al.  A complete fuzzy decision tree technique , 2003 .

[17]  Dik Lun Lee,et al.  Partitioned signature files: design issues and performance evaluation , 1989, TOIS.