Optimization of information retrieval for cross media contents in a best practice network

Recent challenges in information retrieval are related to cross media information in social networks including rich media and web based content. In those cases, the cross media content includes classical file and their metadata plus web pages, events, blog, discussion forums, comments in multilingual. This heterogeneity creates large complex problems in cross media indexing and retrieval for services that integrate qualified documents and user generated content together. Problems are also related to scalability, robustness and resilience to errors. Moreover, users expect to have fast and efficient indexing and searching services, from social media in best practice network services. This paper presents a model and an indexing and searching solution for cross media contents, addressing the above issues, developed for the ECLAP Social Network, in the domain of Performing Arts. Effectiveness and optimization analysis of the retrieval solution are presented with relevant metrics. The research aimed to cope with the complexity of a heterogeneous indexing semantic model, using stochastic optimization techniques, with tuning and discrimination of relevant metadata terms. The research was conducted in the context of the ECLAP European Commission project and services (http://www.eclap.eu).

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Giovanni Maria Sacco,et al.  Dynamic Taxonomies and Faceted Search: Theory, Practice, and Experience , 2009, The Information Retrieval Series.

[3]  Martha E. Williams Annual review of information science and technology, vol. 22 , 1987 .

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Robert Dale,et al.  Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics , 1999 .

[6]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[7]  James P. Callan,et al.  Automatic discovery of language models for text databases , 1999, SIGMOD '99.

[8]  Daniel Tunkelang,et al.  Faceted Search , 2009, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[9]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[10]  Giovanni Maria Sacco Research Results in Dynamic Taxonomy and Faceted Search Systems , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).

[11]  Nicholas J. Belkin,et al.  Retrieval techniques , 1987 .

[12]  Parham Moradi,et al.  A Personalized Search Engine Using Ontology-Based Fuzzy Concept Networks , 2010, 2010 International Conference on Data Storage and Data Engineering.

[13]  Luca Viganò,et al.  Automated analysis of RBAC policies with temporal constraints and static role hierarchies , 2015, SAC.

[14]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[15]  Javed A. Aslam,et al.  On the effectiveness of evaluating retrieval systems in the absence of relevance judgments , 2003, SIGIR.

[16]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[17]  Mark Sanderson,et al.  Information retrieval system evaluation: effort, sensitivity, and reliability , 2005, SIGIR '05.

[18]  Michael A. Shepherd,et al.  Context thesaurus for the extraction of metadata from medical research papers , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[19]  Edward Hung,et al.  Fuzzy clustering and relevance ranking of web search results with differentiating cluster label generation , 2010, International Conference on Fuzzy Systems.

[20]  Amanda Spink,et al.  Regions and levels: Measuring and mapping users' relevance judgments , 2001, J. Assoc. Inf. Sci. Technol..

[21]  James Allan,et al.  Minimal test collections for retrieval evaluation , 2006, SIGIR.

[22]  Ellen M. Voorhees,et al.  Evaluating evaluation measure stability , 2000, SIGIR '00.

[23]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[24]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[25]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[26]  Maximilian Eibl,et al.  A Large-Scale System Evaluation on Component-Level , 2011, ECIR.

[27]  Bettina Fazzinga,et al.  FOX: Inference of Approximate Functional Dependencies from XML Data , 2007 .

[28]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[29]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[30]  Ana Gabriela Maguitman,et al.  Using genetic algorithms to evolve a population of topical queries , 2008, Inf. Process. Manag..

[31]  Robert R. Korfhage,et al.  Query Improvement in Information Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project , 1992, TREC.

[32]  Cong Wang,et al.  Efficient verifiable fuzzy keyword search over encrypted data in cloud computing , 2013, Comput. Sci. Inf. Syst..

[33]  Ian Soboroff,et al.  Ranking retrieval systems without relevance judgments , 2001, SIGIR '01.

[34]  José R. Pérez-Agüera Using genetic algorithms for query reformulation , 2007 .

[35]  M.J. Martin-Bautista,et al.  Fuzzy genes: improving the effectiveness of information retrieval , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[36]  Pierfrancesco Bellini,et al.  On the Effectiveness and Optimization of Information Retrieval for Cross Media Content , 2012, KDIR.

[37]  Lien Fu Lai,et al.  Developing a fuzzy search engine based on fuzzy ontology and semantic search , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[38]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[39]  J. Scott McCarley Should we Translate the Documents or the Queries in Cross-language Information Retrieval? , 1999, ACL.

[40]  Pierfrancesco Bellini,et al.  A New Generation Digital Content Service for Cultural Heritage Institutions , 2013, ECLAP.

[41]  Jun Wang,et al.  On statistical analysis and optimization of information retrieval effectiveness metrics , 2010, SIGIR.

[42]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[43]  Kun Bai,et al.  Automatic extraction of table metadata from digital documents , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[44]  Mike Thelwall,et al.  Synthesis Lectures on Information Concepts, Retrieval, and Services , 2009 .

[45]  Bahgat A. Abdel Latef,et al.  Using Genetic Algorithm to Improve Information Retrieval Systems , 2008 .

[46]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Víctor Fresno-Fernández,et al.  Integrating the Probabilistic Models BM25/BM25F into Lucene , 2009, ArXiv.

[48]  Simon Y. Berkovich,et al.  A fuzzy find matching tool for image text analysis , 2004, 33rd Applied Imagery Pattern Recognition Workshop (AIPR'04).

[49]  Carol Peters,et al.  Cross-Language Information Retrieval: A System for Comparable Corpus Querying , 1998 .

[50]  Hsin-Hsi Chen,et al.  Clustering and Visualization in a Multi-lingual Multi-document Summarization System , 2003, ECIR.

[51]  John Tait,et al.  Literature Review of Cross Language Information Retrieval , 2005, WEC.

[52]  Song-Nian Yu,et al.  A New Method for Cross-Language Information Retrieval by Summing Weights of Graphs , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).

[53]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[54]  Shengli Wu,et al.  Methods for ranking information retrieval systems without relevance judgments , 2003, SAC '03.

[55]  Edie Rasmussen,et al.  Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries , 2007 .

[56]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[57]  Ellen M. Voorhees,et al.  The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.

[58]  Tomohiro Takagi,et al.  Query expansion using conceptual fuzzy sets for search engine , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[59]  Jian Zhang,et al.  Improving the Effectiveness of Information Retrieval with Clustering and Fusion , 2001, Int. J. Comput. Linguistics Chin. Lang. Process..

[60]  Pierfrancesco Bellini,et al.  A Linked Open Data Service for Performing Arts , 2013, ECLAP.

[61]  Gail E. Kaiser,et al.  DOM-based content extraction of HTML documents , 2003, WWW '03.

[62]  Hsin-Chang Yang,et al.  Towards Multilingual Information Discovery through a SOM based Text Mining approach , 2000, PRICAI Workshop on Text and Web Mining.

[63]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[64]  Douglas W. Oard,et al.  Structured translation for cross-language information retrieval , 2000, SIGIR '00.

[65]  Weiguo Fan,et al.  Effective information retrieval using genetic algorithms based matching functions adaptation , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.