Formal language models for finding groups of experts

We introduce a new information retrieval task: given a topic, try to find knowledgeable groups that have expertise on the topic.Five probabilistic language models are proposed to tackle the challenge of automatically finding groups of experts in heterogeneous document collections.For evaluation purpose, a data set is created based on a publicly downloadable corpus used in the TREC Enterprise 2005 and 2006 tracks and three types of ground truth are defined.We provide a detailed analysis of the performance of the proposed group finding models. The task of finding groups or teams has recently received increased attention, as a natural and challenging extension of search tasks aimed at retrieving individual entities. We introduce a new group finding task: given a query topic, we try to find knowledgeable groups that have expertise on that topic. We present five general strategies for this group finding task, given a heterogenous document repository. The models are formalized using generative language models. Two of the models aggregate expertise scores of the experts in the same group for the task, one locates documents associated with experts in the group and then determines how closely the documents are associated with the topic, whilst the remaining two models directly estimate the degree to which a group is a knowledgeable group for a given topic. For evaluation purposes we construct a test collection based on the TREC 2005 and 2006 Enterprise collections, and define three types of ground truth for our task. Experimental results show that our five knowledgeable group finding models achieve high absolute scores. We also find significant differences between different ways of estimating the association between a topic and a group.

[1]  David van Dijk,et al.  Early Detection of Topical Expertise in Community Question Answering , 2015, SIGIR.

[2]  Andreas Wichert,et al.  Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy , 2013, Expert Syst. Appl..

[3]  Ashish Ghosh,et al.  Use of aggregation pheromone density for image segmentation , 2009, Pattern Recognit. Lett..

[4]  Ming-Syan Chen,et al.  On Social-Temporal Group Query with Acquaintance Constraint , 2011, Proc. VLDB Endow..

[5]  Maarten de Rijke,et al.  Hypergeometric language models for republished article finding , 2011, SIGIR '11.

[6]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[7]  Shian-Shyong Tseng,et al.  A rule-based CBR approach for expert finding and problem diagnosis , 2010, Expert Syst. Appl..

[8]  Maarten de Rijke,et al.  Associating People and Documents , 2008, ECIR.

[9]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[10]  Chengjie Sun,et al.  A language model approach for tag recommendation , 2011, Expert Syst. Appl..

[11]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[12]  Luo Si,et al.  Discriminative models of integrating document evidence and document-candidate associations for expert search , 2010, SIGIR '10.

[13]  Djoerd Hiemstra,et al.  Expert group formation using facility location analysis , 2014, Inf. Process. Manag..

[14]  Robert Ivor John,et al.  Alpha-Level Aggregation: A Practical Approach to Type-1 OWA Operation for Aggregating Uncertain Information with Applications to Breast Cancer Treatments , 2011, IEEE Transactions on Knowledge and Data Engineering.

[15]  Maarten de Rijke,et al.  Finding knowledgeable groups in enterprise corpora , 2013, SIGIR.

[16]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[17]  M. de Rijke,et al.  A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[18]  Shou-De Lin,et al.  On team formation with expertise query in collaborative social networks , 2015, Knowledge and Information Systems.

[19]  Luo Si,et al.  Probabilistic models for answer-ranking in multilingual question-answering , 2010, TOIS.

[20]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[21]  Maarten de Rijke,et al.  People searching for people: analysis of a people search engine log , 2011, SIGIR '11.

[22]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[23]  Jinglei Zhao,et al.  A proximity language model for information retrieval , 2009, SIGIR.

[24]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[25]  Jiun-Long Huang,et al.  Efficient algorithms for team formation with a leader in social networks , 2013, The Journal of Supercomputing.

[26]  Aijun An,et al.  Discovering top-k teams of experts with/without a leader in social networks , 2011, CIKM '11.

[27]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[28]  M. de Rijke,et al.  Expertise Retrieval , 2012, Found. Trends Inf. Retr..

[29]  M. de Rijke,et al.  Query modeling for entity search based on terms, categories, and examples , 2011, TOIS.

[30]  ChengXiang Zhai,et al.  Positional language models for information retrieval , 2009, SIGIR.

[31]  M. de Rijke,et al.  Mapping queries to the Linking Open Data cloud: A case study using DBpedia , 2011, J. Web Semant..

[32]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[33]  Krisztian Balog,et al.  Temporal Expertise Profiling , 2014, ECIR.

[34]  Jianfeng Gao,et al.  Multi-style language model for web scale information retrieval , 2010, SIGIR '10.

[35]  J. K. Anand,et al.  TEAM MANAGEMENT OF THE ELDERLY PATIENT WITH HIP FRACTURE , 1988, The Lancet.

[36]  Wei Zeng,et al.  A unified framework for recommending items, groups and friends in social media environment via mutual resource fusion , 2013, Expert Syst. Appl..

[37]  Gang Li,et al.  Learning Choquet-Integral-Based Metrics for Semisupervised Clustering , 2011, IEEE Transactions on Fuzzy Systems.

[38]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[39]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[40]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[41]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[42]  Craig MacDonald,et al.  Learning Models for Ranking Aggregates , 2011, ECIR.

[43]  Steffen Staab,et al.  Explicit Versus Latent Concept Models for Cross-Language Information Retrieval , 2009, IJCAI.

[44]  Eyke Hüllermeier,et al.  Top-Down Induction of Fuzzy Pattern Trees , 2011, IEEE Transactions on Fuzzy Systems.

[45]  Laura Sebastia,et al.  A negotiation framework for heterogeneous group recommendation , 2014, Expert Syst. Appl..

[46]  Fabio Crestani,et al.  Proximity-based opinion retrieval , 2010, SIGIR '10.

[47]  M. de Rijke,et al.  Broad expertise retrieval in sparse data environments , 2007, SIGIR.

[48]  J. Shane Culpepper,et al.  Efficient set intersection for inverted indexing , 2010, TOIS.

[49]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[50]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.