The voting model for people search

An expert search engine aims to assist users with their expertise need - instead of ranking documents, possible candidate experts in an enterprise organisation with relevant expertise are suggested in response to a query. This thesis investigates people search tasks such as expert search, and how persons can be ranked in response to a query, such that those with relevant expertise to the query are ranked first. The expertise areas of the persons are represented by documentary evidence of expertise, known as candidate profiles. The statement of this research work is that people search tasks in general and expert search in particular can be successfully and effectively modelled using a voting paradigm.

[1]  Ellen M. Voorhees,et al.  Overview of TREC 2007 , 2007, TREC.

[2]  Irma Becerra-Fernandez,et al.  Searching for experts on the Web: A review of contemporary expertise locator systems , 2006, TOIT.

[3]  Elad Yom-Tov,et al.  Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval , 2005, SIGIR '05.

[4]  Craig MacDonald,et al.  University of Glasgow at TREC 2008: Experiments in Blog, Enterprise, and Relevance Feedback Tracks with Terrier , 2008, TREC.

[5]  M. Trick,et al.  Voting schemes for which it can be difficult to tell who won the election , 1989 .

[6]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[7]  Michael Heine,et al.  Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW , 2002, J. Documentation.

[8]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[9]  John Dunnion,et al.  ProbFuse: a probabilistic approach to data fusion , 2006, SIGIR.

[10]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[11]  Iadh Ounis,et al.  University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using the Query Scope , 2003, TREC.

[12]  Iadh Ounis,et al.  Combining fields for query expansion and adaptive query expansion , 2007, Inf. Process. Manag..

[13]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[14]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[15]  Giuseppe Attardi,et al.  Ranking very many typed entities on wikipedia , 2007, CIKM '07.

[16]  Prabhakar Raghavan,et al.  Navigating large-scale semi-structured data in business portals , 2001, VLDB.

[17]  Albert-László Barabási,et al.  Linked - how everything is connected to everything else and what it means for business, science, and everyday life , 2003 .

[18]  Douglas W. Oard,et al.  TREC 2006 Legal Track Overview , 2006, TREC.

[19]  Stephen E. Robertson,et al.  On rank-based effectiveness measures and optimization , 2007, Information Retrieval.

[20]  Nicholas J. Belkin,et al.  Retrieval techniques , 1987 .

[21]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[22]  Craig MacDonald,et al.  Combining fields in known-item email search , 2006, SIGIR '06.

[23]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[24]  Donald H. Kraft,et al.  Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , 1998, SIGIR 2002.

[25]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[26]  Alistair Moffat,et al.  Self-indexing inverted files for fast text retrieval , 1996, TOIS.

[27]  Ben He Term frequency normalisation for information retrieval , 2006 .

[28]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[29]  Johan Bollen,et al.  An analysis of the bid behavior of the 2005 JCDL program committee , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[30]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[31]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[32]  Brian T. Bartell,et al.  Optimizing ranking functions: a connectionist approach to adaptive information retrieval , 1994 .

[33]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[34]  Mark T. Maybury,et al.  Expert Finding for Collaborative Virtual Environments , 2001, CACM.

[35]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[36]  Djoerd Hiemstra,et al.  Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding , 2008 .

[37]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[38]  Anne-Marie Vercoustre,et al.  Enterprise PeopleFinder: Combining Evidences from Web Pages and Corporate Data , 2003 .

[39]  Johan Bollen,et al.  Mapping the Bid Behavior of Conference Referees , 2006, J. Informetrics.

[40]  David Carmel,et al.  Juru at TREC 2003 - Topic Distillation using Query-Sensitive Tuning and Cohesiveness Filtering , 2003, TREC.

[41]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[42]  Adele E. Howe,et al.  SAVVYSEARCH: A Metasearch Engine That Learns Which Search Engines to Query , 1997, AI Mag..

[43]  Craig MacDonald,et al.  Expert Search Evaluation by Supporting Documents , 2008, ECIR.

[44]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[45]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[46]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[47]  R. Manmatha,et al.  Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[48]  Craig MacDonald,et al.  Key blog distillation: ranking aggregates , 2008, CIKM '08.

[49]  Alfred Kobsa,et al.  Expert-Finding Systems for Organizations: Problem and Domain Analysis and the DEMOIR Approach , 2003, J. Organ. Comput. Electron. Commer..

[50]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[51]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[52]  Benjamin Reilly,et al.  The International Idea Handbook of Electoral System Design , 2002 .

[53]  Anupam Joshi,et al.  Spam in Blogs and Social Media, Tutorial , 2007 .

[54]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2008, Int. J. Artif. Intell. Tools.

[55]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[56]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[57]  B Gitanjali Peer review -- process, perspectives and the path ahead. , 2001, Journal of postgraduate medicine.

[58]  Mounia Lalmas,et al.  Video retrieval using an MPEG-7 based inference network , 2002, SIGIR '02.

[59]  Mounia Lalmas,et al.  Combining evidence for Web retrieval using the inference network model: an experimental study , 2004, Inf. Process. Manag..

[60]  Mark T. Maybury,et al.  Enterprise expert and knowledge discovery , 1999, HCI.

[61]  Marji Lines,et al.  Approval voting and strategy analysis: A Venetian example , 1986 .

[62]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[63]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[64]  Craig MacDonald,et al.  High Quality Expertise Evidence for Expert Search , 2008, ECIR.

[65]  Iadh Ounis,et al.  The Static Absorbing Model for the Web , 2005, J. Web Eng..

[66]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[67]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[68]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[69]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[70]  Ellen M. Voorhees,et al.  Learning collection fusion strategies , 1995, SIGIR '95.

[71]  David Hawking,et al.  Context in Enterprise Search and Delivery , 2005 .

[72]  Ellen M. Voorhees,et al.  TREC: Continuing information retrieval's tradition of experimentation , 2007, CACM.

[73]  W. Bruce Croft,et al.  Finding experts in community-based question-answering services , 2005, CIKM '05.

[74]  Iadh Ounis,et al.  Automatically Building a Stopword List for an Information Retrieval System , 2005, J. Digit. Inf. Manag..

[75]  Robert M. Hayes The SMART retrieval system; experiments in automatic document processing: Edited by Gerard Salton, Prentice-Hall, Englewood Cliffs, New Jersey, 1971. 556 pages , 1973 .

[76]  Jennifer Rowley,et al.  What is knowledge management , 1999 .

[77]  Arend Lijphart,et al.  The field of electoral systems research: A critical survey , 1985 .

[78]  Craig MacDonald,et al.  Searching for Expertise: Experiments with the Voting Model , 2009, Comput. J..

[79]  Norbert Fuhr,et al.  Probabilistic Models in Information Retrieval , 1992, Comput. J..

[80]  Roi Blanco,et al.  Static Pruning of Terms in Inverted Files , 2007, ECIR.

[81]  Stephen Baker,et al.  The Numerati , 2008 .

[82]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[83]  John D. Lafferty,et al.  Information Retrieval as Statistical Translation , 2017 .

[84]  Craig MacDonald,et al.  Using Relevance Feedback in Expert Search , 2007, ECIR.

[85]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[86]  Gregory S. Bucher,et al.  The Senate of the Roman Republic: Addresses on the History of Roman Constitutionalism , 1995 .

[87]  S. Robertson The probability ranking principle in IR , 1997 .

[88]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[89]  Berthier de Araujo Neto Ribeiro Approximate answers in intelligent systems , 1995 .

[90]  Tim Oates,et al.  Feeds That Matter: A Study of Bloglines Subscriptions , 2007, ICWSM.

[91]  David Yarowsky,et al.  Taking the load off the conference chairs-towards a digital paper-routing assistant , 1999, EMNLP.

[92]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, COCOON.

[93]  Craig MacDonald,et al.  Retrieval sensitivity under training using different measures , 2008, SIGIR '08.

[94]  David C. Blair,et al.  Some thoughts on the reported results of TREC , 2002, Inf. Process. Manag..

[95]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[96]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[97]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.

[98]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[99]  Ludovic Denoyer,et al.  Bayesian network model for semi-structured document classification , 2004, Inf. Process. Manag..

[100]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[101]  Maarten de Rijke,et al.  Language Models for Enterprise Search: Query Expansion and Combination of Evidence , 2006, TREC.

[102]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[103]  W. Bruce Croft,et al.  PIC matrices: a computationally tractable class of probabilistic query operators , 1999, TOIS.

[104]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[105]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[106]  Richard R. Muntz,et al.  Bayesian Network Models for Information Retrieval , 2000 .

[107]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[108]  Maarten de Rijke,et al.  Finding experts and their eetails in e-mail corpora , 2006, WWW '06.

[109]  Craig MacDonald,et al.  Expertise drift and query expansion in expert search , 2007, CIKM '07.

[110]  Iadh Ounis,et al.  University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[111]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[112]  W. Riker,et al.  Liberalism Against Populism: A Confrontation Between the Theory of Democracy and the Theory of Social Choice , 1982 .

[113]  Johan Bollen,et al.  An algorithm to determine peer-reviewers , 2006, CIKM '08.

[114]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[115]  Yi Zhang,et al.  Graph-based ranking algorithms for e-mail expertise analysis , 2003, DMKD '03.

[116]  Lada A. Adamic,et al.  Network dynamics: the world wide web , 2001 .

[117]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[118]  Constantino Tsallis,et al.  Optimization by Simulated Annealing: Recent Progress , 1995 .

[119]  Berthier A. Ribeiro-Neto,et al.  A belief network model for IR , 1996, SIGIR '96.

[120]  S. L. Gilbert,et al.  NIST Special Publication 260-133 2001 Edition , 2001 .

[121]  Iadh Ounis,et al.  A BELIEF NETWORK MODEL FOR EXPERT SEARCH , 2007 .

[122]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[123]  Craig MacDonald,et al.  Voting techniques for expert search , 2008, Knowledge and Information Systems.

[124]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[125]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[126]  Hugh E. Williams,et al.  Compression of inverted indexes For fast query evaluation , 2002, SIGIR '02.

[127]  Iadh Ounis,et al.  University of Glasgow at TREC 2004: Experiments in Web, Robust, and Terabyte Tracks with Terrier , 2004, TREC.

[128]  Iadh Ounis,et al.  Incorporating term dependency in the dfr framework , 2007, SIGIR.

[129]  Marti A. Hearst Improving Full-Text Precision on Short Queries using Simple Constraints , 1996 .

[130]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[131]  G. Thompson,et al.  The Theory of Committees and Elections. , 1959 .

[132]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[133]  Peter Bruza,et al.  Hyperindices: A Novel Aid for Searching in Hypermedia , 1992, ECHT.

[134]  Yakov Rekhter,et al.  BGP/MPLS IP Virtual Private Networks (VPNs) , 2006, RFC.

[135]  David Hawking,et al.  The Very Large Collection and Web Tracks (Preprint version) , 2004 .

[136]  Morten Hertzum,et al.  The information-seeking practices of engineers: searching for documents as well as for people , 2000, Inf. Process. Manag..