Measuring conference quality by mining program committee characteristics

Bibliometrics are important measures for venue quality in digital libraries. Impacts of venues are usually the major consideration for subscription decision-making, and for ranking and recommending high-quality venues and documents. For digital libraries in the Computer Science literature domain, conferences play a major role as an important publication and dissemination outlet. However, with a recent profusion of conferences and rapidly expanding fields, it is increasingly challenging for researchers and librarians to assess the quality of conferences. We propose a set of novel heuristics to automatically discover prestigious (and low-quality) conferences by mining the characteristics of Program Committee members. We examine the proposed cues both in isolation and combination under a classification scheme. Evaluation on a collection of 2,979 conferences and 16,147 PC members shows that our heuristics, when combined, correctly classify about 92% of the conferences, with a low false positive rate of 0.035 and a recall of more than 73% for identifying reputable conferences. Furthermore, we demonstrate empirically that our heuristics can also effectively detect a set of low-quality conferences, with a false positive rate of merely 0.002. We also report our experience of detecting two previously unknown low-quality conferences. Finally, we apply the proposed techniques to the entire quality spectrum by ranking conferences in the collection.

[1]  Byung-Won On,et al.  System Support for Name Authority Control Problem in Digital Libraries: OpenDBLP Approach , 2004, ECDL.

[2]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[3]  Gideon S. Mann,et al.  Bibliometric impact measures leveraging topic analysis , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[4]  Dongwon Lee,et al.  Oracle, where shall I submit my papers? , 2009, CACM.

[5]  Dongwon Lee,et al.  On six degrees of separation in DBLP-DB and more , 2005, SGMD.

[6]  Peter Ingwersen,et al.  Using citations for ranking in digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[7]  Jennifer Widom,et al.  Database Publication Practices , 2005, VLDB.

[8]  Johan Bollen,et al.  Toward alternative metrics of journal impact: A comparison of download and citation data , 2005, Inf. Process. Manag..

[9]  Andreas Thor,et al.  Citation analysis of database publications , 2005, SGMD.

[10]  D. Christakis,et al.  Impact factor: a valid measure of journal quality? , 2003, Journal of the Medical Library Association : JMLA.

[11]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[12]  Riyaz Sikora,et al.  Assessing the relative influence of journals in a citation network , 2005, CACM.

[13]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[15]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Soongoo Hong,et al.  Objective quality ranking of computing journals , 2003, CACM.

[17]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[18]  Wei Fan,et al.  Bagging , 2009, Encyclopedia of Machine Learning.

[19]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[20]  Johan Bollen,et al.  Journal status , 2006, Scientometrics.

[21]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[22]  E GARFIELD,et al.  Citation indexes for science; a new dimension in documentation through association of ideas. , 2006, Science.

[23]  M. Newman 1 Who is the best connected scientist ? A study of scientific coauthorship networks , 2004 .

[24]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[25]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[26]  Mehmet M. Dalkilic,et al.  Using Compression to Identify Classes of Inauthentic Texts , 2006, SDM.