Science and Ethnicity: How Ethnicities Shape the Evolution of Computer Science Research Community

Globalization and the world wide web has resulted in academia and science being an international and multicultural community forged by researchers and scientists with different ethnicities. How ethnicity shapes the evolution of membership, status and interactions of the scientific community, however, is not well understood. This is due to the difficulty of ethnicity identification at the large scale. We use name ethnicity classification as an indicator of ethnicity. Based on automatic name ethnicity classification of 1.7+ million authors gathered from Web, the name ethnicity of computer science scholars is investigated by population size, publication contribution and collaboration strength. By showing the evolution of name ethnicity from 1936 to 2010, we discover that ethnicity diversity has increased significantly over time and that different research communities in certain publication venues have different ethnicity compositions. We notice a clear rise in the number of Asian name ethnicities in papers. Their fraction of publication contribution increases from approximately 10% to near 50% from 1970 to 2010. We also find that name ethnicity acts as a homophily factor on coauthor networks, shaping the formation of coauthorship as well as evolution of research communities.

[1]  James W. Gentry,et al.  Ethnic consumer reaction to targeted marketing: A theory of intercultural accommodation , 1999 .

[2]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[3]  William R. Kerr,et al.  Ethnic Scientific Communities and International Technology Diffusion , 2007, The Review of Economics and Statistics.

[4]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[5]  M. G. Smith Ethnicity and ethnic groups in America: The view from Harvard , 1982 .

[6]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[7]  Jie Tang,et al.  Inferring social ties across heterogenous networks , 2012, WSDM '12.

[8]  K. Fiscella,et al.  Use of geocoding and surname analysis to estimate race and ethnicity. , 2006, Health services research.

[9]  C. Lee Giles,et al.  Collaboration over time: characterizing and modeling network evolution , 2008, WSDM '08.

[10]  M. Levandowsky,et al.  Modeling Nature: Episodes in the History of Population Ecology , 1985 .

[11]  Laurel L. Haak,et al.  Race, Ethnicity, and NIH Research Awards , 2011, Science.

[12]  J Y Mortimer,et al.  'Soundex' codes of surnames provide confidentiality and accuracy in a national HIV database. , 1995, Communicable disease report. CDR review.

[13]  Xiaolong Zhang,et al.  CollabSeer: a search engine for collaboration discovery , 2011, JCDL '11.

[14]  Wolfgang Glänzel,et al.  National characteristics in international scientific co-authorship relations , 2004, Scientometrics.

[15]  O. Persson,et al.  Understanding Patterns of International Scientific Collaboration , 1992 .

[16]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[17]  Raj Bhopal,et al.  Limitations and potential of country of birth as proxy for ethnic group , 2005, BMJ : British Medical Journal.

[18]  Lawrence Philips,et al.  The double metaphone search algorithm , 2000 .

[19]  S. Cochran,et al.  Classification of race and ethnicity: implications for public health. , 2003, Annual review of public health.

[20]  Massimo Franceschet,et al.  Collaboration in computer science: a network science approach. Part II , 2011, ArXiv.

[21]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  C. Lee Giles,et al.  Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching , 2012, AAAI.

[23]  Lars Backstrom,et al.  ePluribus: Ethnicity on Social Networks , 2010, ICWSM.

[24]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[25]  Steven Skiena,et al.  Name-ethnicity classification from open sources , 2009, KDD.

[26]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[27]  Sorting and searching" the art of computer programming , 1973 .

[28]  Marjori Matzke,et al.  F1000Prime recommendation of An index to quantify an individual's scientific research output. , 2005 .

[29]  Jiawei Han,et al.  Community Mining from Multi-relational Networks , 2005, PKDD.

[30]  C. Norman Scientific collaboration in the middle East. , 1982, Science.

[31]  P. Mateos A review of name-based ethnicity classification methods and their potential in population studies , 2007 .

[32]  R. Bhopal,et al.  Glossary of terms relating to ethnicity and race: for reflection and debate , 2004, Journal of Epidemiology and Community Health.

[33]  A. Coldman,et al.  The classification of ethnic status using name information. , 1988, Journal of epidemiology and community health.

[34]  R. Bhopal,et al.  Ethnicity as a variable in epidemiological research , 1994, BMJ.

[35]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[36]  Stephanie D. Teasley,et al.  Scientific Collaborations at a Distance , 2001, Science.

[37]  Virgílio A. F. Almeida,et al.  A geographical analysis of knowledge production in computer science , 2009, WWW '09.