Towards Automatic Discovery of co-authorship Networks in the Brazilian Academic Areas

In Brazil, individual curricula vitae of academic researchers, that are mainly composed of professional information and scientific productions, are managed into a single software platform called Lattes. Currently, the information gathered from this platform is typically used to evaluate, analyze and document the scientific productions of Brazilian research groups. Despite the fact that the Lattes curricula has semi-structured information, the analysis procedure for medium and large groups becomes a time consuming and highly error-prone task. In this paper, we describe an extension of the script Lattés (an open-source knowledge extraction system from the Lattes platform), for analysing individuals Lattes curricula and automatically discover large-scale co-authorship networks for any academic area. Given some knowledge domain (academic area), the system automatically allows to identify researchers associated with the academic area, extract every list of scientific productions of the researchers, discretized by type and publication year, and for each paper, identify the co-authors registered in the Lattes Platform. The system also allows the generation of different types of networks which may be used to study the characteristics of academic areas at large scale. In particular, we explored the node's degree and Author Rank measures for each identified researcher. Finally, we confirm through experiments that the system facilitates a simple way to generate different co-authorship networks. To the best of our knowledge, this is the first study to examine large-scale co-authorship networks for any Brazilian academic area.

[1]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[2]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[3]  Scott Nicholson,et al.  The basis for bibliomining: Frameworks for bringing together usage-based data mining and bibliometrics through data warehousing in digital library services , 2006, Inf. Process. Manag..

[4]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[5]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[6]  Won-Kyung Sung,et al.  On co-authorship for author disambiguation , 2009, Inf. Process. Manag..

[7]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[8]  Bin Wu,et al.  Characterizing the evolution of collaboration network , 2009, CIKM-SWSM.

[9]  Roberto Marcondes Cesar Junior,et al.  scriptLattes: an open-source knowledge extraction system from the Lattes platform , 2009, Journal of the Brazilian Computer Society.

[10]  Loet Leydesdorff,et al.  Can scientific journals be classified in terms of aggregated journal-journal citation relations using the Journal Citation Reports? , 2009, J. Assoc. Inf. Sci. Technol..

[11]  Alexander Weber,et al.  Analysing Social Networks Within Bibliographical Data , 2006, DEXA.

[12]  Loet Leydesdorff,et al.  Betweenness centrality as an indicator of the interdisciplinarity of scientific journals , 2007, J. Assoc. Inf. Sci. Technol..

[13]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[14]  John Scott Social Network Analysis , 1988 .

[15]  Cristiane V. Amorin Organização do currículo: plataforma Lattes , 2003 .

[16]  Almerima Jamakovic,et al.  On the relationships between topological measures in real-world networks , 2008, Networks Heterog. Media.

[17]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Louisa Sadler Current issues in parsing technology , 2004, Machine Translation.

[19]  Cristiane V Amorin [Curriculum vitae organization: the Lattes software platform]. , 2003, Pesquisa odontologica brasileira = Brazilian oral research.

[20]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[21]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[22]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[23]  Sônia Elisa Caregnato,et al.  Co-autoria como indicador de redes de colaboração científica , 2008 .