Towards automatic extraction of social networks of organizations in PubMed abstracts

Social Network Analysis (SNA) of organizations can attract great interest from government agencies and scientists for its ability to boost translational research and accelerate the process of converting research to care. For SNA of a particular disease area, we need to identify the key research groups in that area by mining the affiliation information from PubMed. This not only involves recognizing the organization names in the affiliation string, but also resolving ambiguities to identify the article with a unique organization. We present here a process of normalization that involves clustering based on local sequence alignment metrics and local learning based on finding connected components. We demonstrate the application of the method by analyzing organizations involved in angiogenensis treatment, and demonstrating the utility of the results for researchers in the pharmaceutical and biotechnology industries or national funding agencies.

[1]  R. Kahn,et al.  The Social Psychology of Organizations , 1966 .

[2]  Shankar Kumar,et al.  Normalization of non-standard words , 2001, Comput. Speech Lang..

[3]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[4]  Michael Schroeder,et al.  Inter-species normalization of gene mentions with GNAT , 2008, ECCB.

[5]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[6]  E. Balas,et al.  Managing Clinical Knowledge for Health Care Improvement , 2000, Yearbook of Medical Informatics.

[7]  Emil H Schemitsch,et al.  A randomized trial of opinion leader endorsement in a survey of orthopaedic surgeons: effect on primary response rates. , 2003, International journal of epidemiology.

[8]  N. Christakis,et al.  The Spread of Obesity in a Large Social Network Over 32 Years , 2007, The New England journal of medicine.

[9]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[10]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[11]  Muin J. Khoury,et al.  An automatic method to generate domain-specific investigator networks using PubMed abstracts , 2007, BMC Medical Informatics Decis. Mak..

[12]  Richard C. Roistacher,et al.  A Review of Mathematical Methods in Sociometry , 1974 .

[13]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[14]  Elena Parmelli,et al.  Local opinion leaders: effects on professional practice and health care outcomes. , 2011, The Cochrane database of systematic reviews.

[15]  M. Crozier,et al.  The Relationship Between Micro and Macrosociology , 1972 .

[16]  Guillaume Cleuziou,et al.  Biology Based Alignments of Paraphrases for Sentence Compression , 2007, ACL-PASCAL@ACL.

[17]  George R. Thoma,et al.  Correcting OCR text by association with historical datasets , 2003, IS&T/SPIE Electronic Imaging.

[18]  K. Weick The social psychology of organizing , 1969 .

[19]  John Scott What is social network analysis , 2010 .

[20]  Yang Jin,et al.  Automated recognition of malignancy mentions in biomedical literature , 2006, BMC Bioinformatics.

[21]  F. Glen The social psychology of organizations , 1976 .

[22]  Sérgio Anibal de Carvalho Sequence Alignment Algorithms , 2003 .

[23]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[24]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[25]  W. Bennis,et al.  The Social Psychology of Organizations , 1966 .

[26]  Valentin Jijkoun,et al.  The Impact of Named Entity Normalization on Information Retrieval for Question Answering , 2008, ECIR.