A social voting approach for scientific domain vocabularies construction

Scientific domain vocabularies play an important role in academic communication and lean research management. Confronted with the dramatic increasing of new keywords, the continuous development of a domain vocabulary is important for the domain to keep its long survival in the scientific context. Current methods based either on statistical or linguistic approaches can automatically generate vocabularies that consist of popular keywords, but these approaches fail to capture high-quality standardized terms due to the lack of human intervention. Manual methods take use of human knowledge, but they are both time-consuming and expensive. In order to overcome these deficiencies, this research proposes a novel social voting approach to construct scientific domain vocabularies. It integrates automatic system and human knowledge based on the theory of linguistic arbitrariness and selects widely accepted standardized set of keywords based on social voting. A social voting system has been implemented to aid scientific domain vocabulary construction in the National Natural Science Foundation of China. Two experiments are conducted to demonstrate the effectiveness and validity of the built system. The results show that the constructed domain vocabulary using this system covers a wide range of areas under a discipline and it facilitates the standardization of scientific terminology.

[1]  W. H. Carpenter,et al.  The Study of Language , 2019 .

[2]  魏屹东,et al.  Scientometrics , 2018, Encyclopedia of Big Data.

[3]  Raquel Hervás,et al.  Assessing the influence of personal preferences on the choice of vocabulary for natural language generation , 2013, Inf. Process. Manag..

[4]  Raymond Y. K. Lau,et al.  A multi-faceted method for science classification schemes (SCSs) mapping in networking scientific resources , 2015, Scientometrics.

[5]  Suzanne Rivard,et al.  An Information Systems Keyword Classification Scheme , 1988, MIS Q..

[6]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[7]  William E. Moen,et al.  Automatic keyword extraction for learning object repositories , 2008, ASIST.

[8]  S. Garrod,et al.  How Groups Co-ordinate their Concepts and Terminology: Implications for Medical Informatics , 1998, Methods of Information in Medicine.

[9]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[10]  Michael K. Buckland,et al.  Vocabulary as a Central Concept in Library and Information Science , 1999, CoLIS.

[11]  Jian Ma,et al.  A Multilevel Information Mining Approach for Expert Recommendation in Online Scientific Communities , 2015, Comput. J..

[12]  F. Saussure,et al.  Course in General Linguistics , 1960 .

[13]  Gerd Wagner,et al.  Vocabularies, ontologies, and rules for enterprise and business process modeling and management , 2010, Inf. Syst..

[14]  Yaakov HaCohen-Kerner,et al.  AUTOMATIC MACHINE LEARNING OF KEYPHRASE EXTRACTION FROM SHORT HTML DOCUMENTS WRITTEN IN HEBREW , 2007, Cybern. Syst..

[15]  Marcus Spies An ontology modelling perspective on business reporting , 2010, Inf. Syst..

[16]  Fei Liu,et al.  A Supervised Framework for Keyword Extraction From Meeting Transcripts , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[18]  Paul Nation,et al.  Identifying technical vocabulary , 2004 .

[19]  Sungjoo Lee,et al.  Development and application of a keyword-based knowledge map for effective R&D planning , 2010, Scientometrics.

[20]  Emily Gallup Fayen,et al.  Guidelines for the construction, format, and management of monolingual controlled vocabularies : A revision of ANSI/NISO Z39.19 for the 21st century , 2007 .

[21]  On the Arbitrariness of Linguistic Signs , 2010 .

[22]  Paola Velardi,et al.  Text Mining Techniques to Automatically Enrich a Domain Ontology , 2003, Applied Intelligence.

[23]  Marcie Zaharee Building controlled vocabularies for metadata harmonization , 2013 .

[24]  David Reitter,et al.  How groups develop a specialized domain vocabulary: A cognitive multi-agent model , 2011, Cognitive Systems Research.

[25]  Gordon W. Paynter,et al.  Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications , 2002, J. Assoc. Inf. Sci. Technol..

[26]  Kathrin M. Möslein,et al.  Towards Research Collaboration - a Taxonomy of Social Research Network Sites , 2010, AMCIS.

[27]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[28]  Jennifer S. Pardo,et al.  On phonetic convergence during conversational interaction. , 2006, The Journal of the Acoustical Society of America.

[29]  Jennifer E. Rowley,et al.  The controlled versus natural indexing languages debate revisited: a perspective on information retrieval practice and research , 1994, J. Inf. Sci..

[30]  Ferdinand de Saussure Course in General Linguistics , 1916 .

[31]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[32]  Hyoung-Joo Kim,et al.  Adaptive Partitioned Indexes for Efficient XML Keyword Search , 2007, J. Res. Pract. Inf. Technol..

[33]  Yaakov HaCohen-Kerner,et al.  Automatic Extraction and Learning of Keyphrases from Scientific Articles , 2005, CICLing.

[34]  Hoang Pham,et al.  Weighted voting systems , 1999 .

[35]  Claire François,et al.  A concept for inferring ‘frontier research’ in grant proposals , 2013, Scientometrics.