论文信息 - Using Ontology-based Approaches to Representing Speech Transcripts for Automated Speech Scoring - 字舞流文

Using Ontology-based Approaches to Representing Speech Transcripts for Automated Speech Scoring

This paper presents a thesis proposal on approaches to automatically scoring non-native speech from second language tests. Current speech scoring systems assess speech by primarily using acoustic features such as fluency and pronunciation; however content features are barely involved. Motivated by this limitation, the study aims to investigate the use of content features in speech scoring systems. For content features, a central question is how speech content can be represented in appropriate means to facilitate automated speech scoring. The study proposes using ontology-based representation to perform concept level representation on speech transcripts, and furthermore the content features computed from ontology-based representation may facilitate speech scoring. One baseline and two ontology-based representations are compared in experiments. Preliminary results show that ontology-based representation slightly improves performance of one content feature for automated scoring over the baseline system.

Miao Chen | Miao Chen

[1] Christoph Meinel,et al. E-Librarian Service - User-Friendly Semantic Search in Digital Libraries , 2011, X.media.publishing.

[2] Bob Rehder,et al. How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans , 1997 .

[3] M. Banerjee,et al. Beyond kappa: A review of interrater agreement measures , 1999 .

[4] Steffen Staab,et al. Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[5] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[6] Jill Burstein,et al. AUTOMATED ESSAY SCORING WITH E‐RATER® V.2.0 , 2004 .

[7] Heting Chu. Information Representation and Retrieval in the Digital Age , 2003 .

[8] Xiaoming Xi,et al. Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[9] Paul M. B. Vitányi,et al. The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10] Dan Roth,et al. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines) , 2012, LREC.

[11] Thomas R. Gruber,et al. A Translation Approach to Portable Ontologies , 1993 .

[12] E. M. Adams. Ontological Investigations: An Inquiry into the Categories of Nature, Man and Society , 1991 .

[13] L. Boves,et al. Quantitative assessment of second language learners' fluency: comparisons between read and spontaneous speech. , 2002, The Journal of the Acoustical Society of America.

[14] Lillian Lee,et al. Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[15] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[16] N. F. Noy,et al. Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[17] Michael Gruninger,et al. Methodology for the Design and Evaluation of Ontologies , 1995, IJCAI 1995.

[18] Timo Honkela,et al. WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[19] Salvatore Valenti,et al. An Overview of Current Research on Automated Essay Grading , 2003, J. Inf. Technol. Educ..

[20] Margarita P. Steinel,et al. FACETS OF SPEAKING PROFICIENCY , 2012, Studies in Second Language Acquisition.

[21] Asunción Gómez-Pérez,et al. Ontology Specification Languages for the Semantic Web , 2002, IEEE Intell. Syst..

[22] Gilles Bisson,et al. Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[23] Steffen Staab,et al. Combining Data-Driven and Semantic Approaches for Text Mining , 2011, Foundations for the Web of Information and Services.

[24] Timothy W. Finin,et al. Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[25] Noam Chomsky,et al. वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[26] Liang Chen,et al. A new differential LSI space-based probabilistic document classifier , 2003, Inf. Process. Lett..

[27] Klaus Zechner,et al. Exploring Content Features for Automated Speech Scoring , 2012, HLT-NAACL.

[28] Samuel Kaski,et al. Computationally Efficient Approximation of a Probabilistic Model for Document Representation in the WEBSOM Full-Text Analysis Method , 1997 .

[29] David D. Lewis,et al. Representation Quality in Text Classification: An Introduction and Experiment , 1990, HLT.

[30] Xiaoming Xi,et al. AUTOMATED SCORING OF SPONTANEOUS SPEECH USING SPEECHRATERSM V1.0 , 2008 .

[31] Jia Zeng,et al. A “stereo” document representation for textual information retrieval , 2006 .

[32] Steffen Staab,et al. What Is an Ontology? , 2009, Handbook on Ontologies.

[33] Steffen Staab,et al. Text Clustering Based on Background Knowledge , 2003 .

[34] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[35] Klaus Zechner,et al. Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech , 2011, ACL.

[36] Jens Lehmann,et al. DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[37] David M. Blei,et al. Probabilistic topic models , 2012, Commun. ACM.

[38] Stan Matwin,et al. Feature Engineering for Text Classification , 1999, ICML.

[39] P. Schmitz,et al. Inducing Ontology from Flickr Tags , 2006 .

[40] Pedro M. Domingos. A few useful things to know about machine learning , 2012, Commun. ACM.

[41] Jian Qin,et al. Semantic Relation Extraction from Socially-Generated Tags: A Methodology for Metadata Generation , 2008, Dublin Core Conference.

[42] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .

[43] Lyle F. Bachman. 语言测试要略 = Fundamental considerations in language testing , 1990 .

[44] Jill Burstein,et al. The E-rater® scoring engine: Automated essay scoring with natural language processing. , 2003 .

[45] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[46] Semire Dikli,et al. An Overview of Automated Scoring of Essays. , 2006 .

[47] Stephan Bloehdorn,et al. Boosting for Text Classification with Semantic Features , 2004, WebKDD.

[48] M. Canale. From communicative competence to communicative language pedagogy , 2014 .

[49] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[50] Haim Levkowitz,et al. Introduction to information retrieval (IR) , 2008 .

[51] Ted Pedersen,et al. WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[52] Evgeniy Gabrilovich,et al. Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[53] Lyle F. Bachman,et al. Language testing in practice : designing and developing useful language tests , 1996 .

[54] Wei-Ying Ma,et al. Locality preserving indexing for document representation , 2004, SIGIR '04.

[55] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[56] Philip Resnik,et al. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[57] Doug Downey,et al. Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[58] HigginsDerrick,et al. Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009 .

[59] Kevin Knight,et al. Toward Distributed Use of Large-Scale Ontologies t , 1997 .

[60] John F. Sowa,et al. Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[61] Lyle F. Bachman. Statistical analyses for language assessment , 2004 .

[62] Jaime G. Carbonell,et al. Document Representation and Query Expansion Models for Blog Recommendation , 2008, ICWSM.

[63] Jill Burstein,et al. Automated Essay Scoring : A Cross-disciplinary Perspective , 2003 .

[64] William M. Pottenger,et al. A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[65] Anne Cutler,et al. A theory of lexical access in speech production , 1999, Behavioral and Brain Sciences.

[66] W. Levelt. Speaking: From Intention to Articulation , 1990 .

[67] Klaus Zechner,et al. Using an Ontology for Improved Automated Content Scoring of Spontaneous Non-Native Speech , 2012, BEA@NAACL-HLT.

[68] M. Swain,et al. THEORETICAL BASES OF COMMUNICATIVE APPROACHES TO SECOND LANGUAGE TEACHING AND TESTING , 1980 .

[69] Benno Stein,et al. Insights into explicit semantic analysis , 2011, CIKM '11.

[70] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[71] Daoud Clarke,et al. A Context-Theoretic Framework for Compositionality in Distributional Semantics , 2011, Computational Linguistics.

[72] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[73] David D. Lewis,et al. Representation and Learning in Information Retrieval , 1991 .

[74] Ian H. Witten,et al. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[75] Roger B. Bradford,et al. An empirical study of required dimensionality for large-scale latent semantic indexing applications , 2008, CIKM '08.

[76] Gary Marchionini,et al. A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[77] Peter W. Foltz,et al. Automated Essay Scoring: Applications to Educational Technology , 1999 .

[78] T. McNamara,et al. Assessed Levels of Second Language Speaking Proficiency: How Distinct? , 2007 .

[79] L. Boves,et al. Quantitative assessment of second language learners' fluency by means of automatic speech recognition technology. , 2000, The Journal of the Acoustical Society of America.

[80] Jian Cheng,et al. Validating automated speaking tests , 2010 .

[81] Hans-Michael Müller,et al. Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[82] Carola Eschenbach,et al. Formal Ontology in Information Systems , 2008 .

[83] Michael K. Buckland,et al. Information as Thing , 1991 .

[84] Hussein A. Abbass,et al. A Comparative Study for Domain Ontology Guided Feature Extraction , 2003, ACSC.

[85] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[86] David M. Blei,et al. Introduction to Probabilistic Topic Models , 2010 .

[87] Edel Garcia. Latent Semantic Indexing (LSI) A Fast Track Tutorial , 2006 .

[88] Simone Paolo Ponzetto,et al. WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[89] Levent Özgür,et al. Text Categorization with Class-Based and Corpus-Based Keyword Selection , 2005, ISCIS.

[90] W. Bruce Croft,et al. Search Engines - Information Retrieval in Practice , 2009 .

[91] Vesna Bagarić,et al. DEFINING COMMUNICATIVE COMPETENCE , 2007 .

[92] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[93] Dekang Lin,et al. An Information-Theoretic Definition of Similarity , 1998, ICML.

[94] Marina Dodigovic,et al. Speech Processing Technology in Second Language Testing , 2009 .

[95] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .