A novel cluster-based approach for keyphrase extraction from MOOC video lectures

Massive open online courses (MOOCs) have emerged as a great resource for learners. Numerous challenges remain to be addressed in order to make MOOCs more useful and convenient for learners. One such challenge is how to automatically extract a set of keyphrases from MOOC video lectures that can help students quickly identify the right knowledge they want to learn and thus expedite their learning process. In this paper, we propose SemKeyphrase, an unsupervised cluster-based approach for keyphrase extraction from MOOC video lectures. SemKeyphrase incorporates a new semantic relatedness metric and a ranking algorithm, called PhraseRank, that involves two phases on ranking candidates. We conducted experiments on a real-world dataset of MOOC video lectures, and the results show that our proposed approach outperforms the state-of-the-art keyphrase extraction methods.

[1]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[2]  Wei You,et al.  An automatic keyphrase extraction system for scientific documents , 2012, Knowledge and Information Systems.

[3]  Katharina Reinecke,et al.  Demographic differences in how students navigate through MOOCs , 2014, L@S.

[4]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[5]  Cornelia Caragea,et al.  Extracting Keyphrases from Research Papers Using Citation Networks , 2014, AAAI.

[6]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[7]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[8]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[9]  Maria P. Grineva,et al.  Extracting key terms from noisy and multitheme documents , 2009, WWW '09.

[10]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[11]  Armando Fox,et al.  Monitoring MOOCs: which information sources do instructors value? , 2014, L@S.

[12]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[13]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[14]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[15]  Cornelia Caragea,et al.  PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents , 2017, ACL.

[16]  Linkai Luo,et al.  An unsupervised keyphrase extraction model by incorporating structural and semantic information , 2019, Progress in Artificial Intelligence.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[19]  Mung Chiang,et al.  MOOC performance prediction via clickstream data and social learning networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[20]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[21]  Jeffrey Heer,et al.  Replication of the Keyword Extraction part of the paper "'Without the Clutter of Unimportant Words': Descriptive Keyphrases for Text Visualization" , 2019, ArXiv.

[22]  Ahmed A. Rafea,et al.  KP-Miner: Participation in SemEval-2 , 2010, *SEMEVAL.

[23]  Andreas Paepcke,et al.  YouEDU: Addressing Confusion in MOOC Discussion Forums by Recommending Instructional Video Clips , 2015, EDM.

[24]  Arijit Biswas,et al.  ViZig: Anchor Points based Non-Linear Navigation and Summarization in Educational Videos , 2016, IUI.

[25]  Luigi Di Caro,et al.  A Supervised KeyPhrase Extraction System , 2016, SEMANTICS.

[26]  Mohammad Rajiur Rahman,et al.  Automatic Identification of Keywords in Lecture Video Segments , 2020, 2020 IEEE International Symposium on Multimedia (ISM).

[27]  Branimir Boguraev,et al.  Automatic Glossary Extraction: Beyond Terminology Identification , 2002, COLING.

[28]  Linda Corrin,et al.  Visualizing patterns of student engagement and performance in MOOCs , 2014, LAK.

[29]  Timothy Baldwin,et al.  Automatic keyphrase extraction from scientific articles , 2013, Lang. Resour. Evaluation.

[30]  Xiaojian Liu,et al.  Graph-based Keyphrase Extraction Using Word and Document Em beddings* , 2020, 2020 IEEE International Conference on Knowledge Graph (ICKG).

[31]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[32]  Laurent Romary,et al.  HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID , 2010, *SEMEVAL.

[33]  Juan Martínez-Romo,et al.  SemGraph: Extracting keyphrases following a novel semantic graph‐based approach , 2016, J. Assoc. Inf. Sci. Technol..

[34]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[37]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[38]  Florian Boudin,et al.  Unsupervised Keyphrase Extraction with Multipartite Graphs , 2018, NAACL.

[39]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[40]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[41]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[42]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..