Predicting podcast preference: An analysis framework and its application

Finding worthwhile podcasts can be difficult for listeners since podcasts are published in large numbers and vary widely with respect to quality and repute. Independently of their informational content, certain podcasts provide satisfying listening material while other podcasts have little or no appeal. In this paper we present PodCred, a framework for analyzing listener appeal, and we demonstrate its application to the task of automatically predicting the listening preferences of users. First, we describe the PodCred framework, which consists of an inventory of factors contributing to user perceptions of the credibility and quality of podcasts. The framework is designed to support automatic prediction of whether or not a particular podcast will enjoy listener preference. It consists of four categories of indicators related to the Podcast Content, the Podcaster, the Podcast Context, and the Technical Execution of the podcast. Three studies contributed to the development of the PodCred framework: a review of the literature on credibility for other media, a survey of prescriptive guidelines for podcasting, and a detailed data analysis. Next, we report on a validation exercise in which the PodCred framework is applied to a real-world podcast preference prediction task. Our validation focuses on select framework indicators that show promise of being both discriminative and readily accessible. We translate these indicators into a set of easily extractable “surface” features and use them to implement a basic classification system. The experiments carried out to evaluate system use popularity levels in iTunes as ground truth and demonstrate that simple surface features derived from the PodCred framework are indeed useful for classifying podcasts. © 2010 Wiley Periodicals, Inc.

[1]  M. de Rijke,et al.  Credibility Improves Topical Blog Post Retrieval , 2008, ACL.

[2]  M. de Rijke,et al.  PodCred: a framework for analyzing podcast preference , 2008, WICOW '08.

[3]  Soo Young Rieh Judgement of information quality and cognitive authority in the Web , 2002 .

[4]  Matthew J. Salganik,et al.  Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market , 2006, Science.

[5]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[6]  Kerry Matthews RESEARCH INTO PODCASTING TECHNOLOGY INCLUDING CURRENT AND POSSIBLE FUTURE USES , 2006 .

[7]  M. de Rijke,et al.  Exploiting Surface Features for the Prediction of Podcast Preference , 2009, ECIR.

[8]  Laurie J. Patterson The Technology Underlying Podcasts , 2006, Computer.

[9]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[10]  Max Mühlhäuser,et al.  Automatically Assessing the Post Quality in Online Discussions on Software , 2007, ACL.

[11]  Masataka Goto,et al.  Automatic transcription for a web 2.0 service to search podcasts , 2007, INTERSPEECH.

[12]  Thijs Westerveld,et al.  Surface Features in Video Retrieval , 2005, Adaptive Multimedia Retrieval.

[13]  Sydney Jones,et al.  Podcast downloading 2008 , 2008 .

[14]  Òscar Celma,et al.  ZemPod: A semantic web approach to podcasting , 2008, J. Web Semant..

[15]  Miriam J. Metzger Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research , 2007 .

[16]  Nicholas J. Belkin,et al.  Understanding Judgment of Information Quality and Cognitive Authority in the WWW , 1998 .

[17]  George Ghinea,et al.  Quality of perception: user quality of service in multimedia presentations , 2005, IEEE Transactions on Multimedia.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[19]  Soo Young Rieh,et al.  Credibility: A multidisciplinary framework , 2007 .

[20]  Gilad Mishne,et al.  Applied text analytics for blogs , 2007 .

[21]  Soo-Min Kim,et al.  Automatically Assessing Review Helpfulness , 2006, EMNLP.

[22]  B. J. Fogg,et al.  Credibility and computing technology , 1999, CACM.

[23]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[24]  M. de Rijke,et al.  Using Coherence-Based Measures to Predict Query Difficulty , 2008, ECIR.

[25]  Elizabeth D. Liddy,et al.  Assessing Credibility of Weblogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[26]  Soo Young Rieh,et al.  Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in context , 2008, Inf. Process. Manag..

[27]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[28]  George Ghinea,et al.  Measuring quality of perception in distributed multimedia: Verbalizers vs. imagers , 2008, Comput. Hum. Behav..

[29]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[30]  Miriam J. Metzger,et al.  Credibility for the 21st Century: Integrating Perspectives on Source, Message, and Media Credibility in the Contemporary Media Environment , 2003 .

[31]  F. V. Gils,et al.  PodVinder : spoken document retrieval for Dutch pod- and vodcasts , 2008 .

[32]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.