Predicting Audio Advertisement Quality

Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.

[1]  Andreas F. Ehmann,et al.  Modeling musical rhythmatscale with the music Genome project , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[2]  Oriol Nieto,et al.  Music segment similarity using 2D-Fourier Magnitude Coefficients , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Fabrizio Silvestri,et al.  Promoting Positive Post-Click Experience for In-Stream Yahoo Gemini Users , 2015, KDD.

[4]  D. Ellis Beat Tracking by Dynamic Programming , 2007 .

[5]  Bruno Fazenda,et al.  Perception of Audio Quality in Productions of Popular Music , 2016 .

[6]  Meinard Müller,et al.  Fundamentals of Music Processing , 2015, Springer International Publishing.

[7]  Yannis Stylianou,et al.  Scale Transform in Rhythmic Similarity of Music , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[9]  Emilia Gómez,et al.  Tonal Description of Polyphonic Audio for Music Content Processing , 2006, INFORMS J. Comput..

[10]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[11]  Hongtao Lu,et al.  Deep CTR Prediction in Display Advertising , 2016, ACM Multimedia.

[12]  Dustin Hillard,et al.  A predictive model for advertiser value-per-click in sponsored search , 2013, WWW.

[13]  Hema Raghavan,et al.  Improving ad relevance in sponsored search , 2010, WSDM '10.

[14]  Klaus Seyerlehner FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION , 2010 .

[15]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[16]  Elaine Toms,et al.  The development and evaluation of a survey to measure user engagement , 2010, J. Assoc. Inf. Sci. Technol..

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Jun Wang,et al.  Deep Learning over Multi-field Categorical Data - - A Case Study on User Response Prediction , 2016, ECIR.

[19]  Ke Zhou,et al.  Predicting Pre-click Quality for Native Advertisements , 2016, WWW.

[20]  Rómer Rosales,et al.  Post-click conversion modeling and analysis for non-guaranteed delivery display advertising , 2012, WSDM '12.

[21]  G. Widmer,et al.  MAXIMUM FILTER VIBRATO SUPPRESSION FOR ONSET DETECTION , 2013 .

[22]  Peter Knees,et al.  Introduction to Music Similarity and Retrieval , 2016 .

[23]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[24]  Meinard Mller,et al.  Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications , 2015 .

[25]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[26]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[27]  Jeffrey J. Scott,et al.  MUSIC EMOTION RECOGNITION: A STATE OF THE ART REVIEW , 2010 .

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Fabrizio Silvestri,et al.  Improving Post-Click User Engagement on Native Ads via Survival Analysis , 2016, WWW.

[32]  Ryen W. White,et al.  Modeling dwell time to predict click-level satisfaction , 2014, WSDM.

[33]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[34]  Chong Wang,et al.  Viewability Prediction for Online Display Ads , 2015, CIKM.

[35]  Andreas F. Ehmann,et al.  Modeling Genre with the Music Genome Project: Comparing Human-Labeled Attributes and Audio Features , 2015, ISMIR.

[36]  Joshua D. Reiss,et al.  Intelligent systems for mixing multichannel audio , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[37]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[38]  Bob L. Sturm The State of the Art Ten Years After a State of the Art: Future Research in Music Information Retrieval , 2013, ArXiv.

[39]  Wentong Li,et al.  Estimating conversion rate in display advertising from past erformance data , 2012, KDD.

[40]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[41]  Xavier Serra,et al.  Good-sounds.org: A Framework to Explore Goodness in Instrumental Sounds , 2016, ISMIR.

[42]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[43]  Bob L. Sturm The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use , 2013, ArXiv.