CAEMSI : A Cross-Domain Analytic Evaluation Methodology for Style Imitation

We propose CAEMSI, a cross-domain analytic evaluation methodology for Style Imitation (SI) systems, based on a set of statistical significance tests that allow hypotheses comparing two corpora to be tested. Typically, SI systems are evaluated using human participants, however, this type of approach has several weaknesses. For humans to provide reliable assessments of an SI system, they must possess a sufficient degree of domain knowledge, which can place significant limitations on the pool of participants. Furthermore, both human bias against computer-generated artifacts, and the variability of participants’ assessments call the reliability of the results into question. Most importantly, the use of human participants places limitations on the number of generated artifacts and SI systems which can be feasibly evaluated. Directly motivated by these shortcomings, CAEMSI provides a robust and scalable approach to the evaluation problem. Normalized Compression Distance, a domain-independent distance metric, is used to measure the distance between individual artifacts within a corpus. The difference between corpora is measured using test statistics derived from these inter-artifact distances, and permutation testing is used to determine the significance of the difference. We provide empirical evidence validating the statistical significance tests, using datasets from two distinct domains.

[1]  Graeme Ritchie,et al.  Some Empirical Criteria for Attributing Creativity to a Computer Program , 2007, Minds and Machines.

[2]  Luigi Salmaso,et al.  Union–intersection permutation solution for two-sample equivalence testing , 2016, Stat. Comput..

[3]  Ahmed M. Elgammal,et al.  CAN: Creative Adversarial Networks, Generating "Art" by Learning About Styles and Deviating from Style Norms , 2017, ICCC.

[4]  S. N. Roy On a Heuristic Method of Test Construction and its use in Multivariate Analysis , 1953 .

[5]  Ronald de Wolf,et al.  Algorithmic clustering of music , 2003, Proceedings of the Fourth International Conference onWeb Delivering of Music, 2004. EDELMUSIC 2004..

[6]  François Pachet,et al.  Deep learning for music generation: challenges and directions , 2017, Neural Computing and Applications.

[7]  Christopher Ariza,et al.  The Interrogator as Critic: The Turing Test and the Evaluation of Generative Music Systems , 2009, Computer Music Journal.

[8]  Marcus T. Pearce,et al.  The construction and evaluation of statistical models of melodic structure in music perception and composition , 2005 .

[9]  Geraint A. Wiggins,et al.  Motivations and Methodologies for Automation of the Compositional Process , 2002 .

[10]  Luís Fabrício Wanderley Góes,et al.  Regent-Dependent Creativity: A Domain Independent Metric for the Assessment of Creative Artifacts , 2016, ICCC.

[11]  James C. Kaufman,et al.  Expertise, Domains, and the Consensual Assessment Technique. , 2009 .

[12]  Paul J. Laurienti,et al.  A permutation testing framework to compare groups of brain networks , 2013, Front. Comput. Neurosci..

[13]  Ewa Dahlig Judgments of humans and machine authorship in real and artificial folksongs , 1998 .

[14]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[15]  Marco Marozzi,et al.  Some remarks about the number of permutations one should consider to perform a permutation test , 2004 .

[16]  Markus Schedl,et al.  The neglected user in music information retrieval research , 2013, Journal of Intelligent Information Systems.

[17]  Dan Ventura,et al.  Accounting for Bias in the Evaluation of Creative Computational Systems: An Assessment of DARCI , 2015, ICCC.

[18]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[19]  András Kocsor,et al.  Sequence analysis Application of compression-based distance measures to protein sequence classification : a methodological study , 2005 .

[20]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.

[21]  Kimmo Kettunen,et al.  Normalized Compression Distance as automatic MT evaluation metric , 2009 .

[22]  T. Eerola,et al.  Perceived complexity of western and African folk melodies by western and African listeners , 2006 .

[23]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[24]  Stefan Axelsson Using Normalized Compression Distance for Classifying File Fragments , 2010, 2010 International Conference on Availability, Reliability and Security.

[25]  Mary Lou Maher,et al.  Evaluating creativity in humans, computers, and collectively intelligent systems , 2010, DESIRE.

[26]  Charles L. A. Clarke,et al.  Human Competence in Creativity Evaluation , 2015, ICCC.

[27]  Yi-Hsuan Yang,et al.  MidiNet: A Convolutional Generative Adversarial Network for Symbolic-Domain Music Generation , 2017, ISMIR.

[28]  Douglas H. Fisher,et al.  USING AI TO EVALUATE CREATIVE DESIGNS , 2012 .

[29]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[30]  Ming Li,et al.  Genre Classification via an LZ78-Based String Kernel , 2005, ISMIR.

[31]  Kevin Burns,et al.  Computing the creativeness of amusing advertisements: A Bayesian model of Burma-Shave's muse , 2014, Artificial Intelligence for Engineering Design, Analysis and Manufacturing.

[32]  T. M. Amabile Social psychology of creativity: A consensual assessment technique. , 1982 .

[33]  Philippe Pasquier,et al.  Investigating Listener Bias Against Musical Metacreativity , 2016, ICCC.

[34]  Christa L. Taylor,et al.  Exploring emotional responses to computationally-created music. , 2014 .

[35]  Mateu Sbert,et al.  Image registration by compression , 2010, Inf. Sci..

[36]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[37]  Geraint A. Wiggins,et al.  Evaluating Cognitive Models of Musical Composition , 2007 .

[38]  Jamie Shotton,et al.  Automatic Stylistic Composition of Bach Chorales with Deep LSTM , 2017, ISMIR.

[39]  Arne Eigenfeldt,et al.  An Introduction to Musical Metacreation , 2016, CIE.