Exemplar-based complex features prediction framework

Exemplars are typically defined by set of features that may have simple or complex structures. Comparing two exemplars requires a distance calculation between their features, a task which becomes more difficult when some of these features are missing. A possible solution is to predict the missing features making use of those that are known. Prediction of features is considered a hard task in machine learning and becomes more difficult when features have a complex structure and the relationship between the features is not clearly defined. This paper presents a framework for predicting complex features based on exemplar theory. The framework presented consists of two stages. The first stage is the similarity correlation stage, in which the correlation between the distance matrices of the features is calculated to determine the relationship between missing and existing features. The second stage calculates the conditional membership probability between these features using the distance matrices; this value determines the probability that for a new example not found in the dataset for which only some features are known, an exemplar with similar features to those of missing features that can be adapted to serve as appropriate features for the new example. This paper also presents a case study for the use of the framework in the context of speech synthesis. The framework is used to investigate the relationship between duration information and the syntactic and dependency trees.

[1]  Gérard Bailly,et al.  Talking Machines: Theories, Models, and Designs , 1992 .

[2]  Gregory L Murphy,et al.  Uncertainty in category-based induction: when do people integrate across categories? , 2010, Journal of experimental psychology. Learning, memory, and cognition.

[3]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Erik D. Demaine,et al.  An Optimal Decomposition Algorithm for Tree Edit Distance , 2007, ICALP.

[6]  Gerrit Storms,et al.  Prototype and Exemplar-Based Information in Natural Language Categories , 2000 .

[7]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[8]  D. Klatt Letter: Interaction between two factors that influence vowel duration. , 1973, The Journal of the Acoustical Society of America.

[9]  Simon King,et al.  Predicting consonant duration with Bayesian belief networks , 2005, INTERSPEECH.

[10]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[11]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[12]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[13]  K. Holyoak,et al.  Analogical and category-based inference: a theoretical integration with Bayesian causal models. , 2010, Journal of experimental psychology. General.

[14]  Ben R. Newell,et al.  Inferring Properties when Categorization is Uncertain: A Feature-Conjunction Account , 2007 .

[15]  Ben R. Newell,et al.  Non-categorical approaches to property induction with uncertain categories , 2009 .

[16]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.