POLEMAD-A database for the multimodal analysis of Polish pronunciation

Abstract The structure and functionality of the POLEMAD database constructed on the basis of a study using Electromagnetic Articulograph AG 500, an acoustic camera, and 3 video cameras are described in the paper. The article describes also data types stored in the database including speaker data, EMA data, video and sound recordings, phonetic information, and dynamic Bayesian network (DBN) models. The database allows for selective extraction of various types of samples for further analysis, which is performed by SQL queries generated in MATLAB® using Database Toolbox™. The possibilities of potential future application of the database in statistical analysis and automation of experiments on speech inversion using DBN are described in the paper as well.

[1]  Anita Lorenc,et al.  Correlational and regressive analysis of the relationship between tongue and lips motion — An EMA and video study of selected polish speech sounds , 2017, 2017 MIXDES - 24th International Conference "Mixed Design of Integrated Circuits and Systems.

[2]  Lei Xie,et al.  A coupled HMM approach to video-realistic speech animation , 2007, Pattern Recognit..

[3]  An Ji,et al.  The Electromagnetic Articulography Mandarin Accented English (EMA-MAE) corpus of acoustic and 3D articulatory kinematic data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Christoph Draxler,et al.  A cross-database comparison of two large German speech databases , 2015, ICPhS.

[5]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.

[6]  Anita Lorenc,et al.  An acoustic camera approach to studying nasality in speech: The case of Polish nasalized vowels. , 2018, The Journal of the Acoustical Society of America.

[7]  Anita Lorenc,et al.  Kinematic analysis of articulatory movements in polish affricates consonants , 2016, 2016 International Conference on Signals and Electronic Systems (ICSES).

[8]  Phil Hoole,et al.  Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus , 2011, INTERSPEECH.

[9]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[10]  Hedvig Kjellström,et al.  Audiovisual-to-articulatory inversion , 2009, Speech Commun..

[11]  W. Jassem,et al.  Polish , 1963, Nature.

[12]  Lya Meister,et al.  Multimodal Corpus of Speech Production: Work in Progress , 2012, Baltic HLT.

[13]  M H Cohen,et al.  Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. , 1992, The Journal of the Acoustical Society of America.

[14]  Björn Granström,et al.  Resynthesis of Facial and Intraoral Articulation fromSimultaneous Measurements , 2003 .

[15]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[16]  R. Swiecinski,et al.  Fusing the electromagnetic articulograph, high-speed video cameras and a 16-channel microphone array for speech analysis , 2018 .

[17]  Abeer Alwan,et al.  On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics , 2002, EURASIP J. Adv. Signal Process..

[18]  Anita Lorenc,et al.  Speech inversion by dynamic time warping method , 2016, 2016 International Conference on Signals and Electronic Systems (ICSES).

[19]  Anita Lorenc,et al.  Acoustic Field Distribution in Speech with the use of the Microphone Array , 2017 .

[20]  THE ARTICULATION OF SECONDARILY PALATALIZED CORONALS IN POLISH , 1999 .

[21]  Mirko Grimaldi,et al.  Numerical instabilities and three-dimensional electromagnetic articulography. , 2012, The Journal of the Acoustical Society of America.

[22]  Anita Lorenc,et al.  The EMA study on the inter-individual variability and differences in articulation between polish oral and nasalised vowels , 2017 .

[23]  Anita Lorenc,et al.  Detecting laterality and nasality in speech with the use of a multi-channel recorder , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Phil Hoole,et al.  Five-dimensional articulography , 2009 .

[25]  Anita Lorenc,et al.  Choice of optimal measurement conditions for calculating the correlation between EMA sensor and video marker position coordinates in electromagnetic articulography , 2017, 2017 International Conference on Systems, Signals and Image Processing (IWSSIP).

[26]  Mariusz Ziólko,et al.  Triphone Statistics for Polish Language , 2009, LTC.

[27]  João Paulo Teixeira,et al.  CENTERIS 2013-Conference on ENTERprise Information Systems / HCIST 2013-International Conference on Health and Social Care Information Systems and Technologies Vocal Acoustic Analysis-Jitter , Shimmer and HNR Parameters , 2013 .

[28]  R. Sataloff,et al.  Diagnosis and Treatment of Voice Disorders , 2014 .

[29]  R. Swiecinski An EMA Study of Articulatory Settings in Polish Speakers of English , 2013 .

[30]  Petros Maragos,et al.  Audiovisual-to-Articulatory Speech Inversion Using HMMs , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.

[31]  Anita Lorenc,et al.  Assessment of sound laterality with the use of a multi-channel recorder , 2015, ICPhS.