AN ALGORITHM FOR LOCATING FUNDAMENTAL FREQUENCY (F0) MARKERS IN SPEECH Princy Dikshit Old Dominion University, December 2004 Director: Dr. Stephen A. Zahorian Speech has been the principal form of human communication since it began to evolve at least one hundred thousand years ago. Speech is produced by vibrations of the vocal cords. The rate of vibration of the cords is called fundamental frequency (F0) or pitch. The objective of this thesis is to locate pitch period cycles on a cycle-by-cycle basis. The complexity in identifying pitch cycles stems from the highly irregular nature of human speech. Dynamic programming is used to combine two sources of information for pitch period marking. One source of information is the "local" information corresponding to the location and amplitude of peaks in the acoustic speech signal. The other source of information is the "transition" information corresponding to the relative closeness of the distance between the signal peaks to the expected pitch period values. The expected pitch period values are obtained from a pitch tracker (YAPT) or from the reference pitch track. The Keele speech database was used for testing purposes. Over 95% of the identified pitch cycles were within a 1ms deviation of the actual pitch cycles in experiment using clean speech signals. In experiments with noisy speech signals, an accuracy rate of 92% and above was observed for an SNR range of 30db to 5db. In an experiment evaluating the robustness of the algorithm vis-a-vis errors in the pitch track using clean studio quality signals, an accuracy rate of 95% was obtained for an error range of -10% to +60% in pitch. The algorithm generated = 1% extra markers (false positives) for clean studio quality (pitch track error range of -10% to +60%) and noisy speech signals (SNR range of 30db to 5db). The use of the pitch track generated by the ODU pitch tracker (YAPT) for identifying pitch markers gave an accuracy rate of 95% as compared to 93% obtained using the reference pitch track supplied with the Keele database. A preliminary test on telephone quality signals gave an accuracy rate of 63%.
[1]
A.P. Benguerel,et al.
Speech analysis
,
1981,
Proceedings of the IEEE.
[2]
Raymond N. J. Veldhuis.
Consistent pitch marking
,
2000,
INTERSPEECH.
[3]
Lawrence J. Raphael,et al.
Speech Science Primer: Physiology, Acoustics, and Perception of Speech
,
1980
.
[4]
David Talkin,et al.
A Robust Algorithm for Pitch Tracking ( RAPT )
,
2005
.
[5]
Yves Laprie,et al.
Automatic pitch marking for speech transformations via TD-PSOLA
,
1998,
9th European Signal Processing Conference (EUSIPCO 1998).
[6]
David A. Jackson,et al.
Dynamic programming : a practical introduction
,
1992
.
[7]
Wolfgang Hess,et al.
Pitch Determination of Speech Signals
,
1983
.
[8]
Yves Laprie,et al.
Higher precision pitch marking for TD-PSOLA
,
2002,
2002 11th European Signal Processing Conference.
[9]
Stephen A. Dyer,et al.
Digital signal processing
,
2018,
8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..
[10]
Ingo R. Titze,et al.
Principles of voice production
,
1994
.
[11]
Stephen A. Zahorian,et al.
Yet Another Algorithm for Pitch Tracking
,
2002,
2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[12]
Eric Moulines,et al.
Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
,
1989,
Speech Commun..
[13]
Elmar Nöth,et al.
Robust pitch period detection using dynamic programming with an ANN cost function
,
1995,
EUROSPEECH.
[14]
Carl-Gustaf Söderberg,et al.
The physics of speech
,
1980
.