Recognition of sign language using neural networks

This thesis details the development of a computer system (labelled the SLARTI system) capable of recognising a subset of signs from Auslan (the sign language of the Australian Deaf community), based on the pattern classification paradigm of artificial neural networks. The research discussed in this work has two main streams. The first is the creation of a practical sign classification system, suitable for use within a sign language training system or other applications based on hand gestures. The second is an exploration of the suitability of neural networks for the creation of a real-time classification system with the ability to process temporal patterns. Sign languages such as Auslan are the primary form of communication between members of the Deaf community. However these languages are not widely known outside of these communities, and hence a communications barrier can exist between Deaf and hearing people. The techniques for recognising signs developed in this research allow the creation of systems which can help to eliminate this barrier, either by providing computer tools to assist in the learning of sign language, or potentially the creation of portable sign-language-to-speech translation systems. Artificial neural networks have proved to be an extremely useful approach to pattern classification tasks, but much of the research in this field has concentrated on relatively simple problems. Attempting to apply these networks to a complex real-world problem such as sign language recognition exposed a range of issues affecting this classification technique. The development of the SLARTI system inspired the creation of several new techniques related to neural networks, which have general applicability beyond this particular application. This thesis includes discussion of techniques related to issues such as input encoding, improving network generalisation, training recurrent networks and developing modular, extensible neural systems.

[1]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[2]  Eun-Jung Holden,et al.  Recognition of Sign Motion , 1994 .

[3]  Anthony J. Robinson,et al.  Static and Dynamic Error Propagation Networks with Application to Speech Coding , 1987, NIPS.

[4]  Brigitte Dorner,et al.  CHASING THE COLOUR GLOVE: VISUAL HAND TRACKING , 1994 .

[5]  Carlton James Sparrell,et al.  Coverbal iconic gesture in human-computer interaction , 1993 .

[6]  Warren Robinett,et al.  Virtual environment display system , 1987, I3D '86.

[7]  Geoffrey E. Hinton,et al.  Combining two methods of recognizing hand-printed digits , 1992 .

[8]  J. R. Quinlan DECISION TREES AS PROBABILISTIC CLASSIFIERS , 1987 .

[9]  Alan Wexelblat,et al.  A feature-based approach to continuous-gesture analysis , 1994 .

[10]  Richard A. Bolt,et al.  Multi-modal natural dialogue , 1992, CHI '92.

[11]  Michael W. McGreevy,et al.  The Presence of Field Geologists in Mars-Like Terrain , 1992, Presence: Teleoperators & Virtual Environments.

[12]  W. Press,et al.  Numerical Recipes: The Art of Scientific Computing , 1987 .

[13]  Philip D. Wasserman,et al.  Neural computing - theory and practice , 1989 .

[14]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[15]  Marija Zlata Boznar,et al.  A neural network-based method for short-term predictions of ambient SO2 concentrations in highly polluted industrial areas of complex terrain , 1993 .

[16]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[17]  C. Lewis Signal melding-the construction of training vectors for classifying data series , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[18]  Ivan E. Sutherland,et al.  A head-mounted three dimensional display , 1968, AFIPS Fall Joint Computing Conference.

[19]  Ronald J. Williams,et al.  Experimental Analysis of the Real-time Recurrent Learning Algorithm , 1989 .

[20]  Geoffrey E. Hinton,et al.  Glove-TalkII-a neural-network interface which maps gestures to parallel formant speech synthesizer controls , 1997, IEEE Trans. Neural Networks.

[21]  S. Grossberg,et al.  ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.

[22]  M. Mitchell Waldrop,et al.  Complexity : the emerging science and the edge of order and chaos , 1992 .

[23]  Geoffrey E. Hinton,et al.  Building adaptive interfaces with neural networks: The glove-talk pilot study , 1990, INTERACT.

[24]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[25]  Yasuo Morooka,et al.  Fuzzy and Neural Hybrid Expert Systems: Synergetic AI , 1995, IEEE Expert.

[26]  Tomoichi Takahashi,et al.  Hand gesture coding based on experiments using a hand gesture interface device , 1991, SGCH.

[27]  Helge Ritter,et al.  Learning to recognize 3D-Hand Postures from Perspective Pixel Images , 1992 .

[28]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[29]  Stephen Grossberg,et al.  Art 2: Self-Organization Of Stable Category Recognition Codes For Analog Input Patterns , 1988, Other Conferences.

[30]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[31]  Dean Rubine,et al.  Specifying gestures by example , 1991, SIGGRAPH.

[32]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[33]  Terrence J. Sejnowski,et al.  A Parallel Network that Learns to Play Backgammon , 1989, Artif. Intell..

[34]  Peter Vamplew,et al.  A new artificial neural network classifier , 1992 .

[35]  Shuji Hashimoto,et al.  A computer music system that follows a human conductor , 1991, Computer.

[36]  Geoffrey E. Hinton Connectionist Symbol Processing , 1991 .

[37]  Tosiyasu L. Kunii,et al.  Hand motion coding system for algorithm recognition and generation , 1992 .

[38]  W. Kadous GRASP: Recognition of Australian Sign Language Using Instrumented Gloves , 1995 .

[39]  P. Hingston A master/slave neural network architecture , 1992 .

[40]  Frederick P. Brooks,et al.  Project GROPEHaptic displays for scientific visualization , 1990, SIGGRAPH.

[41]  E. Klima The signs of language , 1979 .

[42]  A. Adams,et al.  Comparison of Inductive Learning of Classification Tasks by Neural Networks Background , 1993 .

[43]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[44]  Zvi Eisikovits,et al.  Being in Nothingness: The Adolescent Experience of Imprisonment. , 1987 .

[45]  Zoubin Ghahramani,et al.  Temporal processing with connectionist networks , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[46]  James L. McClelland,et al.  Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[47]  Thomas Jackson,et al.  Neural Computing - An Introduction , 1990 .

[48]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[49]  Claudio Moraga,et al.  The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning , 1995, IWANN.

[50]  Francis Hamit,et al.  Virtual reality and the exploration of cyberspace , 1993 .

[51]  Richard Rohwer,et al.  The "Moving Targets" Training Algorithm , 1989, NIPS.

[52]  Philip A. Collier,et al.  CHARACTERISTICS OF DATA SUITABLE FOR LEARNING WITH CONNECTIONIST AND SYMBOLIC METHODS , 1994 .

[53]  Eli Hagen A flexible American Sign Language interface to deductive databases , 1993 .

[54]  David Blatner,et al.  Silicon Mirage: The Art and Science of Virtual Reality , 1992 .

[55]  Helge Ritter,et al.  Learning 3D-shape perception with local linear maps , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[56]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[57]  A. Adams,et al.  Hubble classification of galaxies using neural networks , 1994 .

[58]  Michel Beaudouin-Lafon,et al.  Charade: remote control of objects using free-hand gestures , 1993, CACM.

[59]  Frank Biocca,et al.  A Survey of Position Trackers , 1992, Presence: Teleoperators & Virtual Environments.

[60]  Kevin J. Lang A time delay neural network architecture for speech recognition , 1989 .

[61]  Geoffrey E. Hinton,et al.  Glove-Talk: a neural network interface between a data-glove and a speech synthesizer , 1993, IEEE Trans. Neural Networks.

[62]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[63]  Tosiyasu L. Kunii,et al.  Visual translation: from native language to sign language , 1992, Proceedings IEEE Workshop on Visual Languages.

[64]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[65]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[66]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[67]  Kouichi Murakami,et al.  Gesture recognition using recurrent neural networks , 1991, CHI.

[68]  Kiyohiro Shikano,et al.  Modularity and scaling in large phonemic neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[69]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[70]  M. Mitchell Waldrop,et al.  Complexity : the emerging science and the edge of order and chaos , 1992 .

[71]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[72]  P. Z. Revesz,et al.  Matcher neural networks , 1989, International 1989 Joint Conference on Neural Networks.

[73]  Peter Z. Revesz,et al.  Functional interpretations of neocortical modules , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[74]  Eli Hagen,et al.  Towards an American Sign Language interface , 1994, Artificial Intelligence Review.

[75]  Catherine Myers Learning with Delayed Reinforcement Through Attention-Driven Buffering , 1991, Int. J. Neural Syst..

[76]  Waibel A novel objective function for improved phoneme recognition using time delay neural networks , 1989 .

[77]  C. Lee Giles,et al.  Experimental Comparison of the Effect of Order in Recurrent Neural Networks , 1993, Int. J. Pattern Recognit. Artif. Intell..

[78]  Kaisa Väänänen,et al.  Gesture Driven Interaction as a Human Factor in Virtual Environments - An Approach with Neural Networks , 1993, Virtual Reality Systems.

[79]  Michael J. Papper,et al.  Using Gestures to Control a Virtual Arm , 1993, Virtual Reality Systems.

[80]  James V. Stone,et al.  An empirical study of the time complexity of various error functions with conjugate gradient backpropagation , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[81]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[82]  Edward Joseph Herranz,et al.  Giving directions to computers via two-handed gesture, speech, and gaze , 1992 .

[83]  Mubarak Shah,et al.  Establishing motion correspondence , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[84]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[85]  Warren Robinett,et al.  Synthetic Experience:A Proposed Taxonomy , 1992, Presence: Teleoperators & Virtual Environments.

[86]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[87]  Howard Rheingold,et al.  Virtual Reality , 1991 .

[88]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[89]  Jürgen Schmidhuber,et al.  Continuous history compression , 1993 .