Motion-capture-based hand gesture recognition for computing and control

This dissertation focuses on the study and development of algorithms that enable the analysis and recognition of hand gestures in a motion capture environment. Central to this work is the study of unlabeled point sets in a more abstract sense. Evaluations of proposed methods focus on examining their generalization to users not encountered during system training. In an initial exploratory study, we compare various classification algorithms based upon multiple interpretations and feature transformations of point sets, including those based upon aggregate features (e.g. mean) and a pseudo-rasterization of the capture space. We find aggregate feature classifiers to be balanced across multiple users but relatively limited in maximum achievable accuracy. Certain classifiers based upon the pseudo-rasterization performed best among tested classification algorithms. We follow this study with targeted examinations of certain subproblems. For the first subproblem, we introduce the a fortiori expectation-maximization (AFEM) algorithm for computing the parameters of a distribution from which un­ labeled, correlated point sets are presumed to be generated. Each unlabeled point is assumed to correspond to a target with independent probability of appearance but correlated positions. We propose replacing the expectation phase of the algo­ rithm with a Kalman filter modified within a Bayesian framework to account for the unknown point labels which manifest as uncertain measurement matrices. We also propose a mechanism to reorder the measurements in order to improve parameter estimates. In addition, we use a state-of-the-art Markov chain Monte Carlo sampler to efficiently sample measurement matrices. In the process, we indirectly propose a constrained /c-means clustering algorithm. Simulations verify the utility of AFEM against a traditional expectation-maximization algorithm in a variety of scenarios. In the second subproblem, we consider the application of positive definite kernels and the earth mover’s distance (EMD) to our work. Positive definite kernels are an important tool in machine learning that enable efficient solutions to otherwise difficult or intractable problems by implicitly linearizing the problem geometry. We develop a set-theoretic interpretation of EMD and propose earth mover’s intersection (EMI), a positive definite analog to EMD. We offer proof of EMD’s negative definiteness and provide necessary and sufficient conditions for EMD to be conditionally negative definite, including approximations that guarantee negative definiteness. In particular, we show that EMD is related to various min-like kernels. We also present a positive definite preserving transformation that can be applied to any kernel and can be used to derive positive definite EMD-based kernels, and we show that the Jaccard index is simply the result of this transformation applied to set intersection. Finally, we evaluate kernels based on EMI and the proposed transformation versus EMD in various computer vision tasks and show that EMD is generally inferior even with indefinite kernel techniques. Finally, we apply deep learning to our problem. We propose neural network architectures for hand posture and gesture recognition from unlabeled marker sets in a coordinate system local to the hand. As a means of ensuring data integrity, we also

[1]  Fred J. Hickernell,et al.  Reproducing Kernel Banach Spaces with the l1 Norm , 2011, ArXiv.

[2]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[3]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[4]  Joydeep Ghosh,et al.  A study of K-Means-based algorithms for constrained clustering , 2013, Intell. Data Anal..

[5]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[6]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  John W. Fisher,et al.  Efficient Sampling from Combinatorial Space via Bridging , 2012, AISTATS.

[8]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[9]  L. Kantorovich On the Translocation of Masses , 2006 .

[10]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[11]  G. Seber Multivariate observations / G.A.F. Seber , 1983 .

[12]  Nozha Boujemaa,et al.  Generalized histogram intersection kernel for image recognition , 2005, IEEE International Conference on Image Processing 2005.

[13]  Alex Graves,et al.  Connectionist Temporal Classification , 2012 .

[14]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[15]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  John R. Gilbert,et al.  Sparse Matrices in MATLAB: Design and Implementation , 1992, SIAM J. Matrix Anal. Appl..

[17]  Alexander J. Smola,et al.  Introduction to Machine Learning , 2020, Machine Learning Refined.

[18]  S. Varadhan On the behavior of the fundamental solution of the heat equation with variable coefficients , 2010 .

[19]  Oliver Kramer,et al.  Recognition of Manual Actions Using Vector Quantization and Dynamic Time Warping , 2010, HAIS.

[20]  Yale Song,et al.  Continuous body and hand gesture recognition for natural human-computer interaction , 2012, TIIS.

[21]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[22]  Julien Rabin,et al.  Transportation Distances on the Circle , 2009, Journal of Mathematical Imaging and Vision.

[23]  Mario Fritz,et al.  On the Significance of Real-World Conditions for Material Classification , 2004, ECCV.

[24]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[25]  James B. Orlin A Faster Strongly Polynomial Minimum Cost Flow Algorithm , 1993, Oper. Res..

[26]  P. K. Bora,et al.  Hand motion tracking and trajectory matching for dynamic hand gesture recognition , 2006, J. Exp. Theor. Artif. Intell..

[27]  Julie Delon,et al.  Fast Transport Optimization for Monge Costs on the Circle , 2009, SIAM J. Appl. Math..

[28]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  J. Alison Noble,et al.  Finding Corners , 1988, Alvey Vision Conference.

[30]  Zheng Zhang,et al.  An Analysis of Transformation on Non - Positive Semidefinite Similarity Matrix for Kernel Machines , 2005, ICML 2005.

[31]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[32]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[33]  Padraig Cunningham,et al.  An Assessment of Alternative Strategies for Constructing EMD-Based Kernel Functions for Use in an SVM for Image Classification , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[34]  Ralf Salomon,et al.  Gesture recognition for virtual reality applications using data gloves and neural networks , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[35]  C. Loan The ubiquitous Kronecker product , 2000 .

[36]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[37]  Yael Edan,et al.  Vision-based hand-gesture applications , 2011, Commun. ACM.

[38]  Arthur Cayley,et al.  The Collected Mathematical Papers: On Monge's “Mémoire sur la théorie des déblais et des remblais” , 2009 .

[39]  Sylvain Paris,et al.  6D hands: markerless hand-tracking for computer aided design , 2011, UIST.

[40]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[41]  Baba C. Vemuri,et al.  Robust Point Set Registration Using Gaussian Mixture Models , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  L. Kantorovich On a Problem of Monge , 2006 .

[43]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[44]  Branko Ristic,et al.  A Metric for Performance Evaluation of Multi-Target Tracking Algorithms , 2011, IEEE Transactions on Signal Processing.

[45]  Yang Zou,et al.  Sliced Wasserstein Kernels for Probability Distributions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Guodong Liu,et al.  Estimation of missing markers in human motion capture , 2006, The Visual Computer.

[48]  Helge J. Ritter,et al.  Robust tracking of human hand postures for robot teaching , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Peter J. Bickel,et al.  The Earth Mover's distance is the Mallows distance: some insights from statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[50]  Yaakov Bar-Shalom,et al.  Sonar tracking of multiple targets using joint probabilistic data association , 1983 .

[51]  Michael Werman,et al.  Fast and robust Earth Mover's Distances , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[52]  Wolfram Burgard,et al.  Automatic initialization for skeleton tracking in optical motion capture , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[53]  Wolfram Burgard,et al.  Online marker labeling for fully automatic skeleton tracking in optical motion capture , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[55]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[56]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[57]  Anne-Laure Jousselme,et al.  A proof for the positive definiteness of the Jaccard index matrix , 2013, Int. J. Approx. Reason..

[58]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[59]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[60]  Paolo Dario,et al.  A Survey of Glove-Based Systems and Their Applications , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[61]  De Barra Introduction to Measure Theory , 1974 .

[62]  Mohammad Reza Daliri Kernel Earth Mover's Distance for EEG Classification , 2013, Clinical EEG and neuroscience.

[63]  Daniel Thalmann,et al.  Robust 3D Hand Pose Estimation in Single Depth Images: From Single-View CNN to Multi-View CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Jung-Hua Chou,et al.  A Robust and Friendly Human-Robot Interface System Based on Natural Human Gestures , 2010, Int. J. Pattern Recognit. Artif. Intell..

[65]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[66]  Gideon Schechtman,et al.  Planar Earthmover is not in L_1 , 2005, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[67]  Jake Araullo,et al.  The Leap Motion controller: a view on sign language , 2013, OZCHI.

[68]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[69]  Michael Neff,et al.  State of the Art in Hand and Finger Modeling and Animation , 2015, Comput. Graph. Forum.

[70]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .

[71]  Yung-Hui Lee,et al.  Taiwan sign language (TSL) recognition based on 3D data and neural networks , 2009, Expert Syst. Appl..

[72]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[73]  Tom M. Mitchell,et al.  Feature selection for grasp recognition from optical markers , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[74]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[75]  Michael Werman,et al.  A Linear Time Histogram Metric for Improved SIFT Matching , 2008, ECCV.

[76]  Katta G. Murty,et al.  Letter to the Editor - An Algorithm for Ranking all the Assignments in Order of Increasing Cost , 1968, Oper. Res..

[77]  Frank L. Lewis,et al.  Neural Network Control Of Robot Manipulators And Non-Linear Systems , 1998 .

[78]  Hugh F. Durrant-Whyte,et al.  On entropy approximation for Gaussian mixture random vectors , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[79]  C. Berg,et al.  Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions , 1984 .

[80]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[81]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[82]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[83]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[84]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[85]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[86]  Arnaud Doucet,et al.  Fast Computation of Wasserstein Barycenters , 2013, ICML.

[87]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[88]  Nicu Sebe,et al.  We are not All Equal: Personalizing Models for Facial Expression Analysis with Transductive Parameter Transfer , 2014, ACM Multimedia.

[89]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[90]  Benjamin Graham,et al.  Spatially-sparse convolutional neural networks , 2014, ArXiv.

[91]  Haibin Ling,et al.  An Efficient Earth Mover's Distance Algorithm for Robust Histogram Comparison , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[92]  Paul Honeine,et al.  The angular kernel in machine learning for hyperspectral data classification , 2010, 2010 2nd Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing.

[93]  Richard M. Karp,et al.  Theoretical Improvements in Algorithmic Efficiency for Network Flow Problems , 1972, Combinatorial Optimization.

[94]  Cheng Soon Ong,et al.  Learning SVM in Kreĭn Spaces , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95]  Steven W. Nydick,et al.  The Wishart and Inverse Wishart Distributions , 2012 .

[96]  Marco Cuturi,et al.  Permanents, Transport Polytopes and Positive Definite Kernels on Histograms , 2007, IJCAI.

[97]  Ognjan Luzanin,et al.  Hand gesture recognition using low-budget data glove and cluster-trained probabilistic neural network , 2014 .

[98]  Leonidas J. Guibas,et al.  The Earth Mover's Distance under transformation sets , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[99]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[100]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[101]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[102]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[103]  Trevor Darrell,et al.  The Pyramid Match Kernel: Efficient Learning with Sets of Features , 2007, J. Mach. Learn. Res..

[104]  Hans-Peter Seidel,et al.  Automatic Learning of Articulated Skeletons from 3D Marker Trajectories , 2006, ISVC.

[105]  J. Schur Bemerkungen zur Theorie der beschränkten Bilinearformen mit unendlich vielen Veränderlichen. , 1911 .

[106]  Ronald P. S. Mahler,et al.  Multitarget miss distance via optimal assignment , 2004, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[107]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[108]  J. M. Mazón,et al.  A Monge–Kantorovich mass transport problem for a discrete distance , 2011 .

[109]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[110]  Wei Wang,et al.  Human motion estimation from a reduced marker set , 2006, I3D '06.

[111]  Eric Vigoda,et al.  A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries , 2004, JACM.

[112]  HEAT KERNELS MEASURES AND INFINITE DIMENSIONAL ANALYSIS , 2003 .

[113]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[114]  Keenan Crane,et al.  Geodesics in heat: A new approach to computing distance based on heat flow , 2012, TOGS.

[115]  S. Nash,et al.  Linear and Nonlinear Optimization , 2008 .

[116]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[117]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[118]  Ba-Ngu Vo,et al.  The Gaussian Mixture Probability Hypothesis Density Filter , 2006, IEEE Transactions on Signal Processing.

[119]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[120]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[121]  B. Prabhakaran,et al.  Point-Based Manifold Harmonics , 2012, IEEE Transactions on Visualization and Computer Graphics.

[122]  C. Villani Topics in Optimal Transportation , 2003 .

[123]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[124]  A. Vershik Long History of the Monge-Kantorovich Transportation Problem , 2013 .

[125]  C. Villani Optimal Transport: Old and New , 2008 .

[126]  Greg Welch,et al.  Welch & Bishop , An Introduction to the Kalman Filter 2 1 The Discrete Kalman Filter In 1960 , 1994 .

[127]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[128]  Maksims Volkovs,et al.  Efficient Sampling for Bipartite Matching Problems , 2012, NIPS.

[129]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[130]  Raman K. Mehra,et al.  Approaches to adaptive filtering , 1970 .

[131]  W. Gangbo An Introduction to the Mass Transportation Theory and its Applications , 1970 .

[132]  N. Trawny,et al.  Indirect Kalman Filter for 3 D Attitude Estimation , 2005 .

[133]  Ba-Ngu Vo,et al.  A Consistent Metric for Performance Evaluation of Multi-Object Filters , 2008, IEEE Transactions on Signal Processing.

[134]  Antonio Irpino,et al.  Dimension Reduction Techniques for Distributional Symbolic Data , 2013, IEEE Transactions on Cybernetics.

[135]  Garry A. Einicke,et al.  Riccati Equation and EM Algorithm Convergence for Inertial Navigation Alignment , 2009, IEEE Transactions on Signal Processing.

[136]  Junsong Yuan,et al.  Barehanded music: real-time hand interaction for virtual piano , 2016, I3D.

[137]  Andreas Aristidou,et al.  Motion capture with constrained inverse kinematics for real-time hand tracking , 2010, 2010 4th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[138]  Ba-Ngu Vo,et al.  Labeled Random Finite Sets and Multi-Object Conjugate Priors , 2013, IEEE Transactions on Signal Processing.

[139]  Ben J. A. Kröse,et al.  Efficient Greedy Learning of Gaussian Mixture Models , 2003, Neural Computation.

[140]  Jonas Beskow,et al.  Robust online motion capture labeling of finger markers , 2016, MIG.

[141]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[142]  Marion G. Ceruti,et al.  Wireless communication glove apparatus for motion tracking, gesture recognition, data transmission, and reception in extreme environments , 2009, SAC '09.

[143]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.