论文信息 - Applying vision to intelligent human-computer interaction

Applying vision to intelligent human-computer interaction

As powerful and affordable computers and sensors become virtually omnipresent, constructing highly intelligent and convenient computation systems has never been so promising. Vision holds great promise in building advanced human computer interaction (HCI) systems. We investigate different techniques to integrate passive vision into various interaction environments. First, we propose a novel approach to integrate visual tracking into a haptics systems. Traditional haptic environments require that the user must be attached to the haptic device at all times, even though force feedback is not always being rendered. We design and implement an augmented reality system called VisHap that uses visual tracking to seamlessly integrate force feedback with tactile feedback to generate a "complete" haptic experience. The VisHap framework allows the user to interact with combinations of virtual and real objects naturally, thereby combining active and passive haptics. The flexibility and extensibility of our framework is promising in that it supports many interaction modes and allows further integration with other augmented reality systems. Second, we propose a new methodology for vision-based human-computer interaction called the Visual Interaction Cues (VICs) paradigm. VICs is based on the concept of sharing perceptual space between the user and the computer. Each interaction component is represented as a localized region in the image(s). We propose to model gestures based on the streams of extracted visual cues in the local space, thus avoiding the problem of globally tracking the user(s). Efficient algorithms are proposed to capture hand shape and motion. We investigate different learning and modeling techniques including neural networks, Hidden Markov Models and Bayesian classifiers to recognize postures and dynamic gestures. Since gestures are in essence a language with individual low-level gestures analogous to a word in conventional languages, a high-level gesture language model is essential for robust and efficient recognition of continuous gesture. To that end, we have constructed a high-level language model that integrates a set of low-level gestures into a single, coherent probabilistic framework. In the language model, every low-level gesture is called a gesture word and a composite gesture is a sequence of gesture words, which are contextually and temporally constrained. We train the model via supervised and unsupervised learning techniques. A greedy inference algorithm is proposed to allow efficient online processing of continuous gestures. We have designed a large-scale gesture experiment that involves sixteen subjects and fourteen gestures. The experiment shows the robustness and efficacy of our system in modeling a relative large gesture vocabulary involving many users. Most of the users also consider our gesture system as comparable or more natural and comfortable than traditional user interfaces with a mouse.

Gregory D. Hager | Guangqi Ye | Gregory Hager | Guangqi Ye

[1] Francis K. H. Quek. Unencumbered Gestural Interaction , 1996, IEEE Multim..

[2] James L. Crowley,et al. Finger Tracking as an Input Device for Augmented Reality , 1995 .

[3] Christoph Bregler,et al. Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Greg Welch,et al. An Introduction to Kalman Filter , 1995, SIGGRAPH 2001.

[5] Nuria Oliver,et al. GWindows: robust stereo vision for gesture-based control of windows , 2003, ICMI '03.

[6] Gary Bradski,et al. Computer Vision Face Tracking For Use in a Perceptual User Interface , 1998 .

[7] J. Cassell. Computer Vision for Human–Machine Interaction: A Framework for Gesture Generation and Interpretation , 1998 .

[8] Jason J. Corso,et al. A Practical Approach for Integrating Vision-Based Methods into Interactive 2 D / 3 D Applications , 2005 .

[9] James M. Rehg,et al. Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[10] Mubarak Shah,et al. Recognizing Hand Gestures , 1994, ECCV.

[11] Alex Pentland,et al. Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12] M.A. Salada,et al. Validating a novel approach to rendering fingertip contact sensations , 2002, Proceedings 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. HAPTICS 2002.

[13] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[14] Jeff A. Johnson,et al. The Xerox Star: a retrospective , 1989, Computer.

[15] Takeo Kanade,et al. DigitEyes: Vision-Based Human Hand Tracking , 1993 .

[16] Larry S. Davis,et al. A Robust Background Subtraction and Shadow Detection , 1999 .

[17] Eric Saund,et al. Design and technology for Collaborage: collaborative collages of information on physical walls , 1999, UIST '99.

[18] David Minnen,et al. The perceptive workbench: Computer-vision-based gesture tracking, object tracking, and 3D reconstruction for augmented desks , 2003, Machine Vision and Applications.

[19] Dmitry B. Goldgof,et al. Gesture recognition using Bezier curves for visualization navigation from registered 3-D data , 2004, Pattern Recognit..

[20] Isaac Weiss,et al. Model-Based Recognition of 3D Objects from Single Images , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[21] Christopher Schmandt,et al. Spatial input/display correspondence in a stereoscopic computer graphic work station , 1983, SIGGRAPH.

[22] Jakub Segen,et al. Gesture VR: vision-based 3D hand interace for spatial interaction , 1998, MULTIMEDIA '98.

[23] Bernard F. Buxton,et al. Statistical Modeling of Colour Data , 2004, International Journal of Computer Vision.

[24] J. Edward Colgate,et al. FINGERTIP HAPTICS: A NOVEL DIRECTION IN HAPTIC DISPLAY , 2002 .

[25] Chris McDonald,et al. Red-handed: collaborative gesture interaction with a projection table , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[26] ÇavuşoğluMurat Cenk,et al. A critical study of the mechanical and electrical properties of the PHANToM haptic interface and improvements for high-performance control , 2002 .

[27] David C. Minnen,et al. Propagation networks for recognition of partially ordered sequential action , 2004, CVPR 2004.

[28] A. Pentland,et al. Computer Vision for Human–Machine Interaction: A Framework for Gesture Generation and Interpretation , 1998 .

[29] Gang Hua,et al. Tracking articulated body by dynamic Markov network , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30] Paul A. Viola,et al. Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[31] Gregory D. Hager,et al. Visual Modeling of Dynamic Gestures Using 3D Appearance and Motion Features , 2005 .

[32] Gregory D. Hager,et al. Fast and Globally Convergent Pose Estimation from Video Images , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[33] Zhengyou Zhang,et al. A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[34] Gregory D. Hager,et al. Gesture Recognition Using 3D Appearance and Motion Features , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[35] Jonathon S. Hare,et al. iGesture: A Platform for Investigating Multimodal, Multimedia Gesture-based Interactions , 2005 .

[36] Rainer Stiefelhagen,et al. Pointing gesture recognition based on 3D-tracking of face, hands and head orientation , 2003, ICMI '03.

[37] Berthold K. P. Horn,et al. Closed-form solution of absolute orientation using orthonormal matrices , 1988 .

[38] T. Gevers. Robust histogram construction from color invariants , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[39] Gregory D. Hager,et al. Augmented Reality Combining Haptics and Vision , 2003 .

[40] Refractor. Vision , 2000, The Lancet.

[41] Stephen J. McKenna,et al. A comparison of skin history and trajectory-based representation schemes for the recognition of user-specified gestures , 2004, Pattern Recognit..

[42] Heinz,et al. Adding a True 3-D Display to a Raster Graphics System , 1982, IEEE Computer Graphics and Applications.

[43] Michael Johnston,et al. Finite-state Multimodal Parsing and Understanding , 2000, COLING.

[44] Yoichi Sato,et al. Real-Time Fingertip Tracking and Gesture Recognition , 2002, IEEE Computer Graphics and Applications.

[45] Stan Sclaroff,et al. Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[46] Aaron F. Bobick,et al. A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[47] Jakub Segen,et al. Shadow gestures: 3D hand pose estimation using a single camera , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[48] Dimitris N. Metaxas,et al. ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[49] Aaron F. Bobick,et al. Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[50] Wen Gao,et al. Transition movement models for large vocabulary continuous sign language recognition , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[51] Andrew Blake,et al. Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[52] François Bérard,et al. Bare-hand human-computer interaction , 2001, PUI '01.

[53] Christopher G. Lasater,et al. Design Patterns , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[54] Rama Chellappa,et al. View invariants for human action recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[55] Jonathan Ginzburg,et al. Proceedings of COLING 2004 , 2004 .

[56] Christopher R. Wagner,et al. A tactile shape display using RC servomotors , 2002, Proceedings 10th Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems. HAPTICS 2002.

[57] Greg Welch,et al. The office of the future: a unified approach to image-based modeling and spatially immersive displays , 1998, SIGGRAPH.

[58] Jakub Segen,et al. Fast and accurate 3D gesture recognition interface , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[59] Mary C. Whitton,et al. Passive haptics significantly enhances virtual environments , 2001 .

[60] Alex Pentland,et al. Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[61] Shree K. Nayar,et al. Histogram Preserving Image Transformations , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[62] Larry S. Davis,et al. Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[63] Gregory D. Hager,et al. VisHap: augmented reality combining haptics and vision , 2003, SMC'03 Conference Proceedings. 2003 IEEE International Conference on Systems, Man and Cybernetics. Conference Theme - System Security and Assurance (Cat. No.03CH37483).

[64] Aditya Ramamoorthy,et al. Recognition of dynamic hand gestures , 2003, Pattern Recognit..

[65] James W. Davis,et al. The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[66] Ying Wu,et al. View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[67] David C. Hogg,et al. Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[68] Darius Burschka,et al. Software Systems for Vision-Based Spatial Interaction , 2002 .

[69] Vincent Hayward,et al. Tactile Display Device Using Distributed Lateral Skin Stretch , 2000, Dynamic Systems and Control: Volume 2.

[70] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[71] Claudio S. Pinhanez,et al. Dynamically reconfigurable vision-based user interfaces , 2004, Machine Vision and Applications.

[72] Yiu-ming Cheung. A competitive and cooperative learning approach to robust data clustering , 2004, Neural Networks and Computational Intelligence.

[73] Abdesselam Bouzerdoum,et al. Skin segmentation using color pixel classification: analysis and comparison , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74] Alex Pentland,et al. Motion regularization for model-based head tracking , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[75] Ying Wu,et al. Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[76] Pierre David Wellner,et al. Interacting with paper on the DigitalDesk , 1993, CACM.

[77] Michael J. Swain,et al. Color indexing , 1991, International Journal of Computer Vision.

[78] Andries van Dam,et al. Post-WIMP user interfaces , 1997, CACM.

[79] L. R. Rabiner,et al. A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.

[80] Takuya Kondo,et al. Incremental tracking of human actions from multiple views , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[81] Alex Pentland,et al. Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[82] Victor Zue,et al. Webgalaxy - integrating spoken language and hypertext navigation , 1997, EUROSPEECH.

[83] Yuntao Cui,et al. Hand sign recognition from intensity image sequences with complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[84] Matthew Turk,et al. Perceptual user interfaces , 2000 .

[85] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[86] Quentin Stafford-Fraser,et al. BrightBoard: a video-augmented environment , 1996, CHI '96.

[87] Thomas H. Massie,et al. The PHANToM Haptic Interface: A Device for Probing Virtual Objects , 1994 .

[88] Gregory D. Hager,et al. Analysis of composite gestures with a coherent probabilistic graphical model , 2005, Virtual Real..

[89] Gregory D. Hager,et al. Appearance-based Visual Interaction , 2002 .

[90] Aaron F. Bobick,et al. Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[91] Akira Utsumi,et al. Multiple-hand-gesture tracking using multiple cameras , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[92] Larry S. Davis,et al. Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[93] Ben Shneiderman,et al. Direct Manipulation: A Step Beyond Programming Languages , 1983, Computer.

[94] Srinivas Bangalore,et al. Finite-state Methods for Multimodal Parsing and Integration , 2008 .

[95] Darius Burschka,et al. VICs: A modular HCI framework using spatiotemporal dynamics , 2004, Machine Vision and Applications.

[96] Zoubin Ghahramani,et al. A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[97] Jitendra Malik,et al. Matching Shapes , 2001, ICCV.

[98] Ronald Azuma,et al. Recent Advances in Augmented Reality , 2001, IEEE Computer Graphics and Applications.

[99] Ying Wu,et al. Hand modeling, analysis and recognition , 2001, IEEE Signal Process. Mag..

[100] Alex Pentland,et al. Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[101] Gregory D. Hager,et al. X Vision: A Portable Substrate for Real-Time Vision Applications , 1998, Comput. Vis. Image Underst..

[102] Kevin Murphy,et al. A brief introduction to graphical models and bayesian networks , 1998 .

[103] Emanuele Trucco,et al. Introductory techniques for 3-D computer vision , 1998 .

[104] Junji Yamato,et al. Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[105] Ephraim P. Glinert,et al. Multimodal Integration , 1996, IEEE Multim..

[106] Dariu Gavrila,et al. The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[107] Jitendra Malik,et al. Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[108] Vladimir Pavlovic,et al. Speech/gesture interface to a visual computing environment for molecular biologists , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[109] Carlo Tomasi,et al. 3D tracking = classification + interpolation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[110] Irfan A. Essa,et al. Expectation grammars: leveraging high-level expectations for activity recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[111] Darius Burschka,et al. VICs: A Modular Vision-Based HCI Framework , 2003, ICVS.

[112] Jitendra Malik,et al. Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[113] Janne Heikkilä,et al. A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[114] S. Shankar Sastry,et al. An Invitation to 3-D Vision: From Images to Geometric Models , 2003 .

[115] Jochen Triesch,et al. Robust classification of hand postures against complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[116] Arnold W. M. Smeulders,et al. Color Based Object Recognition , 1997, ICIAP.

[117] Takeo Kanade,et al. WYSIWYF Display: A Visual/Haptic Interface to Virtual Environment , 1999, Presence.

[118] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[119] Cyril S. Ku,et al. Design Patterns , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[120] Tsuneo Yoshikawa,et al. Path planning for encountered-type haptic devices that render multiple objects in 3D space , 2001, Proceedings IEEE Virtual Reality 2001.

[121] Tsuneo Yoshikawa,et al. A Touch/Force Display System for Haptic Interface , 2001, Presence: Teleoperators & Virtual Environments.

[122] Aaron F. Bobick,et al. A State-Based Approach to the Representation and Recognition of Gesture , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[123] Michael G. Strintzis,et al. A gesture recognition system using 3D data , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[124] Takeo Kanade,et al. Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[125] Dinesh K. Pai,et al. Multisensory Interaction: Real and Virtual , 2003, ISRR.

[126] Roel Vertegaal,et al. Attentive User Interfaces , 2003 .

[127] Michel Beaudouin-Lafon,et al. Instrumental interaction: an interaction model for designing post-WIMP user interfaces , 2000, CHI.

[128] Carolina Cruz-Neira,et al. Surround-Screen Projection-Based Virtual Reality: The Design and Implementation of the CAVE , 2023 .

[129] Z. Obrenovic,et al. Modeling multimodal human-computer interaction , 2004, Computer.

[130] Gregory D. Hager,et al. Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[131] Francisco Casacuberta,et al. Probabilistic finite-state machines - part I , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[132] Ivan E. Sutherland,et al. A head-mounted three dimensional display , 1968, AFIPS Fall Joint Computing Conference.

[133] David C. Gibbon,et al. Multi-modal system for locating heads and faces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[134] James M. Rehg,et al. Asymmetrically boosted HMM for speech reading , 2004, CVPR 2004.

[135] Takeo Kanade,et al. Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[136] Ying Wu,et al. Visual panel: virtual mouse, keyboard and 3D controller with an ordinary piece of paper , 2001, PUI '01.

[137] Forsyth,et al. Computer Vision , 2007 .

[138] Oliver Bimber,et al. The Virtual Showcase , 2001, IEEE Computer Graphics and Applications.

[139] Jake K. Aggarwal,et al. Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[140] Michael J. Black,et al. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion , 1995, Proceedings of IEEE International Conference on Computer Vision.

[141] Jeffrey Mark Siskind,et al. Visual event perception , 1997 .

[142] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .

[143] Jake K. Aggarwal,et al. Human motion analysis: a review , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[144] Sethuraman Panchanathan,et al. Automated gesture segmentation from dance sequences , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[145] Frank Tendick,et al. A Critical Study of the Mechanical and Electrical Properties of the PHANToM Haptic Interface and Improvements for Highperformance Control , 2002, Presence: Teleoperators & Virtual Environments.

[146] Alex Pentland,et al. The ALIVE system: wireless, full-body interaction with autonomous agents , 1997, Multimedia Systems.

[147] Padhraic Smyth,et al. Belief networks, hidden Markov models, and Markov random fields: A unifying view , 1997, Pattern Recognit. Lett..

[148] Masamichi Shimosaka,et al. Hierarchical recognition of daily human actions based on Continuous Hidden Markov Models , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[149] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[150] Thomas S. Huang,et al. Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[151] Erkki Oja,et al. Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[152] Ronald Azuma,et al. A Survey of Augmented Reality , 1997, Presence: Teleoperators & Virtual Environments.

[153] James W. Davis,et al. The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[154] Yoichi Sato,et al. Interactive textbook and interactive Venn diagram: natural and intuitive interfaces on augmented desk system , 2000, CHI.

[155] Darius Burschka,et al. Direct plane tracking in stereo images for mobile navigation , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[156] Olivier Faugeras,et al. Three-Dimensional Computer Vision , 1993 .

[157] Vladimir Pavlovic,et al. Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[158] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.