Expectation-based selective attention

In many real-world tasks, the ability to focus attention on the relevant portions of the input is crucial for good performance. This work has shown that, for temporally coherent inputs, a computed expectation of the next time step’s inputs provides a basis upon which to focus attention. Expectations are useful in tasks which arise in visual and non-visual domains, ranging from scene analysis to anomaly detection. When temporally related inputs are available, an expectation of the next input’s contents can be computed based upon the current inputs. A saliency map, which is based upon the computed expectation and the actual inputs, indicates which inputs will be important for performing the task in the next time step. For example, in many visual object tracking problems, the relevant features are predictable, while the distractions in the scene are either unpredictable or unrelated to the task. The task-specific selective attention methods can be used to create a saliency map which accentuates only the predictable inputs that are useful in solving the task. In a second use of expectation, anomaly detection, the unexpected features are important. Here, the role of expectation is reversed; it is used to emphasize the unpredicted features. The performance of these methods is demonstrated in artificial neural network based systems on two real-world vision tasks: lane-marker tracking for autonomous vehicle control and driver monitoring, and hand tracking in cluttered scenes. For the hand-tracking task, techniques for incorporating a priori available domain knowledge are presented. These methods are also demonstrated in a nonvision based task: anomaly detection in the plasma etch step of semiconductor wafer fabrication. In addition to explicitly creating a saliency map to indicate where a network should pay attention, techniques are developed to reveal a network’s implicit saliency map. The implicit saliency map represents the portions of the input to which a network will pay attention in the absence of the explicit focusing mechanisms developed in this thesis. Methods to examine the features a network has encoded in its hidden layers are also presented. These techniques are applied to networks trained to perform face-detection in arbitrary visual scenes. The results clearly display the facial features the network determines to be the most important for face detection. These techniques address one of the largest criticisms of artificial neural networks − that it is difficult to understand what they encode.

[1]  J. Deutsch,et al.  Attention: Some theoretical considerations. , 1963 .

[2]  Josh H. McDermott,et al.  Visual Learning , 1968 .

[3]  Donald E. Broadbent,et al.  Decision and stress , 1971 .

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  D Marr,et al.  Early processing of visual information. , 1976, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[6]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[8]  J. Staddon,et al.  On sequential effects in absolute judgment experiments. , 1980 .

[9]  M. Posner,et al.  Attention and the detection of signals. , 1980, Journal of experimental psychology.

[10]  D. Broadbent Task combination and selective intake of information. , 1982, Acta psychologica.

[11]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[12]  M. Posner,et al.  Components of visual orienting , 1984 .

[13]  S. Ullman Visual routines , 1984, Cognition.

[14]  S Ullman,et al.  Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[15]  H. Pashler,et al.  Visual attention and stimulus identification. , 1985, Journal of experimental psychology. Human perception and performance.

[16]  R. Morgan Plasma Etching in Semiconductor Fabrication , 1985 .

[17]  T. Poggio,et al.  Spotlight on attention , 1985, Trends in Neurosciences.

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[19]  R. H. Pope Human Performance: What Improvement from Human Reliability Assessment , 1986 .

[20]  J. Hoffman,et al.  Spatial attention in vision , 1986, Psychological research.

[21]  G. Rizzolatti,et al.  Selective visual attention. , 1987, Neuropsychologia.

[22]  C. Eriksen,et al.  Temporal changes in the distribution of attention in the visual field in response to precues , 1987, Perception & psychophysics.

[23]  A. Treisman Features and Objects: The Fourteenth Bartlett Memorial Lecture , 1988, The Quarterly journal of experimental psychology. A, Human experimental psychology.

[24]  C. Umilta Orienting of attention. , 1988 .

[25]  Paul W. Munro,et al.  Principal Components Analysis Of Images Via Back Propagation , 1988, Other Conferences.

[26]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[27]  Dana H. Ballard,et al.  Eye Fixation And Early Vision: Kinetic Depth , 1988, [1988 Proceedings] Second International Conference on Computer Vision.

[28]  Allen Allport,et al.  Visual attention , 1989 .

[29]  Waibel A novel objective function for improved phoneme recognition using time delay neural networks , 1989 .

[30]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[31]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[32]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[33]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[34]  Charles E. Thorpe,et al.  Outdoor visual navigation for autonomous robots , 1989, Robotics Auton. Syst..

[35]  Eric Krotkov,et al.  Active perception for legged locomotion: every step is an experiment , 1990, Proceedings. 5th IEEE International Symposium on Intelligent Control 1990.

[36]  Roy A. Maxion,et al.  Toward diagnosis as an emergent behavior in a network ecosystem , 1990 .

[37]  P. S. Maybeck,et al.  The Kalman Filter: An Introduction to Concepts , 1990, Autonomous Robot Vehicles.

[38]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[39]  Ingemar J. Cox,et al.  Autonomous Robot Vehicles , 1990, Springer New York.

[40]  Christopher M. Brown,et al.  Selective Attention as Sequential Behavior: Modeling Eye Movements with an Augmented Hidden Markov Model , 1990 .

[41]  Mark H. Johnson,et al.  The perception of facial structure in infancy. , 1991 .

[42]  A. H. C. van der Heijden,et al.  Selective Attention in Vision , 1991 .

[43]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[44]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[45]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[46]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[47]  Michael I. Jordan,et al.  Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..

[48]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[49]  George Bolt,et al.  Investigating Fault Tolerance in ArtificialNeural Networks , 1991 .

[50]  Jocelyn Sietsma,et al.  Creating artificial neural networks that generalize , 1991, Neural Networks.

[51]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[52]  David Chapman,et al.  Learning to See Where and What: Training a Net to Make Saccades and Recognize Handwritten Characters , 1992, NIPS.

[53]  Christopher M. Brown,et al.  Where to Look Next Using a Bayes Net: Incorporating Geometric Relations , 1992, ECCV.

[54]  Paul W. Munro,et al.  Nets with Unreliable Hidden Nodes Learn Error-Correcting Codes , 1992, NIPS.

[55]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[56]  Alan F. Murray,et al.  Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalization and Learning Trajectory , 1992, NIPS.

[57]  Jim Austin,et al.  Fault Tolerant Multi-Layer Perceptron Networks , 1992 .

[58]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[59]  R.D. Clay,et al.  Fault tolerance training improves generalization and robustness , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[60]  Sebastian Thrun,et al.  Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.

[61]  Nigel Goddard,et al.  The Perception of Articulated Motion: Recognizing Moving Light Displays , 1992 .

[62]  J. Stroop Studies of interference in serial verbal reactions. , 1992 .

[63]  Hilary Buxton,et al.  Selective Attention in Dynamic Vision , 1993, IJCAI.

[64]  Christopher M. Brown,et al.  Task-oriented vision with multiple Bayes nets , 1993 .

[65]  Roy A. Maxion,et al.  Detection and discrimination of injected network faults , 1993, FTCS-23 The Twenty-Third International Symposium on Fault-Tolerant Computing.

[66]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[67]  Gregory J. Wolff,et al.  Optimal Brain Surgeon: Extensions and performance comparisons , 1993, NIPS 1993.

[68]  Ernst D. Dickmanns,et al.  Expectation-based dynamic scene understanding , 1993 .

[69]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[70]  John C. Platt,et al.  A Convolutional Neural Network Hand Tracker , 1994, NIPS.

[71]  Thomas S. Huang,et al.  Human face detection in a complex background , 1994, Pattern Recognit..

[72]  Sebastian Thrun,et al.  Extracting Rules from Artifical Neural Networks with Distributed Representations , 1994, NIPS.

[73]  Dean Pomerleau,et al.  Reliability estimation for neural network based autonomous driving , 1994, Robotics Auton. Syst..

[74]  David B. Fogel,et al.  An introduction to simulated evolutionary optimization , 1994, IEEE Trans. Neural Networks.

[75]  Shumeet Baluja,et al.  A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning , 1994 .

[76]  Pat Langley,et al.  Oblivious Decision Trees and Abstract Cases , 1994 .

[77]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[78]  Rich Caruana,et al.  Greedy Attribute Selection , 1994, ICML.

[79]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[80]  Ron Kohavi,et al.  Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[81]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[82]  Tom M. Mitchell,et al.  Using the Future to Sort Out the Present: Rankprop and Multitask Learning for Medical Risk Evaluation , 1995, NIPS.

[83]  Dean A. Pomerleau,et al.  RALPH: rapidly adapting lateral position handler , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[84]  Aaron F. Bobick,et al.  A state-based technique for the summarization and recognition of gesture , 1995, Proceedings of IEEE International Conference on Computer Vision.

[85]  Tom Heskes,et al.  A Neural Model of Visual Attention , 1995, SNN Symposium on Neural Networks.

[86]  Shumeet Baluja,et al.  Using the Representation in a Neural Network's Hidden Layer for Task-Specific Focus of Attention , 1995, IJCAI.

[87]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[88]  Rajesh P. N. Rao,et al.  Modeling Saccadic Targeting in Visual Search , 1995, NIPS.

[89]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[90]  Stephen Jose Hanson,et al.  A Neural Network Autoassociator for Induction Motor Failure Prediction , 1995, NIPS.

[91]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[92]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[93]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[94]  Tom M. Mitchell,et al.  Explanation-based learning for mobile-robot perception , 1997 .

[95]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[96]  Robert J. Marks,et al.  Neurosmithing: improving neural network learning , 1998 .

[97]  Françoise Fogelman-Soulié,et al.  Applications of neural networks , 1998 .

[98]  C. Mozer A connectionist m o d e l of selective attention in visual perception , 2020 .