A transformational approach to visual perception is presented, in which image structure is encoded in the parameters of those transformations that produce an output maximally symmetric with the current input. Results are presented for a computer program, dubbed SMART (Symmetry Maximizing Array using Random Transformations). SMART consists of a parallel array of independent symmetry detectors. Each detector attempts to find a transformation that maximizes the symmetry between the original and the transformed configuration. The weighted output of the detectors is collated in a connection matrix, which summarizes the image structure and provides a continuously varying measure of relative symmetry. The program is applied to constrained and random arrays, Glass figures, and the detection of hidden symmetric targets. More general implications are briefly discussed. Most recent attempts to develop a general explanatory framework for phenomena belonging to the intermediate levels of visual perception can be classified into one of two broad categories. Likelihood approaches argue that image elements are organized by unconscious inference processes into the most likely hypothesis concerning their source (e.g., Albert & Hoffman, 1995; Knill & Richards, 1996). Simplicity approaches argue that perception organizes image elements so as to provide the simplest description or encoding of the source of that stimulation (e.g., Hatfield & Epstein, 1985; Van der Helm & Leeuwenberg, 1996). Although both approaches provide plausible accounts of many phenomena, each is vulnerable to criticism. A problem for the likelihood approach is that assumptions required to assign likelihoods to hypothetical situations giving rise to an image tend to be elaborate and tenuous. On the other hand, the simplicity approach requires debatable assumptions concerning the basic elements of a description and the process by which descriptions are constructed. More importantly, neither approach suggests a specific mechanism by which their optimizing principles might be implemented. SMART (Symmetry Maximizing Array using Random Transformations) We propose an alternative, transformational approach, motivated by the obvious importance of symmetries and transformations in the recovery of object information in visual images (Barnsley, 1993; Mundy & Zisserman, 1992; Tyler, 1996). This approach follows previous theorizing (Palmer, 1999), but differs in that it attempts to explore specific mechanisms for maximizing symmetry. As a first step, we have developed a program in Matlab, dubbed SMART (Symmetry Maximizing Array using Random Transformations). Like someone trying to solve an anagram, the program begins by subjecting image elements to multiple random transformations. Symmetries produced by this process then select those transformations that best capture any structure in the image. Figure 1 illustrates the structure of the program. This consists of S symmetry detectors, operating independently of one another. Each detector inputs the coordinates of the set of points, P, normalized to be between 0 and 1, and subjects this set to some transformation, to yield a transformed set, Pt. Allowable transformations are currently restricted to combinations of rotations and vertical and horizontal translations, randomly selected from normal distributions, centered on zero. Standard deviations, σr, σv and σh, are typically set to values of π/16, 0.05, and 0.05, respectively. For each point, i, in Pt, the program finds the point in P that is closest to it, and records the distance, di, between these points. (To prevent null transformations, the restriction is imposed that point i in P cannot be considered for point i in Pt.) The distance, D, between the two point sets is evaluated as the sum of the individual inter-point distances, D = Σdi. The program then uses a hillclimbing algorithm to find a transformation, t, and hence the point set P*, that corresponds to a local minimum of D. The coordinates of the two point sets, P* and P, are then compared. For each point i in P*, the distance di to its nearest neighbor in P is calculated. If this distance is less than a predefined tolerance, t, then a mapping {i, j} of that pair of points is recorded in a connection matrix, C. Figure 1. Information flow diagram of the processes involved in the SMART program. The matrix C starts as an NxN matrix of zeroes, where N is the number of points in P. The output of each detector then modifies C as follows. Each mapping {i, j}, made by the detector, corresponds to two entries in C (i.e., to cij and to cji). Both of these entries are increased by x, where x is the total number of mappings made by that detector. This weighting mechanism allows the best transformations to dominate C. After all detectors have modified C, the matrix is then normalized, as follows. The maximum value an entry, cij, can have occurs when all S detectors make the maximum of N mappings and the particular mapping {i, j} is made by every detector. Since this value will be equal to NxS, we normalize C by dividing each entry by NxS. The resulting normalized matrix constitutes a cumulative record of the major point symmetries discovered by the detectors. This record can be used, together with an Sx3 matrix of the associated transformations, to identify those transformations that maximize invariance in the image. The program can then be set to display only those mappings that contribute more than a certain threshold value, l, to the matrix. The SMART program also provides a measure, sr, of the relative symmetry of the array, according to the following rationale. If no detectors map any points, then we want sr to be 0. Conversely, if all S detectors map all N points, then we want sr to be 1. A detector that maps N points adds N to each cell in C that corresponds to a mapping, and hence adds a total of 2N (since each mapping corresponds to two entries). Thus, the upper bound on ΣijCij is 2NxS. After normalization, this total becomes 2NxS/(NxS), or 2N. We define sr as ΣijCij/2N. This is 0 when no mappings are made and 1 when all detectors find a perfect symmetry. Some examples of SMART analyses Despite its simplicity, SMART is effective at identifying structure in point arrays. The following examples illustrate the range of situations in which the program is successful in detecting structure and in simulating human performance. Figure 2. Illustration of how SMART detects simple translational symmetries and captures the Gestalt principle of organization by proximity as a result of the normally distributed values for initial transformations. For this analysis, the number of detectors, S = 20, the drawing threshold, l = 0.05, the mapping tolerance, t = 0.025, and the standard deviations for rotations, σr, and for vertical and horizontal translations, σv and σh, were π/16, 0.05, and 0.05, respectively. detector
[1]
C. Tyler.
Human Symmetry Perception and Its Computational Analysis
,
2002
.
[2]
Refractor.
Vision
,
2000,
The Lancet.
[3]
H. Barlow.
Vision Science: Photons to Phenomenology by Stephen E. Palmer
,
2000,
Trends in Cognitive Sciences.
[4]
D. Vickers,et al.
Transformational Analyses of Visual Perception
,
2000
.
[5]
Michael D. Lee,et al.
Towards a dynamic connectionist model of memory
,
1997,
Behavioral and Brain Sciences.
[6]
E. Leeuwenberg,et al.
Goodness of visual regularities: a nontransformational approach.
,
1996,
Psychological review.
[7]
Donald D. Hoffman,et al.
Genericity in spatial vision
,
1995
.
[8]
M. Leyton.
Symmetry, Causality, Mind
,
1999
.
[9]
Andrew Zisserman,et al.
Geometric invariance in computer vision
,
1992
.
[10]
Michael F. Barnsley,et al.
Fractals everywhere
,
1988
.
[11]
J. Freyd.
Dynamic mental representations.
,
1987,
Psychological review.
[12]
W. Epstein,et al.
The status of the minimum principle in the theoretical analysis of visual perception.
,
1985,
Psychological bulletin.
[13]
L. Glass.
Moiré Effect from Random Dots
,
1969,
Nature.