Skip Navigation

This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Wuerger, S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Wuerger, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Brain, Vol. 123, No. 1, 185-187, January 2000
© 2000 Oxford University Press


Book reviews

VISION AND ACTION.

.

Sophie Wuerger

MacKay Institute of Communication and Neuroscience, Keele University, Keele, Staffordshire ST5 5BG, UK

This book is derived from an international conference on vision and action, held at York University, Canada. The contributors are from three disciplines: psychology, computer science and neuroscience. They share the belief that perceptual senses do not function in isolation: the visual and motor processes involved in driving a car, catching a ball or playing table tennis interact. For instance, moving the head provides useful cues for interpreting the visual scene, due to motion parallax; conversely, the changing visual input can provide important information about the motion of the observer.

The editors, Laurence R. Harris and Michael Jenkin, attempt to take vision research back into the real world. They argue that many laws of psychophysics have been established in the laboratory under simplified conditions and these laws do not transfer to vision in a complex, natural environment. The reason for the failure of generality is the interaction between parameters; reduced laboratory conditions are designed to ignore these interactions. The editors argue that to truly understand vision one needs to take into account that the same visual input is used for different tasks; hence different aspects of the visual world are being extracted under different circumstances.

Chapters 2, 8, 9 and 10 deal with the computation of optical flow fields for motor actions, such as image stabilization, driving, catching a ball, collision avoidance and navigating. They are essential reading for any student or researcher interested in the use of optical flow.

In Chapter 2, F. A. Miles provides an example of the cross-talk between self-motion, depth cues and eye movements. Miles provides compelling evidence that, depending on the optical flow pattern on the retina, humans use three different ultra-fast visual tracking mechanisms with a latency of less than 85 ms. The first one, ocular following, helps to stabilize the retinal images of objects in the plane of fixation. When the observer is heading towards the object of interest, a radial optic flow pattern is produced on the retina, and the corresponding eye movements (radial flow vergence) help to stabilize the gaze against motion in depth. The third tracking mechanism, disparity vergence, eliminates residual vergence errors and is driven only by binocularly processed visual signals. Miles' suggestion that MST neurones in the monkeys' cortex act like templates tuned to specific optical flow patterns is corroborated by recent brain-imaging work on humans by Morrone, Burr and their co-workers.

In Chapter 8, M. F. Land investigates the visual mechanisms involved in driving a car. He identifies two main components: (i) an anticipatory feed-forward signal, derived from the discrepancy between the direction of motion and the edges of the road far ahead (10–20 m); and (ii) a feed-back signal from the lane edge much closer to the vehicle (<10 m). Land argues that the locations of the edges of the road in the field of view are necessary and sufficient cues for steering. This is interesting for two reasons: first, it has obvious practical implications. Secondly, if this claim is true, then a driver does not need to know the vehicle's heading in absolute terms; only the difference signal is required.

In Chapter 9, D. Regan and his numerous colleagues ask what kind of visual information is exploited to control motor actions, such as catching, hitting and collision avoidance. They report a comprehensive series of experiments on the relative input of monocular and binocular information. All of their experiments are guided by the principle of simulating real-world stimuli (and maybe by their love of cricket). For instance, to avoid collision with an object, the rate of expansion on the retina seems to be an obvious cue. However, Regan and Vincent show that observers ignore the rate of expansion per se and base their judgement of the proximity of collision on the ratio between object size and expansion rate. This is just one example taken from a series of interesting and well-designed experiments.

Grigo and Lappe (Chapter 10) analyse the visual information employed in navigating. They examine the retinal flow pattern that arises when an observer is heading towards a wall and measure the accuracy of heading judgements, when the direction of gaze is different from the heading direction. One of the most surprising results is the dependence of heading accuracy on the duration of the stimulus. For a large field of view and brief presentations (<1 s) the direction of gaze has no influence on the heading judgement and the observer judges the heading direction veridical. For presentations longer than 1 s the direction of gaze biases the heading judgement towards the direction of gaze, hence reducing its accuracy. The authors suggest that the time dependence of the heading judgement may reflect the dynamical use of the extra-retinal eye movement signals.

Chapters 6, 7, 15 and 16 concern the kinematics of movement systems. It is interesting to compare the solutions found in different systems, such as the control of eye movements and limbs. The basic question is: why do movements normally follow one particular trajectory and not any of the others that are possible?

Chapters 6 and 7 concern the 3D kinematics of the eye and an ongoing controversy about the oculomotor mechanisms. The controversy is based on the following problem: when a rigid body, such as the eyeball, is rotated around arbitrary axes in three dimensions, then the final position of the eyeball depends on the order in which the rotations around the various axes are carried out. The puzzle is how the brain controls eye movements given this complication. Optican and Quaia (Chapter 6) approach this problem with a theoretical analysis of the 3D kinematics of the eyeball. In order to assess whether the non-commutativity of the eyeball rotations introduces a significant error in the neural control of eye movements, they quantitatively assess the degree of non-commutativity in a realistic model with limited muscular slip. They conclude that the brain is able to control ocular rotations without having a precise knowledge of the non-commutative mechanics of rotations. The reason for this simplification is the location of the muscle pulleys attached to the orbit. As far as the neural control of eye movements is concerned, it seems that mother nature has solved the problem rather elegantly by putting all the burden on the peripheral motor control and not on the central nervous system. In Chapter 7, J. D. Crawford discusses, among other issues, the problem of non-commutativity of ocular rotations within the framework of Listing's Law. In 1848, Donders observed that, when the head is upright and stable, then only one unique 3D eye orientation is employed for each direction of visual gaze. This is interesting since any one gaze direction could be obtained using a variety of 3D orientations. Listing's Law states that, if each eye position is characterized by an axis of rotation, then these axes will form a plane. Crawford then discusses several controversies associated with Listing's Law, e.g. whether its origin is neural or mechanical. He finally notes that the solution to the `degrees of freedom problem' in the oculomotor system is very similar to the solution generated in the other parts of the motor system.

Mitra, Riley, Schmidt and Turvey (Chapter 15) introduce the reader to synergies in movement control and how they are modulated by visual input. Similar to the rotations of the eyeball, the movement system of the entire body has to solve the `degrees of freedom problem'. The problem is how a group of relatively independent components (limbs) can be controlled by low-dimensional subsystems. Mitra et al. investigate the effect of vision on elementary inter-limb co-ordination and simple aiming tasks.

In the final chapter (Chapter 16), Soechting and Flanders ask: why do arm movements normally follow one particular trajectory and not any of the others that are possible? (In earlier chapters, we have encountered a similar problem for eye movements and the solution was Listing's Law.) The authors try to answer this question by establishing a cost function and then computing a trajectory that minimizes this cost function. They show that the final posture of the arm can be predicted by a cost function related to energy expenditure (minimum peak work) and that the hand paths during planar arm movement can be at least partly predicted by another criterion (minimum muscle force change) that is also related to energy expenditure.

Chapters 3, 4 and 11 concern the combination of visual and non-visual cues (auditory input and self-motion) for perceptual tasks.

Chapter 3 deals with the combination of visual cues and to what extent the complexity of the visual scene affects the combination rule. Banks and Backus test whether a weak fusion model of visual cues can account for slant perception. The major visual cues for slant estimation are the horizontal disparity, the vertical disparity, the eye position, and monocular perspective cues. The results suggest the determination of perceived slant can be modelled as a weighted combination of two slant estimates derived from these visual cues. These weighting factors seem to depend on the reliability of the particular estimates and hence on the complexity of the visual scene.

Harris, Zikovitz and Kopinska (Chapter 4) ask how the brain establishes a frame of reference. They discuss two examples: driving and auditory localization. Their results show a tremendous dominance of visual information over gravito-inertial forces in controlling the orientation of a driver's head. A very different problem is the frame of reference for auditory localization. When a listener is asked to identify the location of a sound source, there are several possible reference frames: the retina, the head, the body, and the external space. Their major finding—which is in conflict with previous reports—is that the frame of reference for auditory localization is neither the retina nor the head, but that the central auditory localization system uses the body as a frame of reference.

Cornilleau-Pérès, Paradis and Droulez (Chapter 11) deal with the cortical basis of 3D shape from motion. They compare fMRI studies with psychophysical investigations and PET scans. They find that, for small fields, self-motion does not enhance the performance in recovering the 3D shape of an object. This result is in agreement with the idea that the coding of 3D variables related to object shape primarily involves the ventral not the dorsal stream. They also suggest that the implication of the ventral and dorsal processing routes might be task dependent.

Chapters 5 and 12 focus on the modelling aspect of vision and action. Ballard et al. (Chapter 5) integrate information and ideas from basic image processing to learning and complex decision making. This chapter will be useful for any scientist with an interest in building a realistic process model of human behaviour. Chapter 12, by Terzepolous, will be more useful for researchers interested in computer vision and computer animation.

Ballard, Salgian, Rao and McCallum (Chapter 5) discuss the evidence for temporal hierarchies in brain computation and why temporal hierarchies are necessary to account for the complex behaviour of the brain. Ballard and his co-workers provide evidence from physiology and psychophysics that the fundamental time constant in the cortex is in the range of 50–100 ms. Their basic idea comes from classical control theory and has been formulated by other scientists such as Newell, Barlow and MacKay: the cortex encodes memories by trying to predict its input. The feed-forward connections convey the difference between the current input and its prediction. The prediction is carried by feedback signals from higher brain areas. The residuals—carried by the feed-forward connections—provide an error signal that can be used to adjust the synapses that are the anatomical basis for memories. Ballard et al. provide an example for the importance of the 80 ms time scale, namely the efficient encoding of visual scenes that drive saccadic eye movements. They point out that encoding at several spatial scales is crucial and an efficient scheme is to encode the responses to spatio-chromatic basis functions rather than low-level image features such as grey levels. Ballard et al. propose an elaborate model to predict complex human behaviour. Their major premise is that the different functions of the brain can be categorized in different temporal bands: `memories' lasting up to 80 ms; `routines' with a time scale of 300 ms; and `programs' at a still longer time scale of about 2–3 s. As an example of a complete small system, they develop a model for driving.

Terzepolous (Chapter 12) talks about vision and action in artificial animals. In his approach, artificial animals are autonomous virtual robots with active perception systems, situated in physics-based virtual worlds. As an example, Terzepolous develops a model for life-like artificial fish inhabiting a physics-based, virtual marine world. These fish incorporate a sensorimotor control loop for eye movements and body movements. The active vision system is capable of foveating, making saccades, and directing the gaze at objects of interest.

In Chapter 13, M. Goodale and A. Haffenden introduce clinical aspects of vision and action, that is, situations in which vision and action become dissociated. Clinical evidence suggests that the visual system does not construct a single representation of the world for both visual perception and visual control of action. Instead, separate neural mechanisms seem to mediate vision for perception and vision for action. These mechanisms can be differentially affected by neurological damage.

In Chapter 14, Koenderink and van Doorn question our notion of `visual space'. Koenderink and van Doorn re-visit a problem that has been popular ever since the `discovery' of non-Euclidean geometry. In the years between 1820 and 1830, Gauss, Lobachevsky and Bolyai had independently from each other arrived at the conclusion that a consistent geometry can be derived assuming that the sum of the three angles in a triangle is less than (hyperbolic) or greater than 180 degrees (elliptic). It took 10 years for this work to be published since the mathematicians suspected that the mathematical community would not be able to accept the revolutionary denial of Euclid's geometry. For centuries vision scientists (e.g. Luneburg) tried to decide whether the perceptual visual space can be modelled by a Riemannian space (Euclidean, elliptical or hyperbolic). A Riemannian structure implies that the space has a constant curvature (zero for Euclidean, negative for hyperbolic, positive for elliptic). Based on their experimental results, Koenderink and van Doorn argue that visual space is not Riemannian, hence it does not have a constant curvature. This chapter may be somewhat esoteric, but I enjoyed reading it for its historical perspective. I also think it is useful to remind vision scientists of the assumptions underlying their models.

This book contains a collection of interesting and timely research contributions that should be of interest to anyone working in the field of perception or motor control. The editors provide compelling psychophysical and theoretical evidence supporting their claim that vision cannot be studied without considering its purpose—and they have convinced at least one vision scientist.

Notes

Edited by Laurence R. Harris and Michael Jenkin. 1998. Pp. 360. Cambridge: Cambridge University Press. Price £50.00. ISBN 0-521-63162-9.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Wuerger, S.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Wuerger, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?