The Attentional Doughnut

Many think of visual attention as something like a spotlight moving across the visual scene: whatever it illuminates enters conscious awareness, and everything else is confined to preattentive limbo. This seemingly reasonable metaphor takes for granted that the areas to which we attend are roughly circular, contiguous, and, well, spotlight-light like, just like the receptive fields of neurons in visual cortex. Hence the intense surprise at the recent discovery that attention's "spotlight" shape may actually deform into a "donut" shape under the right conditions.

Muller and Hubner presented a steady stream of small flickering uppercase letters embedded in the center of a stream of large, uppercase letters, and measured an ERP component known as the steady-state visual evoked potential (SSVEP) which is known to be sensitive to changes in flicker frequency. Subjects were first told to monitor the large letters, and after a given amount of time, to monitor the smaller letters in the center of the larger ones. SSVEP amplitude is known to be larger for attended items than for unattended ones, and the waveform itself is nearly sinusoidal in response to different stimulus flicker rates.

If attention is shaped like a spotlight, viewers will actually be unwillingly attending to the smaller letters when they're supposed to be attending to the large letters; therefore, one would not expect SSVEP magnitude to change significantly when subjects attend to the small letters. However, the researchers found SSVEP amplitude increasing by almost 100% when subjects changed the location of their attention, suggesting that they were actually able to ignore the distractor letters in the middle, while attending to regions surrounding it!

Several control conditions ensure that this effect is not due simply to different responses in flicker rate (flicker rate was counterbalanced for large and small items), less attention being paid to the large items (subjects were asked to detect the target letter H in the to-be-attended stream), crosstalk between flicker frequencies (a complex demodulation process was used on the EEG waveforms, along with a low-pass filter at 2 Hz), attentional selection by spatial frequency (a global spatial frequency filter would not have differentially selected each stream, since their spatial frequencies are not mutually exclusive to one another), or gradient allocation of attentional resources within a beamlike area (there was no measurable target P300 for the ignored small letter stream). The researchers also claim that an object-based system of attention cannot account for their results because of the origin of the SSVEP signal (early visual areas) which would not be predicted if subjects were selecting on the basis of object identity, and that this pattern of results is consistent with several other imaging studies.

Other researchers, however, have criticized some of the methodology in this study. For example, Catena, Castillo, Fuentes & Milliken point out that they did not control whether subjects focused on the display or at a point before the display in space, which could have important consequences for how the image was displayed on the retina. Replications of this study seem to suggest that people are indeed intentionally blurring the image by fixating at a different place in depth, which means that the simplest explanation for the data is that attention is not shaped like a doughnut, but rather that subjects are using flexible strategies which make it appear so. Still, other studies have reported similar findings in cases where intentionally blurring one's vision would not seem to help.

What do these and similar findings mean for models of attention? The results would seem to suggest that attention does not necessarily occupy contiguous regions; but there is no definitive answer to whether the early visual effects (such as SSVEP) are actually caused by annular or ringlike attention, or are merely the result of activation by top-down object identity/feature processing.


Anonymous Anonymous said...

It can also be split, which is easy to tell phenomenologically when doing something like an MOT experiment.

The spotlight idea is not meant as a model; simply a metaphor that used to be more useful than it is today.

2/09/2006 09:10:00 AM  
Blogger Chris Chatham said...

I'm not familiar with MOT experiments; unpack the acronym and maybe I'll do another post on it!

Thanks as usual for your helpful comments.

(haha, I think I've discovered why there's so much crossover between physics and psychology: look up "MOT" experiment and you get stuff about laser-cooled magneto-optical traps; look up geons and you get stuff about black holes and quantum foam. hahah )

2/09/2006 12:29:00 PM  
Blogger Bob Mottram said...

Perhaps this could have some connection with the shape of receptive fields in the initial stages of vision. The donut effect might just be top down attention preferentially selecting the input from some larger centre/surround fields.

Apparent splits in the spotlight could also be explained in a similar way, as the selection of a group of mutually interfering fields.

2/09/2006 02:29:00 PM  
Anonymous Anonymous said...

Oh, I'm sorry. I've spent too much time doing vision science to realize what is common and uncommon as far as acronyms.

MOT is multiple object tracking; Brian Scholl has some demos up here. It is clear, not just phenomenologically, but also using cues and testing reaction times, that what happens in an MOT experiment is that you split your attention up and track each object, not that you spread it out thinly over the entire display.

There is an interesting review of how the attention as a spotlight "model" of visual attention fails, written by Marvin Chun & Jeremy Wolfe, available online here. I think thats actually the second time I've mentioned that chapter on your blog, oddly enough, even though it isn't exceptional for any reason.

Oh well.

Your blog is, as always, interesting. Sorry I don't often make more insightful responses and I'm always just pointing out silly addendums!

2/09/2006 02:33:00 PM  
Blogger Chris Chatham said...

Thanks both of you for the comments... Bob, I'll have to think some more about the receptive field idea, but you could certainly be right. And in the mean time I'll check out some of this MOT stuff...

2/10/2006 11:53:00 AM  
Blogger Bob Mottram said...

It just so happens that I've been doing a lot of stuff with centre/surround fields in relation to stereoscopic vision. This arrangement of receptive fields isn't only useful for filtering out noise, but also contains very useful information particularly when multiple field sizes are combined. If the selection of neurons with different fields were under attentional control potentially all kinds of shapes for visual attention, including but not limited to a spotlight, could be produced.

2/11/2006 03:12:00 AM  
Blogger Chris Chatham said...

Very interesting - I can see how a circular arrangement of on-center/off-surround receptive fields could give rise to a doughnut shape of attentional salience. If you want to localize attention all the way back to that low level though, it may be more productive to try to make center/surround fields account for the MOT experiments Tim talks about, since the "doughnut" evidence is actually a subset. Still, I think your idea is reasonable; it just has architectural implications for how flexible this top-down "attentional" control really has to be. I see there are multiple ANN models of MOT; it would be fun to do a review of one of them, if either of you have a favorite among them.

As for the noise reduction idea, I wonder why we don't see more neural net/ICA implementations of noise reduction in visual applications (digital cameras, photoshop, etc). Or even in audio production; god knows there's lots of wealthy studios ready to spend money on fancy noise-reduction audio plugins.

Nice summer project, maybe :)

2/11/2006 08:23:00 AM  
Anonymous Anonymous said...

There are other factors contributing to the shape of attention as well, and even talking about its shape may be slightly misleading. For instance, Eriksen & Yeh (1985) and several other papers show that the strength of attention actually decreases the further you get from its focus -- so if put my attention on an area, you get something akin to an exponential fall off from that area, rather than a shape per say.

There is also the problem of attentional selection rarely being based on spatial location at all -- For instance, all the object-based attention studies which suggest we attend to objects, not locations. This type of attention, which superficially at least seems very similar to the attention we think of when we think of attention as a "spotlight" or "donut", is clearly not based on low-level receptive fields, since it must be operating at a level in the visual system where "object" is a well-defined term.

2/11/2006 11:44:00 PM  

Post a Comment

<< Home