Neural Codes for Perception and Imagery
Mental imagery may seem like it's too abstract to test scientifically. However, this week's posts have reviewed cognitive neuroscience approaches to mental imagery, and how it may differ from perception, using information processing analyses as well as detailed case studies of neurological patients. Overall, it appears that we actually use the same neural mecahnisms for mental imagery as we do for everyday perception, even though there are some patients with selective damage to either perception or imagery (there is reason to doubt the validity of many of these cases, as discussed yesterday). Does evidence from brain imaging technology (fMRI, SPECT) support this conclusion?
Kosslyn and Thompson showed that under most conditions, one will observe neural activity in primary visual areas when subjects are asked to imagine a visual object. But according to Kosslyn & Thompson’s analysis, relatively simple, non-spatial mental imagery may actually fail to elicit PVA activity. This brings us to a representational issue: if this activity is truly absent (as opposed to immeasurably small), then what kinds of representations are activated in these cases?
Both Kosslyn and Pylyshyn (the two primary opponents in the "great imagery debate") posit that structural descriptions are the storage format of long-term memory. This means that long term memory encodes information into an amodal format, consisting of symbols and spatial relations between those symbols (e.g., above, below, to the left of). Where Kosslyn and Pylyshyn differ, in terms of this small part of the debate, is that Pylyshyn claims such amodal, symbolic codes are sufficient for all mental “imagery” tasks per se. Pylyshyn argues that many of the behavioral results that are consistent with a "pictures in the brain" explanation of mental imagery are actually "faked" by participants, because they believe that is what is expected of them (this problem is known as "demand characteristics" in the literature). Pylyshyn has shown that by altering the tasks in such a way as to remove all demand characteristics, participants show behavioral results that are consistent with his view of mental imagery, in which activation of knowledge in long term memory is sufficient for all "mental imagery" tasks.
However, one major flaw plagues Pylyshyn’s logic when it comes to the aforementioned brain imaging data: if long-term memory actually contains directly accessible, highly-detailed information about the visual world, why would subjects ever activate primary visual areas, as seen in fMRI studies of mental imagery? Such activation would be unnecessary because long-term memory would contain all the tacit knowledge necessary to “fake” the RTs.
Therefore, the most plausible explanation of PVA recruitment in highly-detailed imagery tasks is that subjects are incapable of directly interpreting the highly-detailed information residing in long-term memory, but must instead “decompress” it into a format consistent with that used in primary visual areas. (In contrast, long term memory can provide relatively simple, structural information, which suffices in some low-detail imagery tasks, without requiring the full recruitment of PVAs)
Importantly, this viewpoint is consistent with the observation that mental imagery appears to work only with the interpreted versions of visual stimuli (Chambers et al., 1985, cited by Logie, 2003). In other words, imagine that you are presented with an ambiguous line drawing (pictured at the beginning of the article), which is removed as soon as you have identified one interpretation of the drawing. If you are then asked to mentally image the drawing, you are unable to determine whether a second interpretation is possible. However, as soon as you are presented with the original drawing again, determining the second interpretation is easy. (In this case, the drawing can be either a duck or a rabbit).
To summarize, subjects are incapable of decompressing visual information from associative memory into a format that is semantically neutral; semantic interpretations are pre-associated with visual information in the representational format used by associative memory. As suggested by Kosslyn, the flow of information is reversed in imagery, relative to visual perception: long-term memories are activated, and the visual data associated with them is projected back to primary visual areas. However, this pattern of activity includes the semantic processing that was originally associated with the image. The image cannot be reinterpreted precisely because the interpretation is driving the imagery!
In conclusion, the representations used by pattern recognition and mental imagery differ in that perceptual input can be semantically reinterpreted, while mental imagery cannot; they also differ in that perceptual input requires the use of PVAs, while mental imagery only elicits PVA activity for detailed information. On the other hand, mental imagery and pattern recognition use many of the same information processing components, as illustrated by both the rarity of double dissociations and Kosslyn’s parsimonious architecture of high-level vision.
Kosslyn and Thompson showed that under most conditions, one will observe neural activity in primary visual areas when subjects are asked to imagine a visual object. But according to Kosslyn & Thompson’s analysis, relatively simple, non-spatial mental imagery may actually fail to elicit PVA activity. This brings us to a representational issue: if this activity is truly absent (as opposed to immeasurably small), then what kinds of representations are activated in these cases?
Both Kosslyn and Pylyshyn (the two primary opponents in the "great imagery debate") posit that structural descriptions are the storage format of long-term memory. This means that long term memory encodes information into an amodal format, consisting of symbols and spatial relations between those symbols (e.g., above, below, to the left of). Where Kosslyn and Pylyshyn differ, in terms of this small part of the debate, is that Pylyshyn claims such amodal, symbolic codes are sufficient for all mental “imagery” tasks per se. Pylyshyn argues that many of the behavioral results that are consistent with a "pictures in the brain" explanation of mental imagery are actually "faked" by participants, because they believe that is what is expected of them (this problem is known as "demand characteristics" in the literature). Pylyshyn has shown that by altering the tasks in such a way as to remove all demand characteristics, participants show behavioral results that are consistent with his view of mental imagery, in which activation of knowledge in long term memory is sufficient for all "mental imagery" tasks.
However, one major flaw plagues Pylyshyn’s logic when it comes to the aforementioned brain imaging data: if long-term memory actually contains directly accessible, highly-detailed information about the visual world, why would subjects ever activate primary visual areas, as seen in fMRI studies of mental imagery? Such activation would be unnecessary because long-term memory would contain all the tacit knowledge necessary to “fake” the RTs.
Therefore, the most plausible explanation of PVA recruitment in highly-detailed imagery tasks is that subjects are incapable of directly interpreting the highly-detailed information residing in long-term memory, but must instead “decompress” it into a format consistent with that used in primary visual areas. (In contrast, long term memory can provide relatively simple, structural information, which suffices in some low-detail imagery tasks, without requiring the full recruitment of PVAs)
Importantly, this viewpoint is consistent with the observation that mental imagery appears to work only with the interpreted versions of visual stimuli (Chambers et al., 1985, cited by Logie, 2003). In other words, imagine that you are presented with an ambiguous line drawing (pictured at the beginning of the article), which is removed as soon as you have identified one interpretation of the drawing. If you are then asked to mentally image the drawing, you are unable to determine whether a second interpretation is possible. However, as soon as you are presented with the original drawing again, determining the second interpretation is easy. (In this case, the drawing can be either a duck or a rabbit).
To summarize, subjects are incapable of decompressing visual information from associative memory into a format that is semantically neutral; semantic interpretations are pre-associated with visual information in the representational format used by associative memory. As suggested by Kosslyn, the flow of information is reversed in imagery, relative to visual perception: long-term memories are activated, and the visual data associated with them is projected back to primary visual areas. However, this pattern of activity includes the semantic processing that was originally associated with the image. The image cannot be reinterpreted precisely because the interpretation is driving the imagery!
In conclusion, the representations used by pattern recognition and mental imagery differ in that perceptual input can be semantically reinterpreted, while mental imagery cannot; they also differ in that perceptual input requires the use of PVAs, while mental imagery only elicits PVA activity for detailed information. On the other hand, mental imagery and pattern recognition use many of the same information processing components, as illustrated by both the rarity of double dissociations and Kosslyn’s parsimonious architecture of high-level vision.
4 Comments:
Another good post. I find Pylshyn to be a most irritating sort of gadfly; at this point I think he is arguing just to argue. He's obviously off the mark, though I don't think Kosslyn's entirely correct either.
I can't helpbut think (and this is entirely intuitive) that the LGN isn't somehow implicated in all this with the enourmous amount of efferent fibre going unaccounted for.
Either that, or precise pattern feedback from the rhinal cortex/hippocampus that activates associational networks.
Anyway, good post.
Hi Dan! Thanks for stopping in, (and leaving such encouraging comments :)
I agree with you that there's a lot more going on with the LGN than it's given credit for. LGN, rhinal cortex, hippocampus, and lots of subcortical structures interact in ways we barely understand. I'd love to do some posts on that stuff but I don't frequently see new research on it...
I don't either, which makes me think 1 of 2 things.
1. I am a genius and I need to pursue this, become famous, then schizophrenic, wander around an Ivy league campus unbathed for a while until I win a Nobel prize and someone makes a treacly movie about me, or
2. I am a moron and there's an obvious reason that nobody's bothering.
I lean towards number 2, except when I am falling asleep on the bar next to an empty bottle of Jagermeister and a whore named Trixie who takes out her teeth for an extra buck fifty.
Great post and comments.
Just a follow-up on the LGN and its possible role in imagery, dreams, etc. This projection predates our particular skills in mental imagery. Whatever role the corticogeniculate projection has, goes far back in time.
I think that the heavy back-projection from layer 6 of primary visual cortex to the lateral geniculate nucleus (LGN) has a much more mundane but critical role--cleaning up temporal noise in raw retinal-LGN-cortex spike trains.
A microsecond strobe to the retina will evoke a messy wave of ganglion cell activity across a large surface. These cells have a wide range of conduction velocities and target neurons at different distances. The temporal dispersion and noise of this activity wave will increase in the LGN and in layer 4 of cortex. Noise of this type will make it more difficult to decode complex visual scenes—detecting the predator in the tree or the tasty bug in the grass.
The good news is that the LGN also projects to layer 6, and layer 6 has a massive projection back to principal neurons in LGN. One can imagine a fast feedback loop that would effectively tweak or bias membrane potentials of principal neurons to increase or decrease thalamic latency. This may be a flexible system that would reduce temporal dispersion, and "produce" simultaneity and effectivley deconvolve time and space. We have a lot of built-in calibrators: blinking, eye and head movements.
There is now modest experimental support for this idea. Adam Sillito has done most of this work. The idea requires precise topograph of the feedback, and even LGN lamina specific targeting by layer 6 neurons.
The idea that the LGN is a big on-off switch is (perhaps) obviously a very crude simplification. You do not need 1.5 million "extra" neurons for an on-off switch.
Why does the visual system need multiple and perhaps plastic time-base correctors? Probably because the temporal structure of waves of visual activity vary so much as a function of light level, attention level, stimulus eccentricity, and movement. For example, cones are wicked fast compared to rods. Even within retinal cell classes, response latency is a function of brightness.
A striking but crude demonstration of the time-base correction requirement of the PVS: Place one or more red light-emitting diodes (a clock will work fine) close to the frame of window in room that you can darken almost completely. Stay in this dark room for a few minutes so that your rods are comfortably adapted. Then gently rock one open eye back and forth with your index finger while looking at the LED and the window (the window should be slightly brighter than background). If you have this set up correctly, then LEDs will not remain in the same position relative to the window frame. At a particular oscillation frequency you may be able to perceptually move the LEDs into the window and back out. This demonstrates how you overrode the time-base corrector design specifications. The system cannot correct for the huge temporal lag between scotopic (rod vision) and photopic (cone vision) when you add the external deviation added by your finger.
Post a Comment
<< Home