Development of Visual Binding
In Mareschal & Bremner's chapter in the new volume of the Attention & Performance series, they describe how object location and object identity information may be bound together as a function of prefrontal development. Specifically, they show that human infants are incapable of paying attention to both object identity and object location information at once, and that this inability likely arises from lack of attentional capacity rather than more stimulus-driven causes. They then demonstrate how binding might be accomplished by a simple temporal synchrony mechanism in a neural network model.
The authors first describe how the dorsal and ventral visual processing streams have often been considered independently responsible for location and identity information, respectively, but increasing evidence supports the idea that they are integrated to some extent in human adults. For example, sensitivity to motion - what we generally be considered a "dorsal" task - has been observed in ventral areas. Nonetheless, an abundance of developmental data suggests that the two streams are indeed functionally segregated in the developing brain of infants.
To demonstrate this phenomenon, the authors extended some previous work by Mareschal and Johnson. (Note to the casual reader: you may wish to skip the italicized section below, which contains methodological details - the results are summarized below that).
The authors began by familiarizing infants to a display that contained either two female faces, two monochromatic asterisks, or two toys, one at either side of the display, and two white rectangles in the center. In each of these displays, infants were familiarized with the movement of two objects of the same type moving from the sides of the screen towards the center, and then back to the periphery. While the objects were located in the center of the screen, they were occluded by the two white rectangles.
After being familiarized to these displays, the authors then manipulated what would happen after the objects had become occluded. In a baseline trial, the objects would reappear after occlusion as though nothing had changed. This is the only trial type in which a physically possible event occurred. In "location trials," both objects would appear under the same white rectangle after occlusion - as though one of the objects had mysteriously changed location while occluded. In "identity trials," one of the objects would have been replaced by a novel object while being occluded. Finally, in "binding trials," the objects switched locations, so that both the identities of objects in the display as well as the locations of objects in the display remained the same as before occlusion, but the specific combination of those two forms of information had changed: "object identity # 1" now occupied "object location #2," and vice versa.
If infants could detect a difference between the possible and impossible events, one would expect a large increase in their mean looking time to the display - infants show a robust novelty preference in which impossible or otherwise surprising events show increased looking time relative to possible or otherwise familiar events.
As in previous work, the authors found that infants demonstrated sensitivity to either object location or object identity, but not both (i.e., there was no change in looking times to binding trials relative to baseline). However, in addition to these findings, the authors showed that infants tended to show sensitivity to object-identity information only for faces, and tended to show sensitivity to object-location information only for toys. Therefore, infants had not built up some kind of an attentional bias to process one or the other type of object information, but were instead flexibly maintaining object-identity or object-location information based on the type of object that had been presented.
The authors then describe how the binding together of identity and location information could be a later-developing capacity than representing information about either identity or location. They constructed a 6 layer model with 400 visual input units which projects to both 5 object recognition units (as a kind of "ventral" stream) as well as 100 recurrently connected units (as a kind of "dorsal" stream). These recurrent units projected to another layer of 75 hidden units. These hidden units, as well as the 5 object recognition units, both projected to an output layer of 100 units; in addition, the 75 hidden units also projected to a "predicted location" layer of 100 units. The ventral stream was trained through a self-organizing learning algorithm, while the dorsal stream was trained with backpropagation via the predicted location layer.
Unlike a previous network of identical design, which could only maintain information about a single object at one time, this network was made to accept pulsed firing (what they call "peaks") rather than rate codes. This resulted in the ability of the network to predictively track multiple objects simultaneously. To quote from the chapter itself: "Thus, for example, it is possible to encode the location of object 1 on peak 3 and that of object 2 on peak 1 down the dorsal stream, while encoding the identity information of object 1 on peak 2 and that of object 2 on peak 4 down the ventral stream." Weights were changed on the basis of all peaks, such that the connection weights came to represent general properties of objects. However, when binding is required, the prefrontal layer needed to "align" the proper peaks of activation so as to bind them with temporal synchrony.
The prefrontal layer accomplishes this by cycling among the different pulses from the two streams until it arrisved at a feature-location pair which produced the least error. Although this may seem like a theoretical weakness, it does explain why binding tasks are frequently accompanied by a brief burst of gamma-band activity, in which the brain may be realigning representations in the dorsal and ventral streams. Furthermore, it also demonstrates how binding information might be more difficult to maintain than object identity or location information alone.
It is important to note that temporal synchrony mechanisms are still very controversial, particularly with regard to the possible role for temporal synchrony in accomplishing binding. Other mechanisms certainly exist which could support the same role, and which rely on other established properties of neural computation. Nonetheless, the temporality of spike timing is known to carry information, and so temporal synchrony mechanisms seem to be plausible, at least.
The authors first describe how the dorsal and ventral visual processing streams have often been considered independently responsible for location and identity information, respectively, but increasing evidence supports the idea that they are integrated to some extent in human adults. For example, sensitivity to motion - what we generally be considered a "dorsal" task - has been observed in ventral areas. Nonetheless, an abundance of developmental data suggests that the two streams are indeed functionally segregated in the developing brain of infants.
To demonstrate this phenomenon, the authors extended some previous work by Mareschal and Johnson. (Note to the casual reader: you may wish to skip the italicized section below, which contains methodological details - the results are summarized below that).
The authors began by familiarizing infants to a display that contained either two female faces, two monochromatic asterisks, or two toys, one at either side of the display, and two white rectangles in the center. In each of these displays, infants were familiarized with the movement of two objects of the same type moving from the sides of the screen towards the center, and then back to the periphery. While the objects were located in the center of the screen, they were occluded by the two white rectangles.
After being familiarized to these displays, the authors then manipulated what would happen after the objects had become occluded. In a baseline trial, the objects would reappear after occlusion as though nothing had changed. This is the only trial type in which a physically possible event occurred. In "location trials," both objects would appear under the same white rectangle after occlusion - as though one of the objects had mysteriously changed location while occluded. In "identity trials," one of the objects would have been replaced by a novel object while being occluded. Finally, in "binding trials," the objects switched locations, so that both the identities of objects in the display as well as the locations of objects in the display remained the same as before occlusion, but the specific combination of those two forms of information had changed: "object identity # 1" now occupied "object location #2," and vice versa.
If infants could detect a difference between the possible and impossible events, one would expect a large increase in their mean looking time to the display - infants show a robust novelty preference in which impossible or otherwise surprising events show increased looking time relative to possible or otherwise familiar events.
As in previous work, the authors found that infants demonstrated sensitivity to either object location or object identity, but not both (i.e., there was no change in looking times to binding trials relative to baseline). However, in addition to these findings, the authors showed that infants tended to show sensitivity to object-identity information only for faces, and tended to show sensitivity to object-location information only for toys. Therefore, infants had not built up some kind of an attentional bias to process one or the other type of object information, but were instead flexibly maintaining object-identity or object-location information based on the type of object that had been presented.
The authors then describe how the binding together of identity and location information could be a later-developing capacity than representing information about either identity or location. They constructed a 6 layer model with 400 visual input units which projects to both 5 object recognition units (as a kind of "ventral" stream) as well as 100 recurrently connected units (as a kind of "dorsal" stream). These recurrent units projected to another layer of 75 hidden units. These hidden units, as well as the 5 object recognition units, both projected to an output layer of 100 units; in addition, the 75 hidden units also projected to a "predicted location" layer of 100 units. The ventral stream was trained through a self-organizing learning algorithm, while the dorsal stream was trained with backpropagation via the predicted location layer.
Unlike a previous network of identical design, which could only maintain information about a single object at one time, this network was made to accept pulsed firing (what they call "peaks") rather than rate codes. This resulted in the ability of the network to predictively track multiple objects simultaneously. To quote from the chapter itself: "Thus, for example, it is possible to encode the location of object 1 on peak 3 and that of object 2 on peak 1 down the dorsal stream, while encoding the identity information of object 1 on peak 2 and that of object 2 on peak 4 down the ventral stream." Weights were changed on the basis of all peaks, such that the connection weights came to represent general properties of objects. However, when binding is required, the prefrontal layer needed to "align" the proper peaks of activation so as to bind them with temporal synchrony.
The prefrontal layer accomplishes this by cycling among the different pulses from the two streams until it arrisved at a feature-location pair which produced the least error. Although this may seem like a theoretical weakness, it does explain why binding tasks are frequently accompanied by a brief burst of gamma-band activity, in which the brain may be realigning representations in the dorsal and ventral streams. Furthermore, it also demonstrates how binding information might be more difficult to maintain than object identity or location information alone.
It is important to note that temporal synchrony mechanisms are still very controversial, particularly with regard to the possible role for temporal synchrony in accomplishing binding. Other mechanisms certainly exist which could support the same role, and which rely on other established properties of neural computation. Nonetheless, the temporality of spike timing is known to carry information, and so temporal synchrony mechanisms seem to be plausible, at least.
3 Comments:
Chris,
I don't always understand your posts but I always thoroughly enjoy trying.
I recently ran across a paper I thought you might interesting. It's called "Socioaffective factors modulate working memory in schizophrenia patients."
Just a few good quotes I dug up and saved simply because I liked them so much:
"Indeed, in healthy people, watching social interactions result in increased prefrontal activation, which suggests that socially-relevant information recruits PFC."
"Bleuler's concept of active attention is akin to the modern concept of WM. His hypothesis predicts that diminished affective drive will result in reduced active attention (i.e., WM), but the reverse may also be true; deficits in WM may result in reduced socio-affective functioning."
"The literature concerning the function of the PFC in emotion, social behavior, and WM yield a consistent theme: the PFC is essential when representationally-guided behavior is required whether cognitive, affective or social. [...] WM deficits in schizophrenia may be a core feature of the illness."
Anyway... I found this interesting because a lot of what was said seems to imply what I am hoping for: training the working memory can be useful for developing social awareness, etc. Believe it or not... this was my original intent for training working memory, among other things.
-- Dan (synovexh -AT- gmail.com)
Hi Dan - That definitely sounds interesting. Schizophrenia is obviously a very complex disease, and most of my research focuses on the so-called "normal" aspects of cognitive function. Also, given that social cognitive neuroscience is pretty much a whole field unto itself, I really don't have much to add to the quotes you selected. But from what I can tell it seems like a sound theory, and what you've quoted definitely touches base with established phenomena in the cognitive literature.
I wonder if you could get reaction time differences that reflect differences in social skill by testing adults on classic theory-of-mind tasks. I imagine you would. On the other hand, I'm not sure what other diagnostic measures there would be to which that could be compared... interesting nonetheless...
Some tangential thoughts triggered by your post. Thanks for the stimulation.
Post a Comment
<< Home