Generalization and Symbolic Processing in Neural Networks
Cognitive modeling with neural networks is sometimes criticized for failing to show generalization. That is, neural networks are thought to be extremely dependent on their training (which is particularly true if they are "overtrained" on the input training set). Furthermore, they do not explicitly perform any "symbolic" processing, which some believe to be very important for abstract thinking involved in reasoning, mathematics, and even language.
However, recent advances in neural network modeling have rendered this criticism largely obsolete. In this article from the Proceedings of the National Academy, Rougier et al. demonstrate how a specific network architecture - modeled loosely on what is known about dopaminergic projections from the ventral tegmental area and the basal ganglia to prefrontal cortex - can capture both generalization and symbol-like processing, simply by incorporating biologically-plausible simulations of neural computation. These include the use of distributed representations, self-organizing and error-driven learning (equivalent to contrastive hebbian learning), reinforcement learning (via the temporal differences algorithm), lateral inhibition (via k-winners-take-all), and a biologically-plausible activation function based on known properties of ionic diffusion across the neural cell membranes (via Leabra's point-neuron activation function).
The Rougier et al. network consists of 10 layers - 4 task input layers, all of which project to either a standard hidden layer (meant to represent sensory cortex; 83 units), a task-hidden layer or a cue-hidden layer (again, meant to represent posterior cortical areas; 16 units each). These in turn are bidirectionally connected to PFC layer (30 units) which includes recurrent self-excitatory projections. Both this layer and the sensory cortex hidden layer are bidirectionally connected with the output "response" layer. Finally, an "adaptive gating" unit projects to the PFC, which learns about reward contingencies across time (which partially addresses a lurking criticism that many neural network modes are "temporally static" in comparison to approaches like liquid state machines.)
Simulations within this framework began by training the network on 4 simple tasks. One task required the network to produce output corresponding to only one of the dimensions present in the input (which varied in terms of shape, color, size, location, and texture). Another task required the network to match two stimuli along a single dimension, or comparing their relative values along a certain dimension. Regier et al. note that one critical feature is shared by all these tasks: they all require that the network attend to only a single feature dimension at a time.
After this training, the prefrontal layer had developed peculiar sensitivities to the output. In particular, it had developed abstract representations of feature dimensions, such that each unit in the PFC seemed to code for an entire set of stimulus dimensions, such as "shape," or "color." This is the first time (to my knowledge) that such abstract, symbol-like representations have been observed to self-organize within a neural network.
Furthermore, this network also showed powerful generalization ability. If the network was provided with novel stimuli after training - i.e., stimuli that had particular conjunctions of features that had not been part of the training set - it could nonetheless deal with them correctly. This demonstrates clearly that the network had learned sufficiently abstract rule-like things about the tasks to behave appropriately in a novel situation. Further explorations involving parts of the total network confirmed that the "whole enchilada" was necessary for this performance; without an adaptive gating unit, or connecting an additional context layer to the PFC layer (making it equivalent to a simple recurrent network, or SRN) did not demonstrate equivalent generalization.
Regier also explored the effects of damaging the prefrontal layer, and administering tests like Stroop and Wisconsin Card Sort to the network. In both cases, the network exhibited disproportionate increases in perseverative errors, just as seen in patients with prefrontal damage.
Thus, this computational model demonstrates that neural networks need not be temporally static, nor deficient in generalization, nor stimulus-specific. Instead, they can demonstrate sensitivity to reward contingencies over time, can develop stimulus-general rule-like representations, and produce output based on those rules in response to novel stimuli. Interestingly, the development of these more abstract representations is not dependent on increasing the recurrent connectivity within PFC, as it is in other simulations of the development of such abstract representations. Instead, these representations develop as a consequence of learning and stability within sensory cortex, providing one possible reason that prefrontal development in humans is so prolonged.
Related Posts:
Reexamining Hebbian Learning
Towards A Mechanistic Account of Critical Periods
Neural Network Models of the Hippocampus
Inhibitory Oscillations and Retrieval Induced Forgetting
Binding through Synchrony: Proof from Developmental Robotics
Task Switching in Prefrontal Cortex
Modeling Neurogenesis
The Mind's Eye: Models of the Attentional Blink
However, recent advances in neural network modeling have rendered this criticism largely obsolete. In this article from the Proceedings of the National Academy, Rougier et al. demonstrate how a specific network architecture - modeled loosely on what is known about dopaminergic projections from the ventral tegmental area and the basal ganglia to prefrontal cortex - can capture both generalization and symbol-like processing, simply by incorporating biologically-plausible simulations of neural computation. These include the use of distributed representations, self-organizing and error-driven learning (equivalent to contrastive hebbian learning), reinforcement learning (via the temporal differences algorithm), lateral inhibition (via k-winners-take-all), and a biologically-plausible activation function based on known properties of ionic diffusion across the neural cell membranes (via Leabra's point-neuron activation function).
The Rougier et al. network consists of 10 layers - 4 task input layers, all of which project to either a standard hidden layer (meant to represent sensory cortex; 83 units), a task-hidden layer or a cue-hidden layer (again, meant to represent posterior cortical areas; 16 units each). These in turn are bidirectionally connected to PFC layer (30 units) which includes recurrent self-excitatory projections. Both this layer and the sensory cortex hidden layer are bidirectionally connected with the output "response" layer. Finally, an "adaptive gating" unit projects to the PFC, which learns about reward contingencies across time (which partially addresses a lurking criticism that many neural network modes are "temporally static" in comparison to approaches like liquid state machines.)
Simulations within this framework began by training the network on 4 simple tasks. One task required the network to produce output corresponding to only one of the dimensions present in the input (which varied in terms of shape, color, size, location, and texture). Another task required the network to match two stimuli along a single dimension, or comparing their relative values along a certain dimension. Regier et al. note that one critical feature is shared by all these tasks: they all require that the network attend to only a single feature dimension at a time.
After this training, the prefrontal layer had developed peculiar sensitivities to the output. In particular, it had developed abstract representations of feature dimensions, such that each unit in the PFC seemed to code for an entire set of stimulus dimensions, such as "shape," or "color." This is the first time (to my knowledge) that such abstract, symbol-like representations have been observed to self-organize within a neural network.
Furthermore, this network also showed powerful generalization ability. If the network was provided with novel stimuli after training - i.e., stimuli that had particular conjunctions of features that had not been part of the training set - it could nonetheless deal with them correctly. This demonstrates clearly that the network had learned sufficiently abstract rule-like things about the tasks to behave appropriately in a novel situation. Further explorations involving parts of the total network confirmed that the "whole enchilada" was necessary for this performance; without an adaptive gating unit, or connecting an additional context layer to the PFC layer (making it equivalent to a simple recurrent network, or SRN) did not demonstrate equivalent generalization.
Regier also explored the effects of damaging the prefrontal layer, and administering tests like Stroop and Wisconsin Card Sort to the network. In both cases, the network exhibited disproportionate increases in perseverative errors, just as seen in patients with prefrontal damage.
Thus, this computational model demonstrates that neural networks need not be temporally static, nor deficient in generalization, nor stimulus-specific. Instead, they can demonstrate sensitivity to reward contingencies over time, can develop stimulus-general rule-like representations, and produce output based on those rules in response to novel stimuli. Interestingly, the development of these more abstract representations is not dependent on increasing the recurrent connectivity within PFC, as it is in other simulations of the development of such abstract representations. Instead, these representations develop as a consequence of learning and stability within sensory cortex, providing one possible reason that prefrontal development in humans is so prolonged.
Related Posts:
Reexamining Hebbian Learning
Towards A Mechanistic Account of Critical Periods
Neural Network Models of the Hippocampus
Inhibitory Oscillations and Retrieval Induced Forgetting
Binding through Synchrony: Proof from Developmental Robotics
Task Switching in Prefrontal Cortex
Modeling Neurogenesis
The Mind's Eye: Models of the Attentional Blink
4 Comments:
If you are interested in self-organization, symbolic processing, and generalized abstraction in neural-networks I highly recommend you read 'A Functional Approach to Neural Systems: The Recommedation Architecture' by L. Andrew Coward. It has all the same features (no pun intended) as the architecture you're describing, and is based largely on the intersection of his 30 years of work on large scale embedded systems (telephone switching computers) and current neurological research. I think you'll find it very interesting.
Wow! Thanks for the recommendation. I am reading the abstract but am having trouble wrapping my head around a lot of the terminology. It does not seem to be very similar to the terminology used in standard neural net stuff. ALso, the only person to cite this work is the author himself...
If there's a peer-reviewed, well cited article that expresses the same ideas, in slightly more conventional language, I'd be more likely to read it.
Symbolic Processing: Jonathan Cohen's Stroop model (1990, Psych Review) develops 'symbolic' representations. I think it does so for the same reason the Rouguer model does - localist (i.e. symbolic) input representations. Symbolic representations are easy to develop from symbolic inputs. Or have I got the wrong end of the stick?
Generalisation: I think it is unfair to say that nerual nets have been criticised for failing to generalise. Neural networks are great at generalising - it is just, as you point out, that they don't always generalise in the way you want
That aside, thanks for the summary and the paper looks like being a nice piece or work which is a logical progression of some very current research themes. I look forward to reading it properly ;-)
Hi Anonymous! Yes, you're correct that symbolic outputs are easy to get with symbolic inputs - the old "garbage in garbage out" trick of computational neuroscience.
But what makes Rougier et al. special is that they use distributed representations in the hidden layer. Because the mapping from input->hidden is not required to be symmetric with the mapping from hidden->output, this model truly self-developed symbolic representations, rather than this merely being a feature of the input.
Maybe you could say it developed symbols as a result of the *output* being symbolic, since it's an attractor network. But it is definitely worth reading the paper to see how delicate that type of representation is (they explore 4 other architectures, none of which develop such robust rule-like reps).
I completely agree that it's unfair to accuse neural nets of poor generalization, but it's something I hear from CS people all of the time (maybe because they use too many units [which guarantees problems with generalization], or because they overtrain - i don't know). I guess now they have one less reason to argue that, though! :)
Post a Comment
<< Home