Pictures in the Brain

The new issue of Scientific American Mind has a nice recap on the mental imagery debate, which has been going strong for over 10 years now. The debate proceeds roughly as follows: when we imagine something, are we merely activating its abstract, propositional representation in long term memory, or is imagination actually a process of neurally reinstantiating this distributed information into a visual form available for inspection and spatial manipulation?

Though this latter "depictive" theory may seem far-fetched, much experimental evidence by Kosslyn and others supports the idea that we use the same neural machinery in mental imagery as we do in direct visual experience with the world. For example, when subjects are asked to imagine a rabbit next to a flea, they are much quicker to answer questions pertaining to small visual deatils on the rabbit (such as whiskers) than other subjects who had imagined the rabbit next to an elephant; this result suggests that the inspection of mental images requires a "zooming" process that is scale sensitive, and not simply propositional or code-like.

A similar result occurs when subjects are asked to approach an elephant or a smaller object until that image begins to "overflow" from their frame of reference; the imagined distance at which subjects stop is remarkably consistent with the real distance at which such an image would eclipse an observer's field of view. Likewise, when subjects are asked to study a figure consisting of multiple features separated in space, and then to identify specific features on that figure from memory alone; reaction times are consistent with the distance between features, as though some "attentional spotlight" must physically shift from one location to another in the mind's representation.

Neuroimaging evidence also supports the claim that mental imagery actually projects to many of the same anatomical areas we use for sensory perception. Patients with hemispheric neglect also show this same pattern of results; those with right parietal lesions are impaired both at orienting to visible stimuli in the left hemifield, as well as in orienting to imagined stimuli in that hemifield.

Yet the conclusions drawn from this research is hotly debated by researchers such as Zenon Pylyshyn, who believe that the behavioral evidence reflects both "tacit knowledge" on the part of the subjects as well as experimenter/task bias. Pylyshyn argues that subjects may understand the task as being to actively simulate visual perception, and hence they produce reaction times that are consistent with visual perception though these are merely artifacts of propositional encodings of features and relations between them. Pylyshyn also points to experiments, described in the SciAmMind piece, which seem to support this artifactual view of imagery scanning times. Finally, several visual illusions (such as the necker cube) and other visual phenomena are not seen in mental imagery; this poses a problem for theories which invoke the same neural machinery in imagery as in perception.

Who is right? Get a copy of Pylyshyn and Kosslyn's heated exchange (in which Pylyshyn calls Kosslyn's theory "grotesque," and Kosslyn calls Pylyshyn "nihilistic") in March 2003's Trends in Cognitive Sciences, and see for yourself. Like most theoretical schisms in cognitive psychology, the answer is likely to begin with the phrase "it depends" - but the precise nature of this dependency is both a far deeper and much less explored question.

Related Posts:
An Informal Integration of Object Recognition Models
The Attentional Spotlight
Mind's Eye: Models of the Attentional Blink
False Promise of View Invariance


Under The Rug: Executive Functioning

If you want to build an intelligent and biologically-plausible system, you of course need an actuator or motor subsystem, object recognition capability, several different kinds of memory capacity, and probably several other subsystems corresponding to various regions of the human brain. But what kind of subsystem would be capable of orchestrating these capacities and coordinating them to produce intelligent behavior? To put it another way, does intelligence consist entirely in interactions between various capacities, or is there a cognitively- and anatomically-distinct agent that coordinates them? These questions (albeit in slightly different form) are the same as those confronted in "executive functioning" research.

This field gets its name from Baddeley's proposal of a "central executive" subsystem in working memory, which for years seemed like nothing more than a placeholder. It was a convenient spot in which to hide those things that we couldn't accurately measure or didn't fully understand (such as attention, or visual binding). But recent work by Miyake, Friedman, Emerson, Witzki, and Howerter has begun to tease apart the component features of "executive functioning" and give us a much better idea of what functions may subserve intelligent behavior.

Of course, how do you test intelligent behavior? The authors picked a battery of tasks that are commonly believed to load one or two executive subfunctions: shifting, updating and inhibition. "Shifting" is the switching of attention back and forth between multiple responses, either in a dual task paradigm or in a task requiring different responses under different conditions. "Updating" subfunctions refer to the monitoring and coding of incoming information for relevancy, and then updating working memory representations with more relevant information. Finally, the "inhibition" subfunction refers to the deliberate suppression of dominant responses.

Many executive function tasks are plagued with "task impurity" problems, in which they have low test-retest or within-subject reliability, reflecting both the fact that executive functions rely on non-executive cognitive abilities (they are, after all, "coordinators") and also suggesting that the use of multiple strategies may be confounding the results. To mitigate these problems, the authors ensured that each participant had not encountered any task before (performance is sensitive to repeat encounters) and also adopted a unique statistical approach known as latent variable analysis or structural equation modeling. This approach allows one to test a small number of hidden variables (in this case, updating, shifting, and inhibition) which are thought to be responsible for the variation seen across a number of manifest variables. The correlations between these latent variables allows one to assess whether a three-part model fits the data significantly better than one involving only a single latent variable (some unitary "executive function") or one that involves two of the three proposed subfunctions. Further analyses allow one to determine whether the proposed subfunctions have sufficiently distinct explanatory power to be truly considered different constructs.

Before delving into the results, it's important to review what each subfunction task measured. Shifting was measured by the plus-minus task (participants are given three lists of random numbers and asked to add 3 to each number on the first list, subject 3 from each number on the second list, and alternate between adding and substracting three on the third list; measures of mean reaction times on this task allow an index of the "switch cost" associated with shifting behaviors), the number-letter task (participants respond in one way to a number-letter pair if presented in the top two quadrants of a computer display, but oppositely if they are presented in the bottom two quadrants of the display; in the first two blocks of trials, the pairs are presented entirely in the top or bottom of the display; during the final block, however, the pairs will alternate from top to bottom half; differences in mean reaction times between the third block and the mean of the first two blocks results in a measure of switch cost), and the local-global task (participants respond on the basis of either a global shape on the computer screen, or on the basis of the tiny shapes that are organized to create that large shape, somewhat like ASCII art; the switch cost here is measured as the difference in RT between those trials requiring a shift in response set from those that were repeats of the previous set).

The three updating tasks were keep-track, tone-monitoring, and letter-memory. In the keep-track task, participants were presented with each of 15 words, for 1.5 seconds a piece, and had to remember the last word presented in each of six pre-determined categories. Proportion of correct responses was the dependent variable (DV). In tone-monitoring, participants were presented with a series of 25 tones that were randomized as either high, medium, or low, and their job was to respond on the fourth presentation of each tone type, again with proportion of correct responses as the DV. In the letter-memory task, subjects were required to rehearse outloud letters as they were presented, and then to recall the last 4 letters in a given list, with proportion correctly recalled as the DV.

Finally, inhibition was measured with the antisaccade task (subjects must inhibit a saccade in the direction of visual cue in order to successfully detect a briefly presented target), the stop-signal task (subjects had to categorize stimuli except for when presented with a brief tone) and stroop.

The authors also administered five complex executive tasks, in order to assess how well each of their postulated subfunctions could account for the "messier" results provided by these more traditional measures of executive functions: Wisconsin Card Sort (WCST), Tower of Hanoi, Random Number Generation, Operation Span, and Dual Task. In WCST, participants must match their cards to a series of reference cards according to the dimensions labeled by the experimenter (i.e., on the basis of color, shape, number, etc). The DV is the number of perseverative errors, in which subjects mistakenly sort by a dimension that is no longer relevant. In Tower of Hanoi, participants must move a series of disks from one peg to a third peg so that the pile looks identical at the end of the task as it did at the beginning; however, subjecs are only allowed to move one disk at a time and can never place a larger disk on top of a smaller one (try it here). The total number of moves is the DV. Random number generation was measured with a variety of randomness indices. In operation span, participants must read aloud arithmetic equations and then a briefly presented word; after a certain number of equations (2-5) the participants must recall all the previously presented words. Finally, in the dual task paradigm participants had to finish as many mazes with paper and pencil as possible while completing a word generation task outloud.

In what can only be called a mammoth of a study, 137 subjects were tested with only two outlier exclusions. The results of the study are reported below:
  • Each of the three posited executive functions were separable, distinct constructs as confirmed by factor analysis in which a three variable model fit the data significantly better than a single factor model; further, predictions based on the three factor model did not significantly deviate from the observed data, whereas the single factor and all two factor models were significantly worse; finally, none of the three factors were perfectly correlated with one another, reflecting independence (but some overlap, as would be expected of executive functions that coordinate other functions)
  • After determining factor loadings for the basic executive tasks, the factor loadings for the 5 complex executive tasks (WCST, Tower of Hanoi, Random Number Generation, Operation Span and Dual Task) did not significantly differ, suggesting that the empirically derived factor structure was highly reliable even for more complex tasks that involved multiple subfunctions
  • WCST loaded shifting functions, not updating, and the contribution from inhibition was non-significant
  • Tower of Hanoi was better modeled with a single path model from inhibition better than no-path, all other one path, or the three factor models;
  • Two components of the random number generation task derived from the analysis of specific randomness indices, identified in previous literature as a "prepotent associates" component and a "equality of response usage, tapped both inhibition and updating functions, respectively. This result is consistent with research with transcranial magnetic stimulation of the dorsolateral prefrontal cortex, showing dissociable capacities for these two component factors ("prepotent associations"/inhibition, and "equality of response usage"/updating)
  • Operation span was found to load updating; other models were significantly worse;
  • finally, dual task performance was not significantly related to any of the three postulated functions, possibly suggesting that it may tap an executive function that is independent of the three postulated here (though conclusions from null results must be made with caution)

In summary, the authors remark that their results show both the unity and diversity of executive functions: while tasks can be created that load each individually, intelligent behavior on mor ecomplex tasks (and indeed in day-to-day functioning) is likely a result of complex interactions between these subfunctions. Still, several questions remain, such as how each of these constructs may map onto neuroanatomy, whether other factor structures explain the data even better, and how these constructs might relate to more traditional measures of intelligence (e.g., Gc, Gf). Thankfully, there is some preliminary evidence for the answers to two of these questions; stay tuned for reviews of some relevant evidence in this week's upcoming posts.

Related Posts:
Task Switching in Prefrontal Cortex
Active Maintenance and the Visual Refresh Rate
The Tyranny of Inhibition
Selection Efficiency and Inhibition


Modeling Neurogenesis

It's been years since we learned the falsity of that old claim, "you're born with all the neurons you'll ever have," but the molecular mechanisms of neurogenesis have remained fairly mysterious. The functional role of neurogenesis remains unclear as well; why is adult neurogenesis primarily limited to the subgranular zone of the dentate gyrus in hippocampus? A better understanding of the biological conditions that signal adult neurogenesis could inform computational models of this process, and perhaps clarify the role that neurogenesis serves in these specific brain regions. These are the issues tackled by Liedo, Alonso, and Grubb in the current online edition of Nature Reviews Neuroscience.

Adult hippocampal neurogenesis is known to be increased in cab drivers, rats placed in an enriched environment, and even in certain seed-catching birds. This suggests that some mechanism identifies the need to learn and remember more spatial locations in these populations, and then presumably signals the development of new neurons. Indeed, new neurons have been shown to be more sensitive to novel inputs than other neurons, suggesting that they are actively generated, under some conditions, in response to novelty.

These new neurons are also more excitable than their elders, in part due to the fact that new neurons are excited by a neurotransmitter that is normally inhibitory - GABA. This excitatory effect appears to be crucially linked to chloride ion channels, in that blocking these channels results in GABA becoming inhibitory in new neurons as it is in older neurons. NMDA receptor activation also appears necessary for new cell survival; interestingly, survival is improved not by some absolute magnitude of NMDA activation, but rather an amount that is high with respect to a neuron's elder neighbors. New neurons also show increased long-term potentiation. Functionally speaking, these characteristics all make sense: new neurons must be "immune" to inhibition (so that they can more fairly compete with established neurons for representing new inputs) but must also show that they are actually needed (i.e., they are receiving a large amount of excitatory input relative to others).

On a molecular level, neurogenesis may occur in a process roughly similar to the following: astrocytes actively express growth proteins dependent on the local patterns of neural activity and the local neural density, which then upregulate the proliferation of adult hippocampal stem cells and encourage them to adopt a specific fate as neurons. The expression of WNT proteins also has a role in promoting neurogenesis. Interestingly, neurons appear to promote the differentiation of these stem cells into oligodendrocytes without an increase in neurogenesis, suggesting a possible pathway for negative feedback in the production of new neurons from adult neural stem cells. This view of neurogenesis is consistent with an emerging view of the active role astrocytes may play in neural functioning. For far more details on the molecular mechanisms of neurogenesis, see this excellent in press article from Nature Reviews Neuroscience.

The CA3 region of the hippocampus has been studied as a possible candidate for creating the kinds of distributed representations that likely underlie memory functions. If the CA3 detects these representations are beginning to overlap, and are hence less distributed, it may be subject to catastrophic interference in memory recall - in which memories may overwrite others. Neurogenesis in the hippocampus could therefore be a way of maintaining capacity for distributed representations and avoiding catastrophic interference. In fact, the neurons generated by astrocytic signalling of progenitor cells (which are of both inhibitory and excitatory types) send axonal projections into CA3.

This view is also compatible with evidence that adult neurogenesis may contribute to the learning and memory functions of hippocampus, as well as with computational models of both olfactory bulb circuitry (in which neurogenesis better orthogonalizes new sensory representations and may relate to improved olfactory discrimination) and the dentate gyrus layer of hippocampal networks (in which neurogenesis reduces inteference between stored representations and hence improves recall). Other computational models of neurogenesis, such as the cascade correlation algorithm, are not as biologically constrained and work on slightly different principles: new units are added to better orthogonalize existing representations rather than prepare for better orthogonalization of new ones. See this excellent review for more information on computational models of neurogenesis.

In summary, the current evidence appears to converge on an understanding of neurogenesis as a response to low neural density or astrocyte dependent brain activity, with the possible purpose of preparing a network with the ability to recruit new units if necessary. As pointed out by Liedo, Alonso, and Grubb, this might be considered a form of "metaplasticity" - in other words, changes that facilitate further changes.

Related Posts:
Neurogenesis in Kids, Adults, and Silicon
A Role for Protein in Learning and Memory


A Role for Protein in Learning and Memory

Researchers have identified a protein crucial for understanding both dendritic remodeling and dendritic pruning. This protein and associated pathways are likely one of the primary systems underlying changes in synaptic efficacy, one of the most fundamental aspects of learning.

As a result of neural activity, intracellular stores of calcium increase. This process is known to be involved in long term depression and long term potentiation, the two processes often cited in computational models as the biological agents of unit connection "weight changes." This new research helps elucidate exactly how this process may occur, as a result of dephosphorylation (and hence activation) of a protein called MEF2 which is a negative regulator of synapse formation.

When activated, MEF2 promotes the transcription of several genes known to restrict synapse number. This finding is somewhat counterintuitive, since it means that under conditions of learning new synapses are not generated, but instead the brain remodels those synapses already available to it (although MEF2 has the opposite effect if it's sumoylated, as opposed to phosphorylated, which in turn appears to depend on its location in the brain). Several other other calcium-dependent proteins are also generated (including CREST and CREB) which are known to be involved in the formation of new synapses. The precise nature of the interactions between these conflicting activity-dependent proteins is still a mystery.

For more freely-available information, see this press release.

Related Posts:
A Role for MicroRNA in Learning and Memory
Molecular Basis of Memory


Neuroindices of Memory Capacity

Direct relationships between neuroimaging and measures of "intelligence" (broadly speaking) have until now mostly been the stuff of science fiction. Many of the cognitive functions considered integral to our capacity for intelligent behavior are thought to be an emergent property of oscillations between far-flung brain regions; functions like "attention" or "working memory" seem to involve these complex interactions between multiple brain regions on multiple timescales. Not surprisingly, this makes the functions rather difficult to localize, and given the spatiotemporal tradeoffs inherent to neuroimaging methods, links between neuroimaging and specific cognitive indexes are rare.

However, Vogel and Machizawa (2004) have found a startlingly accurate ERP correlate of encoding and maintenance in visual short term memory. By using a simple calculation, they are capable of predicting an individual's visual short term memory capacity (the ability to remember objects with multiple features over a delay in the range of seconds to minutes).

The ERP component reflecting these capacity limits is recorded from the posterior parietal and lateral occipital electrode sites, and consists of a single positive spike which sustains its activity over the delay period. For most people, this wave reaches an asymptotic minimum when around 4 items are being successfully maintained (as measured by subsequent recall). For lower capacity individuals, this wave reaches its minimum more quickly, such that it may be "as low as it will go" when only 2 or 3 items are being maintained.

The authors discovered this component first by presenting visual items to participants in only one visual hemifield. Next, the participants were required to maintain these items over a delay, and then respond to whether a test delay was the same or different, while an EEG was recorded from their scalp. By subtracting the ipsilateral wave from the wave in the hemisphere contralateral to the to-be-remembered hemifield (so as to find the component related only to the remembered items), and then by analyzing the amount of change between the encoding of successive items, the researchers can forecast the number of visual items at which this wave would asymptote, reflecting nearly 78% of the variability in that individual's visual short term memory capacity (p<0.0001).

Other conditions ruled out alternative interpretations of this calculation as a result of simply increasing the number of maintained representations, changes in arousal, more executive processing, or higher difficulty overall. However, the active maintenance of items in visual memory does suggest that the recorded EEG is actually a product of visual short term memory under the influence of amplification by attention.

What is the mechanism by which this "wave" or attentional modulation might be more thinly spread among representations in low capacity individuals than in high capacity individuals? One possibility is that a second "gating" frequency is responsible for allocating the attentional modulation among items, and that the gating frequency differs among individuals such that some can successfully allocate their available attentional bandwidth to more items than others. Or, the frequency of attentional maintenance itself could be lower in these individuals, reflecting lower bandwidth. Unfortunately, both interpretations are probably naively simplistic, given that changes to one frequency would likely result in concomitant adjustments in others. Until the origins, pathways, and purposes of such oscillations can be untangled, this highly accurate neuroindex of memory capacity will continue to be somewhat perplexing.

Related Posts:
Entangled Oscillations (and possible roles of various frequencies in gating)
Active Maintenance and the Visual Refresh Rate (and gating between modalities in working memory)
Thinking about "Thinking Harder" (and a neuroindex of cognitive workload)
Anticipation and Synchronization (and results consistent with a relatively slow gating of fast "attentional" gamma oscillations)


Interactive Brain Maps

Ever get a little frustrated with the number of adjectives tacked onto the word "cortex" (e.g., dorso-lateral prefrontal, anterior inferotemporal, ventral frontoparietal)? How about with all those weird subcortical structures (e.g., superior cerebellar peduncle, lateral geniculate nucleus)? I do.

Brush up on your neuroanatomy with BrainTutor, a free interactive brain mapping tool in which you can rotate and zoom around an accurate brain (based on MRI data) in three dimensions. It will even label the names of different regions at your preferred level of analysis (lobes, gyri, and sulci).

Now, if only each region was automatically linked to a Google scholar search...

EDIT: Be sure to check out Sylvius too, a web app which may actually be better for quickly finding a specific sulcus or gyrus.

Task Switching in Prefrontal Cortex

During complex tasks, we may have to switch our attention between multiple task demands, and maintain information about what we're currently doing as well as the end goal. Unfortunately, those tasks which require rapid or frequent shifts of attention are often the most difficult, or in other words, the most subject to "switch costs."

Performance in these task switching paradigms is thought to be driven by specific regions of prefrontal cortex, which may subserve the active maintenance of goal- or task-related information. According to Reynolds, Braver, Brown and Stigchel's article in press at Neurocomputing, analyses of switch cost distributions suggest two distinct modes of behavior; one stochastic distribution of fast trials with low or no switch costs, thought to be a result of successful suppression of task-irrelevant representations, and a second stochastic distribution of much slower trials which reflect the unsuccessful suppression of the pre-switch learning, task set, and resultant priming. Whether a trial contains low or severe switch costs is thought to be a function of a dopamine-based gating signal sent from basal ganglia to prefrontal cortex.

To test these hypotheses, the authors developed a computational model using the biologically-plausible LEABRA algorithms of the pdp++ modeling environment. Many implementational details are provided in the paper, but for our purposes the most important aspect is the modeling of a phasic dopamine signal: this gating input serves to activate a hysteresis current in prefrontal cortex, which allows PFC activity to become self-sustaining (and hence maintain the current goal or task). In the absence of this gating signal, PFC activity dies out and therefore more stimulus-specific and posterior regions are not as heavily biased towards the current task. In this latter case, the effects of priming and previous learning are more pronounced, and therefore switch costs are higher.

As often happens with neural network models, this model emergently manifested several other phenomena seen in human studies of task-switching. For example, task switch trials were less accurate, slower, and different from non-switch trials by roughly the same magnitude as in the empirical literature.

Related Posts:
Models of Dopamine in Prefrontal Cortex
Anticipation and Synchronization


The Astrocyte Hypothesis

Hat tip to the Bioethics blog for pointing out an article in the newest issue of Nature Neuroscience in which glial cells are shown to be capable of increasing blood flow dependent on recent neural activity. Glial cells, also known as astrocytes, can increase bloodflow by 37% in less than 2 seconds; such activity suggests they may have a more important regulatory role in processes such as learning. This study is one of several that are beginning to establish a more important role for astrocytes in information processing than previously thought.

Glial cells outnumber neurons in the cortex by almost 10 to 1, but were previously thought to be only simple "support cells" and not involved in the details of learning and memory - a role traditionally assigned only to neurons. However, Takano, Tian, Peng, Lou, Libionka, Han and Nedergaard have used 2 photon microscopy and laser Doppler flowmetry in live, exposed cortex of adult mice to show that glial cells are the means by which neural activity is tied to local blood flow. This is important because fMRI, probably the dominant form of brain imaging in use today, is not capable of imaging neural activity directly, but instead reflects changes in blood flow. In a sense, fMRI images astrocytic as opposed to neuronal activity, although as shown by this study they are tightly coupled.

The exact mechanism by which astrocytes cause vasodilation is thought to be triggered by increases in extracellular glutamate, triggering cytoplasmic calcium release and thereby signalling the production of both COX-1 and arachidionic acid derivatives. These are then converted to vasodilators by vasoactive epoxyeicosatrienoic acids.

Related Posts:
Complexity and Biologically Accurate Cognitive Modeling



Nice post on Clathrin (pictured) over at thie Beta Rhythm blog, which turns out to be a pretty fascinating molecule. Previous posts also cover the geometric structure of ion channels and other very detailed neurobiological phenomena. Make sure to check out the video of clathrin self-assembling.


Gestures and Mathematical Performance

Although I don't normally blog on the weekends, I thought this article from New Scientist was pretty interesting: teachers who "talk with their hands" may actually be better teachers than those who are less animated, new research shows. In Susan Goldin-Meadow's recent study with 160 elementary schoolers, kids performed far better on a series of math questions when their teachers instructions included specific meaningful hand gestures, as compared to other groups who received only abstract or no hand gestures.

Spatial thinking may be beneficial for mathematical skill, and it's likely that these hand gestures would engage the parietal lobe, a region of the brain responsible for our representation of space.


Neural Correlates of Insight

What happens in the brain during an "a-ha" moment? As described by this paper from PLOS Biology, both fMRI and EEG evidence suggests distinct neural patterns of activity accompany the feeling of insight. Right anterior superior temporal gyrus activity increases along with increases in gamma band activity 300 msec prior to insight solutions. Although many of the same brain areas appear to be active during insight and normal problem solving processes, this specific pattern of results suggests that insight is caused by an abstract, holistic processing of information recruited from distant brain regions.

Is insight just another form of problem solving, differing only in emotional intensity or suddenness of onset? Such skepticism is warranted, given that insight often seems to occur with little or no warning, and is certainly accompanied by an intense feeling of accomplishment. But some recent results suggest that people continue to "think" about problems unconsciously - when primed with a potential solution, people are faster to make a decision when these words are presented to the right as opposed to left hemisphere - and similar processes might be at work during insight.

Right-hemispheric regions are particularly important for analyzing distant semantic relations, and indeed fMRI evidence from the PLOS paper implicates a right-hemisphere structure called the anterior superior temporal gyrus. This region is known to be important for integrating information across large cortical distances, and the right hemisphere more generally is known to be involved in relatively coarse-coding of semantic information and more holistic processing of visual information.

The researchers also demonstrated that just prior to the discovery of problem solutions, brief bursts of gamma activity (at anterior right electrodes) are predictive of whether the solution involved "insight" or not (see an animation here). Further, this insight-related activity was directly preceeded by alpha band (8–13 Hz) activity in the right posterior parietal electrodes up to 1.4 seconds before the solution response. This observation is consistent with an interpretation of alpha activity as a form of "cortical idling" in which bottom-up activity is essentially suppressed in order to allow more free-form associations to take place, such as those may take place in the right hemisphere. If alpha activity does serve such a role, it provides a new way of understanding previous reports that that alpha band power is involved in search and retrieval processes.

The same authors have several newer papers in Trends in Cognitive Sciences and Psychological Science, describing their approach and ways in which people can become biased towards insight-based problem solving strategies.

If you liked this, don't forget to dig it.

Related Posts:
Gamma Synchrony
Entangled Oscillations
Aha! Neural mechanisms immediately preceding the Aha!


Entangled Oscillations

One common way of thinking about brain activity is that networks of brain areas activate in sequences corresponding to different cognitive processes. However, as pointed out by Lawrence Ward in his article "Synchronous Neural Oscillations and Cognitive Processes", this way of thinking actually conceals the importance of oscillatory computation, or as Ward puts it, "reverberations of reentrant activity in complex neural networks."

One way of illustrating the importance of these oscillations is that different frequency bands are correlated with different types of cognitive change. For example, spectral power in alpha bands increases while theta and delta frequencies decrease in maturing children; the opposite trend is observed in the elderly. While alpha frequencies were thought to primarily reflect search and retrieval processes, and theta oscillations were correlated with encoding processes in working memory, some new evidence refines these hypothesized roles.

Global theta oscillations are more prominent when subjects navigate mazes without memory cues, and these frequencies are more prevalent the longer the decision time at each turn. Such evidence suggests that the theta band has a role in retrieval processes. Likewise, gamma (40 Hz) oscillations briefly interface the rhinal cortex with the hippocampus during successful memory formation, are prevalent in successful target detection, and have been proposed more generally to be responsible for the effective transmission of information over long cortical distances. Ward reviews compatible evidence, that during successful recollection gamma-band activity is actually modulated by theta waves between frontal and parietal cortex.

So, do gamma- and theta waves reverberate through the network responsible for short-term memory? Ward reviews one such hypothesis, in the form of Lisman and Idiart's computational model of working memory, in which synchronous firing through recurrent pyramidal connections is able only to preserve information transiently, and must therefore be briefly "refreshed" at roughly 40 Hz (gamma) once every 100-200 ms (theta). Such waves have indeed been recorded in cortex.

Some quick math provides a tantalizing, if provocative (and admittedly inexact) insight: if working memory span is related to some interaction betweent theta and gamma waves, it is a coincidence that 40/6 is roughly equal to Miller's number 7, plus or minus two? Further, the range of possible spans based on gamma and theta variability (30 to 70 Hz, and 3.5 to 7 Hz respectively) falls within the range of working memory capacities observed in humans (roughly 3 to 20). This is compatible with earlier suggestions that if oscillations of "perceptual sampling" are responsible for the wagon wheel illusion, they may be closely related to visuo-spatial working memory and/or processing speed.

As Ward admits, these numbers are not universally accepted and the hypothesis is still an empirical question. But some converging evidence comes in the form of a kind of gamma "frequency following response," based on experiments in which the rate of auditory clicks (presented at near-gamma frequencies) was seen to influence WM span. According to Ward's analysis, this "confirms the importance of a gamma-clocked process." One wonders whether other frequency bands might be subject to the same clocking or frequency following response process, perhaps as illustrated in popular depictions of hypnosis.

Along those lines, alpha-band power appears to correlate with some aspects of attention, particularly suppression processes and behaviors such as infant habituation. Alpha waves are seen to increase during memory load, external task load, and cued anticipation of an auditory stimulus. Alpha-band oscillations can even be localized to those exact regions of retinotopic visual cortex in which distractors are expected to appear, as though alpha oscillations are somehow responsible for (or the result of?) supression processes. Finally, alpha-band oscillations are also thought to be phase locked with external stimuli, which may allow peaks in attentional dynamics (such as capacity or switching) to be synchronized with the time course of environmental changes. This suggestion is compatible with new EEG data from attentional blink paradigms, which appears to be a gamma rhythm modulated by an alpha or theta wave.

Koch, Tononi, and others, have even gone so far as to propose that a global, dynamic core of intermixed oscillations may somehow provide a foundation for consciousness. According to this framework, local oscillations only enter conscious awareness when they become integrated with the dynamic core. On its surface, this view of consciousness is compatible with some theories of attention, although clearly it does not specify in detail how consciousness might arise from these oscillations.

Related Posts:
Neural Oscillations and the Mozart Effect
Perceptual Sampling: The Wagon Wheel Illusion
The Mind's Eye: Models of the Attentional Blink
Neural Network Visualization
Anticipation and Synchronization
Gamma Synchrony
Synchrony vs. Polychrony


Mindsight Reconsidered

Prompted by Bob Mottram's justified skepticism of "mindsight", I've found this article (by Dan Simons et al., & also from Psychological Science) which suggests this change blindness phenomenon might be merely a change of response criterion.

"Provocative claims merit rigorous scrutiny," state Simons et al., before they go on to show that mindsight rates are highly correlated with false alarm rates in their replication of Rensink's original study (with a greater number of trials, including catch trials). This is precisely what one would predict if "sensing" is actually a liberal response strategy, and "seeing" requires verification of that initial response over several subsequent alternations between pictures.

According to this logic, more liberal individual response criteria for "sensing" should result in both more "sensing" false alarms and longer lags between "sensing" and "seeing." In other words, subjects in the can-sense group should show more sensing false alarms than only-see subjects, which is inconsistent with "mindsight" being the result of a distinct informative process. Indeed, Simons et al. found a difference of more than 10% in false alarm rates between groups; perhaps the lower number of catch trials used by Rensink provided insufficient power to find this crucial difference (which in his experiment was less than a 1% difference between groups).

Rensink had argued against the "liberal response criterion" explanation with with the following logic: if mindsight is only a different criterion, such partial detection should lead to a more immediate "saw" response. In contrast, Rensink argued, the "sense-saw" lag on mindsight trials was similar to response latency on nonmindsight trials, indicating that mindsight did not contribute to change localization. Simons et al. claim that this finding is a result of Rensink's arbitrary 1-second cut-offs between can-sense and only-see groups.

Although it is certainly tempting to deride such an outlandish hypothesis, non-attentional pathways to awareness do seem to exist as demonstrated by several neuropsychological findings. For example, when patients with right parietal damage are asked to draw two houses (pictured at the start of this article), they are unable to draw the left side of each house, and will proclaim that the two images are identical despite the fact that the left side of one house is on fire; this is a classic demonstration of hemispatial neglect. However, when asked to point to the house in which they'd rather live, most hemispatial neglect patients will point to the house that's not on fire. Such cases demonstrate that visual information is able to guide decision-making even if spatially it is nonspecific (similar to mindsight) and even if it is below the threshold of verbal awareness (similar to blindsight).

In a wonderful example of academic "adversarial collaboration," Rensink and Simons have since co-authored a review of change blindness research which reflects their continuing disagreement about whether mindsight might reflect a new perceptual mechanism, a non-attentional pathway to awareness (such as that demonstrated by hemispatial neglect patients, or those with optic ataxia), or merely the use of multiple response criteria.

Related Posts:
A New Mode of Perception?


A New Mode of Perception?

Most people implicitly equate attention with awareness: to be aware of something, you must have paid attention to it at some point. The phenomenon of mindsight suggests this may not be a safe assumption; according to Ronald Rensink's experiments, observers can actually become aware of visual objects ("sensing") without a corresponding visual experience ("seeing").

In Rensink's change blindness flicker paradigm, subjects are presented with two different visual scenes which alternate back and forth, separated only by a brief (80ms) blank screen during the switch. Forty subjects underwent nearly 48 trials each, 42 of which actually contained a change from one scene to the other; the remaining six trials consisted of only two identical pictures switching back and forth which served as "catch trials."

Subjects were to give one response when they first "sensed" a change, and a second response when they were sufficiently sure of a change that they could verbally identify the changing object and its location. The results were separated into two trial types: alpha trials were those in which the first response occurred within 1 sec of the second response, and hence there was effectively no "sensing;" beta trials were those in which "sensing" responses occurred more than 1 sec before the "seeing" response. Subjects were then divided into three groups: the only-see group were those who had a very low proportion of alpha trials, while the remaining subjects were divided between the can-sense group (if they performed above 50% on the catch trials) and the guess group.

The results showed that the can-sense group was able to "sense" a change more than 2 seconds before they were able to identify that change, and given the hit rate for "sensing" it's clear that these responses were not merely the result of a guessing process. Further analyses suggest that "sensing" responses are not simply the a result of a lower change detection threshold; rather the pattern of results more strongly implicate a distinct mechanism of visual perception. A second experiment was conducted to rule out the possible effects of transients in the display. No one knows the specific mechanism by which this non-specific "sensing" might occur, although one possibility proposed in Rensink's paper is the disturbance of some non-attentional, global representation of scene layout.

Nearly 30% of participants were able to sense changes without having actually seen them - which Rensink calls "mindsight." In contrast to the phenomenon of blindsight (in which people with damage to V1 will declare themselves totally blind, but can nonetheless perform well above change in identifying visual information), mindsight involves a conscious awareness of information, but no visual experience. Rensink concludes by conjecturing that mindsight may underlie the commonly held belief in a "sixth sense," and while there's no need to posit an "extrasensory" modality, it's likely that similar phenomena would occur for the other senses as well.

EDIT: be sure to see the next article in this series, "Mindsight Reconsidered," for a different perspective on this data, offered by Dan Simons, et al..


Video Games, Bilingualism, and Cognition

Hat tip to the Intelligence Testing blog for pointing out a nice article about possible advantages in executive control and visual attention conveyed by bilingualism and video games, respectively.

An Informal Integration of Object Recognition Models

Comprehensive theories of human pattern recognition must confront several fundamental questions, including the nature of visual representations, the nature of object knowledge, the mechanisms that interface the two, and how either or both of these may change with experience (Palmeri & Gauthier, 2004). Below, an integrated model of pattern recognition is proposed which addresses these topics by positing a) multiple view-dependent object representations, as well as b) separate subsystems for feature-based and holistic processing. In this four-part model, incoming visual data first undergoes preprocessing, and is then transformed to a familiar view, ultimately resulting in strengthened pattern activation. The transformed visual information is then routed to two lateralized and parallel subsystems: a right-hemispheric system which processes more specific, exemplar-related characteristics of the visual data (on the basis of holistic forms), and a left-hemispheric system which processes more abstract, category-related information (on the basis of features). These two subsystems connect bidirectionally to associative memory, where object identity is retrieved jointly on the basis of features and holistic forms.

Any complete model of pattern recognition must account for viewpoint-dependent reaction times in object recognition, and yet be able to simulate viewpoint-independence as exposure to a given object increases. Many early theories of object recognition stressed the view-invariant aspect to object recognition, citing the impressive human ability to recognize a new instance of an old object despite the kinds of changes in orientation, size, and lighting that occur in everyday life (Biederman, 1995). Other data showed that naming times for basic-level categories is invariant across changes in viewpoint, suggesting that the primary mechanism of object recognition is view-independent (Biederman & Gerhardstein, 1993). Upon closer inspection, however, human object recognition is not perfectly robust to pattern variability.

For example, subjects can more quickly recognize objects from specific characteristic views than other less characteristic views (for example, a cow may be easier to recognize from the side than from the bottom). Although some adaptations of view-invariant approaches can account for this data, experiments such as those conducted by Tarr (1995) are much more definitive. Tarr clearly showed that the time required to recognize a novel view of some object is linearly dependent on its angular displacement from the closest previously studied view. In other words, even after extensive experience with an object, one can only recognize it from a new perspective by mentally transforming it to match a memorized view.

The nature of this transformation relies critically on the kinds of information available to it. In the first stage of my proposed model, image preprocessing extracts surface-based primitives in parallel from basic color, texture, orientation, and contour information available in earlier visual areas (similar to Wolfe & Horowitz’s 2004 treatment of guiding attributes). This proposal is compatible with behavioral evidence from IT-lesioned monkeys; though unable to recognize objects in most tasks, they were able to partially succeed in some object recognition tasks by differentiating between objects on the basis of contour (Farah, 2000). This evidence suggests that contour-based information is available prior to IT, and therefore that object-recognition processes rely on representations that are extracted after contour. This notion is also compatible with Kosslyn et al’s (1990) “preprocessing” stage in the ventral pathway.

In the second stage of this model, the incoming surface information is transformed to familiar or characteristic views. This transformation can take the form of mental rotation, view interpolation, or linear combinations of surfaces (Tarr & Bülthoff, 1998). This normalization step allows for incoming visual data to be matched against multiple views of an object, such that if one has enough visual experience with an object, it might be recognized from most novel angles with nearly equal ease. Object transformation is terminated when the surface-based information (in the form of graded activation) sufficiently matches a familiar view (in the form of connection weight patterns) driven by prior experience; in cases where the incoming information is already close to a familiar view, no such transformation is necessary because activations will match connection weights almost immediately [see footnote 1]. The match between activation and connection weights caused by a successful transformation amplifies the representation, thereby projecting it to the third stage of the model [see footnote 2]. The transformation process is compatible with the depth transformations in Tarr’s handedness experiments (1995), as well as the linear deformations involved in the Lades et al. (1993) “lattice” recognition system; in addition, both processes fit nicely into Kosslyn et al.’s “pattern activation” subsystem in the ventral pathway.

The third system of the model consists of two parallel and lateralized subsystems. In the left hemisphere, a feature-based recognition system receives input from specific surface-combinations and will match them against a database of stored parts. By breaking apart these surface combinations into likely parts (on the basis of conversality or other contour-based principles emerging from Hebbian learning) the left-hemispheric feature processor will activate patterns that represent object parts; these candidate features are then projected to associative memory, which stores a comprehensive list of parts and relationships between them.

While the left hemisphere is engaged in part-based decomposition, the right hemisphere performs a more holistic analysis of the image by identifying the n principal components which could be combined to reproduce the normalized view of the image. In the case of face recognition, this process would involve combining the stored eigenvectors of the entire “face-space” in an attempt to match the current face to a certain accuracy criterion. As described in Calder & Young (2005), principle component analysis (PCA) has proven useful in machine vision applications, and FFA may perform an analogous computation. This implementation of holistic processing is also compatible with evidence that processing in the FFA is particularly sensitive to inversion (Gauthier, Curran, Curby, & Collins, 2003) possibly because efficient PCA requires that all images be normalized to a canonical view. Once a certain accuracy criterion is met, the principal components for a given image are then projected to associative memory.

This distinction between left- and right-hemispheric processes could be seen as controversial, given that some researchers believe FFA just happens to be the area of maximal activity for faces even though all visual representations are distributed, while others insist that the FFA is inherently selective for faces. Still others argue that we are all “face-experts” and that FFA is actually selective for expertise (Palmeri & Gauthier, 2004). However, an emerging body of ERP evidence suggests that FFA activity is actually selective for expertise, whether those experts are identifying Greebles, cars, dogs, cats, or faces (Tarr & Cheng, 2002, although see Xu, Liu, & Kanwisher 2005 for a different view). Further, this division is compatible with neuropsychological data in which dissociations in brain-damaged patients support two, but not one or three, distinct subsystems for object recognition (Farah, 1992; see Humphreys & Riddoch [in press] for possible exceptions, though these might instead be explained as semantic impairments in associative memory). Finally, evidence that viewpoint-dependent priming effects result only when items are presented to the right hemisphere (Burgund & Marsolek 2000) is consistent with the interpretation of hemispheric specialization given here.

The fourth and final step in this model is long-term associative memory. This region consists of feature units; once combinations of those features have become sufficiently activated, an object has been identified. After receiving projections from both left- and right-hemispheric recognition subsystems, this stage finds a constellation of features that both match those features identified in the left-hemispheric process and yet share historical correlations with the eigenvalues activated through the right-hemispheric process. In cases where feature information is ambiguous, associative memory may use holistic information to more strongly amplify one or another interpretation via its bidirectional connectivity with both components of the third stage.

Feature-based information may be sufficient for basic-level categorization, but subordinate-level distinctions may require holistic information to play a larger part. For certain domains, one of the two lateralized processes may be more heavily weighted by associative memory; in this way, domains in which more subordinate level distinctions are required (e.g., face recognition or other areas of expertise) will rely more heavily on right-hemispheric information. Although other “divisions of labor” between visual subsystems might also be capable of explaining the human data, this two-part architecture is both parsimonious and powerful.

In conclusion, this four-part model integrates neural, behavioral, ERP, and neuropsychological data on object recognition. Multiple view-dependent representations of surface-combinations are matched with incoming surface data through a transformation process. The results of this transformation process allow two parallel downstream units to bidirectionally excite associative memory, resulting in object recognition through constraint satisfaction between distinct subsystems. The model also suggests how both the multiple stored views of each object and the relative weighting of right- and left-hemispheric processes in recognizing objects from that domain may be altered with experience.

[Footnote 1] If no matching view is found, unstable activations will presumably oscillate for longer before settling into a lower energy state; this process may itself serve to modify weights enough to actually store a new view of the object.

[Footnote 2] While it may seem that the process of matching surface representations to a stored view would mean that object recognition already be completed, it is not necessarily so. At this point, visual data is nothing more than surface-combinations; characteristic or familiar views are simply arrangements of surfaces; neither of these is sufficient for object recognition. Bidirectional constraint satisfaction with stage 3 processes may also guide object transformation.


Biederman, I. (1995). Visual object recognition. In SF and DN Osherson (Eds.).An Invitation to Cognitive Science, 2nd edition, Volume 2., Cognition. MIT Press. Chapter 4, pp. 121-165.

Biederman, I., & Gerhardstein, P. C. (1993). Recognizing depth-rotated objects: Evidence and conditions for 3D viewpoint invariance. Journal of Experimental Psychology: Human Perception and Performance, 19, 1162-1182.

Burgund, E. D., & Marsolek, C. J. (2000). Viewpoint-invariant and viewpoint-dependent object recognition in dissociable neural subsystems. Psychonomic Bulletin & Review, 7, 480-489.

Calder AJ, Young AW. Understanding the recognition of facial identity and facial expression. Nat Rev Neurosci. 2005 Aug;6(8):641-51

Farah, MJ (1992). Is an object an object an object? Current Directionsin Psychological Science, 1:164-169.

Farah, MJ (2000). The Cognitive Neuroscience of Vision. Oxford: Blackwell Publishers.

Gauthier, I., Curran, T., Curby, K. M., & Collins, D. (2003). Perceptual interference supports a non-modular account of face processing. Nat Neurosci, 6(4), 428-432

Humphreys GW, & Riddoch MJ. Features, Objects, Action: The cognitive neuropsychology of visual object processing, 1984-2004. Cognitive Neuropsychology, 2006, 23 (0), 1–28

Kosslyn SM, Flynn RA, Amsterdam JB, Wang G. Components of high-level vision: a cognitive neuroscience analysis and accounts of neurological syndromes. Cognition. 1990 Mar;34(3):203-77

Lades, M., Vorbruggen, J.C., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R.P., Konen, W., 1993. Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers 42, 300–311.

Palmeri TJ, Gauthier I.(2004) Visual object understanding. Nat Rev Neurosci. 2004 Apr;5(4):291-303

Tarr, M. J. (1995). Rotating objects to recognize them: A case study of the role of viewpoint dependency in the recognition of three-dimensional objects. Psychonomic Bulletin and Review, 2, 55-82.

Tarr MJ, Cheng YD. Learning to see faces and objects. Trends Cogn Sci. 2003 Jan;7(1):23-30

Tarr, M. J., & Bülthoff, H. H. (1998). Image-based object recognition in man, monkey, and machine. Cognition, Special Issue on Image-based Object Recognition (Tarr & Bülthoff, eds.), 67 (1/2), 1-20.

Wolfe, J.M., Horowitz, T.S. (2004). What attributes guide the deployment of visual attention and how do they do it? Nature Reviews Neuroscience, 5 1-7.

Xu, Y., Liu, J., & Kanwisher, N. (2005). The M170 is Selective for faces, not for Expertise. Neuropsychologia, 43, 588-597


Gamma Synchrony

Synchrony dynamics are still a hotly debated topic in neuroscience, and the role of synchrony in cortical processing is unclear. Much of this confusion stems from the fact that EEG activity can be interepreted as the result of phasic bursts of activity or of phase-changes resulting in synchrony. Adding fuel to the fire is some new evidence from yesterday's issue of Nature, showing that increased gamma-band (40-70 Hz) synchrony in the V4 layer of visual cortex, as measured by intracranial electrodes, is predictive of certain behavioral reaction times.

Two monkeys were trained to detect changes in visual stimuli and ignore distractors while the researchers recorded from mulitple electrode sites in V4. Those trials which resulted in the fastest reaction time to detect a change in a target item could be predicted on the basis of increased gamma-band power and spike-field coherence. As in other studies showing the effects of synchrony on vision and attention, the earliest changes could be observed before stimulus onset, such that gamma-band differences occured in faster RT trials as much as 350ms prior to the change.

These effects were detected only in those neurons whose receptive fields overlap with the target, suggesting that increased gamma-band coherence is not purely a result of globally increased arousal or alertness. In fact, there was a reversal of the trend such that gamma coherence in other receptive fields anti-correlated with change detection RT. Previous work on synchrony dynamics in the attentional blink paradigm has likewise shown that increased visual cortex synchrony appears to result in shorter attentional blink times as well. How synchrony in visual cortex is successfully engaged by "attentional processes" remains to be proven.

Related Posts:
Anticipation and Synchronization
Synchrony vs Polychrony


The Attentional Doughnut

Many think of visual attention as something like a spotlight moving across the visual scene: whatever it illuminates enters conscious awareness, and everything else is confined to preattentive limbo. This seemingly reasonable metaphor takes for granted that the areas to which we attend are roughly circular, contiguous, and, well, spotlight-light like, just like the receptive fields of neurons in visual cortex. Hence the intense surprise at the recent discovery that attention's "spotlight" shape may actually deform into a "donut" shape under the right conditions.

Muller and Hubner presented a steady stream of small flickering uppercase letters embedded in the center of a stream of large, uppercase letters, and measured an ERP component known as the steady-state visual evoked potential (SSVEP) which is known to be sensitive to changes in flicker frequency. Subjects were first told to monitor the large letters, and after a given amount of time, to monitor the smaller letters in the center of the larger ones. SSVEP amplitude is known to be larger for attended items than for unattended ones, and the waveform itself is nearly sinusoidal in response to different stimulus flicker rates.

If attention is shaped like a spotlight, viewers will actually be unwillingly attending to the smaller letters when they're supposed to be attending to the large letters; therefore, one would not expect SSVEP magnitude to change significantly when subjects attend to the small letters. However, the researchers found SSVEP amplitude increasing by almost 100% when subjects changed the location of their attention, suggesting that they were actually able to ignore the distractor letters in the middle, while attending to regions surrounding it!

Several control conditions ensure that this effect is not due simply to different responses in flicker rate (flicker rate was counterbalanced for large and small items), less attention being paid to the large items (subjects were asked to detect the target letter H in the to-be-attended stream), crosstalk between flicker frequencies (a complex demodulation process was used on the EEG waveforms, along with a low-pass filter at 2 Hz), attentional selection by spatial frequency (a global spatial frequency filter would not have differentially selected each stream, since their spatial frequencies are not mutually exclusive to one another), or gradient allocation of attentional resources within a beamlike area (there was no measurable target P300 for the ignored small letter stream). The researchers also claim that an object-based system of attention cannot account for their results because of the origin of the SSVEP signal (early visual areas) which would not be predicted if subjects were selecting on the basis of object identity, and that this pattern of results is consistent with several other imaging studies.

Other researchers, however, have criticized some of the methodology in this study. For example, Catena, Castillo, Fuentes & Milliken point out that they did not control whether subjects focused on the display or at a point before the display in space, which could have important consequences for how the image was displayed on the retina. Replications of this study seem to suggest that people are indeed intentionally blurring the image by fixating at a different place in depth, which means that the simplest explanation for the data is that attention is not shaped like a doughnut, but rather that subjects are using flexible strategies which make it appear so. Still, other studies have reported similar findings in cases where intentionally blurring one's vision would not seem to help.

What do these and similar findings mean for models of attention? The results would seem to suggest that attention does not necessarily occupy contiguous regions; but there is no definitive answer to whether the early visual effects (such as SSVEP) are actually caused by annular or ringlike attention, or are merely the result of activation by top-down object identity/feature processing.


Simulating Emotion

I've written previously about the role of emotions as an organizing principle for guiding robotic behavior. Dr. Cynthia Breazeal and colleagues at MIT have taken a slightly different approach, in that they used emotion as a way to interface robots with the human world. One perhaps unexpected benefit of this approach is that these expressed emotions allow the robot to elicit desired human behaviors.

Their "expressive anthropomorphic" robot is named Kismet (pictured above) and has been designed to show a continuum of emotional responses to specific stimuli. For example, when it encounters an object that could be seen as threatening, it may first respond with interest, but will soon turn away from the aversive object, as demonstrated by this video. In situations such as social dialogue, Kismet can respond quite fluidly and naturally to proto-linguistic cues.

The implicit rationale of much of this research is Hull's drive reduction theory, which states that imbalances in homeostasis cause states of psychological arousal. These states of arousal (emotions) then compel us to restore our homeostatic balance, thereby reducing our drives. As Dr. Braetheal put it, "Kismet actually evaluates all the incoming stimuli with respect to: Is this beneficial to me or not? Is it going to satiate a drive that I need to satiate or not?" The answers to these questions evoke responses in different emotional systems.

So how close might Kismet's emotions be to "the real thing" that humans experience every day? From the videos linked above: "The emotions that have been programmed into the robot have been modeled - computationally modeled - after what we know about human emotions. So I would say it's more of a simplification and a subset of what our full-fledged human emotions really are ... By following ideas and theories from psychology, from developmental psychology, and from evolution, from all of these studies of natural systems, and putting these theories into the robot, has the advantage of making the robot's behavior familiar because it's essentially lifelike. It parallels that of natural systems."

Biologically-inspired robotic systems often display eerie, atavistic reenactments of the behavior of those natural systems on which they were modeled. Likewise, it is hard to mistake Kismet's neonatal qualities - not just the size of its eyes, but in its animate qualities as well. These similarities are not merely superficial: in important ways, Kismet is like a child. Many of the traits which Kismet displays are based on, or analogous to, those seen in human children.

Roughly speaking, Kismet is based on a six part architecture: feature extraction, attention, perception, motivation, behavior and motor systems. In feature extraction, Kismet specifically tracks eyes and distinct variations in vocal affect, just as human infants do. The attentional stage tags regions of particular interest, such as those that are changing rapidly or those objects that are brightly colored relative to their background. Both attentionally-tagged items and features are then combined in the perceptual system, which binds this information as a percept that may or may not activate a releaser, each of which has a relationship to particular emotional responses. These interact with motivation and behavior subsystems, which establish homeostasis among a number of different drives as well as the behaviors which can be used to regulate homeostasis. Finally, a motor system carries out the behaviors with the help of motor skills, facial animation, expressive vocalization, or oculo-motor movements. Much more detailed information on these various subsystems is available here.

Emotions are not typically considered mechanistic or functional - we tend to think of them as cognitive "byproducts," an evolutionary inheritance from ancestors capable only of feeling, and not of thinking. On the contrary, emotions may actually be an integral part of cognition. Below is a list of Kismet's emotions, and the functions each is thought to subserve; does this list accord with your subjective experience of these emotions?

  • Anger: to mobilize and sustain activity at high levels; low levels of anger (frustration) are useful when progress toward a goal is slow
  • Disgust: to create distance between one and an aversive stimulus
  • Fear: to motivate avoidance or escape responses
  • Joy: to broaden experience by encouraging social behavior and reward completion of a goal
  • Sorrow: to slow responses in cases of poor performance, so as to encourage reflection and behavior change
  • Surprise: to signal the occurence of an unpredicted event, so as to improve future attempts at event prediction
  • Interest: to motivate exploration and learning, and reinforce selective attention
  • Boredom: similar to interest, except its purpose is to force an encounter with a new stimulus, which might then elicit interest

Certainly one can't ascribe intrinsic "functions" to emotions, but it is clear that emotional deficits can cause changes in behavior - for this reason they must have some behavioral consequence, which we may assume is evolutionarily advantageous. While it may not be possible to describe exactly what these behavioral consequences are, it may actually be possible to test hypotheses about possible "functions" of emotions in the creation of autonomous robots, in order answer precisely these questions that are either impossible or unethical to test in humans.


Blogroll updates....

I finally got around to updating my blog roll; I point this out because I've been discovering a lot of really nice blogs lately.

Some of the newest additions are Zero Brane, Multipolarity Memes, Al Fin. Be sure to check all of them out - great stuff!

The Attentional Spotlight

Visual attention is often thought to move across the visual scene much like a flashlight at night; we may become aware of the objects in only the part of the scene that the "attentional spotlight" illuminates, while other parts of the scene remain below the threshold of awareness. Like most metaphors, however, this one highlights some features of attention at the expense of others; as it turns out, it's the qualities of attention that this metaphor obscures which may be most interesting.

The spotlight metaphor of attention accords with our subjective experience: as we move throughout our environment, we can feel our attention "focus" on particular objects, much like an adjustable flashlight might. We know that attention has limited bandwidth, such that you need to focus on relatively small parts of large objects/concepts in order to fully comprehend them. And attention would even seem to move linearly from one place to another - as we look around a room, our eyes generally do not dart from far left to far right but follow a more meandering path. Just as one does not turn off a flashlight and then turn it on again once the direction of its beam has changed, attention feels like a continuous phenomenon in time and space.

As is often the case with introspection, however, these intuitions are not entirely accurate. While it does seem to take longer to shift attention between two distant locations than between two proximate locations, this difference appears to be related to the visual system and not attention per se; in fact, attention moves 10 degrees across the fovea as quickly as it can move 2 degrees. So attention does appear capable of "darting around," from one object to another, without linearly traversing the space between locations.

Also unlike a flashlight, attention can actually be split; by "covert orienting," we can attend to locations other than those we are actually looking. Attention is not bound solely by visual features either - when attending a location with two superimposed images (for example, reflections in glass by night), subjects appear to be capable of attending to only one of these objects at a time, and are almost completely ignorant of what happens to the unattended image.

Finally, the flashlight metaphor obscures one of the most dynamic and "intelligent" functions of attention. Dozens of experiments demonstrate that viewers unwilling process distractor stimuli; even when told to ignore certain images or words, imaging experiments show that we are unable to ignore them. However, there are conditions under which distractors are not processed, and it is these situations which highlight the kinds of mechanistisms that must be driving the elusive concept we call attention.

As reviewed in Lavie's 2005 Trends in Cognitive Sciences paper, distractors are not processed under conditions of high perceptual load. Perceptual load can be varied by increasing the number of distracting stimuli in a visual scene, or by increasing the distractor similarity to a given target; with high perceptual load, it is as if attentional bandwidth is so limited that it must focus tightly on only the task-relevant stimuli. In contrast, low perceptual load means that there is excess bandwidth; it's not hard to imagine an evolutionary selectivity for those humans that are paying attention to more objects that may be, but are not necessarily "distractors," and so the processing of task-irrelevant stimuli under these conditions seems mandatory.

In contrast, you see the opposite effect under high load on working memory or "cognitive control." When working memory is highly loaded with task-irrelevant information, distractors are processed more than under conditions of low load. This seems to suggest that some selection mechanism is dependent on cognitive control functions that are either the same as, or also shared with working memory functions. In contrast, highly loading working memory has minimal effect on visual search efficiency, and appears to specifically impair the ability to selectively process a visual scene by ignoring distractors.

Given the differential effects of memory load on distractor processing, it appears that attention is not a unitary phenomenon but at least a two-stage process. Low-level bandwidth-limited mecahanisms govern the basic visual processing of incoming information (probably on the basis of top-down guidance) and provide a kind of "buffer" of information which is available for subsequent processing; higher-level systems (such as those engaged by working memory tasks) then selectively govern what information is relayed to more frontal areas for further processing.

Related Posts:
Perceptual Sampling: The Wagon Wheel Illusion
Mind's Eye: Modeling the Attentional Blink
Selection Efficiency and Inhibition


Models of Dopamine in Prefrontal Cortex

George Chadderdon and Olaf Sporns have recently published a large scale neurocomputational model of task-oriented behavior selection, including such disparate brain regions as early visual areas, inferotemporal cortex, prefrontal cortex, basal ganglia, and anterior cingulate cortex. At the heart of this new model is a mechanism that simulates exogenously induced changes in prefrontal dopamine release, which is thought to underlie the updating and maintenance functions of working memory.

The model is meant to simulate the selection of behaviors in the delayed match/nonmatch-to-sample task. In this task, a sample stimulus is displayed, after which the stimulus will usually disappear (although there are versions in which it remains visible). After a delay, from 1 to 3 novel stimuli will appear and the subject must identify either the matching (in the match-to-sample) or the non-matching (in the nonmatch-to-sample) task.

During a trial, a working memory "task" layer fires to indicate whether the current trial is a match-to-sample (DMS), nonmatch-to-sample (DNMS), or idle task. Given the scale of this model, each unit has been designed to simulate the action of a cortical column, in which there are both excitatory and inhibitory neurons that implement feedforward, feedback, and lateral inhibition. Three layers of prefrontal cortex represent different aspects of the task: PFC(s) represents the current stimulus, PFC(d) represents the remembered stimulus, and representations in PFC(m) show the degree of match between PFC(d) and PFC(s). PFC(s) activation is projected to PFC(d) when the task units are firing for a non-idle task. Both DMS and DNMS cause dopamine release within PFC(d) (modeled as a proportional increase in the gain of the excitatory sigmoid activation functgion), which solidifies a persistent representation of the current stimulus until a response is required; units representing the anterior cingulate cortex are used to suppress responses during the cue period.

The model captures many of the qualitative aspects of dopamine/PFC interaction. First, PFC(d) units become dysfunctional under conditions of too low, or too-high extracellular dopamine, which is consistent with empirical evidence. Second, this model implements one way that PFC may intrinsically regulate dopamine levels for optimal function, and predicts task-relevant fluctuations in tonic DA release. However, several things remain unaddressed, including the precise role of phasic DA release by VTA in which working memory representations are updated or maintained, and its interactions with intrinsic PFC dopamine regulation.

The authors conclude that future work "will involve the implementation of the present model as part of the control architecture of an autonomous robot" which could provide a unique opportunity to assess the role of dopamine concentration on behavior given the difficulty of monitoring realtime dopamine fluctuations in behaving animals.

Related Posts:
Emotional Robotics (and dopamine fluctuations)
Selection Efficiency and Inhibition


Perceptual Sampling: The Wagon Wheel Illusion

Ever noticed how wheels or hubcaps can appear to rotate backwards, either in movies, on TV, or even on the highway? Illusory motion reversal is one of the least understood visual illusions, but an ongoing scientific debate asks whether it may actually reflect the temporal resolution of consciousness. If you haven't experienced this illusion - often known as the "wagon wheel illusion" - try it here before reading further.

Up until recently this illusion was thought to be purely a result of subsampling through stroboscopic illumination. And yet, many report experiencing this effect in broad daylight - under continuous illumination. This led Van Rullen, Reddy & Koch to wonder whether the illusion could actually reflect some kind of "discrete perceptual sampling": if the rate of apparent backwards motion in a movie is related to the movie's frame rate, perhaps the rate of apparent backwards motion under continuous light would reflect the temporal resolution of some aspect of cognition.

Interestingly, the authors found that the rate of perceived motion of alternating sunburst patterns was highest at alternation rates of between 10 and 20 Hz. This suggests that some aspect of the visual system is sampling the perceptual stream around 10-20 times per second. But the story doesn't end there: the frequency and duration of motion reversal was dependent on focused attention, such that the average perceived direction of motion at 10Hz was actually worse under directed attention. This is apparently one of the very few known cases where attention actually decreases accuracy in visual performance.

It's not clear exactly what neural mechanism would cause this sampling frequency, because the conventional frequency bands of cortical oscillation do not lie within this range. Some propose that a specific type of motion detector could create this effect; others counter that such "Reichardt" detectors wouldn't be likely show the same temporal sensitivity, and even if they could, there's no way to confirm that they do, since no Reichardt detectors have been discovered in the mammalian visual system. Selected directed attention has effects on both alpha, beta, and gamma band frequencies, but mechanisms by which these could interact with motion perception are unknown.

I've written previously about cortical oscillations in the context of visual attention, but there's still a few more relevant articles to be summarized in the next few days before it's time for (public) hypotheses about the possible relationships between measures of perceptual sampling, measures of processing speed, and visual working memory span. So watch this space ...

Related Posts:
Active Maintenance and the Visual Refresh Rate
Neural Oscillations and the Mozart Effect
Anticipation and Synchronization