Disentangling Two Debates: Domain-Specificity vs Nativism

Two fundamental debates dominate the field of language and psycholinguistics: the first concerns whether language processing arises from innate mechanisms or learned processes, and the second concerns whether language relies on domain-general or domain-specific systems. These questions are not completely independent of one another, neither in the way they have been approached in the literature nor in terms of the evidence that bears on them.

However, these questions can be viewed as distinct in at least three ways. First, domain-general processes can also be innately guided, although this argument is seldom represented in the literature. Second, language-specific mechanisms need not be innately specified, although that view is also seldom represented in the literature. These two points demonstrate that the debates are in fact theoretically orthogonal, regardless of whether they are commonly presented that way. Finally, evidence interpreted in the context of one debate does not necessarily apply to the other debate, although in several cases this is either wrongly claimed or strongly implied.

Both nativists and empiricists confound the question of innateness with that of domain-specificity. For example, some adamant empiricists claim to oppose “a broader view that human cognitive mechanisms are symbolic, modular, innate and domain-specific” (McClelland & Patterson, 2002), while those who emphasize the “innate aspect of grammatical knowledge” (Lidz & Gleitman, 2004) contrast their view with those that espouse “domain-general pragmatic constraints.” In both cases, the debate is multifaceted – it combines the issues of domain-generality and those of nativism.

As one might expect based on these excerpts, it is frequently assumed that language-specific mechanisms are what nativists believe is innate, and conversely that domain-general mechanisms are what empiricists believe is developed with experience. But to demonstrate that issues related to nativism and those related to domain-specificity are in fact distinct, it is necessary to substantiate the plausibility of two claims, covered separately in this week's upcoming posts.

Wednesday Nov 1st: Some domain-general mechanisms need not be learned
Thursday Nov 2nd: Some domain-specific mechanisms need not be innate
Friday Nov 3rd: Dissociations from data and conclusions


Lidz J, & Gleitman LR. 2004. Argument structure and the child's contribution to language learning. Trends Cogn Sci. 2004 Apr;8(4):157-61.

McClelland, J. L. & Patterson, K. (2002). Rules or Connections in Past-Tense inflections: What does the evidence rule out? Trends in Cognitive Sciences. 6:11 (2002), pp 465-472

Submit for the Synapse, Issue 11!

The next edition of The Synapse will be hosted here on November 12th. Submit early (the.synapse.carnival {AT} gmail)! I'll accept submissions up to 9pm the night before.

Check out the latest issue at the Neurocritic, while you're at it.


Shared Mechanisms for Long and Short-term Memory: An Episodic Buffer?

In the new issue of Trends in Cognitive Sciences, Hasselmo & Stern explore the possible mechanisms involved in the short-term maintenance of novel information.

Many such "short-term maintenance" tasks involve familiar stimuli, such as digits in the case of digit span. This has led some to propose that most measurements of working memory are artificially inflated, since these tasks rely on stimuli that may also be represented in long-term memory. The implication is that novel stimuli may eliminate the role of long-term memory, and thus result in a more pure measurement of maintenance capacity.

The nearly canonical perspective is that this short term maintenance is enabled by sustained firing in the prefrontal cortex - regardless of whether that sustained activity actually represents the maintained information, or merely acts as a kind of "pointer" to where that information is actually represented in cortex. In both cases, prefrontal activity is assumed to convey an excitatory bias on more posterior regions.

In their article, Hasselmo & Stern take a radically different view. They suggest that while prefrontal cortex may act as a "pointer" for familiar stimuli (which may actually reside in distrbuted representations across sensory cortex), the prefrontal cortex is only partly responsible for maintenance of novel information. In this case, they suggest that medial temporal lobe structures (MTL; in particular the entorhinal and perirhinal cortices) may play a large role in working memory for novel information, because of that structure's ability to rapidly encode and almost arbitrarily bind various sensory representations.

According to this perspective, damage to the MTL should cause a larger performance decrement for tasks with novel stimuli than with familiar stimuli - and this is exactly what is found in lesion studies. PFC is sufficient for normal performance on digit span (even in the absence of the MTL) while it is not sufficient for normal performance on delay-match-to-sample and nonmatch-to-sample tasks with novel stimuli (these are tasks in which one views a stimulus and after a delay must either pick another identical or non-identical stimulus). Likewise, fMRI studies show differential activity in the MTL for WM tasks involving novel and familiar stimuli.

The authors point to fascinating aspects of cellular neurophysiology in the MTL to support this hypothesis. For example, cells in the parahippocampal cortices are able to maintain sustained firing for periods up to 15 minutes in complete isolation - in other words, this activity is not a result of excitation from other neurons or even self-exictatory recurrent connections. This bizarre phenomenon appears to be enabled by something they call the "Alonso current" - also known as afterdepolarization or plateau potential - thought to result from a calcium-sensitive cation current that relies on acetylcholine. Experiments with scopolamine (an acetylcholine antagonist) suggest that acetylcholine activity is necessary for this self-sustained activity.

Interestingly, delay activity in the entorhinal but not perirhinal cortex is resistant to distractors. Some entorhinal neurons also demonstrate "match enhancement" in which the subsequent presentation of a maintained item increases the rate of firing, while others show "repitiation suppression" where the opposite trend occurs.

The authors conclude that MTL delay-period activity could underlie Baddeley's recent proposal for a third component of working memory: the episodic buffer. The episodic buffer is presumed to tie together arbitrary events, such as might be involved in binding tasks. However, this proposal has been met with skepticism, largely because this seemed very unlike the other components of Baddeley's original working memory model (which included the phonological loop and a visuo-spatial store). However, if Hasselmo & Stern are right, the episodic buffer may soon become a more widely-accepted aspect of working memory.

Related Posts:
Working Memory Capacity: 7 +/- 2, around 4, or ... only 1?
EEG Signatures of Successful Memory Encoding
Working Memory and Convulsions
Valid Dimensions of Memory: Strength, Endurance and Capacity?
Multiple Capacity Limitations for Visual Working Memory


Neural Development of Rule Use and Flexibility

What brain regions support the development of rule use in childhood? Crone et al. attempt to answer this question in this new article from the Journal of Neuroscience.

The article begins with a quick review of the literature suggesting dissociable developmental trajectories for simple rule use, and the ability to switch between those rules. For example, previous behavioral research by the lead author showed that children's active maintenance of rules does not show adult activation patterns until adolescence, whereas neural activity related to rule switching showed adult-like activation at 12 years of age.

The authors predicted that in a task involving two levels of rule difficulty, ventrolateral prefrontal cortex (VLPFC) would show increased activity for bivalent relative to univalent rules. In contrast, the pre-supplementary motor area (pSMA) was predicted to show activity during rule switches. These hypothesis were tested on 62 subjects from three age groups - 8-12 year-olds, 13-17 year-olds, and 18-25 year-olds - by administering the following task. Subjects had to respond with a left-arrow key press to a picture of a tree, but a right-arrow key press to a picture of a house, if they had just previously been cued with a picture of a triangle. If they had been cued with a picture of a circle, these mappings were reversed. Finally, if cued with a bidirectional arrow, the following pictures would be either a flower or a car, requiring left and right key presses respectively. IQ scores were also measured for each participant with Raven's matrices. An ROI analysis including parietal cortex, pSMA, and vlPFC was conducted on the fMRI data.

Despite the varying ages of the groups, all participants were given the same amount of training on the task prior to scanning. In retrospect, it would have been better to train participants to a certain performance criterion, since you cannot be sure that age differences do not merely represent differences in "practice efficacy" unless you equate participants for performance before the task begins.

The results from a blocked version of the task showed that subjects were less accurate for the bivalent rules than for univalent rules, and that the two youngest groups did not differ from each other in terms of their relative accuracy on bivalent and univalent rules (not surprisingly, the difference in accuracy between bivalent and univalent rules was lower for the oldest age group). Similarly, reaction times were much slower on bivalent trial types in general, and the two lowest age-groups did not significantly differ in their RTs on such trials.

In mixed blocks, results were somewhat different. Although the 13-17 year-olds did not significantly differ in their accuracy switch costs for bivalent trials from either the younger or the older age-group, the authors interpret this result to show that "switching ability" reaches adult levels of performance at the beginning of adolescence. Bizarrely, age differences in switching were only reflected in terms of accuracy - RTs did nto significantly differ between the groups. Equally bizarrely, there were no differences between rule-repeat and rule-switch trials.

The fMRI data showed increased bilateral pSMA activity for bivalent relative to univalent rules, but there were no age differences in this reulst; also, this comparison did not turn up activation differences in parietal or vlPFC ROIs.

The authors interpret these results to suggest that children experience more difficulty on bivalent rules relative to univalent rules than do older age groups, as reflected by widely-spread increases in neural activity. Such increases were prominent among the older age groups only in mixed blocks. Unfortunately, it is difficult to take more away from this study than "people of different ages use their brains differently," which is not surprising, given that brains differ between people of different ages!


Inhibition From Excitation: Reconciling Directed Inhibition with Cortical Structure

A frequent interpretation of Stroop tasks is that successful performance (i.e., ink color naming) requires inhibition of the prepotent response (i.e., word reading). However, this interpretation is frequently criticized given the relative lack of long-range inhibitory projections in the neocortex. Instead, inhibition seems to be mostly local, occuring within cortical microcircuits. How then can we reconcile the intuitively appealing idea that Stroop requires inhibition with known features of cortical structure? This is the question answered in a recent JoCN paper by Herd, Banich, and O'Reilly.

One theoretical perspective of cognitive control (such as involved in Stroop) suggests that top-down exictatory signals can suffice for the override of prepotent responses - thus the bias to read the word in a Stroop task is thus overcome by a strong signal which favors "color perception" and "color naming" processes. These then inhibit competing representations through local lateral inhibition.

However, this proposal requires that "color processes" directly compete with word processing - although these are thought to have distinct neural substrates. fMRI data has shown that incongruent Stroop trials (e.g., red) show more activity in regions involving the to-be-ignored stimulus feature than in neutral trials, which might also be interpreted as supporting "directed inhibition." This is a straightforward but problematic interpretation, given that relatively few long-range inhibitory connections exist in cortex.

The computational model presented by Herd et al. can explain these results without invoking the concept of directed inhibition. In short, it is able to accomplish this because a "color" task set is constantly active during both color naming and word reading trial types. This causes activity throughout the network to be increased in incongruent trials relative to neutral trials as a result of increased competition. In this case, increased activity reflects increased competition - not "directed inhibition" as might be presumed. This is reflected both in the network's slower cycle settling times in incongruent relative to neutral trials (which is analogous to slower reaction time in humans).

The model architecture consists of 2 input layers (colors and words respectively, with 3 units each), which send activation to color and word hidden layers, respectively (each with 3 units). These are bidirectionally connected with each other (excitatory only), as well as to the output layer (consisting of 3 units). Finally, a PFC layer (3 units, representing the "ink color naming" "word reading" and "color" task sets) sends top-down exictatory biasing signals to both hidden layers. Each hidden layer, as well as the output, implements local inhibition to create competition for representation. The network was randomly initialized and then trained with Hebbian learning on word-reading and color-naming tasks, with 1.67 times more word-reading than color-naming trials (to roughly mirror human experience with these tasks and instill a prepotent bias for word-reading). The model was then run on the Stroop task, with both neutral and incongruent trials presented as input, and activity in the prefrontal layer clamped to represent the current task (1.0 if color-naming on incongruent trials, .85 if color-naming on congruent trials or if word-reading, and the general "color" task set unit was set to .5 throughout).

The network demonstrated a nice fit to human behavioral data, although very few parameters were modified from their standard values. Several further explorations showed that removing the bidirectional exictatory projections between the color and word processing hidden layers resulted in 0% performance on incongruent trials - showing that, in effect, these excitatory connections were paradoxically helping the network to inhibit the prepotent response.

This result has profound implications for the way cognitive control is discussed in the broader sense. As the authors note, it is potentially confusing to refer to "inhibition" as a construct if it is actually accomplished through exictation that is itself supported by task-relevant representations in PFC. This collapses the distinction between supposed "inhibitory" processes and those of active maintenance, and brings "directed inhibition" accounts into harmony with known characteristics of inhibitory interneurons in cortex.


Generalization and Symbolic Processing in Neural Networks

Cognitive modeling with neural networks is sometimes criticized for failing to show generalization. That is, neural networks are thought to be extremely dependent on their training (which is particularly true if they are "overtrained" on the input training set). Furthermore, they do not explicitly perform any "symbolic" processing, which some believe to be very important for abstract thinking involved in reasoning, mathematics, and even language.

However, recent advances in neural network modeling have rendered this criticism largely obsolete. In this article from the Proceedings of the National Academy, Rougier et al. demonstrate how a specific network architecture - modeled loosely on what is known about dopaminergic projections from the ventral tegmental area and the basal ganglia to prefrontal cortex - can capture both generalization and symbol-like processing, simply by incorporating biologically-plausible simulations of neural computation. These include the use of distributed representations, self-organizing and error-driven learning (equivalent to contrastive hebbian learning), reinforcement learning (via the temporal differences algorithm), lateral inhibition (via k-winners-take-all), and a biologically-plausible activation function based on known properties of ionic diffusion across the neural cell membranes (via Leabra's point-neuron activation function).

The Rougier et al. network consists of 10 layers - 4 task input layers, all of which project to either a standard hidden layer (meant to represent sensory cortex; 83 units), a task-hidden layer or a cue-hidden layer (again, meant to represent posterior cortical areas; 16 units each). These in turn are bidirectionally connected to PFC layer (30 units) which includes recurrent self-excitatory projections. Both this layer and the sensory cortex hidden layer are bidirectionally connected with the output "response" layer. Finally, an "adaptive gating" unit projects to the PFC, which learns about reward contingencies across time (which partially addresses a lurking criticism that many neural network modes are "temporally static" in comparison to approaches like liquid state machines.)

Simulations within this framework began by training the network on 4 simple tasks. One task required the network to produce output corresponding to only one of the dimensions present in the input (which varied in terms of shape, color, size, location, and texture). Another task required the network to match two stimuli along a single dimension, or comparing their relative values along a certain dimension. Regier et al. note that one critical feature is shared by all these tasks: they all require that the network attend to only a single feature dimension at a time.

After this training, the prefrontal layer had developed peculiar sensitivities to the output. In particular, it had developed abstract representations of feature dimensions, such that each unit in the PFC seemed to code for an entire set of stimulus dimensions, such as "shape," or "color." This is the first time (to my knowledge) that such abstract, symbol-like representations have been observed to self-organize within a neural network.

Furthermore, this network also showed powerful generalization ability. If the network was provided with novel stimuli after training - i.e., stimuli that had particular conjunctions of features that had not been part of the training set - it could nonetheless deal with them correctly. This demonstrates clearly that the network had learned sufficiently abstract rule-like things about the tasks to behave appropriately in a novel situation. Further explorations involving parts of the total network confirmed that the "whole enchilada" was necessary for this performance; without an adaptive gating unit, or connecting an additional context layer to the PFC layer (making it equivalent to a simple recurrent network, or SRN) did not demonstrate equivalent generalization.

Regier also explored the effects of damaging the prefrontal layer, and administering tests like Stroop and Wisconsin Card Sort to the network. In both cases, the network exhibited disproportionate increases in perseverative errors, just as seen in patients with prefrontal damage.

Thus, this computational model demonstrates that neural networks need not be temporally static, nor deficient in generalization, nor stimulus-specific. Instead, they can demonstrate sensitivity to reward contingencies over time, can develop stimulus-general rule-like representations, and produce output based on those rules in response to novel stimuli. Interestingly, the development of these more abstract representations is not dependent on increasing the recurrent connectivity within PFC, as it is in other simulations of the development of such abstract representations. Instead, these representations develop as a consequence of learning and stability within sensory cortex, providing one possible reason that prefrontal development in humans is so prolonged.

Related Posts:
Reexamining Hebbian Learning
Towards A Mechanistic Account of Critical Periods
Neural Network Models of the Hippocampus
Inhibitory Oscillations and Retrieval Induced Forgetting
Binding through Synchrony: Proof from Developmental Robotics
Task Switching in Prefrontal Cortex
Modeling Neurogenesis
The Mind's Eye: Models of the Attentional Blink


Review: I of the Vortex

What is the "self" in neural terms? Few would be bold enough to claim an answer to that question. Yet in "I of the Vortex: From Neurons to Self," Rodolfo Llinas sketches a very compelling picture of how the self, consciousness, and intelligence may arise in the brain.

Essentially, Llinas's argument goes as follows. First, brains are really only found in animals that move (so, obviously, plants do not have brains). In fact, at least one animal - the sea squirt - actually devours its own brain once it no longer needs to move. Although simple movements might be caused by oscillatory pattern generators in the spinal cord, the brain is necessary for more complex, sensory-guided movement. Why should this be so?

The answer Llinas provides is prediction, or in other words, a sensorimotor internal model of the world based on "dt lookahead" functions, interfacing the motor and sensory systems. Synchronized oscillations from the cerebellum (Llinas's area of expertise) carry out the motor-side of this computation, giving rise to the characteristic 8-12 Hz periodicity of the neural signals that command voluntary movements. At a higher frequency (40 Hz), other neuronal oscillations throughout the thalamocortical system serve to bind sensory representations together. And the subjective, cognitive correlate of the intersection of these oscillations is no less than the self: "this temporally coherent event that binds, in the time domain, the fractured components of external and internal reality into a single construct is what we call the 'self.'"

But wait, doesn't that mean that all animals have a sense of "self"- even the lowly sea squirt (at least before it eats its brain)? It would seem so. But that's not the end of Llinas's more controversial claims. Llinas also suggests that neural networks explain "very little concerning the actual functioning of the nervous system itself," advocating instead the idea that most of our cognitive abilities are genetically prewired at birth. Along these lines, Llinas endorses Chomsky's idea that genes may to a large extent determine language, and furthermore that language exists in many species besides homo sapiens.

It is here that "I of the Vortex" starts to seem more like a manifesto than a careful scientific analysis. For example, after introducing the basics of neurophysiology and comparative neurology in the first half of the book, Llinas skips the cognitive level of analysis almost altogether and starts extrapolating directly to issues of consciousness, awareness, and selfhood. This bias against direct investigations of cognition (something arguably very important for understanding consciousness) is nowhere more apparent than when he refers to cognitive neuroscience as "neophrenology." But without this important middle-level of analysis, Llinas is mostly shooting from the hip in the second half of the book - and aiming for concepts that are simply too far removed from Llinas's expertise in cellular neurophysiology.

On the whole, Llinas has done an admirable job of outlining one particular view of how neuronal dynamics may give rise to consciousness in an embodied cognition framework. In this sense, "I of the Vortex" makes an excellent companion to books such as Jeff Hawkins' "On Intelligence," and to Steven Rose's "The Future of the Brain," both of which come to similar conclusions but based on very different assumptions, biases, and areas of expertise.

Other Book Review Posts at Developing Intelligence:
Review: Darwin Among the Machines
Review: The Three Pound Enigma
Review: Everything Bad Is Good For You
Review: The Future of the Brain


Blogging on the Brain: 10/15 - 10/20

Recent highlights from the cog/neuro blogosphere:

Distinct roles for ACC and PFC in goal and task-relevant retrieval, respectively?

A fantastic blog that's new to me

Dual-route models of recollection & familiarity are vindicated by new fMRI evidence

Videos of lectures on everything from "Viruses as Nanomachines" to "Chaos and Fractals: Predicting the Unpredictable"

More on the astrocyte hypothesis, from November's issue of PLOS Biology

Video Game Violence: Can they act as an outlet for violent people?

The Washington Post has an article on mental gymnasiums - with quotes from established academics on the science behind the brain fitness movement. (Also covered here)

Sharp Brains covers the fact and the fiction of brain-based education

Arthur Jensen has a new book: Clocking the Mind

Have a nice weekend!


Interactions of Culture and Linguistic Relativism

The language of the Piraha, an Amazonian tribe of around 200 hunter-gatherers, does not include any words for numbers (in fact, they lack words for many things), and despite intense instruction, no one has been able to teach the adult Piraha to count. This has been interpreted by some as evidence that language profoundly constrains thought - a strong version of the Whorfian hypothesis.

However, there is a theoretical divide in the study of the Piraha. Peter Gordon, author of a great article about the Piraha in Science, believes that the Piraha have numbers for "one" and "two," but not for any numbers greater than that. However, Daniel Everett - who spent seven years with the Piraha - suggests these words actually refer to "very small" and "small," or other very relative indications of size.

The rift goes deeper than that, however. According to a recent interview in Scientific American Mind, Gordon and Everett disagree on the fundamental question of why the Piraha have no (or very few) number words. Whereas Gordon suggests the Piraha may be "cognitively incapable" of counting, Everett believes that the Piraha have a kind of moral objection to the idea of counting.

In the interview, Everett musters several pieces of evidence to support this view. First, Piraha children can learn to count culturally-relevant items - beads - which shows that the Piraha as a whole are not cognitively incapable of counting (although it's also possible that the cognitive capacity for counting is transient unless reinforced by language).

Second, there are several indications that the Piraha may simply lack the desire to count. For example, Piraha refuse to learn the national language of Brazil. Everett notes that a young boy who did learn to count was actually shunned by other tribe members.

Finally, other cultural facts about the Piraha that are non-specific to counting suggest a radically different view of their situation. The tribe does not have currency, has no art, and no more than 10 consonants or vowels. Everett also notes in the SciAm:Mind interview that Piraha "grammar" does not have embedded clauses - thought by some to be a defining feature of human communication systems. Although this is politically incorrect, it's clearly possible that the Piraha are severely inbred, and may manifest cognitive deficits as a result (although Dan Everett has said that "Pirahã women occasionally have children with Brazilian traders passing through, children raised as Pirahãs. These children don't show any difference I can see from other Pirahãs on these cognitive skills or language facts. I don't think genes, retardation, or other such suggestions are useful or appropriate here.") But as long as mental retardation is a possibility, it seems premature to make conclusions about the innateness of number based on Piraha behavior and language.

EDIT: Predictably, I've taken heat for suggesting that the Piraha may be inbred. One person suggested that the critical number of individuals required for avoiding "inbreeding depression" is 12, which if true would suggest that the Piraha have more than enough individuals to not suffer from the bad effects of inbreeding. On the other hand, I have not been able to verify this number in any published article, and I find it hard to imagine how this number could be empirically verified in the first place. Furthermore it seems that any such number would have to be relative both to a certain number of generations of inbreeding, and to the initial genetic diversity of those 12 individuals, neither of which is known in the case of the Piraha.

Related Posts:
Innate Numbers: One, Two, or Many?
Piraha links (Language Log)
A movie of a Piraha man counting?


What Matters for Theory of Mind?

At around 5 years of age, most children are able to demonstrate they understand that others' can have lasting counter-factual beliefs. For example, if 5-year-olds are told that Joey's mom moved a candy that Joey had previously placed on the counter, they can correctly state that Joey thinks the candy will remain on the counter. 3-year-olds will tend to say that Joey thinks the candy is wherever his mother had moved it, even though Joey has no way of knowing this - in other words, 3-year-olds are unable to correctly represent Joey's counterfactual belief. The capacity to respond correctly in such "false belief" tasks is sometimes called Theory of Mind.

Theory of Mind (ToM) is measured in several ways, including unexpected-location (as described in the example with Joey, above), unexpected-contents (in which a crayola crayon box may actually contain rubber bands, and children must predict what a naive observer would think is in side the box), and in unexpected-identity paradigms (in which a sponge may be painted to look like a rock, and children must predict how a naive observer would classify the object; also known as an appearance/reality task). Also, sometimes ToM is measured by deception tasks in which the child must deceive an opponent. In all cases, children must usually pass a control task in which they state that they know the true state of the object location/content/identity, before demonstrating their capacity for understanding other minds.

Wellman, Cross and Watson published a metanalysis of 178 different "false belief" tasks, involving thousands of subjects. The authors identified variables that explain over 50% of the variance in false-belief tasks (which is impressive, given all the uncontrolled differences between studies in terms of exact stimuli, the procedures, and the kinds of responses required of kids [i.e., verbal vs. nonverbal]. So, what factors are important for ToM, as measured through false-belief tasks?
  1. Age. Not surprisingly, kids get better at false-belief with age (Wellman et al. report that some studies have oddly failed to find this relationship)
  2. Framing the task in terms of trickery. The odds of being correct increase by 1.9 times (for children of all ages) if the task is framed in terms of deceiving an opponent, rather than merely reasoning about a protagonist's beliefs.
  3. Participation. The odds of being correct increase by another 1.9 times (also for children of all ages) if the child participates in the task, for example by moving Joey's candy, or otherwise having a causal role in the task.
  4. Emphasizing the protagonist's mental state. By encouraging children to state or picture the "mental state" of Joey, they are more likely to correctly answer questions about Joey's counterfactual beliefs. This does not enhance young children's false-belief performance to above-chance levels.
  5. Making the target object not real or not present. More 3-year-olds are able to correctly answer where Joey will look for his candy if they are told that his mother had eaten the candy. This might be explained as simplifying the inhibitory requirements of the task, since the candy is no longer in any place except in Joey's counterfactual belief. However, this does not increase young children's performance to above-chance levels.
  6. Temporal marking. More kids can correctly answer "Where will Joey look first for his candy?" than those who can correctly answer "Where will Joey look for his candy?" However, temporal marking only has an effect among older kids - this manipulation does not help the youngest children, suggesting their problems with false-belief tasks lie elsewhere.
Other factors did not seem to make a difference at any age, such as the nature of the protagonist (puppet vs. real person, etc), the type of target object (real objects vs. pictured object, etc), the type of question ("where will Joey look/say/believe the candy is?") and the type of task (location vs content vs identity). The authors also found that the youngest children are just as unlikely to ascribe a false belief to others as they are to ascribe a false belief to themselves. There are no significant differences between the accuracy of self and other judgments at any of the ages tested, although they are numerically different for all but the youngest age groups.

Many have argued that success in such tasks might be dually-determined by both competence and by performance. Competence refers to the conceptual understanding that others' can have counterfactual beliefs, whereas performance refers to the ability to correctly reason or response on the basis of that conceptual understanding. The authors suggest that their results are incompatible with several "performance" accounts of developmental change in ToM (which they refer to as "early competence" theories). Wellman et al. instead advocate the "conceptual change" hypothesis, which suggests that correct performance on false-belief tasks is driven by changes in competence rather than performance.

In their scathing response, Scholl & Leslie strongly object to this conclusion. In fact, they suggest that Wellman's "conceptual change" hypothesis makes only a single prediction, and that prediction is common to all ToM theories: that children should improve with age. And contrary to Wellman's claims that performance accounts are confounded by the manipulation-related improvements across all or some age-groups in various conditions in the meta-analysis, Scholl and Leslie suggest that these results are not incompatible with "early competence" theories of ToM. For our purposes here, their strongest reason for objection is that "early competence" theories do not require that task manipulations increase the youngest children's performance to above-chance levels simply because the manipulations might not attentuate task demands enough to reveal above-chance performance. Nor do they require, Scholl & Leslie argue, that only the performance of younger children improve as a result of task manipulations, since all children may benefit from eased task demands.

It seems possible that false-belief performance can be explained on the basis of relatively domain-general mechanisms like attention, inhibition, and memory. For example, many of the manipulations identified by Wellman et al. could be effective because they reduce the demand to inhibit the current true state of the world in counterfactual reasoning - such as making the target object not real or not present. Other manipulations could be effective because they direct attention appropriately - such as temporal marking or emphasizing the protagonist's mental state. Yet other factors may have more general affects on arousal and attention, such as participation and framing the task in terms of trickery. And then there are relatively simple memory demands - in other words, none of these manipulations are effective if kids simply cannot remember the previous state of the world. (This interaction is bidirectional - reducing inhibitory demands or redirecting attention may make the previous state of the world easier to remember).

It will be important for future research on ToM and false-belief tasks to distinguish the contributions of these processes. Of course, no task is ever completely process pure - so techniques like latent variable analysis are likely to be very important in determining whether ToM performance can be predicted on the basis of these domain-general mechanisms, or whether ToM performance seems to rely on a more unique or specific skill.

10/19/2006: I made a few edits to clarify that ToM is not the same as false-belief performance, but that false-belief performance is just one process-impure way of measuring ToM.


Language Disorders, Modularity, and Domain-General Mechanisms

Yesterday I discussed how domain-general mechanisms can explain several features of language acquisition, including phonology and some aspects of grammar. However, developmental disorders of language pose a slightly stronger challenge to domain-general theories of language.

Perhaps the strongest argument for a specialized grammar mechanism comes from grammatical specific language impairment (G-SLI), a condition in which a selective grammar deficit occurs alongside mutations in a single gene, leaving intact nonverbal, auditory, and articulation abilities (van der Lely, 2005). G-SLI children are specifically impaired at past and passive tense formation, but unlike normal children they do not show a regularity advantage. These problems are stable within individuals across time, as well as between individuals who speak different languages.

Other forms of SLI may result from general auditory processing deficits, but members of the G-SLI subpopulation do not consistently share any deficits except those that define the disorder. This poses a problem for domain-general approaches to language (although see this account of multiple causality in language disorders). But until more tests of auditory and “exception” processing have been performed on this recently-defined (and possibly heterogenous) subpopulation, it seems likely that G-SLI will also be shown to result from the failure of one or more domain-general mechanisms.

This optimism is partly justified by the success of domain-general approaches in accounting for other language disorders. For example, connectionist networks are capable of simulating surface and deep dyslexia (Christiansen & Chater, 2001) without recourse to specialized computational mechanisms – instead, these models rely on the same basic components and learning algorithms used in simulations of a variety of other domains. Although such models typically include independent layers for semantics and phonology, this should not be taken as a strong theoretical claim: the end-state of learning may come to resemble modularity, but the learning process itself can still rely on homogenous, domain-general mechanisms (Colunga & Smith, 2005).

One example of such apparent modularity comes from reports that selective deficits in semantic knowledge can occur for living and non-living things (Thompson-Schill, et al., 1999). Yet, this evidence does not necessarily support the idea that semantic representations are organized according to the “domains” of living and non-living things. For example, one might observe more loss of knowledge about living things in a patient that sustains damage to visual areas, since visual information is more diagnostic of living than non-living things (i.e., non-living categories tend to be formed more by “purpose” than “appearance,” whereas categories of living things tend to be formed in the opposite way). Therefore, the appearance of modularity may actually reflect organization by modality rather than organization by domain.

Likewise, Broca’s and Wernicke’s aphasia also appear to reflect damage to language-specific regions. However, closer inspection of the neuroimaging and neuropsychological data suggest that a variety of regions are involved in language processing of both semantics and grammar (Martin, 2003). Furthermore, Broca’s and Wernicke’s aphasics each manifest heterogenous behavioral impairments, as one would expect if the damaged regions were involved in multiple domains of processing. A sensory-distributed view of Broca’s and Wernicke’s aphasia thus seems more compatible with the available data.

Another powerful demonstration of how domain-general mechanisms can explain semantic knowledge is Latent Semantic Analysis (LSA). LSA is a mathematical model of word meaning that actually passed the synonym-portion of TOEFL at a level sufficient for admission to many major universities (Landauer & Dumais, 1997). Although LSA is not a connectionist model, it is closely related in at least two ways: first, LSA is equivalent to a large three-layer connectionist network; second, LSA’s singular value decomposition algorithm is closely related to principal components analysis, and by extension, Hebbian learning (p.122 of O’Reilly & Munakata, 2000). If a domain-general approach, such as LSA, demonstrates human-competitive performance, then why posit a domain-specific mechanism?

One possible answer is that LSA is non-referential; according to this logic, LSA’s apparent knowledge of word meaning is a kind of “statistical mirage,” where real knowledge of semantics requires being able to identify objects in the environment. Such real-world referential knowledge is sometimes thought to require specialized mechanisms in order to overcome the Gavagai problem – for example, one such mechanism is word learning biases (e.g., Smith et al., 2002).

However, these “biases” may not be specific to word-learning. For example, some have argued that “uncertainty reduction” is a function common to statistical learning processes (Gomez, 2002) which may underlie learning in multiple domains. Biases may appear only because their use maximizes the reduction of uncertainty, and such “maximal error reduction” may be equivalent in some ways to the gradient descent algorithms featured in many connectionist models. Therefore, apparent “word-learning” biases may actually be the result of maximal error-reduction through more general statistical learning processes.

In conclusion, much evidence interpreted to support language-specific mechanisms may actually result from domain-general processes. As reviewed yesterday, characteristics of general-purpose auditory processing explain several aspects of language acquisition, in particular phonology. Likewise, priming effects on grammaticality suggest that grammar is deeply related to a diverse array of other cognitive processes that have also shown priming. There is reason to think that recursive or combinatorial operations are important both for other aspects of cognition and for behavior in non-human species. Disorders of language, both developmental and acquired, may reflect modality- as opposed to domain-specificity. And finally, semantic learning shares remarkable mechanistic similarities to other forms of cognition.

Perhaps the only “problem area” for such an account is the recently defined G-SLI disorder, but more research is needed before GSLI can be considered strong evidence for either perspective.

Therefore, no unequivocal evidence from any of these domains suggests specialized mechanisms must exist to account for language; instead, language appears to emerge as an interaction of powerful but domain-general mechanisms.


Christiansen MH, & Chater N.(2001). Connectionist psycholinguistics: capturing the empirical data. Trends Cogn Sci. 5(2):82-88.

Colunga, E., Smith, L. B. (2005) From the Lexicon to Expectations About Kinds: A Role for Associative Learning. Psychological Review, Vol. 112, No. 2.

Gomez RL. (2002). Variability and detection of invariant structure. Psychol Sci. 13(5):431-6.

Hutzler F, Ziegler JC, Perry C, Wimmer H, & Zorzi M. (2004). Do current connectionist learning models account for reading development in different languages? Cognition. 91(3):273-96.

Landauer, T. K. & Dumais, S. T. (1997) A solution to Plato’s problem: The Latent Semantic Analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104:211–40.

Martin RC. (2003). Language processing: functional organization and neuroanatomical basis. Annu Rev Psychol. 54:55-89.

O'Reilly, R.C. & Munakata, Y. (2000) Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain, MIT Press.

Premack D. (2004). Psychology. Is language the key to human intelligence? Science. 2004 303(5656):318-20.

Smith LB, Jones SS, Landau B, Gershkoff-Stowe L, & Samuelson L. (2002). Object name learning provides on-the-job training for attention. Psychol Sci.;13(1):13-9.

Thompson-Schill SL, Aguirre GK, D'Esposito M, & Farah MJ. (1999) A neural basis for category and modality specificity of semantic knowledge. Neuropsychologia. 37(6):671-6.

van der Lely HK. (2005). Domain-specific cognitive systems: insight from Grammatical-SLI. Trends Cogn Sci. 9(2):53-9.


Language Acquisition Devices vs. Domain-General Mechanisms

Fluency in a language clearly relies on specialized knowledge about that language’s phonology, syntax, and semantics. Human infants are remarkably skilled at acquiring this knowledge, in stark contrast to non-human neonates and, to a lesser extent, human adults. While it may seem reasonable to suggest that a dedicated, language-specific mechanism underlies these skills, there is surprisingly little evidence that such a mechanism is required. Instead, interactions among general-purpose cognitive mechanisms seem sufficient for explaining language acquisition.

At first glance, it can seem obvious that a specialized “language acquisition device” is at work in language acquisition. For example, infants younger than six months of age are able to discriminate a wide variety of phonemes, but lose the ability to discriminate the phonemes that are not a part of (what will become) their native tongue across the next 6 months. Instead, they develop “categorical perception,” in which discrimination between category boundaries is near perfect, but discrimination within category boundaries is near chance. This phenomenon was initially heralded as a result of “speech mode” processing (Diehl, Lotto & Holt, 2004) and was considered a uniquely human adaptation enabling language. Subsequently, categorical perception was demonstrated among macaques, chinchillas, and even quail – none of which have language in the human sense – suggesting instead that this is merely a characteristic of general-purpose auditory processing (Hauser, Chomsky & Fitch, 2002).

One might also suspect a specialized mechanism for language acquisition in phonetic context effects. Consider the stimulus length effect – where the distinction between stops and glides (e.g., /b/ vs. /w/) is signaled reliably only by the duration of the following vowel. Closely related is the “compensation for coarticulation” effect, in which perception of a syllable can be affected by a syllable preceeding it by up to 50ms. Although previously used as evidence that linguistic perception is uniquely tuned to the peculiars of human vocalization, more recent evidence suggests these effects also arise from perceptual contrast effects in auditory processing common to non-human animals (Diehl, et al., 2004). Thus, the apparent role of language-specific mechanisms in phonology can instead be explained by more general purpose mechanisms.

Evidence from grammar learning is often considered less equivocal. For example, although non-human animals (and even some artificial neural networks) have demonstrated human-like phonological processing (Hutzler et al., 2004), only humans seem capable of syntactical constructions as rich as those in language. Yet there are several reasons to suspect that grammar actually arises from domain-general processes. For example, grammaticality judgments can be primed, such that a given construction is considered more grammatical if it has been recently encountered (Luka & Barsalou, 2005) – and priming has been demonstrated in domains as diverse as visual identification and familiarity judgments. This indicates that the mechanisms underlying grammar are fundamentally similar to those used in these other domains.

Grammar’s recursivity and hierarchical rule structures are also frequently argued to support the existence of language-specific mechanisms (e.g., Premack, 2005), but this too can be a result of domain-general processing. For example, the sensitivity of human infants to non-adjacent dependencies in language-like stimuli can result from general-purpose statistical learning (Gomez, 2002). Other evidence suggests that the representation of hierarchical rules may be an organizing principle of prefrontal cortex (Bunge & Zelazo, 2006) and not specific to language per se. There even appears to be disagreement over whether non-human animals are capable of recursivity – for example, Cullicover & Jackendoff (2003) refer to overwhelming evidence “that the behavior of many animals must be governed by combinatorial computation."

In summary, there is no solid evidence to suggest that language-specific mechanisms are at work in language acquisition. Tomorrow's post will cover language disorders, and the extent to which they implicate a language-specific mechanism.


Bunge, S.A., & Zelazo, P. D. (2006). A Brain-Based Account of the Development of Rule Use in Childhood. Current Directions in Psychological Science. 15(3):118 -121.

Christiansen MH, & Chater N.(2001). Connectionist psycholinguistics: capturing the empirical data. Trends Cogn Sci. 5(2):82-88.

Culicover PW, & Jackendoff R. (2006). The simpler syntax hypothesis. Trends Cogn Sci. 10(9):413-8.

Ehri, L. (2004) Development of sight word reading: Phases and findings. In M. Snowling & C. Hulme (Eds.), The science of reading: A handbook. Oxford, UK: Blackwell. 135-154

Diehl RL, Lotto AJ, & Holt LL. (2004). Speech perception. Annu Rev Psychol. 55:149-79.

Gomez RL. (2002). Variability and detection of invariant structure. Psychol Sci. 13(5):431-6.

Hauser MD, Chomsky N, & Fitch WT. (2002). The faculty of language: what is it, who has it, and how did it evolve? Science. 298(5598):1569-79.

Hutzler F, Ziegler JC, Perry C, Wimmer H, & Zorzi M. (2004). Do current connectionist learning models account for reading development in different languages? Cognition. 91(3):273-96.

Luka, B.J., & Barsalou, L.W., (2005). Structural facilitation: Mere exposure effects for grammatical acceptability as evidence for syntactic priming in comprehension. J. Mem. Lang. 52(3):436-459.

Martin RC. (2003). Language processing: functional organization and neuroanatomical basis. Annu Rev Psychol. 54:55-89.

Premack D. (2004). Psychology. Is language the key to human intelligence? Science. 2004 303(5656):318-20.

Smith LB, Jones SS, Landau B, Gershkoff-Stowe L, & Samuelson L. (2002). Object name learning provides on-the-job training for attention. Psychol Sci.;13(1)


Blogging on the Brain: 10/9 - 10/15

Recent highlights from brain blogging:

More on Linguistic Relativity: Mixing Memory covers the Sapir-Whorf hypothesis, as well as new evidence on how culture affects color perception. (Update! The Mouse Trap has a really nice post on the same topic)

Shared memories: Are your memories are really yours?

Wisdom of the Ancients? Mnemonic techniques from classic Greece.

Brain Ethics mentions several interesting-sounding new articles.

The Ultimatum Game, after TMS of dlPFC

Memory Upgrades: Acetylcholine May Enhance Memory Detail (via)

Adrenaline, Glutamate, and Cerebral Blood Flow: The Role of Pericytes (and a video!)

Gehry at SFN: Shelley tells us how Frank Gehry's SFN talk went.

Brain Training Funded by NSF: The "Spatial Intelligence and Learning Center"

Have a nice weekend, and enjoy SFN!


The Difference Between Knowing and Remembering: Consciousness and Theta Synchronization?

In the remember/know procedure, subjects first study a list of words and some time later are presented with a series of probe words. Subjects must indicate whether they have a specific conscious recollection of studying that word ("remember"), whether they know they studied the word but cannot specifically remember the study episode ("know"), or whether they did not encounter the word previously.

This test is standardly used to assess damage in amnesics, where medial temporal lobe damage impairs "remember" judgments more than "know" judgments. These terms can be seen as isomorphic with "recollection" and "familiarity." In single process models of long-term memory, familiarity involves resonance among non-hippocampal medial temporal lobe structures, whereas recollection involves that resonance in addition to resonance within the hippocampus itself.

However, another way of looking at remember/know judgments is that correct remember judgments involve some aspect of consciousness, while correct know judgments do not. This led Klimesch et al to investigate the differences between "remember" and "know" judgments with ERP and an eye to previous findings about the neural correlates of consciousness.

The authors found a maximal peak in scalp voltage at 356.6 ms for items which subjects "knew" but could not specifically "remember." This component of the signal was maximal for known items, minimal for new items, and in-between for remembered items. In contrast, a different peak at 591 ms was maximal for items which subjects "remembered" being presented, minimal for new items, and in-between for "know" items.

When the authors viewed the dominant frequencies over time, rather than dominant amplitudes over time, they discovered a very different pattern. In this case, there was an increase in theta-band power for correct over incorrect judgments of all three types - new, remembered, and known. However, these three judgment types differed both in terms of the time course and the duration of the theta-band power increase: the duration of power increase was longest for remember items, and this came later than power increase resulting from the other judgment types. The duration of theta-band power increase was shortest for new items, and in-between for known items, this increase in power for known items occurred before the power increases for remember and new judgments.

The authors interpret the late but lengthy increase in theta-band power for "remember" judgments as resulting from reentrant hippocampo-cortical loops. In contrast, the early and relatively short increase in theta-band power for "know" judgments is interpreted to result from resonance within perirhinal and other medial temporal lobe structures.

Broadly speaking, these results are consistent with a single-process model, in which resonance within perirhinal cortex contributes to the theta-band power increases in remember judgments. This power increase is lengthened by an additional process in remember judgments, specifically resonance throughout hippocampo-cortical loops.

What are the implications of this research with regard to consciousness? The authors note there seems to be no simple one-to-one mapping of power increase (which they presume reflects "synchronization," though others have shown this is not necessarily so) to consciousness. However, one somewhat hasty conclusion would be that duration of power increases, but not necessarily the magnitude of power increases, correlates with consciousness.

The authors note previous work that has shown theta-band power increases during REM relative to "non-conscious" sleep, as well as increased power during animal exploratory behavior and orienting relative to "automatic" processes such as eating or drinking. Other research that I have reviewed previously implicates the theta-band in the maintenance of information, the modulation of gamma-band power (which has also been widely implicated in consciousness), as a 'neural pacemaker' for alternating retrieval and encoding processes, and as phase-sensitive both to spatial encoding and to the onset of events in the environment.

Related Posts:
The Argument for Multiplexed Synchrony
Theta Frequency Reset in Memory Scanning
Separate Phases for Encoding and Retrieval in Theta Rhythms
Strength Through Synchrony
Binding Through Synchrony: Proof from Developmental Robotics


Verbal Labeling and Proactive Interference

It is clear that language plays a big role in the performance of many cognitive tasks. For example, in task-switching paradigms subjects may be verbally rehearsing the name of the new task as a way to remember or reinforce the correct response. Accordingly, "articulatory suppression" (even when using something as simple as a tongue depressor) can cause significant performance decrements on many tasks.

In their new article in Psych Science, Kray et al. report on how the act of labeling may influence higher cognition. Specifically, in the first phase of their experiment, they had 96 4-year-old children press a blue key if Ernie appeared, but the green key if Bert appeared. Each of these keys was paired with a different sound, and children were asked to verbally "label" their action after pressing a key, as follows: one group of 4 year olds was asked to say which key they pressed (the blue one or the green one); another group was asked to name the sound they heard after the keypress (bell sound or trumpet sound); a third group was asked to name both the key they pressed and the sound that followed. Finally, a control group used task-irrelevant labels. In the next phase, the children were tested: the experimenters would play a sound, and asked half the children to press the key that was paired with that sound (the consistent condition), and asked the other half to press the key that was not paired with that sound (the inconsistent condition).

The results showed that verbalization had an effect above and beyond control only in one situation: kids who verbalized both the color of the key press (the action) and the sound that resulted (the effect) were less accurate than those who verbalized either the action or the effect alone, but only in the inconsistent condition. The other conditions were not significantly different between inconsistent and condition, nor between the type of labels used. In other words, children who performed other types of labeling (just the action, just the effect, or unrelated labels) revealed no proactive interference from the consistent task. Why might this be the case?

The authors argued that labeling both the action and its effect served to bind these representations strongly together, in a way that was difficult to reverse in the inconsistent mapping condition. This much seems almost incontroversial.

However, it's possible that labels helped in a way that was not specific to language. In other words, the act of labeling both the previous action and it's effect might be considered a form of mental practice. In this case, none of the control conditions are matched for mental practice. Thus it would be important to verify whether it is actually the act of labeling, versus simply "reenacting" the prior trial, that leads to proactive interference further down the road.

Related Posts:
Do Innate Expectations About Causation Reflect "Universal Grammar?"
Labels as An Accelerator of Ontological Development
The Poverty of the Stimulus and the Power of Statistical Learning


Language And Thinking-For-Speaking

Yesterday I reviewed how certain aspects of language may influence thought by transiently accelerating ontological development. But language can also skew perception in a more lasting way, as reviewed by Boroditsky, Schmidt and Phillips in their chapter in Language in Mind: Advances in the study of Language and Thought (perhaps based in part on this paper by the same name).

Boroditsky et al. describe several experiments that demonstrate linguistic influences on learning. In one such experiment, German and Spanish speakers were taught male or female proper names for objects that have a grammatical gender in those languages. Speakers of both languages show interference when learning these names if the gender of the new object name (e.g., "Patrick") is inconsistent with the grammatical gender of that object in the speaker's language.

In another experiment, the authors gave German and Spanish speakers each a list of 24 words which differed in grammatical gender. Each subject was then asked to write down three adjuectives describing each word. The entire experiment was conducted in English. Coders blind to the purpose of the study rated these adjectives as distinctly "more female" for words that had feminine gender in German or Spanish, and as distinctly "more male" for words with masculaine gender. Here is a particularly amusing paragraph from the chapter, regarding this study:
"There were also observable qualitative differences between the kinds of adjectives Spanish and German speakers produced. For example, the word for "key" is masculine in German and feminine in Spanish. German speakers described keys as hard, heavy, jagged, metal, serrated, and useful, while Spanish speakers said they were golden, intricate, little, lovely, shiny and tiny. The word for "bridge," on the other hand, is feminine in German and masculine in Spanish. German speakers described bridges as beautiful, elegant, fragile, peaceful, pretty and slender, while Spanish speakers said they were big, dangerous, long, strong, sturdy and towering."
Boroditsky point out that the grammatical differences in these words might actually reflect cultural differences in bridge architecture, thus making it difficult to ascribe these effects specifically to language rather than general cultural effects. To address this problem, Boroditsky et al taught English speakers the "soupative/oosative" distinction in the fictitious Gumbuzi language, in which, for example, girls as well as pens, pans and forks might be "soupative," while boys as well as giants, spoons and penciles might be "oosative." After learning this distinction, the English speakers were simply shown the objects and asked to provide adjectives describing each one. Independent and blind coding again showed that those who had learned to associate a given object with males tended to provide more "male-like" adjectives, and vice versa for "female-like" adjectives.

All in all it is not terribly surprising that some aspects of language can affect other aspects, such as adjective generation. However, Boroditsky et al. also review evidence where language affects superficially non-linguistic behaviors; for example, German and Spanish speakers are more likely to rate certain objects as more similar to males if that is consistent with grammatical gender in their native tongue, and vice versa for females, even if the task is conducted in English.

One might still complain that a task like similarity rating is still linguistic, and expect that if linguistic resources were somehow occupied this effect would disappear. However, Boroditsky et al. report that the effect remained even when subjects were engaged in a secondary but concurrent speech-shadowing task (where subjects must repeat aloud the words said to them in an unrelated stream of speech). The authors conclude that language clearly does influence non-linguistic thought in a profound way, and that the kinds of mental experiences we have may differ significantly depending on what language we speak.

Of course, there are reasons to think this may not be completely right. For example, the speech-shadowing task may have its effects simply by acting on attention or memory - and so adult subjects essentially become "handicapped" into looking like children. All of the other tasks seem linguistic in nature; in other words, subjects might have been using verbal encoding strategies. In this case, language-specific effects are unsurprising, since they could result from mere priming of "thinking-for-speaking" rather than affecting "thinking-in-general."

In conclusion, much stronger evidence for a deep influence of language on "thinking-in-general" comes from evidence reviewed on Monday about space metaphors of time and linguistic determinism. (The authors found that the kinds of metaphors used in your mother language will skew nonverbal time-reproduction judgments by stimuli that relate to those metaphors).


Labels as an Accelerator of Ontological Development

At a broad level, the strongest versions of linguistic determinism are simply untrue - in other words, language does not constrain or determine the possible range of thought. For example, individuals whose language has fewer color words than English are still capable of distinguishing among the same number of colors as English speakers. Yet, subtler forms of this hypothesis are more difficult to rule out. For example, language may transiently accelerate the developmental acquisition of certain conceptual and perceptual distinctions, although all adults may ultimately become capable of making such distinctions.

Xu's 2002 Cognition article explores this weaker view of linguistic determinism in the context of the sortal/kind distinction - in other words, the capacity of humans to distinguish between items such as "ball" and "cup." Xu reviews previous evidence showing that somewhere between 4 and 10 months, human infants begin to distinguish between objects on the basis of their location in space. However, only infants 12-months-old and older can show more finely-grained distinctions, such as those relying on object feature information. What evidence supports this claim?

If infants are presented with one object that emerges from and subsequentely disappears behind a screen, followed by a different object that is also briefly presented and then occluded, only 12-month-olds and up are surprised if the screen is removed to reveal a single object. It is as though younger infants represented merely that an object existed, whereas older infants can discriminate the two objects on the basis of their kinds.

But what if younger infants merely do not have the perceptual sensitivity to discriminate between the two objects? If this were true, such conclusions about infant's use of abstract categories would be premature. However, Xu reviews control studies demonstrating that infants are sensitive to the perceptual distinctions between objects at even younger ages. This suggests that the developmental change between 10 and 12 months is at the level of ontological categorization, not perceptual discrimination.

So, what drives this change? Some studies suggest that infants' increasing linguistic knowledge may be at the heart of this ontological development. Xu reviews previous work showing that the more words infants are judged to know, the more likely they are to show sortal/kind discrimination. To further investigate this hypothesis, Xu presents evidence from 4 original studies bearing on the idea that language is a key player in the development of ontological categories. The methodological details are in italics below:

Xu first habituated infants to two objects for 7 trials each, in which each object was brought out from behind the screen, tapped, and labeled. Some infants received a unique label for each object (i.e., the "two word" condition) while other infants received only a single label for each object (i.e., the "one word" condition). Next, one object was brought out from behind the screen, and then hidden again, followed by the other object. Then the screen was rotated to reveal either 1 (an possible outcome if objects are only distinguished by spatiotemporal appearance) or 2 objects (the possible outcome if objects were distinguished by kind or object features). This testing process was repeated for each infant 4 times, evenly split between the 1-object or 2-object outcome. Looking times were recorded by video and coded by research assistants blind to the particular condition each infant was in. Finally, in a baseline condition, another group of infants was presented with either one or two objects hidden behind a screen in order to get a baseline measurement of infants' looking times to one or two objects.

The results showed that 9-month-old infants in the two-word condition looked longer at the 1-object outcome than infants in the baseline condition. However, 9-month-olds who were given only a single word did not show any differences between the baseline condition and the 1-object outcome. In addition, giving a unique label to each object increased the looking-time during the habituation trails.

What if this happened merely because using unique auditory stimuli for the two objecfts increases arousal or attention? Two subsequent experiments were unable to replicate this effect using very distinct tones, suggesting that the benefit of labels is language-specific.

What if this happened merely because the sounds are made by a human, rather than being specific to language? A fourth experiment was unable to replicate the effect using nonlinguistic human utterances such as "ah" and "ew," again suggesting the benefit of labels is language and not source-specific.

Xu concludes that language can accelerate the process of sortal/kind discrimination, such that a skill normally only demonstrated by 12-month-olds was in this case demonstrated by 9-month-olds with the proper linguistic input. Xu next discusses four possible conclusions based on these results. Starting with Xu's more conservative conclusions, and moving to the more speculative ones:
  1. Labels facilitate sortal/kind distinctions by aiding a domain-general, non-linguistic process, such as memory? According to this view, "labels function as 'summary representations' or mnemonics for the infants."
  2. Labels increase the salience of perceptual feature differences between objects? According to this view, the use of two labels causes infants to pay more attention to the visual differences between objects, which then helps them demonstrate apparent sortal/kind distinctions. Xu argues that this explanation is unlikely given that the use of two-labels increased looking times relative to silent trials the same amount as the use of a single label.
  3. Labels are "essence placeholders" that directly signal the presence of distinct sortal/kinds? Labels indicate that there are two object types present, which necessarily implies that there are two object tokens present. This then leads infants to show surprise if only a single object is revealed behind the screen.
  4. Labels bind disparate representations in cognitive architecture? This view posits that infants interpret language according to two word-learning biases: the whole object bias (that one word refers to the totality of a contiguous object rather than its constituent parts) and the taxonomic constraint (a word for one kind of object can also be used for a different object of the same kind). These two biases require representations of the type "where" (residing in the dorsal visual processing stream) and of the type "what" (residing in the ventral visual processing stream). Therefore, the interpretation of a label accomplishes a binding function, directly analogous to the kinds of visual binding investigated by Wheeler & Treisman in adults.
All of these possibilities are interesting, but some are difficult to distinguish from others. For example, it is possible that labels serve as a memory crutch (as in conclusion #1) by binding disparate representations (conclusion #4). Similarly, it's difficult to know what makes an "essence placeholder" different from a memory crutch (#1) or from something that increases the salience of perceptual dimensions (#2).


Spacetime and Linguistic Relativity

“Are our own concepts of ‘time,’ ‘space,’ and ‘matter’ given in substantially the same form by experience to all men, or are they in part conditioned by the structure of particular languages?”

This quote by Benjamin Whorf begins a fascinating article by Casasanto et al. on the concept of linguistic determinism, which first traces the evolution of this question before presenting new data that bear on it.

In recent years, strong forms of linguistic determinism have fallen out of favor in cognitive circles. Much of this can be traced to fallout from Whorf's often-ridiculed suggestion that Eskimos have seven different words for snow (or as many as 200 if you believe the popular press), and thus must cognitively represent snow in a different way than the rest of us. The authors note Pullman's observation that English may also have seven words for manifestations of snow ("slush, sleet, powder, granular, blizzard, drift, etc"), thus highlighting an inherent source of subjectivity in vocabulary-based arguments. The authors also cite Pinker's criticism of Whorf's circular logic: "[Eskimos] speak differently so they must think differently. How do we know that they think differently? Just listen to the way they speak!"

Casasanto et al. point out that non-linguistic evidence is critical to demonstrate that speakers of different languages "also think differently in corresponding ways." For example, it is perhaps unsurprising that language affects the kinds of thinking that ultimately lead to spoken responses. A task with non-verbal responses would make a much stronger claim about the broader influence of language on thought. (Chris at Mixing Memory came to the same conclusion: "there is still no nonlinguistic (i.e., non-circular) evidence that time is conceptualized metaphorically through mappings to space.")

To that end, the authors devised two experiments to elicit differences in time perception. These experiments were administered to English, Indonesian, Greek, and Spanish speakers. English and Indonesian speakers tend to express time in terms of distance: for example, "long night" or "long relationship." In contrast, these phrases are not usually directly translated into Greek or Spanish, because those languages express the equivalent thoughts along the lines of "big night" or "big relationship." In other words, they preferentially express time in terms of volume. Thus, speakers of each language may show different patterns of performance on time-estimation tasks that utilize the concepts of distance or volume.

In a first experiment, Casasanto et al. presented speakers of each language with pixels on a computer screen that slowly grew into a line of 9 different possible lengths. These lines grew at 9 different speeds, resulting in 81 possible types of line stimuli. On each trial, subjects had to estimate either how far the line grew (spatial judgment), or how long it took to grow (temporal judgment), by clicking on the screen with the mouse. Thus, in this experiment, time was represented at least partially by distance.

In a second version of this experiment, time was represented by volume - in terms of how long it took an animated jug to accumulate with water. As in the previous experiment, there were 9 different final volumes of water, and 9 different speeds of water accumulation, which were fully crossed to yield 81 possible stimuli.

The results showed that time estimation was strongly affected by line length, but only for English and Indonesian speakers. Time estimation was also strongly affected by volume, but only for Greek and Spanish speakers. Neither group differed in their overall time-estimation accuracy on either task; instead, the differences between groups emerged only in the strength of the effect distance and volume have on time estimation.

Casasanto et al. suggest that these results are compatible with at least one view of linguistic determinism. First, prelinguistic children all experience time in the same way, but as they begin to learn their language's particular conceptual mappings between other physical characteristics (such as distance, or volume) and time, these particular ways of viewing time become strengthened through repeated use and exposure. Notably, such effects are not limited to verbal responses, but may more deeply frame our experience and memory of reality.

Related Posts:
Time Perception I and II (at Mixing Memory)
On Time, Space, Metaphor (also at Mixing Memory)
Time Space Metaphors (at the Mouse Trap)
Language and Time (at Cognitive Daily)


Blogging on the Brain: 9/9-10/9

Highlights from the best in brain blogging (there's a lot here... it's been a while!):

The best psychology articles from the last three years?

Ramachandran on consciousness and "metarepresentation" (A video from Ramachandran's Almaden lecture is here)

A Student's Guide to The Cognitive and Neuroscience Web

Distinguishing Self from Other in the Mirror Neuron System (another article on this topic at Mixing Memory)

BrainEthics covers recent news that perception of a "shadow self" can be induced by stimulation of the temporoparietal junction. Another article here, and another here.

Localizing recollection, familiarity and novelty in the medial temporal lobe

Confabulation and Distortion - Do you think you know a pathological liar?

Music and the Developing Brain - A topic of perennial interest, thanks to the "Mozart Effect"

More Interactive Brain Maps

Beautiful Images from Ernst Haeckel! How many scientists these days can draw like this?

Auditory mirror neurons at Mixing Memory, who's doing a great job of separating the grain from the chaff with regard to mirror neurons.

Have a nice weekend!


Do Innate Expectations About Causation Reflect "Universal Grammar?"

Many theorists insist that some kind of domain-specific language acquisition device underlies language learning (see, for example, the debate covered on Tuesday). One form of this argument, as laid out by Lidz & Gleitman (2004), suggests that children must have measurable expectations about language that hold across cultures, regardless of their actual language experience, because the action of a domain-specific language device constrains the kinds of languages that exist.

So, what evidence do Lidz & Gleitman use to support this assertion? They claim that the following "one-to-one" principle holds true across all human languages: "Every participant in an event as it is mentally represented shows up as a syntactic phrase in a sentence describing that event." In other words, if two entities are involved in some event, then there will be at least two syntactic phrases used to describe that event.

They also show that this principle is adhered to by home-grown sign language systems developed between deaf children and non-signing parents. Of course, this is not particularly strong evidence, since the non-signing parents could have intentionally incorporated the above principle into the make-shift language.

Lidz & Gleitman present slightly stronger evidence in their study of child speakers of Kannada, an Indian language in which a morpheme roughly equivalent to the English suffix "-ize" indicates causation. For example, one might say "John melt-ized the ice" to indicate that John caused the ice to melt. Critically, one can also just say "John melt-ized" in Kannada - so morphology, and not necessarily "transitive" syntax, is a more reliable marker of causation. Interestingly, Lidz & Gleitman interpret Kannada to violate the principle they claim is common to all languages, in which any statement with two objects has two syntactic phrases!

Disregarding that logical disconnect for the moment, the authors demonstrate that Kannada speakers tend to act out phrases with 1 syntactic phrase as non-causative, while those with 2 syntactic phrases are acted out as causative, regardless of the use of the "-ize" morpheme. This, they claim, shows that even children who speak Kannada maintain this internal mechanism for grammar that specifies causation involving two agents must have at least two syntactic phrases - in other words, the one-to-one principle.

There are such serious problems with this account that it appears somewhat incoherent.

1) First, if the one-to-one principle is innate, and language is a product of this innate mechanism, all languages should manifest it. Yet Kannada doesn't - isn't this prima facie evidence that no such innate mechanism is required?

2) Kannada uses syntactic structure to indicate causation less frequently and therefore less reliably than morphology. However, this does not mean that syntax's bearing on causation goes unlearned by Kannada children - therefore, their behavior could result from learning rather than the action of innate syntax-causation mechanisms.

3) Causative relationships might be mentally represented with more than one agent because that is "the way the universe works." This need not be innate, nor specific to language: things act upon other things, and it should not be surprising to see human behavior (even from the youngest infants, the most remote cultures, or the most exotic languages) correspond to this simple fact of life.

In summary, the case of Kannada and deaf children of non-signing parents does not unequivocally demonstrate that syntax-causation rules must be innate. First, if they were, one would not expect a language such as Kannada to exist, and second, the appearance of such rules might simply reflect more general processes of learning from the environment.

Related Posts:
Linking the Nativist and Empiricist Views of Grammar
Poverty of the Stimulus and the Power of Statistical Learning
A Presentation on the Self-Organizing Learning of Semantics
Machine and Human Learning of Word Meanings
Watching a Language Evolve Among Robotic Agents
Symbols, Language, and Human Uniqueness