Rethinking Multiple Causality
I recently posted on a fascinating paper about neural network modeling of multiple causality in developmental disorders, and how one might begin to use variability analyses to tease apart homogenously- from heterogenously-disordered groups on the basis of behavioral measures.
However, after spending many more hours analyzing this paper than I care to admit, and despite having decided that the neural network modeling is probably accurate, the behavioral data used as support for these conclusions is completely inadequate.
The data that supposedly "confirms" the model prediction (which, for those who didn't read my earlier summary, is that "deficits originating from multiple underlying causes show less variability in the disorder’s diagnostic measure than on other behavioral measures, while networks manifesting the same deficit as a result of a single underlying cause tend to show equal or greater variability on nondiagnostic measures relative to the measure that defined the disorder"), from Williams Syndrome (WS) and Word-Finding Difficulties (WFD), actually provides only partial support.
Specifically, the models make these predictions in the context of disorders where heterogenous and homogenous groups cannot be dissociated on the basis of means on a behavioral measure. However, the comparisons that can be made between these two *actual* disorders on the basis of behavioral measures show that means probably *would* be sufficient to tell them apart! (Although in the case of the criterial measure, this might be due to the fact that the groups weren't even age-matched - for WS, adults were included, whereas for WFD only children were analyzed). In summary, the necessity of analysis of variability is not clear.
Secondly, the criterial measure used to compare WS with WFD is "words produced in time period," but this measure actually comes from different tasks in each disordered group: naming accuracy in response to semantic cues for the WFD subjects, and number of words produced from a given category (starting with a certain sound, rhyming with a target word, or from a specific semantic category) for the WS subjects. The fact that the criterial measure is different for both groups just underscores a major tautological problem; the skeptic might ask, does this insight from neural nets help us tease apart disorders that we don't *already* know are different?!?
Thirdly, the "non-criterial" measures used to assess whether there are different patterns of cross-measure variability in WS vs. WFD are not sufficiently different from the criterial measure. For example, the WS criterial measure of naming accuracy requires subjects to generate words from a specific category, whereas the WFD criterial measure is the TWF (which requires picture naming of noun, verbs, categories, and sentence completion). However, the "additional" non-criterial measures are the effect of different semantic categories on the accuracy and latency of picture naming. I fail to see what makes this an "additional" task, since it appears to be tapping the same thing. (To the author's credit, it is surprising that WFD showed increased variability on the additional metric than on the TWF, since they seem almost identical.)
Finally, the author actually doesn't explicitly do the analysis of actual behavioral data that was motivated by the models. This is such a big mistake that I didn't believe it at first, but it's true: the models show that cross-measure variability (and its change over time) is different within a homogenous group than within a heterogenous group, and yet the comparison he makes using real data is that WFD shows larger variance on non-criterial metrics than WS. To test the model predictions, he should have examined whether a) WFD shows larger variance on non-criterial metrics than on the criterial metric and b) WS shows equal variance on non-criterial metrics as as on criterial metric.
Finally, the author doesn't provide any longitudinal data from WS/WFD to support the conclusion that variability among non-criterial metrics would decrease over time in homogenous groups. That conclusion is just "left hanging."
Again, while I still feel that the predictions motivated by the networks are accurate, they have not been adequately tested. This is a great opportunity for anyone sitting on a large dataset of autism or ADHD data to send it my way ;)
However, after spending many more hours analyzing this paper than I care to admit, and despite having decided that the neural network modeling is probably accurate, the behavioral data used as support for these conclusions is completely inadequate.
The data that supposedly "confirms" the model prediction (which, for those who didn't read my earlier summary, is that "deficits originating from multiple underlying causes show less variability in the disorder’s diagnostic measure than on other behavioral measures, while networks manifesting the same deficit as a result of a single underlying cause tend to show equal or greater variability on nondiagnostic measures relative to the measure that defined the disorder"), from Williams Syndrome (WS) and Word-Finding Difficulties (WFD), actually provides only partial support.
Specifically, the models make these predictions in the context of disorders where heterogenous and homogenous groups cannot be dissociated on the basis of means on a behavioral measure. However, the comparisons that can be made between these two *actual* disorders on the basis of behavioral measures show that means probably *would* be sufficient to tell them apart! (Although in the case of the criterial measure, this might be due to the fact that the groups weren't even age-matched - for WS, adults were included, whereas for WFD only children were analyzed). In summary, the necessity of analysis of variability is not clear.
Secondly, the criterial measure used to compare WS with WFD is "words produced in time period," but this measure actually comes from different tasks in each disordered group: naming accuracy in response to semantic cues for the WFD subjects, and number of words produced from a given category (starting with a certain sound, rhyming with a target word, or from a specific semantic category) for the WS subjects. The fact that the criterial measure is different for both groups just underscores a major tautological problem; the skeptic might ask, does this insight from neural nets help us tease apart disorders that we don't *already* know are different?!?
Thirdly, the "non-criterial" measures used to assess whether there are different patterns of cross-measure variability in WS vs. WFD are not sufficiently different from the criterial measure. For example, the WS criterial measure of naming accuracy requires subjects to generate words from a specific category, whereas the WFD criterial measure is the TWF (which requires picture naming of noun, verbs, categories, and sentence completion). However, the "additional" non-criterial measures are the effect of different semantic categories on the accuracy and latency of picture naming. I fail to see what makes this an "additional" task, since it appears to be tapping the same thing. (To the author's credit, it is surprising that WFD showed increased variability on the additional metric than on the TWF, since they seem almost identical.)
Finally, the author actually doesn't explicitly do the analysis of actual behavioral data that was motivated by the models. This is such a big mistake that I didn't believe it at first, but it's true: the models show that cross-measure variability (and its change over time) is different within a homogenous group than within a heterogenous group, and yet the comparison he makes using real data is that WFD shows larger variance on non-criterial metrics than WS. To test the model predictions, he should have examined whether a) WFD shows larger variance on non-criterial metrics than on the criterial metric and b) WS shows equal variance on non-criterial metrics as as on criterial metric.
Finally, the author doesn't provide any longitudinal data from WS/WFD to support the conclusion that variability among non-criterial metrics would decrease over time in homogenous groups. That conclusion is just "left hanging."
Again, while I still feel that the predictions motivated by the networks are accurate, they have not been adequately tested. This is a great opportunity for anyone sitting on a large dataset of autism or ADHD data to send it my way ;)
0 Comments:
Post a Comment
<< Home