The Mathematics of Autism

This article analyzes the soaring rates of “autism spectrum disorders” reported in the United States and other developed nations over the last twenty to thirty years.  The United States Centers for Disease Control (CDC) recently announced new data that one in eighty-eight (1 in 88) children in the United States are diagnosed with autism spectrum disorders, often simplified to autism in press reports.  

Autism may be associated with mathematical skills.  Autism researcher Simon Baron-Cohen has published studies that autism is more prevalent in the familes of  physicists, engineers, and mathematicians.  Unusual mathematical skills are reported in a small percentage of cases of  “classic autism.”  Movies, television, and popular culture such as Rainman (1988), Mercury Rising (1998), and many other works often play up this rare association by presenting autistic characters with extreme mathematical abilities.

It has frequently been suggested that various scientists and mathematicians including the Nobel Prize winning physicist Paul Dirac, the Russian mathematician and Fields Medal refuser Grigori Perelman, and Fields Medal winner Richard Borcherds have had or have Asperger’s Syndrome, now included in the autism spectrum.  Vernon L. Smith who won the Nobel Prize for Economics in 2002 has stated that he has Asperger’s Syndrome.  Wired Magazine popularized the notion of an association of Asperger’s syndrome and autism with computer technology and math in the article The Geek Syndrome by Steve Silberman.

What is autism?

Prior to 1980, autism referred to a rare severely disabling condition that often resulted in lifelong institutionalization.  Autism was first described by the psychiatrist Leo Kanner in 1943.  In 1956, Kanner and his colleague Dr. Leon Eisenberg defined autism as:

  • a profound lack of affective contact (Author’s Note: language and social skills)
  • repetitive, ritualistic behavior, which must be of an elaborate kind.

In practical terms, Kanner’s definition referred to children who experienced an extreme failure to develop normal language and social skills, such as babbling and frequently being unable to carry on even a basic conversation.  Similarly the repetetive behavior referred to extremely abnormal behavior such as spinning around for hours at a time.

Kanner’s autism is now often identified as “classic autism.”  Many readers are probably familiar with the 1988 movie Rainman starring Tom Cruise and Dustin Hoffman.  Rainman present a somewhat romanticized depiction of a relatively high functioning person, Raymond Babbitt played by Hoffman, with classic autism.  Hoffman’s character also exhibits unusual mathematical skills, something reported in a small percentage of classic autism cases.  Hoffman’s character has been institutionalized most of his adult life and ultimately returns to the institution at the end of the movie.

A Dramatic Increase

Autism Prevalence by Nation

Autism Prevalence by Nation

The figure above shows the prevalence of autism at different times and in different nations according to different studies collected and reviewed in the US Centers for Disease Control (CDC) Autism Prevalence Summary Table for 2011.  The appendix below gives the data extracted from the CDC document and the GNU Octave script used to generate the figures in this article.  The red line in the figure above is a polynomial fit to the worldwide autism prevalence data.

The reported prevalence of autism has increased dramatically, especially since the early 1990s.  In the 1960’s and 1970’s, autism as defined by Kanner was extremely rare.  There were few studies of the prevalence of the condition.  The few studies in the United States and the United Kindgom showed a prevalence of less than one child in one-thousand (1 in 1000).  The typical rates were about 3 or 4 children per 10,000 children, less than half of one tenth of one percent.  Many people lived their lives without ever encountering a case of autism.

The definition of autism has changed several times over the last forty years.  Most significantly, autism was expanded into the “autism spectrum disorders” which includes a range of other, possibly related conditions, notably Asperger’s Syndrome.  The definitions are also fairly general like Kanner and Eisenberg’s original definition.  They may, therefore, be prone to changing interpretations even if the verbal description does not change.  What is profound in Kanner’s original definition, for example?  What is elaborate?

The studies reported on the CDC’s web site use a wide range of different definitions to diagnose autism.  DSM refers to the Diagnostic and Statistical Manual of Mental Disorders from the American Psychiatric Association.  DSM is sort of the Bible of psychiatric diagnoses.  Significantly, autism was first recognized as a separate, distinct diagnosis in DSM-III in 1980.  The definition of autism in DSM-III was substantially reworded in DSM-III-R in 1987, although it still appears to refer to “classic autism”.  In 1994, the DSM expanded autism to a range of disorders, the so-called “autism spectrum disorders”, including Asperberger’s Syndrome.  This coincides with the start of the dramatic increase in the prevalence of autism in the United States.  The World Health Organization (WHO) ICD-10 manual of metnal disorders, used in other developed nations, also adopted the “autism spectrum disorders” as about the same time.


Diagnostic and Statistical Manual of Mental Disorders IV

The major formal definitions of autism and autism spectrum disorders:

  • Kanner
    • Kanner (original paper in 1943)
    • Kanner and Eisenberg (1956)
  • DSM-III (1980)
  • DSM-III-R (1987)
  • ICD-10 (WHO)
  • DSM-IV (1994)
    • Autism Spectrum
    • Asperger’s Syndrome
  • More to come: DSM-V!

The APA is now proposing to change the definition of autism yet again in DSM-V which is slated to come out in the next few years.

Autism by Diagnostic Criteria

Autism by Diagnostic Criteria

Asperger’s Chic: Is Bill Gates Autistic?

Asperger’s Syndrome is a generally milder condition characterized by poor or seriously impaired social skills and relatively normal language skills.  Asperger’s syndrome was first described by the Austrian pediatrician Hans Asperger in 1944.  Once obscure, Asperger’s syndrome was added to the so-called autism spectrum disorders.  When people like Bill Gates, Paul Dirac, Grigori Perelman, Richard Borcherds, and Vernon Smith are identified as autistic or having a mild form of autism, this usually refers to an imputed diagnosis of Asperger’s syndrome.  This is often purely speculative.  Of these examples, only the Nobel Prize winning economist Vernon Smith has publicly claimed that he has Asperger’s syndrome.

Nobel Prize Economist Vernon Smith

Nobel Prize Economist Vernon Smith

Asperger’s syndrome in particular has been closely associated in popular culture with scientific, technical, and mathematical skills.  How accurate these popular associations are is debatable.  Social skills are acquired through practice.  Scientists, engineers, mathematicians, and others who spend large amounts of time with machines or symbols instead of people are likely to have poorer social skills than those who spend large amounts of time interacting with other people.  

It certainly has been the author’s personal experience that there is a high rate of odd, possibly irrational behavior, what might be characterized as psychological problems, among people who engage in heavily mathematical work.  There are folk traditions and some scientific studies like Baron-Cohen’s studies of autism in the families of technical people that support this view.

If real, are these problems Asperger’s syndrome or something else?  Mathematical work often involves very high levels of concentration over extended periods of time, meaning months or even years.  There is some reason to suspect these high levels of extended concentration can be unhealthy and have adverse effects.    

The inclusion of Asperger’s Syndrome (and other syndromes once distinct from Kanner’s autism or wholly new) in the autism spectrum as well as the vague, possibly changing working definition of Asperger’s greatly complicates the interpretation of the prevalence of autism and/or autism spectrum disorders, frequently reported in the popular press as simply “autism”.    Since 1990, the prevalence of autism or at least diagnoses of autism spectrum disorders has increased by a factor of about twenty in the United States.  Is there an “autism epidemic” or is the increase purely due to the changing definitions and greater awareness of the autism spectrum disorders — or some confusing combination of the two?

An Environmental Cause?

If the increase in the prevalence of autism is real or mostly real, it is probably due to an environmental cause.  Although there is evidence that genes play a role in autism, a twenty-fold increase in a genetic disease is implausible unless a mutagenic agent was causing a sharp increase in damaged genes.  Autism does not act like a contagious disease.  You don’t catch autism from your children.  Children don’t catch autism from other children.  Thus, a real increase would likely be due to some environmental factor or factors that is either new or has increased in prevalence over the last twenty to thirty years.

There are, of course, many possible candidates for an environmental cause or causes.  Many things have increased dramatically in the last twenty years or are entirely new.  The one which receives the most attention and bitter controversy with respect to autism is childhood vaccination.  Many parents with children diagnosed as autistic report something like the following:

My kid was developing normally, starting to speak, interact.  Then, around eighteen months, we took our kid in for a series of  vaccine shots, including the MMR (Measles/Mumps/Rubella) shot.  My kid seemed to have a reaction to the shots or got sick.  My kid was never the same after that.  My kid stopped speaking, began to behave strangely.  Then my kid was diagnosed as autistic.

In the United States, both the number of recommended vaccine shots and the number of different diseases vaccinated against has increased substantially over the last twenty to thirty years.

CDC 2012 Vaccine Schedule Closeup

CDC 2012 Vaccine Schedule Closeup

Under the Vaccines for Children (VFC) program inaugurated in 1994, the CDC spent $3.9 billion in 2010, over one third of their annual budget, purchasing vaccines for distribution to children in the United States, about one half the total revenues for vaccine sales in the United States.  The CDC is heavily invested in childhood vaccination.  

Many readers have probably heard of the medical journal The Lancet’s retraction of British physician and researcher Andrew Wakefield’s study of the MMR vaccine as fraudulent as well as Wakefield’s conviction for misconduct by the British General Medical Council. (GMC).  Wakefield was publicly pilloried by Bill Gates in a widely cited interview, for example.  Various lawsuits and appeals by Wakefield and his co-authors are on going.  For example, the conviction of his co-author Professor John Walker Smith by the GMC has, for the moment at least, been overturned.

The relationship between autism and vaccines is unfortunately much more complex than the sound bite media coverage of Wakefield’s woes.  The ambiguous and changing definitions of autism and autism spectrum disorders make it extremely difficult, if not impossible, to evaluate the possible relationship between vaccinations and autism, despite the passionate claims of both sides in the dispute.

There are, of course, many other possible causes that have increased dramatically or are entirely new, and several have also been suggested as causing or contributing to the supposed “autism epidemic”.  Things that are new and/or have increased dramatically over the last twenty to thirty years include cell phones, the Internet, general computer use, Diet Coke, aspartame (the sweetener used in Diet Coke), Starbucks coffee, anti-depressant drugs such as Prozac, and high fructose corn syrup (HFCS).  The reader with a little thought and/or research can probably identify many other possibilities.  

One may note, with respect to Wired’s The Geek Syndrome, that most of these possible causes are associated with the modern “geek” lifestyle.

Correlation does not prove causation.  Even if A and B are perfectly correlated, a rarity in the real world, A may cause B, B may cause A, A and B may share a common unidentified cause C, or the correlation may be pure coincidence.

Indeed, when there is imperfect measurement, changing definitions, or other problems with data collection as clearly occurs with the autism spectrum disorders, the lack of a correlation does not prove lack of causation.  For example, if half the overall increase is due to a real increase and half the increase is due to changing definitions, then the “noise” from the changing definitions could hide the true causal relationship.

The prevalence of autism spectrum disorders could continue to increase even though the use of the causal agent demonstrably declined after an initial increase.  The initial rise in autism was real but the continuing rise is due to the changing definitions and greater awareness of the disorders, hopelessly confusing a statistical analysis.

The Limits of Statistics

The autism enigma is an example of the limitations of mathematical modeling and seemingly sophisticated statistical methods in the real world.  Even in so-called hard sciences such as physics there can be bitter disputes and controversies over the results of data analyses such as the dispute that occurred with the putative faster than light neutrino measurements at the OPERA experiment just recently.  In the OPERA experiment, a complex analysis yielded a seemingly definitive five sigma signal, yet many physicists both within the OPERA collaboration and outside were skeptical.  The result did not feel right, not a very scientific criterion.  The dramatic result turned out to be due to a systematic bias, an incorrect timing measurement.

When scientists, medical researchers, and others try to apply mathematical modeling or statistical methods to data in economics, finance, marketing, medicine, biology, epidemiology, and other “softer” fields, there are almost always substantial and difficult problems with the selection of data, the definition of terms, various sources of serious and difficult to quantify bias (e.g. the CDC will look really bad if it turns out their vaccination program had anything to do with the rise in autism), and so forth.
 
Present day widely used mathematical and statistical methods such as regression analysis or the polynomial fit used in this article really can’t address these issues.  They assume the data is clean or relatively clean — that it is comparable to the results of flipping a coin in a lab or a fair game of chance with no cheating in a casino, the sort of situations these mathematical and statistical methods were often originally developed to study and model.

In the case of the autism spectrum disorders, the definitions of the many flavors of “autism” are quite general, not specific, and not quantitative.  In general terms, Bill Gates reportedly rocking back and forth in some situations is similar to “autistic” repetitive activity.  But the frequency and magnitude of this behavior differs dramatically from a child with “classic autism” engaged in rocking back and forth or some other repetitive behavior.  The definitions need to be specific, quantitative, and demonstrably repeatable by different psychiatrists in order to fairly compare different people, different times, and different populations.  

Conclusion

Remarkably, despite a truly dramatic, twenty-fold, increase in diagnoses of autism spectrum disorders in the last twenty years, we don’t know the cause of this increase or even if it is real.  What is needed to resolve this worrisome conundrum is at least:

  • Improved, specific, quantitative definitions of the autism spectrum disorders that do not change over time and are not susceptible to changing interpretations.
  • An improved, specific, quantitative definition of “classic autism” that can be compared reliably to the old data from the 1960’s, 1970’s, and 1980’s.
  • Truly independent, unbiased, disinterested research into the relationship between autism and vaccines, not funded or controlled by the CDC or other interested parties.
  • Truly independent, unbiased, disinterested research into the relationship between autism and other possible environmental causes.

Sadly, this is easy to say and very difficult to achieve in the modern world.

References

Simon Baron-Cohen, Patrick Bolton, Sally Wheelwright, Victoria Scahill
Liz Short, Genevieve Mead, and Alex Smith,  “Autism occurs more
often in families of physicists, engineers, and mathematicians.”
 Autism, 1998, 2, 296-301.

Sally Wheelwright and Simon Baron-Cohen, “The link between autism and skills such as engineering, maths, physics, and computing:  A reply to Jarrold and Routh”  (2001) Autism, 5, 223-227.

Kanner, L. & Eisenberg, L. (1956), Early Infantile Autism 1943-1955, American Journal of Orthopsychiatry 26, pp. 55–65. Reprinted in: Alexander et al., eds. Op. cit. Reprinted in Psychiat. Res. Repts. 1957 (April), American Psychiatric Assn., pp. 55–65.

Resources/Recommended Reading

The Strangest Man: The Hidden Life of Paul Dirac, Mystic of the Atom By Graham Farmelo, Basic Books (2009)

Perfect Rigor: A Genius and the Mathematical Breakthrough of the Century

By Masha Gessen
Houghton Mifflin Harcourt; 2nd Edition edition (November 11, 2009)

Credits

The picture of Vernon Smith is from Wikimedia Commons. It was produced by the US Federal Government and is in the public domain.

The picture of the CDC Recommended Vaccines for 2012 is from the CDC web site and is a work of the US Federal Government and is in the public domain.

© 2012 John F. McGowan

About the Author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.

Appendix: Data and Analysis

Original data from Autism_PrevalenceSummaryTable_2011.pdf  at the CDC Web Site.  This data was acquired by opening the CDC table in Adobe Acrobat Reader 9.5.0, selecting all of the text, copying, and pasting into a buffer in Notepad++  5.8.7.  A few fields in the CDC table were left blank which caused problems in reformatting the data as a tab-delimited file that could be read by GNU Octave.  Consequently, these blank fields were replaced with the text LEFT_BLANK in the data below.

Summary of Autism Spectrum Disorder (ASD) Prevalence Studies
Author
Year published
Country
Time period studied
Age range studied
Number of children in population
Criteria used
Methodology used
ASD prevalence (CI)
IQ<70 (%)
Lotter
1966
England
1964
8 to 10
78,000
Kanner
Case enumeration and direct exam
0.45 (0.31-0.62)
84
Brask
1970
Denmark
1962
2 to 14
46,500
Kanner
Case enumeration
0.43 (0.26-0.66)
NR
Treffert
1970
USA
1962-1967
3 to 12
899,750
Kanner
Case enumeration
0.07-0.31 (0.0-1.0)
NR
Wing & Gould
1979
England
1970
0 to 14
35,000
Kanner
Case enumeration and direct exam
0.49 (0.29-0.78)
70
Hoshino et al. (1)
1982
Japan
1977
0 to 17
234,039
Kanner
Case enumeration and direct exam
0.23 (0.19-0.27)
NR
Ishii & Takahashi
1983
Japan
1981
6 to 12
35,000
Rutter
Case enumeration and direct exam
1.6 (1.2-2.8)
NR
Bohman et al.
1983
Sweden
1979
0 to 20
69,000
Rutter
Case enumeration and direct exam
0.3 (0.2-0.5)
NR
McCarthy et al.
1984
Ireland
1978
8 to 10
65,000
Kanner
Case enumeration and direct exam
0.43 (0.29-0.59)
NR
Gillberg
1984
Sweden
1980
4 to 18
128,584
DSM-III
Case enumeration and direct exam
0.20 (0.13-0.30)
80, 77
Steinhausen et al.
1986
Germany
1982
0 to 14
279,616
Rutter
Case enumeration and direct exam
0.19 (0.14-0.24)
44
Author
Year published
Country
Time period studied
Age range studied
Number of children in population
Criteria used
Methodology used
ASD prevalence (CI)
IQ<70 (%)
Steffenberg & Gillberg
1986
Sweden
1984
<10
78,413
DSM-III
Case enumeration and direct exam
0.45 (0.31-0.62)
NR
Matsuishi et al.
1987
Japan
1983
4 to 12
32,834
DSM-III
Case enumeration and direct exam
1.55 (1.16-1.64)
NR
Burd et al.
1987
USA
1985
2 to 18
180,986
DSM-III
Case enumeration and direct exam
0.12 (0.00-0.20)
NR
Bryson et al.
1988
Canada
1985
6 to 14
20,800
DSM-III
Case enumeration and direct exam
1.01 (0.62-1.54)
76
Tanoue et al.
1988
Japan
1977-1985
3 to 7
95,394
DSM-III
Case enumeration
1.38 (1.16-1.64)
NR
Ciadella & Mamelle
1989
France
1986
3 to 9
135,180
DSM-III
Case enumeration
0.51 (0.39-0.63)
NR
Sugiyama & Abe
1989
Japan
1979-1984
2 to 5
12,263
DSM-III
Population screen and direct exam
1.3 (0.7-2.1)
38
Ritvo et al.
1989
USA
1984-1988
8 to 12
184,822
DSM-III
Case enumeration and direct exam
0.40 (0.31-0.50)
NR
Gillberg et al.
1991
Sweden
1988
4 to 13
78,106
DSM-III-R
Case enumeration and direct exam
0.95 (0.74-1.95)
82, 80
Fombonne & Mazaubrun (1)
1992
France
1985
9 to 13
274,816
ICD-10
Case enumeration and direct exam
0.49 (0.47-0.65)
87
Honda et al.
1996
Japan
1994
1.5 to 6
8,537
ICD-10
Population screen and direct exam
2.11 (1.25-3.33)
50
Author
Year published
Country
Time period studied
Age range studied
Number of children in population
Criteria used
Methodology used
ASD prevalence (CI)
IQ<70 (%)
Fombonne et al.
1997
France
1992-1993
6 to 16
325,347
ICD-10
Case enumeration and direct exam
0.54 (0.46-0.62)
88
Arivdsson et al.
1997
Sweden
1994
3 to 16
1,941
ICD-10
Population screen and direct exam
3.10 (1.14-6.72)
100
Webb et al.
1997
Wales
1992
3 to 15
73,300
DSM-III-R
Case enumeration and direct exam
0.72 (0.54-0.95)
NR
Sponheim & Skjeldae
1998
Norway
1992
3 to 14
65,688
ICD-10
Case enumeration and direct exam
0.38 (0.25-0.56)
64
Kadesjo et al.
1999
Sweden
1992
6.7 to 7.7
826
ICD-10
Case enumeration and direct exam
6.0 (1.97-14.1)
60
Baird et al.
2000
England
1998
1.5 to 8
16,235
ICD-10
Population screen and direct exam
3.1 (2.29-4.06)
40
Powell et al.
2000
England
1995
1 to 4
29,200
DSM-III-R or DSM-IV
Case enumeration
0.96 (0.64-1.39)
NR
Kielinen et al.
2000
Finland
1996
5 to 18
152,732
DSM-IV
Case enumeration
1.22 (1.06-1.41)
50
Magnusson & Saemundsen
2000
Iceland
1997
5 to 14
43,153
ICD-10
Population screen and direct exam
0.86 (0.60-1.18)
49
Chakrabarti & Fombonne
2001
England
1998
2.5 to 6.5
15,500
DSM-IV
Population screen and direct exam
1.68 (1.1-2.46)
24
Fombonne et al. (2)
2001
UK
1999
5 to 15
12,529
DSM-IV
Population screen and direct exam
2.61 (1.81-3.70)
44.4
Author
Year published
Country
Time period studied
Age range studied
Number of children in population
Criteria used
Methodology used
ASD prevalence (CI)
IQ<70 (%)
Bertrand et al.
2001
USA
1998
3 to 10
8,996
DSM-IV
Case enumeration and direct exam
4.0 (2.8-5.5)
49
Croen et al.
2001
USA
1987-1999
0 to 21
4,600,000
DSM-III-R or DSM-IV
Case enumeration
1.1 (1.06-1.14)
NR
Yeargin-Allsopp et al. (2)
2003
USA
1996
3 to 10
290,000
DSM-IV
Case enumeration
3.4 (3.2-3.6)
62
Gurney et al. (2)
2003
USA
1981-1982, 2001-2002
6 to 17
DSM-IV
Case enumeration
4.4 (4.3-4.5)
NR
Lingam et al.
2003
UK
2000
5 to 14
186,206
ICD-10
Case enumeration
1.5 (1.3-1.7)
NR
Icasiano et al.
2004
Australia
2002
2 to 17
45,153
DSM-IV
Case enumeration
3.9 (3.3-4.5)
47
Lauritsen et al.
2004
Denmark
2001
0 to 9
682,397
ICD-10
Case enumeration
1.2 (1.1-1.3)
NR
Fombonne et al.
2006
Canada
1987-1998
5 to 21
27,749
DSM-IV
Case enumeration
2.16 (1.65-2.78)
NR
Baird et al.
2006
UK
1990-1991
9 to 10
56,946
ICD-10
Case enumeration, screen, and direct exam
3.89 (3.39-4.43)
56
CDC ADDM Network (1)
2007
USA
2000
8
187,761
DSM-IV
Case enumeration and record review
6.7 (6.3-7.0)
36-61
CDC ADDM Network (1)
2007
USA
2002
8
444,050
DSM-IV
Case enumeration and record review
6.6 (6.3-6.8)
45
Author
Year published
Country
Time period studied
Age range studied
Number of children in population
Criteria used
Methodology used
ASD prevalence (CI)
IQ<70 (%)
Oullette-Kuntz et al.
2007
Canada
1996-2004
4 to 9
2,240,537
Special education classification
Case enumeration from special education classification
1.2 (1996), 4.3 (2004)
NR
Wong et al. (1)
2008
Hong Kong
1986-2005
0 to 14
4,247,206
DSM-IV
Case enumeration
1.6
NR
Williams et al.
2008
Australia
2003-2004
6 to 12
5,459
DSM-IV
Questionnaires
1.0 (0.8-1.0) to 4.1 (3.8-4.4)
NR
Montiel-Nava et al.
2008
Venezuela
2005-2006
3 to 9
254,905
DSM-IV
Case enumeration
1.7 (1.3-2.0)
NR
Baron-Cohen et al.
2009
UK
2003-2004
5 to 9
5,484
Special Education Needs register
Case enumeration from survey and direct exam
15.7 (9.9-24.6)
NR
CDC ADDM Network (1)
2009
USA
2004
8
172,335
DSM-IV
Case enumeration and record review
8.0 (7.6-8.4)
44
CDC ADDM Network (1)
2009
USA
2006
8
308,038
DSM-IV
Case enumeration and record review
9.0 (8.6-9.3)
41
Al-Farsi et al.
2010
Oman
2009
0 to 14
798,913
DSM-IV
Case enumeration
0.1 (0.1-0.2)
NR
Parner et al.
2011
Denmark
1994-1999
LEFT_BLANK
404,816
DSM-IV
Case enumeration
6.9 (6.5-7.2)
NR
Parner et al.
2011
Western Australia
1994-1999
LEFT_BLANK
152,060
DSM-IV
Case enumeration
5.1 (4.7-5.5)
NR
Chien et al.
2011
Taiwan
1996-2005
0 to 18
372,642
ICD-9
Case enumeration
2.9
NR
Author
Year published
Country
Time period studied
Age range studied
Number of children in population
Criteria used
Methodology used
ASD prevalence (CI)
IQ<70 (%)
Windham et al.
2011
USA
1994, 1996
0 to 8
82,153 (1994), 80,249 (1996)
DSM-IV
Case enumeration
4.7 (4.2-5.1) (1994); 4.7 (4.2-5.2) (1996)
NR
Kim et al.
2011
South Korea
2005-2009
7 to 12
55,266
DSM-IV
Case enumeration from survey and direct exam
26.4 (19.1-33.7)
59
Zimmerman et al.
2012
USA
2002, 2006, 2008
8
26,213 (2002); 29,494 (2006); 33,757 (2008)
ICD-9 and special education classification
Case enumeration
6.5 (2002), 10.2 (2006), 13.0 (2008)
NR
Kocovska et al.
2012
Faroe Islands
2002, 2009
7-16 (2002), 15-24 (2009)
7122 (2002), 7128 (2009)
DSM-IV, ICD-10
Screening and direct exam
5.6 (2002), 9.4 (2009)
NR
CDC ADDM Network (1)
2012
USA
2008
8
337,093
DSM-IV
Case enumeration and record review
11.3 (11.0-11.7)
38
(1) The prevalence reported represents the average. (2) The prevalence study provided overall rate only`

C program reformat.c
to reformat data from CDC web site from one field per line to a
tab-delmited file with one row per autism prevalence study.

#include 
#include 

int main(int argc, char ** argv)
{
	char szBuffer[256];
	int nFieldCount = 0;
	printf("reformatting autism data\n");
	
	FILE *fp = fopen("autism_data_working_copy.txt", "r");
	if(fp)
	{
		FILE * fpOut = fopen("reformat_data.txt", "w");
		if(fpOut)
		{
			if(fgets(szBuffer, 256, fp) != NULL) // skip first field -- table title
			{
				while(fgets(szBuffer, 256, fp) != NULL) // get a line from
					{
						nFieldCount++;
						unsigned int nEnd = (unsigned) strlen(&szBuffer[0]) - 1;
						if(szBuffer[nEnd] == '\n' || szBuffer[nEnd] == '\r')
							szBuffer[nEnd] = (char) 0;
						if(szBuffer[nEnd-1] == '\n' || szBuffer[nEnd-1] == '\r')
							szBuffer[nEnd-1] = (char) 0;
						
						// replace strings with number codes that Octave can handle
						
						if(strcmp(szBuffer, "USA") == 0)
							strcpy(szBuffer, "1");  // use number 1 for USA
							
						if(strcmp(szBuffer, "UK") == 0)
							strcpy(szBuffer, "2");
							
						if(strcmp(szBuffer, "England") == 0)
							strcpy(szBuffer, "3");
							
						if(strcmp(szBuffer, "Sweden") == 0)
							strcpy(szBuffer, "4");
							
						if(strcmp(szBuffer, "Canada") == 0)
							strcpy(szBuffer, "5");
							
						if(strcmp(szBuffer, "Australia") == 0)
							strcpy(szBuffer, "6");
							
						if(strcmp(szBuffer, "Japan") == 0)
							strcpy(szBuffer, "7");
							
						if(strcmp(szBuffer, "Germany") == 0)
							strcpy(szBuffer, "8");
							
						if(strcmp(szBuffer, "France") == 0)
							strcpy(szBuffer, "9");
							
						if(strcmp(szBuffer, "Ireland") == 0)
							strcpy(szBuffer, "10");
							
						if(strcmp(szBuffer, "Denmark") == 0)
							strcpy(szBuffer, "11");
							
						if(strcmp(szBuffer, "South Korea") == 0)
							strcpy(szBuffer, "12");
							
						// diagnosis criteria
						if(strcmp(szBuffer, "Kanner") == 0)
							strcpy(szBuffer, "1");
							
						if(strcmp(szBuffer, "DSM-III") == 0)
							strcpy(szBuffer, "2");
							
						if(strcmp(szBuffer, "DSM-III-R") == 0)
							strcpy(szBuffer, "3");
							
						if(strcmp(szBuffer, "ICD-10") == 0)
							strcpy(szBuffer, "4");
							
						if(strcmp(szBuffer, "DSM-IV") == 0)
							strcpy(szBuffer, "5");
							
						if(strcmp(szBuffer, "ICD-9") == 0)
							strcpy(szBuffer, "6");
						
						
							
						if (nFieldCount % 10)
							fprintf(fpOut, "%s\t", szBuffer);
						else
							fprintf(fpOut, "%s\n", szBuffer);
							
					}
			}
			fclose(fpOut);
		}
		else
		{
			fprintf(stderr, "Unable to open output file\n");
		}
		fclose(fp);
	}
	else
	{
		fprintf(stderr, "Unable to open data file for input!\n");
	}

}

Reformated data from the CDC web site reformat_data.txt

Author	Year published	Country	Time period studied	Age range studied	Number of children in population	Criteria used	Methodology used	ASD prevalence (CI)	IQ<70 (%)
Lotter	1966	3	1964	8 to 10	78,000	1	Case enumeration and direct exam	0.45 (0.31-0.62)	84
Brask	1970	11	1962	2 to 14	46,500	1	Case enumeration	0.43 (0.26-0.66)	NR
Treffert	1970	1	1962-1967	3 to 12	899,750	1	Case enumeration	0.07-0.31 (0.0-1.0)	NR
Wing & Gould	1979	3	1970	0 to 14	35,000	1	Case enumeration and direct exam	0.49 (0.29-0.78)	70
Hoshino et al. (1)	1982	7	1977	0 to 17	234,039	1	Case enumeration and direct exam	0.23 (0.19-0.27)	NR
Ishii & Takahashi	1983	7	1981	6 to 12	35,000	Rutter	Case enumeration and direct exam	1.6 (1.2-2.8)	NR
Bohman et al.	1983	4	1979	0 to 20	69,000	Rutter	Case enumeration and direct exam	0.3 (0.2-0.5)	NR
McCarthy et al.	1984	10	1978	8 to 10	65,000	1	Case enumeration and direct exam	0.43 (0.29-0.59)	NR
Gillberg	1984	4	1980	4 to 18	128,584	2	Case enumeration and direct exam	0.20 (0.13-0.30)	80, 77
Steinhausen et al.	1986	8	1982	0 to 14	279,616	Rutter	Case enumeration and direct exam	0.19 (0.14-0.24)	44
Author	Year published	Country	Time period studied	Age range studied	Number of children in population	Criteria used	Methodology used	ASD prevalence (CI)	IQ<70 (%)
Steffenberg & Gillberg	1986	4	1984	<10	78,413	2	Case enumeration and direct exam	0.45 (0.31-0.62)	NR
Matsuishi et al.	1987	7	1983	4 to 12	32,834	2	Case enumeration and direct exam	1.55 (1.16-1.64)	NR
Burd et al.	1987	1	1985	2 to 18	180,986	2	Case enumeration and direct exam	0.12 (0.00-0.20)	NR
Bryson et al.	1988	5	1985	6 to 14	20,800	2	Case enumeration and direct exam	1.01 (0.62-1.54)	76
Tanoue et al.	1988	7	1977-1985	3 to 7	95,394	2	Case enumeration	1.38 (1.16-1.64)	NR
Ciadella & Mamelle	1989	9	1986	3 to 9	135,180	2	Case enumeration	0.51 (0.39-0.63)	NR
Sugiyama & Abe	1989	7	1979-1984	2 to 5	12,263	2	Population screen and direct exam	1.3 (0.7-2.1)	38
Ritvo et al.	1989	1	1984-1988	8 to 12	184,822	2	Case enumeration and direct exam	0.40 (0.31-0.50)	NR
Gillberg et al.	1991	4	1988	4 to 13	78,106	3	Case enumeration and direct exam	0.95 (0.74-1.95)	82, 80
Fombonne & Mazaubrun (1)	1992	9	1985	9 to 13	274,816	4	Case enumeration and direct exam	0.49 (0.47-0.65)	87
Honda et al.	1996	7	1994	1.5 to 6	8,537	4	Population screen and direct exam	2.11 (1.25-3.33)	50
Author	Year published	Country	Time period studied	Age range studied	Number of children in population	Criteria used	Methodology used	ASD prevalence (CI)	IQ<70 (%)
Fombonne et al.	1997	9	1992-1993	6 to 16	325,347	4	Case enumeration and direct exam	0.54 (0.46-0.62)	88
Arivdsson et al.	1997	4	1994	3 to 16	1,941	4	Population screen and direct exam	3.10 (1.14-6.72)	100
Webb et al.	1997	Wales	1992	3 to 15	73,300	3	Case enumeration and direct exam	0.72 (0.54-0.95)	NR
Sponheim & Skjeldae	1998	Norway	1992	3 to 14	65,688	4	Case enumeration and direct exam	0.38 (0.25-0.56)	64
Kadesjo et al.	1999	4	1992	6.7 to 7.7	826	4	Case enumeration and direct exam	6.0 (1.97-14.1)	60
Baird et al.	2000	3	1998	1.5 to 8	16,235	4	Population screen and direct exam	3.1 (2.29-4.06)	40
Powell et al.	2000	3	1995	1 to 4	29,200	DSM-III-R or DSM-IV	Case enumeration	0.96 (0.64-1.39)	NR
Kielinen et al.	2000	Finland	1996	5 to 18	152,732	5	Case enumeration	1.22 (1.06-1.41)	50
Magnusson & Saemundsen	2000	Iceland	1997	5 to 14	43,153	4	Population screen and direct exam	0.86 (0.60-1.18)	49
Chakrabarti & Fombonne	2001	3	1998	2.5 to 6.5	15,500	5	Population screen and direct exam	1.68 (1.1-2.46)	24
Fombonne et al. (2)	2001	2	1999	5 to 15	12,529	5	Population screen and direct exam	2.61 (1.81-3.70)	44.4
Author	Year published	Country	Time period studied	Age range studied	Number of children in population	Criteria used	Methodology used	ASD prevalence (CI)	IQ<70 (%)
Bertrand et al.	2001	1	1998	3 to 10	8,996	5	Case enumeration and direct exam	4.0 (2.8-5.5)	49
Croen et al.	2001	1	1987-1999	0 to 21	4,600,000	DSM-III-R or DSM-IV	Case enumeration	1.1 (1.06-1.14)	NR
Yeargin-Allsopp et al. (2)	2003	1	1996	3 to 10	290,000	5	Case enumeration	3.4 (3.2-3.6)	62
Gurney et al. (2)	2003	1	1981-1982, 2001-2002	6 to 17	LEFT_BLANK	5	Case enumeration	4.4 (4.3-4.5)	NR
Lingam et al.	2003	2	2000	5 to 14	186,206	4	Case enumeration	1.5 (1.3-1.7)	NR
Icasiano et al.	2004	6	2002	2 to 17	45,153	5	Case enumeration	3.9 (3.3-4.5)	47
Lauritsen et al.	2004	11	2001	0 to 9	682,397	4	Case enumeration	1.2 (1.1-1.3)	NR
Fombonne et al.	2006	5	1987-1998	5 to 21	27,749	5	Case enumeration	2.16 (1.65-2.78)	NR
Baird et al.	2006	2	1990-1991	9 to 10	56,946	4	Case enumeration, screen, and direct exam	3.89 (3.39-4.43)	56
CDC ADDM Network (1)	2007	1	2000	8	187,761	5	Case enumeration and record review	6.7 (6.3-7.0)	36-61
CDC ADDM Network (1)	2007	1	2002	8	444,050	5	Case enumeration and record review	6.6 (6.3-6.8)	45
Author	Year published	Country	Time period studied	Age range studied	Number of children in population	Criteria used	Methodology used	ASD prevalence (CI)	IQ<70 (%)
Oullette-Kuntz et al.	2007	5	1996-2004	4 to 9	2,240,537	Special education classification	Case enumeration from special education classification	1.2 (1996), 4.3 (2004)	NR
Wong et al. (1)	2008	Hong Kong	1986-2005	0 to 14	4,247,206	5	Case enumeration	1.6	NR
Williams et al.	2008	6	2003-2004	6 to 12	5,459	5	Questionnaires	1.0 (0.8-1.0) to 4.1 (3.8-4.4)	NR
Montiel-Nava et al.	2008	Venezuela	2005-2006	3 to 9	254,905	5	Case enumeration	1.7 (1.3-2.0)	NR
Baron-Cohen et al.	2009	2	2003-2004	5 to 9	5,484	Special Education Needs register	Case enumeration from survey and direct exam	15.7 (9.9-24.6)	NR
CDC ADDM Network (1)	2009	1	2004	8	172,335	5	Case enumeration and record review	8.0 (7.6-8.4)	44
CDC ADDM Network (1)	2009	1	2006	8	308,038	5	Case enumeration and record review	9.0 (8.6-9.3)	41
Al-Farsi et al.	2010	Oman	2009	0 to 14	798,913	5	Case enumeration	0.1 (0.1-0.2)	NR
Parner et al.	2011	11	1994-1999	LEFT_BLANK	404,816	5	Case enumeration	6.9 (6.5-7.2)	NR
Parner et al.	2011	Western Australia	1994-1999	LEFT_BLANK	152,060	5	Case enumeration	5.1 (4.7-5.5)	NR
Chien et al.	2011	Taiwan	1996-2005	0 to 18	372,642	6	Case enumeration	2.9	NR
Author	Year published	Country	Time period studied	Age range studied	Number of children in population	Criteria used	Methodology used	ASD prevalence (CI)	IQ<70 (%)
Windham et al.	2011	1	1994, 1996	0 to 8	82,153 (1994), 80,249 (1996)	5	Case enumeration	4.7 (4.2-5.1) (1994); 4.7 (4.2-5.2) (1996)	NR
Kim et al.	2011	12	2005-2009	7 to 12	55,266	5	Case enumeration from survey and direct exam	26.4 (19.1-33.7)	59
Zimmerman et al.	2012	1	2002, 2006, 2008	8	26,213 (2002); 29,494 (2006); 33,757 (2008)	ICD-9 and special education classification	Case enumeration	6.5 (2002), 10.2 (2006), 13.0 (2008)	NR
Kocovska et al.	2012	Faroe Islands	2002, 2009	7-16 (2002), 15-24 (2009)	7122 (2002), 7128 (2009)	DSM-IV, ICD-10	Screening and direct exam	5.6 (2002), 9.4 (2009)	NR
CDC ADDM Network (1)	2012	1	2008	8	337,093	5	Case enumeration and record review	11.3 (11.0-11.7)	38
(1) The prevalence reported represents the average. (2) The prevalence study provided overall rate only`	

GNU Octave script autism_analysis.m
used to process the data and
generate the plots:

autism_data = dlmread('reformat_data.txt', '\t');
dates = real(autism_data(:,4));  % time period studied
country = autism_data(:,3);    % country code usa is 1
valid = find(country >= 1);
usa = find(country == 1);
uk = find(country == 2);
eng = find(country == 3);
sweden = find(country == 4);
canada = find(country == 5);
australia = find(country == 6);
japan = find(country == 7);
germany = find(country == 8);
france = find(country == 9);
ireland = find(country == 10);
denmark = find(country == 11);
sk = find(country == 12);  % South Korea (most extreme autism rate)

diagnosis = autism_data(:, 7); % diagnostic criteria
kanner = find(diagnosis == 1);
dsm3 = find(diagnosis == 2);
dsm4 = find(diagnosis == 5);
icd10 = find(diagnosis == 4);

prevalence = real(autism_data(:,9));

[p_autism, s] = polyfit(dates(valid), prevalence(valid), 3);
mydates = 1960:2012;
simrate_world = polyval(p_autism, mydates);

figure(1)
h1 = plot(dates, prevalence, 'o', mydates, simrate_world, '-r');
set(h1, 'linewidth', 3);
axis([1960 2012 0.0 30.0]);
ylabel('cases per 1000 children', 'fontsize', 14)
xlabel('Year', 'fontsize', 14)
title('Summary of Autism Spectrum Disorder (ASD) Prevalence Studies', 'fontsize', 14);
legend('DATA', 'FIT', 'location', 'northwest');
print('world_autism.jpg');

p_usa = polyfit(dates(usa), prevalence(usa), 3);
fit_usa = polyval(p_usa, mydates);

figure(2)
h2 = plot(dates(usa), prevalence(usa), 'o', mydates, fit_usa, '-r');
set(h2, 'linewidth', 3);
axis([1960 2012 0.0 10.0]);
ylabel('cases per 1000 children', 'fontsize', 14)
xlabel('Year',  'fontsize', 14)
title('Autism Spectrum Disorder Prevalence (USA)',  'fontsize', 14);
legend('DATA', 'FIT', 'location', 'northwest');
print('usa_autism.jpg');

figure(3)
plot(dates(kanner), prevalence(kanner), 'o');
axis([1960 2012 0.0 10.0]);
ylabel('cases per 1000 children')
xlabel('Year')
title('Autism Spectrum Disorder Prevalence (Kanner)');

figure(4)
plot(dates(dsm3), prevalence(dsm3), 'o');
axis([1960 2012 0.0 10.0]);
ylabel('cases per 1000 children')
xlabel('Year')
title('Autism Spectrum Disorder Prevalence (DSM-III)');

figure(5)
plot(dates(dsm4), prevalence(dsm4), 'o');
axis([1960 2012 0.0 30.0]);
ylabel('cases per 1000 children')
xlabel('Year')
title('Autism Spectrum Disorder Prevalence (DSM-IV)');

figure(6)
plot(dates(icd10), prevalence(icd10), 'o');
axis([1960 2012 0.0 10.0]);
ylabel('cases per 1000 children')
xlabel('Year')
title('Autism Spectrum Disorder Prevalence (ICD-10)');

figure(7)
h7 = plot(dates(usa), prevalence(usa), 'ob', 
dates(uk), prevalence(uk), 'or', 
dates(sweden), prevalence(sweden), 'ok', 
dates(denmark), prevalence(denmark), '*b', 
dates(japan), prevalence(japan), '*r', 
dates(eng), prevalence(eng), '*k', 
dates(france), prevalence(france), '+b',
dates(germany), prevalence(germany), '+r',
dates(canada), prevalence(canada), '+k',
dates(australia), prevalence(australia), 'xb',
dates(ireland), prevalence(ireland), 'xr',
dates(sk), prevalence(sk), 'xk',
mydates, simrate_world, '-r');

set(h7, 'linewidth', 3);
axis([1960 2012 0.0 30.0]);
ylabel('cases per 1000 children',  'fontsize', 14)
xlabel('Year',  'fontsize', 14)
title('Autism Spectrum Disorder Prevalence (By Country)',  'fontsize', 14);
legend('USA', 'UK', 'SWEDEN', 'DENMARK', 'JAPAN', 'ENGLAND', 'FRANCE', 
'GERMANY', 'CANADA', 'AUSTRALIA', 'IRELAND', 'SOUTH KOREA', 'FIT',
'location', 'northwest');
print('autism_by_nation.jpg');

figure(8)
h8 = plot(dates(kanner), prevalence(kanner), 'ob', 
dates(dsm3), prevalence(dsm3), 'or',
dates(icd10), prevalence(icd10), 'ok',
dates(dsm4), prevalence(dsm4), '*b');

set(h8, 'linewidth', 3);
axis([1960 2012 0.0 30.0]);
ylabel('cases per 1000 children',  'fontsize', 14)
xlabel('Year',  'fontsize', 14)
title('Autism Spectrum Disorder Prevalence (By Diagnostic Criteria)',  'fontsize', 14);
legend('KANNER', 'DSM-III', 'ICD-10', 'DSM-IV', 'location', 'northwest');
print('autism_by_criteria.jpg');


Get more stuff like this

Get interesting math updates directly in your inbox.

7 Comments

  1. Preeti Edul July 10, 2012
  2. Jason Karam July 12, 2012
  3. Piltdown Proof July 15, 2012
  4. Jeremy Leader July 16, 2012
  5. John F. McGowan July 16, 2012
  6. Piltdown Proof August 11, 2012
  7. Pingback: How to Hang Yourself with Statistics September 2, 2012
  8. John F. McGowan September 9, 2012

Leave a Reply

Join thousands of
math enthusiasts

Get free math updates directly in your inbox.