<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Math-Blog &#187; Probability Theory and Statistics</title>
	<atom:link href="http://math-blog.com/category/probability-theory-and-statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://math-blog.com</link>
	<description>Mathematics is wonderful!</description>
	<lastBuildDate>Thu, 19 Jan 2012 19:41:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>The Cold Hit Problem</title>
		<link>http://math-blog.com/2011/09/25/the-cold-hit-problem/</link>
		<comments>http://math-blog.com/2011/09/25/the-cold-hit-problem/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 22:31:13 +0000</pubDate>
		<dc:creator>John F. McGowan, Ph.D.</dc:creator>
				<category><![CDATA[Applied Math]]></category>
		<category><![CDATA[Probability Theory and Statistics]]></category>

		<guid isPermaLink="false">http://math-blog.com/?p=1004</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2011/09/25/the-cold-hit-problem/' addthis:title='The Cold Hit Problem '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div>The previous article Are Fingerprints Unique? discussed the case of Brandon Mayfield, a Muslim American attorney from the Portland, Oregon area who was wrongly identified as one of the Madrid train bombers in 2004 by the FBI based on an erroneous fingerprint identification. The Mayfield case is probably the most famous case of an incorrect [...]<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2011/09/25/the-cold-hit-problem/' addthis:title='The Cold Hit Problem ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Possibly related articles:<ol>
<li><a href='http://math-blog.com/2010/09/21/bad-mathematics-a-trillion-dollar-problem/' rel='bookmark' title='Bad Mathematics: A Trillion Dollar Problem'>Bad Mathematics: A Trillion Dollar Problem</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2011/09/25/the-cold-hit-problem/' addthis:title='The Cold Hit Problem '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div><p>The previous article <a href="http://math-blog.com/2011/09/20/are-fingerprints-unique/" title="Are Fingerprints Unique?" target="_blank">Are Fingerprints Unique?</a> discussed the case of <a href="http://en.wikipedia.org/wiki/Brandon_Mayfield" title="Brandon Mayfield Wikipedia Page (Controversial Topic)" target="_blank">Brandon Mayfield</a>, a Muslim American attorney from the Portland, Oregon area who was wrongly identified as one of the Madrid train bombers in 2004 by the FBI based on an erroneous fingerprint identification.</p>
<p>The Mayfield case is probably the most famous case of an incorrect fingerprint identification. The Mayfield case is an example of a &#8220;cold hit&#8221; in which a huge biometric database was searched for a possible match to an unknown fingerprint taken from a crime scene. Unlike suspects with plausible links to the crime, there was nothing specific to connect Mayfield to the crime other than the database search match.</p>
<p>There are subtle and serious mathematical and statistical problems with cold hits, which occur with both DNA profiling and fingerprint identification. This article explores in detail the mathematics and statistics of the cold hit problem.</p>
<p>The cold hit problem is closely related to a well-known problem in probability and statistics known as the birthday problem. Imagine a room full of people: Bob, Frank, Mary, Estelle, and others. Each person has a birthday: May 1, December 13, March 11, July 17, and so on.</p>
<p>Not knowing the birthdays of the people in the room, what is the probability that at least two people in the room have the same birthday? How many people need to be in the room for there to be an even (50/50) chance that at least two people in the room have the same birthday?</p>
<p>A naive and incorrect answer would be to reason as follows. There are three-hundred and sixty-five (365) days in the year. The probability that two people have the same birthday is 1/365. Therefore, the probability that at least one pair of people in a room with N people have the same birthday is about N/365. Thus the room needs about 183 people for an even chance of a match. The actual answer is twenty-three (23) people, much smaller than 183!</p>
<p>Let us consider the problem in detail. First, what is the probability that Bob and Frank have the same birthday? There is a 1/365 chance that Bob was born on January 1. There is a 1/365 chance that Frank was born on January 1. Thus, there is a 1/(365*365) chance that both Bob and Frank were born on January 1. There are, however, three hundred and sixty-five days in the year, so the probability that Bob and Frank were born on the same day is 365/(365*365) or 1/365.</p>
<p>We need to find the probability that at least one pair of people in the room (Bob and Frank, Bob and Mary, Bob and Estelle, Frank and Mary, Frank and Estelle, Mary and Estelle, and all other possible distinct pairs) have the same birthday. If there are N people in the room, there will be <img src='http://math-blog.com/wp-content/latex/pictures/1f76062b0901740d5c15d6163d5ced1b.png' title='N(N-1)/2' alt='N(N-1)/2' align=absmiddle> distinct possible pairs of people. Each pair will have a probability of 1/365 of having the same birthday.</p>
<p>The probability that at least one pair of people have the same birthday is:</p>
<pre class="mathcode">
P = 1.0 - (Probability that the Pair Does Not Have the Same Birthday)^(Number of Distinct Pairs of People)
</pre>
<p>which is</p>
<pre class="mathcode">
P = 1.0 - (Number of Distinct Pairs of People)(Probility that the Pair Does Not Have the Same Birthday)

or 

P = 1.0 - (1.0 - 1/365)^(N(N-1)/2)
</pre>
<p>It turns out that P is 0.50048, almost exactly even, for N = 23. The number of distinct pairs of people in the room is proportional to the square of the number of people in the room <img src='http://math-blog.com/wp-content/latex/pictures/e9cf3d4484205b8d46be1978e28a2e30.png' title='(N(N-1)/2) ' alt='(N(N-1)/2) ' align=absmiddle>, <I>not</I> the number of people in the room (N). Hence, it takes far fewer people in the room than one would naively expect for there to be an even chance that at least two people in the room have the same birthday.</p>
<div id="attachment_1005" class="wp-caption aligncenter" style="width: 310px"><a href="http://math-blog.com/wp-content/uploads/2011/09/prob_bday.jpg"><img src="http://math-blog.com/wp-content/uploads/2011/09/prob_bday-300x225.jpg" alt="Probability At Least Two People in Room Have Same Birthday" title="Probability At Least Two People in Room Have Same Birthday" width="300" height="225" class="size-medium wp-image-1005" /></a>
<p class="wp-caption-text">Probability At Least Two People in Room Have Same Birthday</p>
</div>
<p>The plot of the probability of at least two people in a room having the same birthday was generated using the two Octave scripts below: <I>birthday.m</I> and <I>plot_bday.m</I>.</p>
<p><a href="http://www.gnu.org/software/octave/" title="GNU Octave" target="_blank">Octave</a> is a free open-source numerical programming environment that is mostly compatible with <a href="http://www.mathworks.com/products/matlab/" title="MATLAB Web Site" target="_blank">MATLAB</a>.  </p>
<p><I>birthday.m</I></p>
<pre class="mathcode">

function [p] = birthday(n, m, bTrace)
% p = birthday(n [, m, bTrace])
% probability that at least one pair of members of set of N have same birthday (M days in year)
% n  number of people
% m  number of "days" in year (default value = 365)
% bTrace flag to trace operation of function (default value = false)
%
% (C) 2011 John F. McGowan
% E-Mail: jmcgowan11@earthlink.net
% 

if nargin < 2
	m = 365;
	bTrace = false;
end

if nargin < 3
	bTrace = false;
end

p = 0.0;

p_no_pair = 1.0; % probability no pair of people in the sample have the same birthday

% loop over pairs of people in the sample (room full of people)
% brute force
% for i = 1:n
	% for j = i+1:n
	% p_pair = m*(1/m)*(1/m); % probability i and j have same birthday
	% p_no_pair = p_no_pair*(1.0 - p_pair);
	% end
% end

% fast
number_pairs = n * (n-1)/2;
p_pair = m*(1/m)*(1/m);
p_no = 1.0 - p_pair;
if bTrace
	printf("number_pairs: %d  p_pair: %f p_no: %f\n", number_pairs, p_pair, p_no);
	fflush(stdout);
end % if

p_no_pair = p_no_pair*power( p_no, number_pairs);

p = 1.0 - p_no_pair;

end % function
</pre>
<p><I>plot_bday.m</I></p>
<pre class="mathcode">

% plot probability of at least two people having the same birthday
% in a room full of N people
%
% (C) 2011 John F. McGowan, Ph.D.
% E-Mail: jmcgowan11@earthlink.net
%

p = zeros(1,100);

for i=1:100
	if mod(i, 10) == 0
		printf("processing %d people in the room\n", i);
	end
	p(i) = birthday(i);
end

printf("displaying graph");
fflush(stdout);

figure(1);
plot(p);
title('Probability At Least Two People Have Same Birthday');
ylabel('P');
xlabel('Number of People in Room');

printf("writing plot to file prob_bday.jpg");
fflush(stdout);

print('prob_bday.jpg');
</pre>
<p>What does the birthday problem have to do with fingerprint identification, DNA profiling, or other forms of biometric identification? Replace the people in the room with fingerprints or other biometric identifiers (DNA profiles, iris images, faces,...) in a database.</p>
<p>Replace the three-hundred and sixty-five distinct birthdays with thousands, millions or more distinct biometric identification codes derived from the fingerprint, DNA profile, iris, or other form of identification. The pairs of people with the same birthday become pairs of people with the same fingerprint or other biometric identifier: the actual criminal who commits a crime and at least one other innocent person.</p>
<p>What happens if a fingerprint database has 100 million people and the chance of two people having the same fingerprint (we are referring to the same partial prints such as a thumb print lifted from a crime scene) is only one in a trillion (<img src='http://math-blog.com/wp-content/latex/pictures/60d5aa23bf6ec2eb3fc7e9de22535629.png' title='10^{12}' alt='10^{12}' align=absmiddle>).</p>
<p>Astonishingly, the probability of at least two people in the database having the same fingerprint is almost one (1.0). This is because there are (100,000,000)(99,999,999)/2 possible pairs of people in the database &mdash; about five quadrillion (1,000 trillion) possible pairs. Even though the probability of any two people having the same fingerprint is extremely low, at least one misidentification occurring somewhere in the system is almost certain (probability 1.0).</p>
<p>The FBI fingerprint database contains about 200 million people, accumulated since the 1920s, and the probability of two people having identical or indistinguishable partial fingerprints (or even all ten fingerprints) is unknown. </p>
<p>DNA profiles are currently claimed to have a probability of two people having the same profile of about one in ten trillion. With cold hits, with a search of a large database of DNA profiles such as are currently being collected, it is actually likely that there will be incorrect matches somewhere in the system.</p>
<p>Brandon Mayfield probably fell victim, in part, to the counter-intuitive statistics of the birthday problem. As the size of biometric databases collected by governments, law enforcement agencies, intelligence agencies, and private companies grows, the cold hit problem will grow &mdash; as the square of the number of entries in the databases.</p>
<p>If everyone, all of the nearly seven billion people on Earth, was in the databases, one could produce a list of all possible suspects based on fingerprint or other biometric identification alone. This could easily be hundreds or thousands or even more people.</p>
<p>How does one handle possible suspects who lack an adequate alibi and could have flown to a crime? How many of those possible suspects will have some tenuous seven degrees of separation connection to the crime? Brandon Mayfield was a Muslim American who had represented an alleged Islamic terrorist in a child custody case: a tenuous but possible connection to the terrorists responsible for the Madrid train bombings. This is the crux of the cold hit problem.</p>
<p>© 2011 John F. McGowan</p>
<p><strong>About the Author</strong></p>
<p><em>John F. McGowan, Ph.D.</em> solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his <a title="John McGowan's AVI Overview" href="http://www.jmcgowan.com/avi.html" target="_blank">AVI Overview</a>, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at<a title="NASA Ames Research Center" href="http://www.nasa.gov/centers/ames/home/index.html" target="_blank"> NASA Ames Research Center</a> involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the <a title="Department of Physics University of Illinois" href="http://physics.illinois.edu/" target="_blank">University of Illinois at Urbana-Champaign</a> and a B.S. in physics from the <a title="Caltech Homepage" href="http://www.caltech.edu/" target="_blank">California Institute of Technology</a> (Caltech). He can be reached at <a title="send mail to john" href="mailto://jmcgowan11@earthlink.net" target="_blank">jmcgowan11@earthlink.net</a>.</p>
<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2011/09/25/the-cold-hit-problem/' addthis:title='The Cold Hit Problem ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Possibly related articles:<ol>
<li><a href='http://math-blog.com/2010/09/21/bad-mathematics-a-trillion-dollar-problem/' rel='bookmark' title='Bad Mathematics: A Trillion Dollar Problem'>Bad Mathematics: A Trillion Dollar Problem</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://math-blog.com/2011/09/25/the-cold-hit-problem/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>What are the odds? Flipping 10 heads in a row</title>
		<link>http://math-blog.com/2011/08/20/what-are-the-odds-flipping-10-heads-in-a-row/</link>
		<comments>http://math-blog.com/2011/08/20/what-are-the-odds-flipping-10-heads-in-a-row/#comments</comments>
		<pubDate>Sat, 20 Aug 2011 22:50:24 +0000</pubDate>
		<dc:creator>Antonio Cangiano</dc:creator>
				<category><![CDATA[Essential Math]]></category>
		<category><![CDATA[Probability Theory and Statistics]]></category>

		<guid isPermaLink="false">http://math-blog.com/?p=986</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2011/08/20/what-are-the-odds-flipping-10-heads-in-a-row/' addthis:title='What are the odds? Flipping 10 heads in a row '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div>singingbanana released an interesting video about the odds of flipping 10 heads in a row. It is basic probability and the video is entertaining enough to warrant sharing it with your friends, regardless of their mathematical background. Possibly related articles: The Cost of Not Understanding Probability Theory<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2011/08/20/what-are-the-odds-flipping-10-heads-in-a-row/' addthis:title='What are the odds? Flipping 10 heads in a row ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
Possibly related articles:<ol>
<li><a href='http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/' rel='bookmark' title='The Cost of Not Understanding Probability Theory'>The Cost of Not Understanding Probability Theory</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2011/08/20/what-are-the-odds-flipping-10-heads-in-a-row/' addthis:title='What are the odds? Flipping 10 heads in a row '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div><p><a href="http://singingbanana.tumblr.com/post/9166555322/flipping-10-heads-in-a-row-full-video-by">singingbanana</a> released <a href="http://www.youtube.com/watch?v=rwvIGNXY21Y">an interesting video</a> about the odds of flipping 10 heads in a row. It is basic probability and the video is entertaining enough to warrant sharing it with your friends, regardless of their mathematical background.</p>
<p align="center">
<iframe width="560" height="345" src="http://www.youtube.com/embed/rwvIGNXY21Y?rel=0" frameborder="0" allowfullscreen></iframe></a></p>
<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2011/08/20/what-are-the-odds-flipping-10-heads-in-a-row/' addthis:title='What are the odds? Flipping 10 heads in a row ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>Possibly related articles:<ol>
<li><a href='http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/' rel='bookmark' title='The Cost of Not Understanding Probability Theory'>The Cost of Not Understanding Probability Theory</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://math-blog.com/2011/08/20/what-are-the-odds-flipping-10-heads-in-a-row/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>When Science Fails</title>
		<link>http://math-blog.com/2011/01/10/when-science-fails/</link>
		<comments>http://math-blog.com/2011/01/10/when-science-fails/#comments</comments>
		<pubDate>Mon, 10 Jan 2011 23:55:22 +0000</pubDate>
		<dc:creator>John F. McGowan, Ph.D.</dc:creator>
				<category><![CDATA[Applied Math]]></category>
		<category><![CDATA[Probability Theory and Statistics]]></category>

		<guid isPermaLink="false">http://math-blog.com/?p=819</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2011/01/10/when-science-fails/' addthis:title='When Science Fails '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div>The recent New Yorker article The Truth Wears Off: Is there something wrong with the scientific method? by Jonah Lehrer (December 13, 2010) discusses several cases where a new scientific result was initially confirmed by several seemingly independent scientific studies and then subsequently faded away, sometimes to nothing. To quote briefly from the article: But [...]<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2011/01/10/when-science-fails/' addthis:title='When Science Fails ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
No related posts.]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2011/01/10/when-science-fails/' addthis:title='When Science Fails '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div><p>The recent New Yorker article <a title="The Truth Wears Out" href="http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer" target="_blank">The Truth Wears Off: Is there something wrong with the scientific method?</a> by Jonah Lehrer (December 13, 2010) discusses several cases where a new scientific result was initially confirmed by several seemingly independent scientific studies and then subsequently faded away, sometimes to nothing.  To quote briefly from the article:</p>
<blockquote><p>But now all sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. This phenomenon doesn’t yet have an official name, but it’s occurring across a wide range of fields, from psychology to ecology. In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis [professor of psychiatry at the University of Illinois at Chicago] has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.</p></blockquote>
<p>The article cites a number of possible explanations for these cases ranging from confirmation bias to regression to the mean.  None seem entirely satisfactory, either separately or together.</p>
<p>As a graduate student, the author attended a lecture by a senior particle physicist who expressed distinct skepticism of the validity of standard statistics in particle physics, referring parenthetically to several cases of reported results at high levels of statistical significance, several standard deviations, that subsequently proved invalid.  A recent example, similar to the cases described in the New Yorker article, is the saga of the pentaquark.  Not one, but several research groups, reported evidence of the pentaquark, which then seems to have faded away without a full explanation of the multiple observations.  See, for example, the online article <a title="The rise and fall of the pentaquark" href="http://www.symmetrymagazine.org/cms/?pid=1000377" target="_blank">&#8220;The rise and fall of the pentaquark&#8221;</a> in Symmetry Magazine.  Indeed, from many years of experience, particle physicists tend to view even claims of new effects or new particles at a five standard deviation level cautiously.</p>
<p>Particle physics is often considered a very &#8220;hard&#8221; science without some of the problems presumably present in &#8220;softer&#8221; sciences such as medicine, biology, psychology, or parapsychology from which most of the examples in the New Yorker article are drawn.  The New Yorker article notes striking parallels between the cases in mainstream science and the work of J.B. Rhine in parapsychology where a so-called &#8220;decline effect&#8221; has repeatedly been noted.</p>
<p>The New Yorker article is well written and well worth reading.  Nonetheless, a few comments seem in order.  The article avoids discussing the possibility of fraud.  One possible explanation for cases of this type is organized scientific fraud where multiple research groups collude to produce confirmatory results: for example, to ensure the approval and adoption of a new drug developed and promoted by a large pharmaceutical company.  Scientific fraud is extremely difficult to prove.  In most cases, all a scientist can legitimately say is that he or she was unable to replicate the results of a another researcher.  A suggestion of fraud would be unsupported speculation and quite possibly constitute legally actionable defamation or libel.  Most proven cases of scientific fraud involve an insider, a colleague in the same laboratory or office, who blows the whistle.  In many cases, the whistleblowers have suffered personally and professionally even if they were eventually vindicated.</p>
<p>The notion of replication seems straightforward and is heavily touted in popular science books and textbooks.  Replication will probably weed out statistical flukes and gross errors that a competent researcher should have avoided anyway.  However, what if the error is more subtle?  The independent scientific study may simply replicate the same subtle error.  For example, in particle physics, there are a number of extremely complex simulation programs such as the Lund Monte Carlo, the MINUIT fitting package, and the GEANT detector simulation package that are used by many different groups to simulate particle interactions, particle detectors, and analyze results.  Computer programs, of course, have bugs.  These bugs can be quite arcane and difficult to detect.  Consequently, independent research groups may replicate the same spurious results due to a bug in a widely used software package.</p>
<p>Modern scientific research is often technically quite sophisticated.  It often takes years of study and practice to master the theoretical or laboratory techniques of a field.  Frontier research where ostensibly important new results such as the pentaquark are likely to be encountered often involves sophisticated new techniques.  Consequently if a researcher or research group is unable to replicate a reported new result, they must always ask themselves: <em>am I doing something wrong?</em> The researcher who cannot replicate a result may be accused of lack of skill or even incompetence.  This is particularly a concern where the new result is reported by a high status researcher or group, or embraced as the hot &#8220;new new thing&#8221; of the field.  Hence, researchers concerned about their career may, like Thomas More in <em>A Man For All Seasons,</em> adopt a policy of prudent silence.</p>
<p>Even so, cases like the pentaquark or the several cases in &#8220;The Truth Wears Out&#8221; continue to raise questions about the validity of standard statistical methods in the real world.  The New Yorker article touches repeatedly on this concern, without reaching any firm conclusions.  &#8220;The Truth Wears Out&#8221; indirectly alludes to the enormous power of modern mathematical methods in concert with powerful computers and software to slice and dice data to produce, consciously or unconsciously, desired results or to construct elaborate models that will fit the data as discussed in the author&#8217;s previous posts <a title="Frankenstein Functions" href="http://math-blog.com/2010/10/12/frankenstein-functions/" target="_blank">Frankenstein Functions</a> and <a title="Gold Fever" href="http://math-blog.com/2010/10/30/gold-fever/" target="_blank">Gold Fever</a>.  In conclusion, there is both empirical evidence and theoretical reason to entertain doubts about the validity of seemingly solid, well-established statistical methods in the complex world of modern scientific research<strong>.</strong></p>
<p>© 2011 John F. McGowan</p>
<p><strong>About the Author</strong></p>
<p><em>John F. McGowan, Ph.D. </em> is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at <a href="mailto:jmcgowan11@earthlink.net">jmcgowan11@earthlink.net</a>.</p>
<p><strong>Sponsor&#8217;s message</strong>: Check out <a href="https://www.e-junkie.com/ecom/gb.php?cl=61573&amp;c=ib&amp;aff=129997">Math Better Explained</a>, an elegant and insightful ebook that will help you see math in a new light and experience more of those awesome &#8220;aha!&#8221; moments when ideas suddenly click.</p>
<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2011/01/10/when-science-fails/' addthis:title='When Science Fails ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://math-blog.com/2011/01/10/when-science-fails/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The Cost of Not Understanding Probability Theory</title>
		<link>http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/</link>
		<comments>http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/#comments</comments>
		<pubDate>Mon, 24 Aug 2009 15:43:56 +0000</pubDate>
		<dc:creator>Antonio Cangiano</dc:creator>
				<category><![CDATA[Essential Math]]></category>
		<category><![CDATA[Probability Theory and Statistics]]></category>

		<guid isPermaLink="false">http://math-blog.com/?p=316</guid>
		<description><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/' addthis:title='The Cost of Not Understanding Probability Theory '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div>Misconceptions about probability theory and statistics have major repercussions on society. From seemingly minor things like the excessive sensationalism of some headlines, all the way to the jailing of innocent people based on &#8220;statistical evidence&#8221;. One of the most common misconceptions is the so called Gambler&#8217;s fallacy. Wikipedia defines it as follows: The gambler&#8217;s fallacy, [...]<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/' addthis:title='The Cost of Not Understanding Probability Theory ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div>
No related posts.]]></description>
			<content:encoded><![CDATA[<div class="addthis_toolbox addthis_default_style " addthis:url='http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/' addthis:title='The Cost of Not Understanding Probability Theory '  ><a class="addthis_button_facebook_like" fb:like:layout="button_count"></a><a class="addthis_button_tweet"></a><a class="addthis_button_google_plusone" g:plusone:size="medium"></a><a class="addthis_counter addthis_pill_style"></a></div><p>Misconceptions about probability theory and statistics have major repercussions on society. From seemingly minor things like the excessive sensationalism of some headlines, all the way to the jailing of innocent people based on &#8220;statistical evidence&#8221;. One of the most common misconceptions is the so called <a href="http://en.wikipedia.org/wiki/Gambler%27s_fallacy">Gambler&#8217;s fallacy</a>. Wikipedia defines it as follows:</p>
<blockquote><p>The gambler&#8217;s fallacy, also known as the Monte Carlo fallacy or the fallacy of the maturity of chances, is the belief that if deviations from expected behavior are observed in repeated independent trials of some random process then these deviations are likely to be evened out by opposite deviations in the future.</p></blockquote>
<p>This definition may seem a bit abstract, so let&#8217;s clarify it through a practical example. What&#8217;s the probability of flipping a fair coin 10 times in a row and obtaining heads consecutively each time? The answer is:</p>
<p align="center"><img src='http://math-blog.com/wp-content/latex/pictures/2c2f39ce0db73705a5d7613d37897e53.png' title='\displaystyle \mathrm{P(E)} = (\frac{1}{2})^{10} \approx 0.0009766' alt='\displaystyle \mathrm{P(E)} = (\frac{1}{2})^{10} \approx 0.0009766' align=absmiddle>.</p>
<p>This would be very unlikely. How unlikely? One in 1,024 to be exact. So if we&#8217;ve just observed the coin appear as heads 9 times in a row, what are the odds that the same coin will land on heads on the 10th toss?</p>
<p>Many people would argue that the chance of this happening is less than one in a thousand, as we just calculated. However, that answer is blatantly wrong. The probability that the 10th fair coin toss is going to come up as heads is still 0.5, because each trial (toss) is statistically independent from those that preceded it. Tossing 9 heads in a row is very unlikely, however once it has happened, it doesn&#8217;t influence the outcome of the 10th toss in any way.</p>
<p>People who fall for this fallacy, do so because of a fundamental misunderstanding of how probability works. They combine the probability of past events (irrelevant for independent trials), with that of future events. With the example above, some people would also erroneously conclude that &#8220;tails is long due to come up&#8221; and as such would think that it&#8217;s more likely to occur.</p>
<p>It&#8217;s not a difficult theory to understand, but a lot of people make the mistake of confusing probability with sheer luck. Every instance of an event relies on the same probability regardless, whether you&#8217;re rolling dice, waiting for a grade during your online education, or even waiting for buses. If the odds were 5000:1, 4999 events later you&#8217;ve still only got a 0.02% chance of the odds going in your favour, the same as the first time the event occurred.</p>
<p>This informal fallacy has contributed to the ruin of many gamblers over the years. A tragic example of what happens when you uphold this way of looking at odds occurs with many who play the game of &#8220;Lotto&#8221; in Italy, a very popular lottery game played amongst the general population.</p>
<p>The idea behind this game is very simple. Five distinct numbers between 1 and 90 are randomly selected in 10 different Italian cities, three times a week. Gamblers can place several types of bets, but the one we&#8217;re interested in, for the sake of this article, is called the &#8220;estratto semplice&#8221; (simple draw). This type of game requires gamblers to correctly predict that a specific number will be drawn in a particular city.</p>
<p>The probability of placing a winning bet is 1 in 18 (i.e., 5/90), while the payout is 11.232 times the amount that you put down (so if you bet 1 Euro and won, you&#8217;d walk away with 11.23 Euros before taxes). The odds are clearly stacked in favor of the house, of course. Incidentally, Lotto is run by the state and as is also known as &#8220;a tax on the stupid&#8221; for rather obvious reasons.</p>
<p>There are many &#8220;systems&#8221; and theories used by a large pool of gamblers who want to &#8220;beat the system&#8221;. More often then not such systems are based on some flawed understanding of how probability really works. A very popular theory is that of the &#8220;numeri ritardatari&#8221; (&#8220;late numbers&#8221;, as we will refer to them throughout this article). The basic principle behind late numbers is this: since it&#8217;s extremely unlikely that a given number will fail to appear at least once out of 150, 180 or 200 draws in a row, in a given city, you can identify what numbers are &#8220;due&#8221; to appear and thus bet on them. For example, if a number hasn&#8217;t been drawn in the past 140 trials, the number of bets on it will start to grow very quickly.</p>
<p>Of course, despite the fact that a number hasn&#8217;t come up in a given city 140 times in a row, its probability of occurring on the next draw is still just 1 in 18. So betting any of the other 89 numbers would yield the same probability of winning.</p>
<p>The application of this fallacy becomes extremely dangerous when coupled with <a href="http://en.wikipedia.org/wiki/Martingale_%28betting_system%29">Martingale betting systems</a>, which are often adopted by &#8220;late number theorists&#8221;. The theory they use is very simple. Since they assume these late numbers are &#8220;due&#8221; very soon, they think they are going to be able to afford to put down double their previous wager on every bet until the number eventually appears. So when it does happen, the last sum they bet is multiplied 11 times (for the payout) and they will recoup all the money they&#8217;ve spent up until then, and end up netting a large additional payout, which is the (last wager x 9.232 + 1) Euros.</p>
<p>Martingale betting systems are guaranteed to work provided that the gambler has an infinite amount of capital and no limits are imposed on the maximum bet that&#8217;s allowed to be placed. In the real world, both of these requirements cannot be realistically met. The amount bet grows exponentially, so the Martingale system ends up being a surefire way to bankrupt those who employs it.</p>
<p>In the case of the Italian Lotto, both the fallacy that late numbers are &#8220;due&#8221; and the choice of betting systems (Martingale) are responsible for the ruin of many. The gambler&#8217;s fallacy plays an important role in this case because most people realize that they can&#8217;t sustain a Martingale type system for 200 consecutive draws. It&#8217;s their faith in the idea that late numbers are very likely to pop up soon, that tempts them into toying with this risky system.</p>
<p>If we assume these people are convinced that a very late number (say, one that hadn&#8217;t been drawn in the past 180 lottery draws) will be selected at some point during the next 5 weeks or so (15 trials), and that they&#8217;re starting with a bet of one Euro, we can see that the maximum amount they&#8217;d need to invest (according to their theory) would be 32,768 Euros, with a max bet of 16,384 Euros by the 15th draw. This is a sizable sum of money, but something that some people would still be able to put down, especially because they knew they payout would be 184,025.088 Euros (before taxes). A tempting prize indeed.</p>
<p>But what are the real odds that the number in question, the one that&#8217;s been eluding the gamblers, will not end up occurring at least once in the next 15 draws?</p>
<p align="center"><img src='http://math-blog.com/wp-content/latex/pictures/8b2b74967faefb6d4445f15235bdbb6b.png' title='\displaystyle \mathrm{P(\overline{E})} = (\frac{17}{18})^{15} \approx 0.4243' alt='\displaystyle \mathrm{P(\overline{E})} = (\frac{17}{18})^{15} \approx 0.4243' align=absmiddle></p>
<p>So there is a 42.43% risk that the punter will lose their 32,768 Euros, because they won&#8217;t have sufficient funds to double their wager at the next turn (assuming 32,768 Euros was the maximum amount they can afford to bet).</p>
<p>Bear in mind that with an exponential growth of the bet, a huge amount of capital will only afford our late number gamblers a few extra draws, thereby only slightly increasing their probability of making a profit. (With a payout of 11.232 times the wager, they could afford a smaller increase in the amount of money they put down draw by draw, but the overall principle remains the same.)</p>
<p>What has an adoption of this faulty theory led to in Italy? What kind of impact has it really had on those who adhere to it? The honest truth is that it&#8217;s gone so far as to contribute directly to things like suicides, people swindling their friends and employers, divorces, people betting their life savings and their homes, families being destroyed, and so on. Do such dire consequences occur to everyone who plays this game? No, of course not, but the fact that it&#8217;s happened to some people, and that these flawed theories are still employed today, is indicative of the misunderstanding about probability (and the risks of gambling) that occurs in the general population.</p>
<p>One could &#8211; and should &#8211; argue that such peoples&#8217; demise is due to their gambling habits and to good old fashioned greed, yet I can&#8217;t help but feel that a solid understanding of probability theory would go a tremendous way in helping to cut down on the number of people who fall prey to these types of widespread theories.</p>
<p>An increased awareness of probability and statistics can only improve society and its ability to assess situations and make rational decisions. How do we begin to remedy this situation, not only in Italy, but around the world? We can start by devoting far more time in grade, middle and high school math classes, in order to teach students about this important subject and the implications that it can have on their everyday lives, understanding of society, and ability to make wise financial decisions.</p>
<div class="addthis_toolbox addthis_default_style addthis_" addthis:url='http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/' addthis:title='The Cost of Not Understanding Probability Theory ' ><a class="addthis_button_preferred_1"></a><a class="addthis_button_preferred_2"></a><a class="addthis_button_preferred_3"></a><a class="addthis_button_preferred_4"></a><a class="addthis_button_compact"></a></div><p>No related posts.</p>]]></content:encoded>
			<wfw:commentRss>http://math-blog.com/2009/08/24/the-cost-of-not-understanding-probability-theory/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
	</channel>
</rss>

