A scientific calculator can quickly perform a range of operations and functions beyond basic arithmetic. All scientific calculators perform floating point arithmetic. Most scientific calculators have trigonometric functions, logarithm and power functions, hyperbolic functions, and the factorial. Most scientific calculators support scientific notation. Some scientific calculators can perform various more advanced mathematics such as calculus. In this post, we look at scientific calculator programs for computer users, primarily for performing quick scientific or engineering calculations while working on a computer.
(1) Microsoft Windows Calculator
Microsoft Windows comes with a built-in calculator utility that has a basic and a scientific mode.
The scientific calculator mode can be selected by selecting the Scientific Menu Item in the View Menu as shown below. The Windows Calculator from Microsoft Windows 7 Home Premium is shown.
(2) Macintosh OS X Calculator
Like Microsoft Windows, the Macintosh OS X operating system comes with a built-in calculator with a scientific calculator mode.
The scientific calculator can be selected by selecting the Scientific menu item in the View pulldown menu as shown below.
(3) Unix/X Window System xcalc Utility
The X Window System comes with a scientific calculator xcalc. xcalc emulates the Texas Instruments TI-30 scientific calculator by default.
xcalc -rpn
launches xcalc emulating the classic Hewlett-Packard HP-10C scientific calculator.
(4) Google
The Google search box has been able to evaluate mathematical expressions for years. It has the capabilities of a scientific calculator. In the last few weeks, Google modified its search interface to display a scientific calculator when a scientific mathematical expression is entered.
(5) Wolfram Alpha
Wolfram Alpha has extensive scientific calculator functionality and much more.
(6) iPhone (IOS)
The iPhone calculator utility morphs into a scientific calculator when you hold the iPhone in landscape (not portrait) mode. Note that when the iPhone is held in portrait mode (long side vertical/short side horizontal) the iPhone Calculator is a basic arithmetic calculator.
(7) GNU Emacs Text Editor
The widely used and widely available GNU Emacs text editor has both a sophisticated calculator mode with a significant learning curve and an easy-to-use quick calculator command. This calculator has extensive scientific calculator functions bordering on a poor man’s MATLAB.
M-x quick-calc a (op) b
In most versions of GNU Emacs, the result of the quick calculation is placed in the Emacs “kill ring” and can be then pasted into the current edit buffer by using Ctrl-y (“yank”).
The picture below shows the GNU Emacs calculator computing the sin(2.1*3.1) below. Note that the GNU Emacs calculator defaults to degrees rather than radians (Google, Wolfram Alpha) so the result differs from the result computed by Google and Wolfram Alpha.
The GNU Emacs calculator has an extensive user manual.
Conclusion
This brief post has presented seven scientific calculators for computer users:
© 2012 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
]]>This post shows how to combine the raw data plotted with the Octave plot command with a polynomial model fit to the data with the error bars on the polynomial fit results from the Octave polyfit command. The post uses the data from the US Centers for Disease Control (CDC) Autism Prevalence Summary Table for 2011, a survey of studies of the prevalence of autism spectrum disorders throughout the world.
% % Script to display CDC Austism Prevalence Summary Table 2011 Data and Model Fit Results with Error Bars % Author: John F. McGowan, Ph.D. (jmcgowan11@earthlink.net) % % (C) 2012 John F. McGowan, Ph.D. % autism_data = dlmread('autism_prevalence_2011.txt', '\t'); dates = real(autism_data(:,4)); % time period studied country = autism_data(:,3); % country code usa is 1 valid = find(country >= 1); usa = find(country == 1); uk = find(country == 2); eng = find(country == 3); sweden = find(country == 4); canada = find(country == 5); australia = find(country == 6); japan = find(country == 7); germany = find(country == 8); france = find(country == 9); ireland = find(country == 10); denmark = find(country == 11); sk = find(country == 12); % South Korea (most extreme autism rate) diagnosis = autism_data(:, 7); % diagnostic criteria kanner = find(diagnosis == 1); dsm3 = find(diagnosis == 2); dsm4 = find(diagnosis == 5); icd10 = find(diagnosis == 4); prevalence = real(autism_data(:,9)); [p_autism, s] = polyfit(dates(valid), prevalence(valid), 3); % variable s contains error parameters from polynomial fit mydates = 1960:2012; [simrate_world, dsimrate_world] = polyval(p_autism, mydates, s); [p_usa, s_usa] = polyfit(dates(usa), prevalence(usa), 3); % variable s_usa contains error parameters from polynomial fit [fit_usa, dfit] = polyval(p_usa, mydates, s_usa); figure(1) h1 = plot(dates(usa), prevalence(usa), 'o', mydates); set(h1, 'linewidth', 3); axis([1960 2012 0.0 10.0]); hold on; % hold plot and exes so can overlay errorbar plot on same graphic ylabel('cases per 1000 children', 'fontsize', 14) xlabel('Year', 'fontsize', 14) title('Autism Spectrum Disorder Prevalence (USA)', 'fontsize', 14); legend('DATA', 'location', 'northwest'); legend('boxon'); % turn on box around legend errorbar(mydates, fit_usa, dfit); % display fit results with error bars print('usa_autism_errors.jpg'); hold off; % turn off hold so can create a separate plot of world data figure(2) h2 = plot(dates(usa), prevalence(usa), 'ob', dates(uk), prevalence(uk), 'or', dates(sweden), prevalence(sweden), 'ok', dates(denmark), prevalence(denmark), '*b', dates(japan), prevalence(japan), '*r', dates(eng), prevalence(eng), '*k', dates(france), prevalence(france), '+b', dates(germany), prevalence(germany), '+r', dates(canada), prevalence(canada), '+k', dates(australia), prevalence(australia), 'xb', dates(ireland), prevalence(ireland), 'xr', dates(sk), prevalence(sk), 'xk'); set(h2, 'linewidth', 3); axis([1960 2012 0.0 30.0]); hold on; ylabel('cases per 1000 children', 'fontsize', 14) xlabel('Year', 'fontsize', 14) title('Autism Spectrum Disorder Prevalence (By Country)', 'fontsize', 14); legend('USA', 'UK', 'SWEDEN', 'DENMARK', 'JAPAN', 'ENGLAND', 'FRANCE', 'GERMANY', 'CANADA', 'AUSTRALIA', 'IRELAND', 'SOUTH KOREA', 'location', 'northwest'); legend('boxon'); % turn on box around legend errorbar(mydates, simrate_world, dsimrate_world); % add fit results with error bars print('autism_by_nation_errors.jpg'); hold off; % for future disp('ALL DONE');
Note, in particular, the use of the commands hold on and hold off to combine the graphical outputs of the plot and errorbar commands in a single graph. Hold on keeps the graphics and axes of the figure, so that the errorbar command does not overwrite/erase the plot. Once the figure is created with both the raw data and error bars, hold off is used to return the figure to normal behavior so an entirely new plot can be created. Also, note the syntax:
[p, s] = polyfit(x, y, n)
which returns an estimate of errors in the s structure from fitting a polynomial of degreee n to the data x and y.
The syntax:
[y, dy] = polyval(p, x, s)
evaluates the polynomial with coefficients p and error parameters s for the values x, putting the resulting values in y and the error bars (one standard deviation) in dy.
This Octave code makes the following plots with error bars displayed.
The second plot shows the error bars for a fit to the worldwide autism spectrum disorder prevalence data.
Conclusion
Octave can display plots with error bars using the errorbar command or the plot and errorbar commands combined as illustrated above. In particular, the Octave plot and errorbar commands can be combined to display original data and the results of fitting a model including the error bars returned by the model, using the Octave polyfit polynomial fitting command for example.
© 2012 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Appendix I: Autism Prevalence Summary Table for 2011 Data
Author Year published Country Time period studied Age range studied Number of children in population Criteria used Methodology used ASD prevalence (CI) IQ<70 (%) Lotter 1966 3 1964 8 to 10 78,000 1 Case enumeration and direct exam 0.45 (0.31-0.62) 84 Brask 1970 11 1962 2 to 14 46,500 1 Case enumeration 0.43 (0.26-0.66) NR Treffert 1970 1 1962-1967 3 to 12 899,750 1 Case enumeration 0.07-0.31 (0.0-1.0) NR Wing & Gould 1979 3 1970 0 to 14 35,000 1 Case enumeration and direct exam 0.49 (0.29-0.78) 70 Hoshino et al. (1) 1982 7 1977 0 to 17 234,039 1 Case enumeration and direct exam 0.23 (0.19-0.27) NR Ishii & Takahashi 1983 7 1981 6 to 12 35,000 Rutter Case enumeration and direct exam 1.6 (1.2-2.8) NR Bohman et al. 1983 4 1979 0 to 20 69,000 Rutter Case enumeration and direct exam 0.3 (0.2-0.5) NR McCarthy et al. 1984 10 1978 8 to 10 65,000 1 Case enumeration and direct exam 0.43 (0.29-0.59) NR Gillberg 1984 4 1980 4 to 18 128,584 2 Case enumeration and direct exam 0.20 (0.13-0.30) 80, 77 Steinhausen et al. 1986 8 1982 0 to 14 279,616 Rutter Case enumeration and direct exam 0.19 (0.14-0.24) 44 Author Year published Country Time period studied Age range studied Number of children in population Criteria used Methodology used ASD prevalence (CI) IQ<70 (%) Steffenberg & Gillberg 1986 4 1984 <10 78,413 2 Case enumeration and direct exam 0.45 (0.31-0.62) NR Matsuishi et al. 1987 7 1983 4 to 12 32,834 2 Case enumeration and direct exam 1.55 (1.16-1.64) NR Burd et al. 1987 1 1985 2 to 18 180,986 2 Case enumeration and direct exam 0.12 (0.00-0.20) NR Bryson et al. 1988 5 1985 6 to 14 20,800 2 Case enumeration and direct exam 1.01 (0.62-1.54) 76 Tanoue et al. 1988 7 1977-1985 3 to 7 95,394 2 Case enumeration 1.38 (1.16-1.64) NR Ciadella & Mamelle 1989 9 1986 3 to 9 135,180 2 Case enumeration 0.51 (0.39-0.63) NR Sugiyama & Abe 1989 7 1979-1984 2 to 5 12,263 2 Population screen and direct exam 1.3 (0.7-2.1) 38 Ritvo et al. 1989 1 1984-1988 8 to 12 184,822 2 Case enumeration and direct exam 0.40 (0.31-0.50) NR Gillberg et al. 1991 4 1988 4 to 13 78,106 3 Case enumeration and direct exam 0.95 (0.74-1.95) 82, 80 Fombonne & Mazaubrun (1) 1992 9 1985 9 to 13 274,816 4 Case enumeration and direct exam 0.49 (0.47-0.65) 87 Honda et al. 1996 7 1994 1.5 to 6 8,537 4 Population screen and direct exam 2.11 (1.25-3.33) 50 Author Year published Country Time period studied Age range studied Number of children in population Criteria used Methodology used ASD prevalence (CI) IQ<70 (%) Fombonne et al. 1997 9 1992-1993 6 to 16 325,347 4 Case enumeration and direct exam 0.54 (0.46-0.62) 88 Arivdsson et al. 1997 4 1994 3 to 16 1,941 4 Population screen and direct exam 3.10 (1.14-6.72) 100 Webb et al. 1997 Wales 1992 3 to 15 73,300 3 Case enumeration and direct exam 0.72 (0.54-0.95) NR Sponheim & Skjeldae 1998 Norway 1992 3 to 14 65,688 4 Case enumeration and direct exam 0.38 (0.25-0.56) 64 Kadesjo et al. 1999 4 1992 6.7 to 7.7 826 4 Case enumeration and direct exam 6.0 (1.97-14.1) 60 Baird et al. 2000 3 1998 1.5 to 8 16,235 4 Population screen and direct exam 3.1 (2.29-4.06) 40 Powell et al. 2000 3 1995 1 to 4 29,200 DSM-III-R or DSM-IV Case enumeration 0.96 (0.64-1.39) NR Kielinen et al. 2000 Finland 1996 5 to 18 152,732 5 Case enumeration 1.22 (1.06-1.41) 50 Magnusson & Saemundsen 2000 Iceland 1997 5 to 14 43,153 4 Population screen and direct exam 0.86 (0.60-1.18) 49 Chakrabarti & Fombonne 2001 3 1998 2.5 to 6.5 15,500 5 Population screen and direct exam 1.68 (1.1-2.46) 24 Fombonne et al. (2) 2001 2 1999 5 to 15 12,529 5 Population screen and direct exam 2.61 (1.81-3.70) 44.4 Author Year published Country Time period studied Age range studied Number of children in population Criteria used Methodology used ASD prevalence (CI) IQ<70 (%) Bertrand et al. 2001 1 1998 3 to 10 8,996 5 Case enumeration and direct exam 4.0 (2.8-5.5) 49 Croen et al. 2001 1 1987-1999 0 to 21 4,600,000 DSM-III-R or DSM-IV Case enumeration 1.1 (1.06-1.14) NR Yeargin-Allsopp et al. (2) 2003 1 1996 3 to 10 290,000 5 Case enumeration 3.4 (3.2-3.6) 62 Gurney et al. (2) 2003 1 1981-1982, 2001-2002 6 to 17 LEFT_BLANK 5 Case enumeration 4.4 (4.3-4.5) NR Lingam et al. 2003 2 2000 5 to 14 186,206 4 Case enumeration 1.5 (1.3-1.7) NR Icasiano et al. 2004 6 2002 2 to 17 45,153 5 Case enumeration 3.9 (3.3-4.5) 47 Lauritsen et al. 2004 11 2001 0 to 9 682,397 4 Case enumeration 1.2 (1.1-1.3) NR Fombonne et al. 2006 5 1987-1998 5 to 21 27,749 5 Case enumeration 2.16 (1.65-2.78) NR Baird et al. 2006 2 1990-1991 9 to 10 56,946 4 Case enumeration, screen, and direct exam 3.89 (3.39-4.43) 56 CDC ADDM Network (1) 2007 1 2000 8 187,761 5 Case enumeration and record review 6.7 (6.3-7.0) 36-61 CDC ADDM Network (1) 2007 1 2002 8 444,050 5 Case enumeration and record review 6.6 (6.3-6.8) 45 Author Year published Country Time period studied Age range studied Number of children in population Criteria used Methodology used ASD prevalence (CI) IQ<70 (%) Oullette-Kuntz et al. 2007 5 1996-2004 4 to 9 2,240,537 Special education classification Case enumeration from special education classification 1.2 (1996), 4.3 (2004) NR Wong et al. (1) 2008 Hong Kong 1986-2005 0 to 14 4,247,206 5 Case enumeration 1.6 NR Williams et al. 2008 6 2003-2004 6 to 12 5,459 5 Questionnaires 1.0 (0.8-1.0) to 4.1 (3.8-4.4) NR Montiel-Nava et al. 2008 Venezuela 2005-2006 3 to 9 254,905 5 Case enumeration 1.7 (1.3-2.0) NR Baron-Cohen et al. 2009 2 2003-2004 5 to 9 5,484 Special Education Needs register Case enumeration from survey and direct exam 15.7 (9.9-24.6) NR CDC ADDM Network (1) 2009 1 2004 8 172,335 5 Case enumeration and record review 8.0 (7.6-8.4) 44 CDC ADDM Network (1) 2009 1 2006 8 308,038 5 Case enumeration and record review 9.0 (8.6-9.3) 41 Al-Farsi et al. 2010 Oman 2009 0 to 14 798,913 5 Case enumeration 0.1 (0.1-0.2) NR Parner et al. 2011 11 1994-1999 LEFT_BLANK 404,816 5 Case enumeration 6.9 (6.5-7.2) NR Parner et al. 2011 Western Australia 1994-1999 LEFT_BLANK 152,060 5 Case enumeration 5.1 (4.7-5.5) NR Chien et al. 2011 Taiwan 1996-2005 0 to 18 372,642 6 Case enumeration 2.9 NR Author Year published Country Time period studied Age range studied Number of children in population Criteria used Methodology used ASD prevalence (CI) IQ<70 (%) Windham et al. 2011 1 1994, 1996 0 to 8 82,153 (1994), 80,249 (1996) 5 Case enumeration 4.7 (4.2-5.1) (1994); 4.7 (4.2-5.2) (1996) NR Kim et al. 2011 12 2005-2009 7 to 12 55,266 5 Case enumeration from survey and direct exam 26.4 (19.1-33.7) 59 Zimmerman et al. 2012 1 2002, 2006, 2008 8 26,213 (2002); 29,494 (2006); 33,757 (2008) ICD-9 and special education classification Case enumeration 6.5 (2002), 10.2 (2006), 13.0 (2008) NR Kocovska et al. 2012 Faroe Islands 2002, 2009 7-16 (2002), 15-24 (2009) 7122 (2002), 7128 (2009) DSM-IV, ICD-10 Screening and direct exam 5.6 (2002), 9.4 (2009) NR CDC ADDM Network (1) 2012 1 2008 8 337,093 5 Case enumeration and record review 11.3 (11.0-11.7) 38 (1) The prevalence reported represents the average. (2) The prevalence study provided overall rate only`]]>
Quick Reference
(1) MS-DOS SET /A Command Line Calculator DOS PROMPT> SET /A x (op) y
(2) Microsoft Windows Calc
(UNIX/Linux/Mac OS X/Cygwin/etc.)
(3) Bourne Shell command line expressions $[a (op) b]
(4) GNU bc utility (Unix)
(5) GNU Emacs Editor
(6) VIM Text Editor Ctrl-R = a (op) b
(7) perl -de 1 perl> print a (op) b
Unix prompt> perl -e ‘print a (op) b;’
(8) python Unix prompt>python Python prompt>a (op) b
Unix prompt> python -c ‘print a (op) b;’
IDLE launches GUI with python interpreter (IDLE available for MS Windows, Mac OS X, and Unix)
(9) ruby Unix prompt>irb Ruby Prompt>a (op) b
Unix prompt>ruby -e ‘print a (op) b;’
(10) Google Calculator Google search box evaluates mathematical expressions!
Bing and Yahoo also have calculators in their search boxes!
where op stands for an arithmetic operation: addition, subtraction, multiplication, division (+, -, *, /)
Some quick calculators support raising a number to a power, often “**” or “^”. The “^” symbol is sometimes used for the bitwise exclusive OR as in the C programming language, instead of raising a number to a power. For example,
$ python -c 'print 2**3;' 8
The Ten Quick Calculators
Microsoft Windows and MS-DOS
(1) Windows CALC COMMAND/UTILITY
The command calc (the program calc.exe) will launch a simple graphical calculator on Microsoft Windows.
DOS PROMPT>calc
(2) MS-DOS SET /A a (op) b COMMAND
The MS-DOS SET command functions as a simple command line calculator that can perform signed integer arithmetic.
Addition DOS PROMPT> SET /A 1 + 2 3 Multiplication DOS PROMPT> SET /A 3 * 2 6 Subtraction DOS PROMPT> SET /A 10 - 8 2 Division DOS PROMPT> SET /A 10 / 5 2
Unix including Mac OS X, Linux, and Cygwin
(3) Bourne Shell Command Line Calculator
The Bourne shell and bash (the Bourne Again Shell) have a simple built-in command line integer arithmetic calculator somewhat similar to MS-DOS.
Note: in many flavors of Unix, it is necessary to escape the asterisk with a backslash to perform multiplication:
Bourne Shell Prompt> echo $[ 2 \* 3 ] 6
Otherwise, the asterisk is interpreted as a wild card by the shell and the calculation will fail. The Cygwin environment which emulates Unix on MS Windows does not have this problem.
One can assign the results of the calculation to an environment variable by using the equal sign:
a=$[2 + 3] echo $a 5
Note: there is no space between the variable name (e.g. “a”) and the equal sign. “a = $[2 + 3]” gives an error.
NOTE: This only works for integer arithmetic. Floating point gives an error:
$ echo $[2.1*3.1] -bash: 2.1*3.1: syntax error: invalid arithmetic operator (error token is ".1*3.1")
(4) GNU bc Utility (Unix)
The GNU bc utility is an arbitrary precision calculator language. It is preinstalled on many Unix systems. Although it is not part of the base installation, it can be installed in the Cygwin environment. In addition to basic arithmetic, it has a small math library with a few common trigonometric and transcendental functions which can be invoked with the -l option: bc -l
bc has an annoying peculiarity, a somewhat mysterious built-in variable scale which seems to correspond to the number of digits displayed after the decimal point in the results of a division operation. By default, scale is set to zero (0). What this means is that, by default, division (and only division) gives the results of integer division; there are no decimals after the decimal point.
10/3 = 3
However, if scale is set to a positive number, the results of the division operation are reported with the requested precision.
scale = 2 10/3 = 3.33 scale = 3 10/3 = 3.333
(5) GNU Emacs Text Editor
The widely used and widely available GNU Emacs text editor has both a sophisticated calculator mode with a significant learning curve and an easy-to-use quick calculator command.
-x quick-calc a (op) b
In most versions of GNU Emacs, the result of the quick calculation is placed in the Emacs “kill ring” and can be then pasted into the current edit buffer by using Ctrl-y (“yank”).
(6) VIM Text Editor
The widely used and widely available vim text editor has a quick calculator feature.
In the VIM INSERT Mode, type Ctrl-R (nothing visible happens) followed by the equal sign “=”. An equal sign will appear at the lower left corner of the VIM window. Then, enter the mathematical expression to evaluate:
= 2 + 3
Press the RETURN or ENTER key and VIM will paste the result of the calculation into the file being edited. VIM can do both integer and floating point calculation. Use simple numbers such as “2 + 3″ to get integer results. Use numbers with decimal points such as “2.1 + 3.4″ to get floating point results.
PERL, PYTHON, and RUBY
Almost all Unix systems now come with the perl programming language preinstalled. Most Unix systems now come with the python scripting language preinstalled. Many Unix systems come with the ruby scripting language preinstalled. It is easy to install perl, python, and ruby on any Unix system. All are available as native applications for MS-Windows systems as well as through the Cygwin environment which emulates Unix on MS-Windows systems. All of these scripting languages can be run at the command line or interactively as simple quick calculators.
(7) PERL
The perl scripting language can be used as a quick calculator.
at the Unix command line
$ perl -e 'print 2 + 3;' $ perl -e 'print 2 * 3;' $ perl -e 'print 2 - 3;' $ perl -e 'print 2 / 3;'
interactively:
$ perl -de 1 Loading DB routines from perl5db.pl version 1.32 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): 1 DB<1> 2+2 DB<2> print 2+2 4 DB<3> print 2/3 0.666666666666667 DB<4>
(8) PYTHON
The python programming langauge can be used as a quick calculator.
at the Unix command line
John@John-HP ~ $ python -c 'print 2 + 3;' 5 John@John-HP ~ $ python -c 'print 2 * 3;' 6 John@John-HP ~ $ python -c 'print 2 - 3;' -1 John@John-HP ~ $ python -c 'print 2 / 3;' 0 John@John-HP ~ $ python -c 'print 2.0 / 3.0;' 0.666666666667
interactively
$ python Python 2.6.8 (unknown, Jun 9 2012, 11:30:32) [GCC 4.5.3] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> 2 + 3 5 >>> 2 * 3 6 >>> 2 - 3 -1 >>> 2 / 3 0 >>> 2.0 / 3.0 0.66666666666666663 >>> quit()
NOTE that Python treats simple numbers such as 2 and 3 as integers. “2 / 3″ is integer division and yields zero (0). Python treats numbers with decimal points such as 2.0 and 3.0 as floating point numbers. “2.0 / 3.0″ is floating point division and yields 0.66666666.
Python also comes with an interactive GUI environment known as IDLE (after comedian Eric Idle of Monty Python fame)
NOTE that Python under IDLE treats simple numbers such as 2 and 3 as floating point numbers, not integers as at the command line. Sadly, computer programs often contain these inconsistencies and quirks which can sometimes bite the user, especially in mathematical or numerical projects.
(9) RUBY
The ruby programming language can be used as a quick calculator.
at the Unix command line:
$ ruby -e 'print 2 + 3;' 5 John@John-HP ~ $ ruby -e 'print 2 * 3;' 6 John@John-HP ~ $ ruby -e 'print 2 - 3;' -1 John@John-HP ~ $ ruby -e 'print 2 / 3;' 0 John@John-HP ~ $ ruby -e 'print 2.0 / 3.0;' 0.666666666666667
interactively (use the irb command for interactive ruby):
$ irb irb(main):001:0> 2 + 3 => 5 irb(main):002:0> 2 * 3 => 6 irb(main):003:0> 2 - 3 => -1 irb(main):004:0> 2 / 3 => 0 irb(main):005:0> 2.0 / 3.0 => 0.666666666666667 irb(main):006:0> quit()
NOTE that Ruby, like Python, treats simple numbers such as 2 and 3 as integers. “2 / 3″ is integer division and yields zero (0). Ruby treats numbers with decimal points such as 2.0 and 3.0 as floating point numbers. “2.0 / 3.0″ is floating point division and yields 0.66666666.
(10) GOOGLE/YAHOO/BING
Google has a calculator that evaluates mathematical expressions built into the search box. Yahoo and BING also have calculators built into their search boxes.
Conclusion
In this article, ten quick calculators for computers users were presented and their basic use explained.
The quick calculators are appropriate for occasional quick calculations such as adding or multiplying two large numbers. They will work best if the computer user practices and can use the quick calculator of his/her choice quickly and easily — “second nature”.
Quick calculators are often faster and less cumbersome than sophisticated numerical tools such as spreadsheets like Excel or mathematical scripting languages such as MATLAB or Octave for occasional quick calculations such as adding or multiplying two large numbers. However, sophisticated numerical and mathematical tools are better for large number crunching projects.
© 2012 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
]]>There is relatively little publicly available information on the scope and difficulty level of software projects of any kind. Some information is available in books and papers by various self-styled software engineering experts such as Barry Boehm, Donald Reifer, Capers Jones, and several others. These experts usually have consulting businesses and do not disclose their raw data and make limited disclosures of the results of analyses of their data.
Open source software projects can provide an excellent source of information on some aspects, such as the number of lines of computer code, of various software and mathematical software projects. This information can be independently verified by downloading the source code of an open source project and examining it, using tools like the CLOC utility to count the lines of code if needed.
Unfortunately, it is difficult to get accurate estimates of the actual effort expended on an open source project. It is difficult to verify if a contributor worked part-time, full time, or more than full time on the project. Some contributors may not be credited.
This article examines a data set of ninety-three NASA projects between the years 1971-1987 that was collected by Jairus Hihn of the NASA Jet Propulsion Laboratory (JPL). The data is the NASA 93 data set from the PROMISE Software Engineering Repository at the University of Ottawa.
The data lists the number of source lines of code (SLOC) for each project, the actual effort expended in staff months (SM), and classified the projects according to software engineering expert Barry Boehm’s COCOMO I (Constructive Cost Model). The data used for Boehm’s COCOMO I model is also available as a data set in the PROMISE repository.
A Note on Lines of Code
Lines of code is a very imperfect measure of the size and scope of a software project. For example, these are both one line of code in the C Programming Language:
a = 1;
and
a = (1.0/sqrt(2.0*M_PI))*exp(-(x - mean)*(x-mean)/(sigma*sigma));
There are several different definitions of lines of code used in the literature on software cost and schedule estimation. In additional, there are a range of alternatives that have been proposed to lines of codes, such as function points (currently popular).
Nonetheless, lines of code are somewhat reminiscent of Winston Churchill’s quote about democracy:
It has been said that democracy is the worst form of government except all the others that have been tried.
Function points were developed for business applications and rely heavily on counting the number of inputs and outputs to a program. This often works well for business applications where the applications are often relatively simple and the complexity scales with the number of inputs and outputs. Mathematical software such as video codecs often have few inputs (one compressed file or data stream) and outputs (uncompressed video) but a very complex internal implementation (tens of thousands of lines of code). This has been recognized as a weakness of function points for some time and there are some variations such as so-called “feature points” that attempt to address this problem.
Further, methods like function points require substantial training and study to measure and learn to use. They are not relatively intuitive like lines of code. There is much more data on software projects available in lines of code than function points.
One good way to think about lines of code is that each line of code is like a single moving part in a complex machine like a grandfather clock. Some parts are simple like the first line of code above. Some parts are more complex like the second line of code above. In general, lines of code would correspond to moving parts if one tried to implement a computer program as a mechanical device like Victorian era English mathematician Charles Babbage’s steam driven difference engine.
In mathematical software such as video compression, speech recognition, or other advanced applications, a line of code is usually directly equivalent to a single line of a mathematical formula or equation that a math teacher or professor might write on a blackboard or dry erase board in class. Most examples of mathematics taught in high school or college math courses cover at most a dozen blackboards. These are often building blocks of the mathematical solutions to real-world problems or cutting edge research problems. Most real-world examples of mathematical software such as video codecs such as the H.264, Flash, or Microsoft Silverlight video compression used by web sites today are many thousands of lines of code and correspond to hundreds or thousands of blackboards filled with mathematical equations and formulas.
Analysis of the NASA 93 Data
The plots below show various aspects of the NASA 93 data on the scope and effort of these software projects.
The COCOMO model divides software projects into three general categories or “modes”. These are the embedded, semi-detached, and organic. Embedded mode projects such as flight avionics software are most similar in difficulty to mathematical software projects. Indeed, due to safety issues, flight avionics software can be more demanding, requiring higher quality, than commercial applications such as video compression for entertainment. The software productivity in lines of code per staff month is now shown for the three kinds of projects.
The next plot compares the NASA 93 data to Barry Boehm’s Basic COCOMO I model for Embedded Projects (red line) and to a linear fit to the NASA 93 data (green line). As can be seen, there is considerable variation between actual and estimated effort, although the models are on average roughly correct and usually within a factor of three of actual effort.
The final plot shows the relative error between the actual effort and the estimated effort using the fitted model.
Conclusion
On average, the software productivity for demanding software applications such as embedded aerospace applications tends to be quite low, in the range of two-hundred (200) lines of code per staff month (mythical man month). However, there is wide variation between actual and estimated effort. The highest productivity (defined as lines of code per staff month) among the embedded projects in the NASA 93 data set was about 700 lines of code per month, and the lowest around 50 lines of code per month. Given the difficulties in defining lines of code and measuring the quality of the delivered software, it is impossible to evaluate the significance of these variations without more detailed information on the projects.
It is important to keep in mind that numbers like two-hundred lines of code per staff month do not refer to just typing two-hundred lines of code which can take as little as a few minutes. They refer to the entire software development process, usually including requirements analysis, software design, actual coding, and especially debugging to achieve the high levels of quality required for these applications.
There are several cases where a single error in a single line of mathematical software has resulted in the loss of a multi-million dollar mission or human lives. The loss of the Mariner I probe to Mars is frequently attributed to a small error in copying a mathematical formula into the probe’s computer software. In 1991 a subtle error in the mathematical software for a PATRIOT missile system resulted in an Iraqi SCUD missile penetrating to a US base in Dahran, Saudi Arabia and killing 28 soldiers. On June 4, 1995 the European Space Agency’s first launch of the new Ariane 5 rocket exploded due to an error converting a 64 bit floating point number incorrectly to a 16 bit integer number in software. The loss of NASA’s Mars Climate Orbiter (MCO) in 1999 has been attributed to an incorrect conversion between English units (foot-pounds) and metric units (meters-Newtons). Aviation and rocketry have especially demanding requirements for the quality of software.
While commercial applications of mathematical software such as video compression for entertainment are not always as demanding as mission-critical aerospace software, they can still be quite demanding. Viewers of compressed video such as Netflix, YouTube, BluRay, or DVD video have a pretty limited tolerance for visible artifacts and errors in the video. Almost any error in the implementation of a video codec can introduce visible artifacts or errors, so the codecs must, in general, achieve very high levels of quality, though not necessarily perfect.
Credits
Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada . Available: http://promise.site.uottawa.ca/SERepository
© 2012 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Appendix I: Source Code for Analysis
The analysis was performed using a program written in the free open source Octave numerical programming environment which is mostly compatible with MATLAB. Here is the code. It generates additional plots beyond the ones highlighted in the body of this article. The raw data file nasa93_raw_data.txt, which is extracted from the PROMISE data file follows.
% Analysis of NASA 93 software effort data % % (C) 2012 John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.neto % data93 = dlmread('nasa93_raw_data.txt'); % COCOMO (Barry Boehm's Constructive Cost Model) MODE CODES (1=ORGANIC, 2=SEMI-DETACHED, 3=EMBEDDED) [e_row, e_col] = find(data93(:,7) == 3); [semi_row, semi_col] = find(data93(:,7) == 2); [org_row, org_col] = find(data93(:,7) == 1); actuals = data93(:,end-1:end); ksloc = actuals(:,1); % thousand (kilo) source lines of code staff_months = actuals(:,2); % also known as man month, work month, person month printf('making figure 1\n'); fflush(stdout); figure(1); loglog(ksloc, staff_months, 'o'); title('NASA 93 SOFTWARE PROJECT DATA'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); print('nasa93_raw_data.jpg'); logloc = log10(ksloc); log_staff_months = log10(staff_months); [p_nasa93, s_nasa93] = polyfit(logloc, log_staff_months, 1); % fit polynomial model to the data pred_logloc = polyval(p_nasa93, logloc); delta = 10.^pred_logloc - staff_months; % difference between predicted staff months and actual staff months relative_error = delta ./ staff_months; % (Estimated Staff Months - Actual Staff Months)/Actual Staff Months cocomo_x = 1:10:max(ksloc(:)); y = polyval(p_nasa93, log10(cocomo_x)); cocomo_org = 2.4 * (cocomo_x).^1.05; % Barry Boehm's Basic COCOMO 81 (Organic) model cocomo_semi = 3.0 * (cocomo_x).^1.12; % Barry Boehm's Basic COCOMO 81 (Semi-detached) model cocomo_e = 3.6 * (cocomo_x).^1.2; % Barry Boehm's Basic COCOMO 81 (Embedded) model printf('making figure 2\n'); fflush(stdout); figure(2); % loglog(ksloc, staff_months, 'o', ksloc, 10.^pred_logloc, '*'); loglog(ksloc, staff_months, 'o', cocomo_x, 10.^y, '-', "linewidth", 3, cocomo_x, cocomo_e, 'r-', "linewidth", 3); title('FIT TO NASA 93 SOFTWARE PROJECT DATA'); xlabel('Thousands of Lines of Code (KSLOC)'); % thousand source lines of code ylabel('Staff Months (SM)'); % staff month legend("NASA 93 DATA", "FIT 93", "COCOMO 81 (EMBEDDED)"); print('nasa93_fit.jpg'); A = 10.^p_nasa93(2); B = p_nasa93(1); x = 1:100:5000; x = x / 1000.0; y = A.*(x.^B); printf('making figure 3\n'); fflush(stdout); figure(3); %plot(x,y); hist(relative_error, 20); title('Relative Error of Estimates'); xlabel('(Estimated Staff Months - Actual Staff Months)/Actual Staff Months'); ylabel('Number of Projects'); print('nasa93_relative_error.jpg'); max_ksloc = max(ksloc(:)); mean_ksloc = mean(ksloc(:)); min_ksloc = min(ksloc(:)); max_mm = max(staff_months(:)); mean_mm = mean(staff_months(:)); min_mm = min(staff_months(:)); mean_are = mean(abs(relative_error(:))); % known as MMRE Mean Magnitude of Relative Error max_are = max(abs(relative_error(:))); min_are = min(abs(relative_error(:))); prod = 1000.0*ksloc ./ staff_months; max_prod = max(prod(:)); mean_prod = mean(prod(:)); median_prod = median(prod(:)); min_prod = min(prod(:)); std_prod = std(prod(:)); % standard deviation of software productivity printf('making figure 4\n'); fflush(stdout); figure(4); hist(prod, 20); title('Software Productivity of NASA 93 Projects'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod.jpg'); % PRED(30) is number of actuals within 30% of predicted value ind = find(abs(relative_error(:) <= 0.3)); pred30 = numel(ind); % gaussian/normal point of reference g_data = randn(1,93); mean_g = mean(g_data(:)); std_g = std(g_data(:)); skewness_g = skewness(g_data(:)); kurtosis_g = kurtosis(g_data(:)); % technically the kurtosis in Octae is the "excess kurtosis" which is defined so the kurtosis of the Normal distribution has an expected value of zero mean_re = mean(relative_error(:)); std_re = std(relative_error(:)); skewness_re = skewness(relative_error(:)); kurtosis_re = kurtosis(relative_error(:)); printf('making figure 5\n'); fflush(stdout); figure(5) hist(g_data*std_re + mean_re, 20); title('Normal Distribution Data'); ylabel('Number Samples'); xlabel('Scaled Relative Error'); print('nasa93_scaled_normal.jpg'); % figure 5 as JPEG % display the distribution of the kurtosis of the normal distribution fflush(stdout); printf("computing kurtosis of normal distribution\n"); fflush(stdout); g_data_k = randn(10000, 93); % 100 test sets of 93 samples g_kurtosis = kurtosis(g_data_k,2); figure(6); hist(g_kurtosis, 20); title('Excess Kurtosis of Normal Distribution'); xlabel('Kurtosis'); ylabel('Number of Test Sets'); print('normal_kurtosis_distribution.jpg'); g_skewness = skewness(g_data_k, 2); printf('making figure 7\n'); fflush(stdout); figure(7) hist(g_skewness, 20); title('Skewness of Normal Distribution'); xlabel('Skewness'); ylabel('Number of Test Sets'); print('normal_skewness_distribution.jpg'); % tails x = -10.0:0.1:10.0; y = (1.0/sqrt(2*pi))*exp(-x.^2/2.0); printf('making figure 8\n'); fflush(stdout); figure(8) plot(x,y,'-', 'linewidth', 3); title('Normal Distribution (Thin Tails)'); print('normal.jpg'); y_cauchy = 1.0./(1.0 + x.^2); norm_cauchy = 0.1*sum(y_cauchy); y_cauchy = y_cauchy ./ norm_cauchy; figure(9) plot(x,y_cauchy,'-', 'linewidth', 3); title('Cauchy Distribution (Fat Tails)'); print('cauchy.jpg'); printf('making figure 10\n'); fflush(stdout); figure(10); plot(x, y, '-', 'linewidth', 3, x, y_cauchy, '-g', 'linewidth', 3); title('Normal and Cauchy Distributions Together'); legend('Normal', 'Cauchy'); legend('boxon'); % put box around legend print('normal_cauchy.jpg'); year = data93(:,6); % year of project printf('making figure 11\n'); fflush(stdout); figure(11); years = 1970:1990; hist(year, years); title('NASA 93 Software Projects by Year'); xlabel('Year'); ylabel('Number of Projects'); print('project_years.jpg'); printf('making figure 12\n'); fflush(stdout); figure(12) hist(ksloc, 50); title('Size of NASA 93 Software Projects'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Number of Projects'); print('project_size_ksloc.jpg'); printf('making figure 13\n'); fflush(stdout); figure(13) hist(staff_months, 50); title('Size of NASA 93 Software Projects'); xlabel('Staff Months'); ylabel('Number of Projects'); print('project_size_sm.jpg'); printf('making figure 14\n'); fflush(stdout); figure(14) staff_years = staff_months / 12.; % convert to mythical man year/staff year hist(staff_years, 50); title('Size of NASA 93 Software Projects'); xlabel('Staff Years'); ylabel('Number of Projects'); print('project_size_sy.jpg'); % plots for different COCOMO Modes printf('making figure 15\n'); fflush(stdout); figure(15) loglog(ksloc(e_row), staff_months(e_row), 'o', cocomo_x, cocomo_e, 'r-'); title('NASA 93 DATA (EMBEDDED PROJECTS ONLY)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Embedded Data', 'Embedded Model', 'location', 'northwest'); legend("boxon"); print('nasa93_embedded_data.jpg'); % largest effort project is embedded as might expect printf('making figure 16\n'); fflush(stdout); figure(16) loglog(ksloc(semi_row), staff_months(semi_row), 'o', cocomo_x, cocomo_semi, 'r-'); title('NASA 93 DATA (SEMI-DETACHED PROJECTS ONLY)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Semi Detached Data', 'Semi Detached Model', 'location', 'northwest'); legend("boxon"); print('nasa93_semi_data.jpg'); % largest size (KSLOC) project is semi-detached printf('making figure 17\n'); fflush(stdout); figure(17) loglog(ksloc(org_row), staff_months(org_row), 'o', cocomo_x, cocomo_org, 'r-'); title('NASA 93 DATA (ORGANIC PROJECTS ONLY)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Organic Data', 'Organic Model', 'location', 'northwest'); legend("boxon"); print('nasa93_org_data.jpg'); printf('making figure 18\n'); fflush(stdout); figure(18) loglog(ksloc(org_row), staff_months(org_row), '*k', ksloc(semi_row), staff_months(semi_row), 'ob', ksloc(e_row), staff_months(e_row),'or'); title('NASA 93 DATA (ALL PROJECTS)'); xlabel('Thousands of Lines of Code (KSLOC)'); ylabel('Staff Months (SM)'); legend('Organic', 'Semi-Detached', 'Embedded', "location", "northwest"); legend("boxon"); print('nasa93_by_mode_data.jpg'); % productivity by cocomo mode printf('making figure 19\n'); fflush(stdout); figure(19); hist(prod(org_row), 20); title('Software Productivity Organic Mode'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod_org.jpg'); printf('making figure 20\n'); fflush(stdout); figure(20); hist(prod(semi_row), 20); title('Software Productivity Semi Detached Mode'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod_semi.jpg'); printf('making figure 21\n'); fflush(stdout); figure(21); hist(prod(e_row), 20); title('Software Productivity Embedded Mode'); xlabel('Lines of Code per Staff Month (SLOC/SM)'); ylabel('Number of Projects'); print('nasa93_prod_embedded.jpg'); printf("ALL DONE\n"); fflush(stdout);
nasa93_raw_data.txt
1,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,25.9,117.6 2,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,24.6,117.6 3,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,7.7,31.2 4,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,8.2,36 5,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,9.7,25.2 6,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,2.2,8.4 7,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,3.5,10.8 8,erb,avionicsmonitoring,g,2,1982,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,66.6,352.8 9,gal,missionplanning,g,1,1980,2,h,l,h,xh,xh,l,h,h,h,h,n,h,h,h,n,7.5,72 10,gal,missionplanning,g,1,1980,2,n,l,h,n,n,l,l,h,vh,vh,n,h,n,n,n,20,72 11,gal,missionplanning,g,1,1984,2,n,l,h,n,n,l,l,h,vh,h,n,h,n,n,n,6,24 12,gal,missionplanning,g,1,1980,2,n,l,h,n,n,l,l,h,vh,vh,n,h,n,n,n,100,360 13,gal,missionplanning,g,1,1985,2,n,l,h,n,n,l,l,h,vh,n,n,l,n,n,n,11.3,36 14,gal,missionplanning,g,1,1980,2,n,l,h,n,n,h,l,h,h,h,l,vl,n,n,n,100,215 15,gal,missionplanning,g,1,1983,2,n,l,h,n,n,l,l,h,vh,h,n,h,n,n,n,20,48 16,gal,missionplanning,g,1,1982,2,n,l,h,n,n,l,l,h,n,n,n,vl,n,n,n,100,360 17,gal,missionplanning,g,1,1980,2,n,l,h,n,xh,l,l,h,vh,vh,n,h,n,n,n,150,324 18,gal,missionplanning,g,1,1984,2,n,l,h,n,n,l,l,h,h,h,n,h,n,n,n,31.5,60 19,gal,missionplanning,g,1,1983,2,n,l,h,n,n,l,l,h,vh,h,n,h,n,n,n,15,48 20,gal,missionplanning,g,1,1984,2,n,l,h,n,xh,l,l,h,h,n,n,h,n,n,n,32.5,60 21,X,avionicsmonitoring,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,19.7,60 22,X,avionicsmonitoring,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,66.6,300 23,X,simulation,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,29.5,120 24,X,monitor_control,g,2,1986,2,h,n,n,h,n,n,n,n,h,h,n,n,n,n,n,15,90 25,X,monitor_control,g,2,1986,2,h,n,h,n,n,n,n,n,h,h,n,n,n,n,n,38,210 26,X,monitor_control,g,2,1986,2,n,n,n,n,n,n,n,n,h,h,n,n,n,n,n,10,48 27,X,realdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,15.4,70 28,X,realdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,48.5,239 29,X,realdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,16.3,82 30,X,communications,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,12.8,62 31,X,batchdataprocessing,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,32.6,170 32,X,datacapture,g,2,1982,2,n,vh,h,vh,vh,l,h,vh,h,n,l,h,vh,vh,l,35.5,192 33,X,missionplanning,g,2,1985,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,5.5,18 34,X,avionicsmonitoring,g,2,1987,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,10.4,50 35,X,avionicsmonitoring,g,2,1987,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,14,60 36,X,monitor_control,g,2,1986,2,h,n,h,n,n,n,n,n,n,n,n,n,n,n,n,6.5,42 37,X,monitor_control,g,2,1986,2,n,n,h,n,n,n,n,n,n,n,n,n,n,n,n,13,60 38,X,monitor_control,g,2,1986,2,n,n,h,n,n,n,n,n,n,h,n,h,h,h,n,90,444 39,X,monitor_control,g,2,1986,2,n,n,h,n,n,n,n,n,n,n,n,n,n,n,n,8,42 40,X,monitor_control,g,2,1986,2,n,n,h,h,n,n,n,n,n,n,n,n,n,n,n,16,114 41,hst,datacapture,g,2,1980,2,n,h,h,vh,h,l,h,h,n,h,l,h,h,n,l,177.9,1248 42,slp,launchprocessing,g,6,1975,2,h,l,h,n,n,l,l,n,n,h,n,n,h,vl,n,302,2400 43,Y,application_ground,g,5,1982,2,n,h,l,n,n,h,n,h,h,n,n,n,h,h,n,282.1,1368 44,Y,application_ground,g,5,1982,2,h,h,l,n,n,n,h,h,h,n,n,n,h,n,n,284.7,973 45,Y,avionicsmonitoring,g,5,1982,2,h,h,n,n,n,l,l,n,h,h,n,h,n,n,n,79,400 46,Y,avionicsmonitoring,g,5,1977,2,l,n,n,n,n,l,l,h,h,vh,n,h,l,l,h,423,2400 47,Y,missionplanning,g,5,1977,2,n,n,n,n,n,l,n,h,vh,vh,l,h,h,n,n,190,420 48,Y,missionplanning,g,5,1984,2,n,n,h,n,h,n,n,h,h,n,n,h,h,n,h,47.5,252 49,Y,missionplanning,g,5,1980,2,vh,n,xh,h,h,l,l,n,h,n,n,n,l,h,n,21,107 50,Y,simulation,g,5,1983,2,n,h,h,vh,n,n,h,h,h,h,n,h,l,l,h,78,571.4 51,Y,simulation,g,5,1984,2,n,h,h,vh,n,n,h,h,h,h,n,h,l,l,h,11.4,98.8 52,Y,simulation,g,5,1985,2,n,h,h,vh,n,n,h,h,h,h,n,h,l,l,h,19.3,155 53,Y,missionplanning,g,5,1979,2,h,n,vh,h,h,l,h,h,n,n,h,h,l,vh,h,101,750 54,Y,missionplanning,g,5,1979,2,h,n,h,h,h,l,h,n,h,n,n,n,l,vh,n,219,2120 55,Y,utility,g,5,1979,2,h,n,h,h,h,l,h,n,h,n,n,n,l,vh,n,50,370 56,spl,datacapture,g,2,1979,2,vh,h,h,vh,vh,n,n,vh,vh,vh,n,h,h,h,l,227,1181 57,spl,batchdataprocessing,g,2,1977,2,n,h,vh,n,n,l,n,h,n,vh,l,n,h,n,l,70,278 58,de,avionicsmonitoring,g,2,1979,2,h,l,h,n,n,l,l,n,n,n,n,h,h,n,l,0.9,8.4 59,slp,operatingsystem,g,6,1974,2,vh,l,xh,xh,vh,l,l,h,vh,h,vl,h,vl,vl,h,980,4560 60,slp,operatingsystem,g,6,1975,3,n,l,h,n,n,l,l,vh,n,vh,h,h,n,l,n,350,720 61,Y,operatingsystem,g,5,1976,3,h,n,xh,h,h,l,l,h,n,n,h,h,h,h,n,70,458 62,Y,utility,g,5,1979,3,h,n,xh,h,h,l,l,h,n,n,h,h,h,h,n,271,2460 63,Y,avionicsmonitoring,g,5,1971,1,n,n,n,n,n,l,l,h,h,h,n,h,n,l,n,90,162 64,Y,avionicsmonitoring,g,5,1980,1,n,n,n,n,n,l,l,h,h,h,n,h,n,l,n,40,150 65,Y,avionicsmonitoring,g,5,1979,3,h,n,h,h,n,l,l,h,h,h,n,h,n,n,n,137,636 66,Y,avionicsmonitoring,g,5,1977,3,h,n,h,h,n,h,l,h,h,h,n,h,n,vl,n,150,882 67,Y,avionicsmonitoring,g,5,1976,3,vh,n,h,h,n,l,l,h,h,h,n,h,n,n,n,339,444 68,Y,avionicsmonitoring,g,5,1983,1,l,h,l,n,n,h,l,h,h,h,n,h,n,l,n,240,192 69,Y,avionicsmonitoring,g,5,1978,2,h,n,h,n,vh,l,n,h,h,h,h,h,l,l,l,144,576 70,Y,avionicsmonitoring,g,5,1979,2,n,l,n,n,vh,l,n,h,h,h,h,h,l,l,l,151,432 71,Y,avionicsmonitoring,g,5,1979,2,n,l,h,n,vh,l,n,h,h,h,h,h,l,l,l,34,72 72,Y,avionicsmonitoring,g,5,1979,2,n,n,h,n,vh,l,n,h,h,h,h,h,l,l,l,98,300 73,Y,avionicsmonitoring,g,5,1979,2,n,n,h,n,vh,l,n,h,h,h,h,h,l,l,l,85,300 74,Y,avionicsmonitoring,g,5,1982,2,n,l,n,n,vh,l,n,h,h,h,h,h,l,l,l,20,240 75,Y,avionicsmonitoring,g,5,1978,2,n,l,n,n,vh,l,n,h,h,h,h,h,l,l,l,111,600 76,Y,avionicsmonitoring,g,5,1978,2,h,vh,h,n,vh,l,n,h,h,h,h,h,l,l,l,162,756 77,Y,avionicsmonitoring,g,5,1978,2,h,h,vh,n,vh,l,n,h,h,h,h,h,l,l,l,352,1200 78,Y,operatingsystem,g,5,1979,2,h,n,vh,n,vh,l,n,h,h,h,h,h,l,l,l,165,97 79,Y,missionplanning,g,5,1984,3,h,n,vh,h,h,l,vh,h,n,n,h,h,h,vh,h,60,409 80,Y,missionplanning,g,5,1984,3,h,n,vh,h,h,l,vh,h,n,n,h,h,h,vh,h,100,703 81,hst,Avionics,f,2,1980,3,h,vh,vh,xh,xh,h,h,n,n,n,l,l,n,n,h,32,1350 82,hst,Avionics,f,2,1980,3,h,h,h,vh,xh,h,h,h,h,h,h,h,h,n,n,53,480 84,spl,Avionics,f,3,1977,3,h,l,vh,vh,xh,l,n,vh,vh,vh,vl,vl,h,h,n,41,599 89,spl,Avionics,f,3,1977,3,h,l,vh,vh,xh,l,n,vh,vh,vh,vl,vl,h,h,n,24,430 91,Y,Avionics,f,5,1977,3,vh,h,vh,xh,xh,n,n,h,h,h,h,h,h,n,h,165,4178.2 92,Y,science,f,5,1977,3,vh,h,vh,xh,xh,n,n,h,h,h,h,h,h,n,h,65,1772.5 93,Y,Avionics,f,5,1977,3,vh,h,vh,xh,xh,n,l,h,h,h,h,h,h,n,h,70,1645.9 94,Y,Avionics,f,5,1977,3,vh,h,xh,xh,xh,n,n,h,h,h,h,h,h,n,h,50,1924.5 97,gal,Avionics,f,5,1982,3,vh,l,vh,vh,xh,l,l,h,l,n,vl,l,l,h,h,7.25,648 98,Y,Avionics,f,5,1980,3,vh,h,vh,xh,xh,n,n,h,h,h,h,h,h,n,h,233,8211 99,X,Avionics,f,2,1983,3,h,n,vh,vh,vh,h,h,n,n,n,l,l,n,n,h,16.3,480 100,X,Avionics,f,2,1983,3,h,n,vh,vh,vh,h,h,n,n,n,l,l,n,n,h,6.2,12 101,X,science,f,2,1983,3,h,n,vh,vh,vh,h,h,n,n,n,l,l,n,n,h,3,38]]>
The standard audio pitch shifting incorporated in many commonly used audio editors such as the free open-source Audacity editor is presented in detail. The article also shows the results of using a more sophisticated algorithm that produces a more natural sounding pitch-shifted voice similar to the voice of the famous cartoon character Mickey Mouse.
One of the basic concepts and methods of signal and speech processing is the Fourier transform, named after the French mathematician and physicist Joseph Fourier. The basic concept is that any real function can be represented as the sum of the trigonometric sine and cosine functions. For example, a function defined on the region can be expanded as the sum of sines and cosines:
where the coefficients and are known as Fourier coefficients. This is a continuous Fourier Transform.
There is a discrete version of the Fourier Transform, often used in digital signal processing:
where is the index of an array of discrete values such as audio samples, is the value of the th audio sample, is the index of the discrete Fourier coefficients and is the number of discrete values such as the number of audio samples in an audio “frame”. The index is essentially the frequency of the Fourier component. This version of the discrete Fourier Transform uses the mathematical identity:
where
to combine the cosine and sine function components into complex functions and numbers.
In audio signal processing such as speech or music, the Fourier Transform has a straightforward meaning. The sound is broken up into a combination of frequency components. In most instrumental music, this is very simple. The music is a collection of notes or tones with specific frequencies. Percussion instruments and certain other instruments can produce more complex sounds with many frequency components. A spectrogram of a signal such as speech or music shows time on the horizontal axis and the strength of the frequency component on the vertical axis. This is the spectrogram of a pure 100 Hertz (cycles per second) tone:
The spectrogram is generated using the specgram function in the Octave signal signal processing package by dividing the signal into a series of overlapping audio frames. Overlapping audio frames are frequently used to achieve better time resolution during signal processing in the Fourier domain. Each audio frame is windowed using the Hanning window to reduce aliasing effects.
The Fourier transform is applied to each windowed audio frame, giving a series of frequency components, which are displayed on the vertical dimension of the spectrogram. Each frequency component is a bin in frequency covering a frequency range equal to the audio sample rate divided by the number of samples in the audio frame. This frequency bin size or frequency resolution of the Fourier transform is about 20 Hz in the spectrogram above (44100 samples per second/2048 samples in an audio frame = 21.533 cycles per second). Because the 100 Hz tone in the example is not perfectly centered in the frequency bin spanning 100 Hz, the tone spreads out in the spectrogram, contributing to other bins as can be seen above. This is a limitation of the discrete Fourier transform which can lead to problems with signal processing such as pitch shifting.
Speech has a much more complex structure than a pure tone. In fact, the structure of speech remains poorly understood which is why current (2011) speech recognition systems perform poorly in realistic field conditions compared to human beings. This spectrogram shows the structure of the introduction to United States President Barack Obama‘s April 2, 2011 speech on the energy crisis: “Hello everybody. I’m speaking to you today from a UPS customer center in Landover, Maryland where I came to talk about an issue that is affecting families and businesses just like this one — the rising price of gas and what we can…”.
The spectrogram below shows the region from 0 to 600 cycles per second (Hertz). One can see a series of bands in the spectrogram. These bands are located at integer multiples (1, 2, 3, …) of the lowest frequency band, which is often referred to as F0 in the scholarly speech literature. The bands are known as the harmonics. F0 is known as the fundamental frequency. This is the frequency of vibration of the glottis which provides the driving sound for speech and is located in the throat. The glottis vibrates at frequencies ranging from as low as 80 cycles per second (Hertz) in some men to as high as 400 cycles per second (Hertz) in some women and children. This fundamental frequency appears to be loosely correlated with the height of the speaker, higher for short speakers such as children and lower for taller women and men.
The fundamental frequency F0 fluctuates in a rhythmic pattern that is not well understood as people speak. In some languages such as Mandarin Chinese, the changing pitch conveys meaning; a word with rising pitch has a different meaning from an otherwise identical word with falling pitch. In English, a rising pitch at the end of a phrase or sentence indicates that a question is being asked. “The chair.” is pronounced with falling pitch whereas “The chair?” is pronounced with a rising pitch at the end. It is difficult and even sometimes impossible to understand English if the rhythmic pattern of the fundamental frequency or pitch is abnormal.
This spectrogram shows President Dwight David Eisenhower saying “in the councils of government we must guard against the acquisition of unwarranted influence, whether sought or unsought, by the military industrial complex” from his Farewell Address, January 17, 1961, probably his most famous phrase and his most famous speech today.
This spectrogram shows the spectrogram in the range 0 to 600 Hertz (cycles per second). Again, one can easily see the repeating bands.
Human beings perceive something which we call “pitch” in English which appears closely related to or identical to the center frequency of the F0 band in the spectrogram. The F0 band will be higher in higher pitched speakers such as many women and most children. Both President Obama and President Eisenhower have similar pitches, varying between 200 and 75 Hertz with an average of about 150 Hertz. Nonetheless, their voices sound very different. The F0 band can be as low as 70 or 80 Hertz (cycles per second) in a few speakers. Former California governor and actor Arnold Schwarzenegger used an extremely low pitched voice while playing the Terminator, his most famous role.
In general, low pitched voices tend to convey seriousness and sometimes menace whereas high pitched voices tend to convey less seriousness, although there are exceptions. The voice of the genocidal Daleks in the BBC’s Dr. Who series is both high pitched and menacing at the same time. Cartoon style voices can be created by shifting the pitch of normal speakers. This has been done for the Alvin and the Chipmunks characters created by Ross Bagdasarian Sr.. It is probable that some form of pitch shifting has been used over the years to create some of the voices of the Daleks on Dr. Who. Some robot voices have probably been created by combining pitch shifting with other audio effects.
Pitch shifting predates the digital era. In the analog audio era, one could shift the pitch of a speaker by playing a record or tape faster or slower than normal. This shifts the pitch but also changes the tempo — speed or rate of speaking — as well. One can achieve a pure pitch shift by, for example, recording a voice performer speaking at half normal speed and then playing the recording back at twice the normal rate. In this case, the pitch will be shifted up by a factor of two and the tempo or rate of speaking will be normal. One can create the Alvin and the Chipmunks high pitched voice in this way using analog tapes or records. One can also create lower pitched voices by appropriately combining the tempo of the original voice and the playback rate of the recording. Although these voices are easily understandable, they have artificial, electronic qualities not found in normal low or high pitched speakers or voice performers intentionally creating a low or high pitched voice. The voice of Walt Disney’s Mickey Mouse was performed by a series of voice artists starting with Walt Disney himself. This high pitched voice sounds much more natural than the Alvin and the Chipmunks voice.
In digital audio, it is possible to shift the pitch of the voice without changing the tempo of the speech. This can be done by manipulating the Fourier transform of the speech, the spectrogram, and converting back to the “time domain,” the actual audio samples. One can simply shift the Fourier components from their original frequency bin in the spectrogram to an appropriate higher or lower frequency bin. For example, if a Fourier component is in the 100 Hz bin, one shifts this Fourier component value to the 200 Hz bin to double the pitch. This must be done for each and every non-zero Fourier component. In general, this will produce a recognizable pitch shifted voice. If the Fourier components are not centered in each bin, which is normally the situation, this pitch shifted voice will have an annoying beat or modulation. It is necessary to perform some additional mathematical acrobatics to compensate for these effects to produce a relatively smooth pitch shifted voice similar to the output of the analog processing described above.
This video is President Obama’s original introduction from his April 2, 2011 speech on the energy crisis. Click on the images below to download or play the videos.
This video is President Obama speaking with his pitch doubled by shifting the Fourier components but without the mathematical acrobatics to compensate for un-centered frequency components:
This video is President Obama speaking with a chipmunked voice; his pitch has been doubled.
This video is President Obama speaking with a deep voice; his pitch has been reduced to seventy percent of normal.
Octave is a free open-source numerical programming environment that is mostly compatible with MATLAB. The Octave source code below, the Octave function chipmunk, implements the standard pitch shifting algorithm in widespread use. The Octave code requires both Octave and the Octave Forge signal signal processing package for the specgram function which computes the spectrogram of the signal.
The videos in this article were created by downloading the original MPEG-4 videos from the White House web site and splitting the audio and video into a MS WAVE file and a sequence of JPEG still images using the FFMPEG utility. Presidential speeches and video are in the public domain in the United States. The original still images were reduced in size by half using the ImageMagick convert utility. The audio was pitch shifted in Octave using the chipmunk function below. The new audio and video were recombined into the MPEG-4 videos in this article by again using the FFMPEG utility. Variants of this pitch shifting algorithm can be found in many programs including the widely used free open-source Audacity audio editor (the Audacity pitch shifting algorithm may be slightly different from the algorithm implemented below):
function [ofilename, new_phase, output] = chipmunk(filename, pitchShift, fftSize, numberOverlaps, thresholdFactor) % [ofilename, new_phase, output] = chipmunk(filename [,pitchShift , fftSize, numberOverlaps, thresholdFactor]); % % chipmunk audio effect (as in Alvin and the Chipmunks) % % ofilename -- name of output file with pitch shifted audio % new_phase -- the recomputed phases for the pitch shift audio (for debugging) % output -- the pitch shifted audio samples % % arguments: % % filename -- input file name (MS Wave audio file) % pitchShift -- frequency/pitch shift (default=2.0) % fftSize -- size of FFT (default = 2048) % numberOverlaps -- number of overlaps (default = 4) % thresholdFactor -- threshold factor for zeroing silence frames % % $Id: chipmunk.m 1.44 2011/08/04 01:25:35 default Exp default $ % (C) 2011 John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % Web: http://www.jmcgowan.com/ % if nargin < 2 pitchShift = 2.0; % frequency shift end nPitchShift = uint32(pitchShift*100); % to write output file if nargin < 3 fftSize = 2048; % size of audio blocks/FFT size end if nargin < 4 numberOverlaps = 4; % number of overlaps end if nargin < 5 thresholdFactor = 0.002; end printf("pitchShift: %f fftSize: %d numberOverlaps: %d thresholdFactor: %f\n", pitchShift, fftSize, numberOverlaps, thresholdFactor); fflush(stdout); stepSize = fftSize/numberOverlaps; phaseShift = 2.0*pi*(stepSize/fftSize); printf("loading %s\n", filename); fflush(stdout); result = char(strsplit(filename, '.')); filestem = result(1,:); ext = sprintf("_oct_%d_%d_%d.wav", nPitchShift, fftSize, numberOverlaps); ofilename = [filestem ext]; [data, sampleRate, bits] = wavread(filename); freq_resolution = sampleRate / fftSize; % frequency resolution = sample rate / fft size if columns(data) > 1 raw_data = data(:,1); % input is stereo with 2 channels in 2 columns of array else raw_data = data; % mono sound input end data = []; clear data; % free memory mx_input = max(abs(raw_data(:))); printf("applying fft\n"); fflush(stdout); %spectrogram = fft(spectrogram); overlap = fftSize - stepSize; printf("stepSize: %d overlap is %d\n", stepSize, overlap); fflush(stdout); nsamples = length(raw_data); % hanning window window = hanning(fftSize); % window the output window = (numel(window)/sum(window(:)) )*window; % normalize the window % use Octave signal package specgram function to apply fft to windowed overlapping frames % [] indicates default window (hanning) % [spectrogram, f, t] = specgram(raw_data, fftSize, sampleRate, window, overlap); printf("spectrogram has dimensions %d %d\n", rows(spectrogram), columns(spectrogram)); fflush(stdout); % free memory raw_data = []; clear raw_data; intensity = dot(spectrogram, spectrogram, 1); % each column is an audio frame max_intensity = max(intensity(:)); threshold = thresholdFactor*max_intensity; speech_frames = intensity > threshold; printf("speech_frames has dimensions: %d %d \n", rows(speech_frames), columns(speech_frames)); fflush(stdout); printf("zeroing silence frames...\n"); fflush(stdout); speech_frames = repmat(speech_frames,rows(spectrogram), 1); spectrogram = spectrogram .* speech_frames; printf("dimensions spectrogram are now: %d %d \n", rows(spectrogram), columns(spectrogram)); fflush(stdout); printf("computing phase...\n"); fflush(stdout); % spectrogram is half-array without duplicate fft coefficients % 1:fftSize/2 rows, number time steps columns % each row is an fft coefficient % magn = 2.*abs( spectrogram ); % magnitude of fft coefficients phase = arg( spectrogram ); % phase of fft coefficients previous_phase = zeros(size(phase)); previous_phase(:,2:end) = phase(:,1:end-1); phaseShifts = (0:(fftSize/2)-1)*phaseShift; % expected phase shift if frequency component is centered in bin phaseShifts = repmat(phaseShifts', 1, columns(phase)); spec_buf = phase - previous_phase; % change in phase from previous time step spec_buf = spec_buf - phaseShifts; % difference between change in phase and expected phase change % if frequency component is centered in frequency bin printf("computing phase adjustment\n"); fflush(stdout); % handle mapping to -pi to pi range of atan2/arg (below) phase_adjust = uint32(spec_buf./pi); % 0 if spec_buf between -pi and pi phase_adjust = phase_adjust + ((phase_adjust >= 0).*(2) - 1).*bitand(phase_adjust,1); spec_buf = spec_buf - pi*double(phase_adjust); spec_buf = numberOverlaps*spec_buf./(2*pi); printf("computing corrected frequencies\n"); fflush(stdout); % compute corrected frequency frequencies = repmat(f',1,columns(spectrogram)); % f is row vector when returned by specgram spec_buf = frequencies + spec_buf*freq_resolution; corrected_freq = spec_buf; printf("applying frequency shift\n"); fflush(stdout); shifted_magn = zeros(size(magn)); shifted_freq = zeros(size(corrected_freq)); oldTime = time; for k = 1:fftSize/2 ind = uint32((k-1)*pitchShift) + 1; if (ind <= fftSize/2) shifted_magn(ind,:) += magn(k,:); shifted_freq(ind,:) = corrected_freq(k,:) * pitchShift; end newTime = time; deltaTime = newTime - oldTime; if (deltaTime > 1) pct = (k / fftSize)*100.0; % percent progress printf("frequency shift: processed %3.1f%% %d/%d\n", pct, k, fftSize); fflush(stdout); oldTime = time; end % end if end %shifted_freq = corrected_freq * pitchShift; % now convert from mag and freq to mag and phase % printf("computing new phase\n"); fflush(stdout); spec_buf = zeros(size(spectrogram)); % make sure start with zeros printf("new phase: assigning shifted frequencies\n"); fflush(stdout); spec_buf(2:end,:) = shifted_freq(2:end,:); printf("new phase: subtracting center frequencies\n"); fflush(stdout); spec_buf(2:end,:) = spec_buf(2:end,:) - (frequencies(2:end,:) ); printf("new phase: dividing by frequency resolution\n"); fflush(stdout); spec_buf(2:end,:) /= freq_resolution; printf("new phase: adjusting for overlap\n"); fflush(stdout); spec_buf(2:end,:) = 2.*pi*spec_buf(2:end,:)/numberOverlaps; printf("new phase: computing delta phase\n"); fflush(stdout); delta_phase = spec_buf + phaseShifts; %delta_phase = phaseShifts; new_phase = delta_phase; printf("new phase: adding delta phase\n"); fflush(stdout); %new_phase = spec_buf; new_phase = zeros(size(spec_buf)); % % %new_phase(:,1) = spec_buf(:,1); % % %dc coefficient has no phase (always a non-negative real) oldTime = time; ncols = columns(spec_buf); for i = 2:ncols new_phase(2:end,i) = new_phase(2:end,i-1) + delta_phase(2:end,i-1); newTime = time; deltaTime = newTime - oldTime; if (deltaTime > 1) pct = (i / ncols)*100.0; % percent progress printf("new phase: processed %3.1f%% %d/%d\n", pct, k, fftSize); fflush(stdout); oldTime = time; end % end if end spec_buf = []; clear spec_buf; % free memory new_spectrogram = zeros(fftSize, columns(spectrogram)); % allocate full fft array for inverse fft new_spectrogram(1,:) = shifted_magn(1,:); % dc coefficient new_spectrogram(2:fftSize/2,:) = shifted_magn(2:end,:).*cos(new_phase(2:end,:)) + i*shifted_magn(2:end,:).*sin(new_phase(2:end,:)); new_spectrogram(fftSize/2 + 2:end,:) = conj(flipud(new_spectrogram(2:fftSize/2,:))); % reflect fft coefficients spectrogram = []; clear spectrogram; % INVERSE FFT % printf("applying inverse fft\n"); fflush(stdout); new_data = real(ifft(new_spectrogram))/fftSize; printf("dimensions new_data are %d %d\n", rows(new_data), columns(new_data)); fflush(stdout); new_spectrogram = []; clear new_spectrogram; % each column is an audio frame which may overlap with previous audio frame by overlap samples % iframe = 1; % start at frame 1 it = 1; % start at first sample of output output = zeros(nsamples,1); % all rows, 1 column printf("applying overlap and add...\n"); fflush(stdout); while( (it+fftSize-1) < nsamples) update = (new_data(:,iframe).*window)/numberOverlaps; % row of audio data output(it:it+fftSize-1) = output(it:it+fftSize-1) + update(1:fftSize); it = it + stepSize; % advance to next time iframe = iframe + 1; % advance to next audio frame (column of new_data) end % while new_data = []; clear new_data; mx = max(abs(output(:))); %mean = sum(abs(output(:)))/numel(output); if mx > 1.0 scale_factor = mx / mx_input; printf("scaling output by %f\n", 1.0/scale_factor); fflush(stdout); output = output / scale_factor; end printf("writing shifted audio to %s\n", ofilename); fflush(stdout); % wavwrite(output, sampleRate, bits, ofilename); disp('ALL DONE'); end % function %
The screenshot below shows running the chipmunk function in Octave 3.2.4 on a PC under Windows XP Service Pack 2 (Click on the screenshot image to see the full size screenshot). This screenshot shows the function called from the Octave prompt using the default values of the function’s arguments. The argument numberOverlaps controls the mathematics to compensate for the uncentered frequency components. If numberOverlaps is one, there is no compensation. The larger numberOverlaps, the more effective the compensation. The more overlaps, the more computer time and resources required by the pitch shifting. A value of numberOverlaps of thirty-two (32) was used to pitch shift President Obama’s voice in the video above.
Although easily understandable, these pitch-shifted voices sound somewhat artificial. Indeed, this artificial quality is part of the appeal of the Alvin and the Chipmunk voice.
Pitch shifting algorithms have improved. It is now possible to produce voices that sound much more like natural voices at the desired new pitch, very similar to the voice of Mickey Mouse. This video is President Obama speaking with a voice similar to the voice of Mickey Mouse:
This particular pitch shifting algorithm does better with producing natural sounding high pitched voices than low pitched voices.
There are many ways to manipulate voices using mathematics. One of the most common is pitch shifting, which has been described in detail including working source code above. Traditional pitch shifting algorithms give artificial qualities to the pitch-shifted voice. There are now new, improved algorithms that can create more natural sounding pitch-shifted voices. These voices can be used for humor, entertainment, or emphasis in movies, television, video games, video advertisements for small businesses, personal and home video, and in many other applications.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
]]>The update rule for the Game of Life is:
The Game of Life has many interesting and entertaining properties. Amongst other features, it can implement a general purpose computer like the much more complex Pentium CPU chip. There is extensive information on the Game of Life on the Web. Interested readers are referred to this online information as well as the many traditional printed articles and books that discuss the Game of Life. This article discusses how to implement the Game of Life in Octave, a free open-source numerical programming environment that is mostly compatible with MATLAB. Octave is available in both source code and pre-compiled binaries for all three major computing platforms: MS Windows, Macintosh, and Unix/Linux. Full source code in Octave is presented and the results of several simulations of the Game of Life are shown.
There are a number of structures in the Game of Life that have interesting properties. These range from simple oscillators that cycle through a fixed set of patterns to complex self-perpetuating patterns that grow and evolve indefinitely. Several are shown below. These were simulated using an implementation of the Game of Life in Octave. The simulation generates a sequence of images showing the time evolution of the Game of Life. These image squences are either static GIF images or static JPEG images. Where possible, the images were combined into animated GIF sequences shown below using the convert
utility in ImageMagick. In some cases this proved difficult and the images were converted to MPEG-4 video (MP4) using the ffmpeg command line utility.
Gosper’s Gun (Click here to see movie)
The main Game of Life simulation function is simulate_life
. This implementation uses a series of nested loops to implement the update rule in a way that will be familiar to users of procedural programming languages like the C programming language. Later, a function simulate_life_fast
is presented which implements the update rule using the “matrix” operations in Octave with no loops over the rows and columns of the Octave matrix that represents the Game of Life universe.
simulate_life.m
function [] = simulate_life(myseed, niter, name, ext) % [] = simulate_life(myseed, niter [,name, ext]) % simulate Conway's Game of Life % myseed is 2d array with values 0 or 1 % 0 is a dead cell % 1 is a live cell % niter is number of iterations to simulate % name (optional) name of simulation % ext (optionsal) extension of image file (default gif) % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % % to make an animated gif of the game of life % % Install ImageMagick on your computer % command prompt>convert -delay 10 -loop 0 life*.gif game.gif % to make game.gif animated gif % % any good web browser (such as FireFox) can display animated GIF video % if (nargin < 2) printf("ERROR: too few arguments!\n"); printf("Usage: simulate_life(myseed, number_of_iterations [,simulation_name])\n"); fflush(stdout); return; end if nargin < 3 name = 'life'; end if nargin < 4 ext = 'gif'; end nx = rows(myseed); ny = columns(myseed); previous = myseed; update = zeros(size(previous)); display_array = life_grid(myseed); imshow(!display_array); title('SEED'); pause(1); seed_name = sprintf("%s000.%s", name, ext); print(seed_name); total_live = sum(sum(previous)); printf("%d live cells in seed\n", total_live); fflush(stdout); for iter = 1:niter % simulate one iteration/update of game % for ix = 1:nx for iy = 1:ny lowx = max(ix-1,1); hix = min(ix+1, nx); lowy = max(iy-1,1); hiy = min(iy+1, ny); nlive = sum(sum(previous(lowx:hix,lowy:hiy))); % add up number of live cells in neighborhood % handle four (4) corner cases if (ix == 1 && iy == 1) || (ix == 1 && iy == ny) || (ix == nx && iy == ny) || (ix == nx && iy == 1) n11 = previous(cycle(ix-1,nx), cycle(iy-1,ny)); n12 = previous(ix , cycle(iy-1, ny)); n13 = previous(cycle(ix+1,nx), cycle(iy-1,ny)); n21 = previous(cycle(ix-1,nx), iy); n22 = previous(ix, iy); % center cell (current cell) n23 = previous(cycle(ix+1,nx), iy); n31 = previous(cycle(ix-1,nx), cycle(iy+1, ny)); n32 = previous(ix , cycle(iy+1, ny)); n33 = previous(cycle(ix+1, nx), cycle(iy+1, ny)); nlive = n11 + n12 + n13 + n21 + n22 + n23 + n31 + n32 + n33; else % non corner cases % handle cells at edge of universe (treat as closed universe) if ix == 1 nlive = nlive + sum(previous(nx,lowy:hiy)); end if ix == nx nlive = nlive + sum(previous(1,lowy:hiy)); end if iy == 1 nlive = nlive + sum(previous(lowx:hix, ny)); end if iy == ny nlive = nlive + sum(previous(lowx:hix, 1)); end end % else non corner cases if previous(ix,iy) == 1 nlive = nlive - 1; % don't count center cell printf("live cell %d %d has %d live neighbors\n", ix, iy, nlive); fflush(stdout); end if nlive < 2 % cell with fewer than 2 live neighbors dies update(ix, iy) = 0; % cell dies elseif (nlive ==2 || nlive ==3) % cell lives on if it has 2 or 3 neighbors if(previous(ix,iy) == 0 && nlive == 3) update(ix, iy) = 1; % dead cell comes alive if it has exactly 3 live neighbors (reproduction) else update(ix, iy) = previous(ix,iy); end elseif nlive > 3 % cell dies due to overpopulation update(ix, iy) = 0; else update(ix, iy) = previous(ix,iy); printf("error if got here\n"); fflush(stdout); end end % loop over columns end % loop over rows total_live = sum(sum(update)); printf("%d live cells at iteration %d\n", total_live, iter); fflush(stdout); previous = update; filename = sprintf("%s%03d.%s", name, iter, ext); % write image sequence to disk %imshow(!previous); display_array = life_grid(previous); imshow(!display_array); title(filename); pause(1); print(filename); end % loop over iterations end % function simulate_life
The function simulate_life
calls a support function cycle
which uses the Octave mod
function to wrap the row and column coordinates at the edges of the Game of Life universe. This closes the Game of Life universe, giving it a torus or donut shape. In the glider simulation, the glider travels off one end of the universe and reappears at the other end.
cycle.m
function [result] = cycle(n,m) % [result] = cycle(n,m) % 1 to n index wrap % result = mod(n-1,m)+1; end
life_grid.m
function [display_array] = life_grid(previous) % [display_array] = life_grid(previous) % display_array = zeros(10*size(previous)); % create a larger display array display_array(1:10:end,:) = 1; % create grid display_array(:,1:10:end) = 1; display_array(end,:) = 1; display_array(:,end) = 1; for i = 1:rows(display_array)-1 for j = 1:columns(display_array)-1 idata = floor((i-1)/10)+1; jdata = floor((j-1)/10)+1; if(previous(idata, jdata) == 1) display_array(i,j) = 1; end end end end % end function life_grid
life_toad.m
function [result] = life_toad(myseed, toad_x, toad_y) % [result] = life_toad(myseed, toad_x, toad_y) % add a "toad" oscilaltor to Conway's game of life % myseed -- the universe array (2D) % toad_x -- x coordinate of toad % toad_y -- y coordinate of toad % % Author: John F. McGowan, Ph.D. % Web: http://www.jmcgowan.com/ % E-Mail: jmcgowan11@earthlink.net % result = myseed; result(toad_x, toad_y) = 1; result(toad_x+1, toad_y) = 1; result(toad_x+2, toad_y) = 1; result(toad_x + 1, toad_y + 1) = 1; result(toad_x + 2, toad_y + 1) = 1; result(toad_x + 3, toad_y + 1) = 1; end % function life_toad
life_pulsar.m
function [result] = life_pulsar(myseed, pulsar_x, pulsar_y) % [result] = life_pulsar(myseed, pulsar_x, pulsar_y) % add a "pulsar" oscilaltor to Conway's game of life % myseed -- the universe array (2D) % pulsar_x -- x coordinate of pulsar % pulsar_y -- y coordinate of pulsar % % Author: John F. McGowan, Ph.D. % Web: http://www.jmcgowan.com/ % E-Mail: jmcgowan11@earthlink.net % pulsar = [ 0 0 1 1 1 0 0 0 1 1 1 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 0; 1 0 0 0 0 1 0 1 0 0 0 0 1; 1 0 0 0 0 1 0 1 0 0 0 0 1; 1 0 0 0 0 1 0 1 0 0 0 0 1; 0 0 1 1 1 0 0 0 1 1 1 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 1 1 1 0 0 0 1 1 1 0 0; 1 0 0 0 0 1 0 1 0 0 0 0 1; 1 0 0 0 0 1 0 1 0 0 0 0 1; 1 0 0 0 0 1 0 1 0 0 0 0 1; 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 1 1 1 0 0 0 1 1 1 0 0; ]; result = myseed; result(pulsar_x:pulsar_x+rows(pulsar)-1, pulsar_y:pulsar_y+columns(pulsar)-1) = pulsar; end % function life_pulsar
life_glider.m
function [result] = life_glider(myseed, glider_x, glider_y) % [result] = life_glider(myseed, glider_x, glider_y) % add a glider to Conway's game of life % myseed -- the universe array (2D) % glider_x -- x coordinate of glider % glider_y -- y coordinate of glider % % Author: John F. McGowan, Ph.D. % Web: http://www.jmcgowan.com/ % E-Mail: jmcgowan11@earthlink.net % result = myseed; result(glider_x, glider_y) = 1; result(glider_x+1, glider_y+1) = 1; result(glider_x+2, glider_y+1) = 1; result(glider_x, glider_y + 2) = 1; result(glider_x + 1, glider_y + 2) = 1; end % function life_glider
life_gosper.m
function [result] = life_gosper(myseed, gosper_x, gosper_y) % [result] = life_gosper(myseed, gosper_x, gosper_y) % add a "gosper" oscilaltor to Conway's game of life % myseed -- the universe array (2D) % gosper_x -- x coordinate of gosper % gosper_y -- y coordinate of gosper % % Author: John F. McGowan, Ph.D. % Web: http://www.jmcgowan.com/ % E-Mail: jmcgowan11@earthlink.net % gosper = [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0; 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0; ]; result = myseed; result(gosper_x:gosper_x+rows(gosper)-1, gosper_y:gosper_y+columns(gosper)-1) = gosper; end % function life_gosper
The simulations were generated using the following Octave scripts:
l
ife.m
% John Conway's Game of Life in Octave % % Author: John F. McGowan, Ph.D. % Web: http://www.jmcgowan.com/ % E-Mail: jmcgowan11@earthlink.net % nx = 20; ny = 20; niter = 10; % number of iterations of game to simulate % 0 is a dead cell, 1 is a live cell % % blinker % myseed = repmat(0, nx, ny); % create seed for game of life % myseed(nx/2, ny/2) = 1; % myseed((nx/2)+1, ny/2) = 1; % myseed((nx/2)+2, ny/2) = 1; % simulate_life(myseed, niter); % glider myseed = repmat(0, nx, ny); glider_x = nx/2 + 7; glider_y = ny/2 + 7; %myseed = life_glider(myseed, glider_x, glider_y); % add a toad oscillator %myseed = life_toad(myseed, nx/2, ny/2); % blinker at corner % myseed(1,ny) = 1; % myseed(nx,ny) = 1; % myseed(nx-1, ny) = 1; % add a pulsar oscillator (period 3 iterations) myseed = life_pulsar(myseed, 5,5); simulate_life(myseed, 10, "pulsar"); % END OF FILE (Conway's Game of Life in Octave)
gosper_demo.m
% gosper_demo % % demo of Gosper Gun in Conway's Game of Life % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % % myseed = zeros(60, 60); myseed = life_gosper(myseed, 10, 10); simulate_life(myseed, 100, 'gosper', 'jpg');
glider_demo.m
% glider myseed = repmat(0, 20, 20); glider_x = 10; glider_y = 10; myseed = life_glider(myseed, glider_x, glider_y); simulate_life(myseed, 100, 'glider', 'jpg');
Gosper’s Gun (Click Here to See Movie)
Octave has a large number of “matrix” or “n-dimensional array” operators and functions that operate on an entire Octave matrix (two-dimensional array) or n-dimensional array without nested loops over the indices of the matrix or array. These are generally faster, more compact, and often the coding is less error prone than using nested loops. This is an implementation of the Game of Life using Octave matrix operators:
simulate_life_fast.m
function [] = simulate_life_fast(myseed, niter, name, ext) % [] = simulate_life_fast(myseed, niter [,name ,ext]) % simulate Conway's Game of Life % myseed is 2d array with values 0 or 1 % 0 is a dead cell % 1 is a live cell % niter is number of iterations to simulate % name (optional) name of simulation % ext (optionsal) extension of image file (default gif) % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % % to make an animated gif of the game of life % % Install ImageMagick on your computer % command prompt>convert -delay 10 -loop 0 life*.gif game.gif % to make game.gif animated gif % % any good web browser (such as FireFox) can display animated GIF video % if (nargin < 2) printf("ERROR: too few arguments!\n"); printf("Usage: simulate_life(myseed, number_of_iterations [,simulation_name])\n"); fflush(stdout); return; end if nargin < 3 name = 'life'; end if nargin < 4 ext = 'gif'; end nx = rows(myseed); ny = columns(myseed); previous = myseed; update = zeros(size(previous)); imshow(!myseed); title('SEED'); pause(1); seed_name = sprintf("%s000.%s", name, ext); count_neighbors = [1 1 1; 1 0 1; 1 1 1]; % look up tables -- zero based index lut_live = [ 0 0 1 1 0 0 0 0 0 0 0 0]; % live cells with less than 2 live neighbors die, 2-3 live, more than 3 die (overpopulation) lut_dead = [ 0 0 0 1 0 0 0 0 0 0 0 0]; % any dead cell with exactly three (3) live neighbors comes alive print(seed_name); total_live = sum(sum(previous)); for iter = 1:niter % simulate one iteration/update of game % nneighbors = conv2(previous, count_neighbors, 'same'); live_change = previous .* lut_live(nneighbors+1); % update the live cells dead_change = !previous .* lut_dead(nneighbors+1); % bring dead cells with 3 live neighbors to life update = live_change + dead_change; total_live = sum(sum(update)); previous = update; filename = sprintf("%s%03d.%s", name, iter, ext); % write image sequence to disk imshow(!previous); %display_array = life_grid(previous); %imshow(!display_array); title(filename); print(filename); end % loop over iterations end % function simulate_life_fast
The Game of Life is a simple, easy to implement, entertaining cellular automaton. It is easy to implement the Game of Life in Octave (or MATLAB or SciLab). External tools such as ImageMagick convert or ffmpeg can be used to easily convert the image sequences that Octave can generate into animations in commonly used formats such as animated GIF or MPEG-4 video. Even using Octave’s matrix-oriented operators to implement the Game of Life, avoidign the cumbersome and generally slow nested loops over rows and columns, the Octave implementation is still slow compared to a compiled C programming language implementation. This speed issue is probably the primary drawback to using Octave, which otherwise is very quick and convenient and has a much lower development time than low level compiled languages such as C.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Admin’s message: Looking for some great mathematical reads? Check out our list of must-read math books.
]]>In many programming languages the key is a string of characters such as "John Smith"
and the value is another string such as "Address:123 Elm Street;Wife:Amanda"
and so forth. In some object oriented computer programming languages, the key can be any kind of object and the value can be any kind of object such as the key <John Smith's Face Object>
and the value <John Smith's Identifying Information Object>
. An associative array is largely equivalent to a single table in a relational database (RDBMS). In principle, a network of associative arrays can represent complex abstract knowledge and reasoning.
Technically, associative arrays are usually implemented using a hash table. A hash table uses modulo arithmetic to map an object such as a string of characters to the numerical index of an array of values. This array of values is not a simple one dimensional array because there can be collisions where two keys, objects such as strings of characters, “hash” to the same array index. If this happens, there are various methods such as hanging a linked list of elements off the array to handle the collision. Using a hash table means that the time to look up a value is almost constant (O(1)) regardless of the size of the hash table.
The hash table may have millions of entries, but it takes the same small number of operations to look up the associated value. First, compute the array index using modulo arithmetic on the numeric value of the “key”. Second, handle any collisions. The hash table should be large enough that collisions are rare. In principle, an associative array could be implemented in other ways, but hash tables of some kind are generally the fastest, most flexible way to implement an associative array. Hence, the terms associative array, dictionary, map, mapping, and hash table are often used interchangeably.
Octave is a free, open-source numerical programming environment that is mostly compatible with MATLAB. Octave is largely built around matrices (two dimensional arrays) and N (2,3,etc.) dimensional arrays. By default, matrices and N-D arrays in Octave are of the type double (usually a 64 bit IEEE-754 double precision floating point number).
This is due to the history of MATLAB (short for Matrix Laboratory) which started life as an interactive environment for performing two dimensional matrix algebra and computations. At first glance, Octave lacks associative arrays which is a significant deficiency for some types of programming including some types of mathematical programming. Octave does, in fact, have associative arrays. This article shows how to use associative arrays in Octave and use associative arrays to implement cellular automata, a type of discrete mathematical model.
Associative Arrays in Octave
While Octave lacks an explicitly identified associative array, dictionary, map, mapping, or hash table data type, Octave does have data structures or structs similar to structures in the C family of programming languages. For example, in Octave one can define a structure interactively like this:
octave-3.2.3:2> a.key = 'value' a = { key = value } octave-3.2.3:3> a.key2 = 'value2' a = { key = value key2 = value2 }
The structure a now has two fields “key” and “key2″ with values ‘value’ and ‘value2′. In Octave, the data structures are implemented using a hash table and can act as a fully functional associative array. Octave provides several functions to access and manipulate structures, making it easy to use a structure in Octave as an associative array.
Built-in Function: struct ("field", value, "field", value,...) Built-in Function: isstruct (expr) Built-in Function: rmfield (s, f) Function File: [k1,..., v1] = setfield (s, k1, v1,...) Function File: [t, p] = orderfields (s1, s2) Built-in Function: fieldnames (struct) Built-in Function: isfield (expr, name) Function File: [v1,...] = getfield (s, key,...) Function File: substruct (type, subs,...)
All of these functions are useful. Most important for the purposes of this article are struct("field", value, "field", value,...)
which creates a data structure explicitly, setfield(structure_name, key, value,...)
which assigns a value to a key, and getfield(structure_name, key)
which retrieves the value associated with key. These are used in the examples in this article to implement cellular automata.
Cellular Automata
A cellular automaton (plural, cellular automata, sometimes abbreviated as CA) is a discrete mathematical model. A typical cellular automaton consists of a grid of cells with one or more dimensions. Often, the cells have two possible values, 0 and 1, which are often displayed as black and white pixels graphically. The cellular automaton evolves over time in discrete time steps. With each time step or update, a cell changes to 0 or 1 based on the values of the cells in its neighborhood. A cellular automaton has a rule that specifies how it updates.
Cellular automata have been used in entertainment (they make pretty pictures), mathematics, physics, and a number of other fields. Probably the most well known cellular automaton is the “Game of Life”, a two dimensional cellular automaton with many entertaining and interesting properties invented by the mathematician John Conway in the 1970′s.
Stephen Wolfram, the creator of the Mathematica computer algebra system and mathematical research tool, has had a long standing interest in cellular automata. His book, A New Kind of Science, speculates that the universe might be a sort of cellular automaton and be “computationally undecidable” (in layman’s terms, math and science don’t have all the answers). Matthew Cook, who assisted Wolfram in the research for A New Kind of Science, proved that a particularly simple cellular automaton known as “rule 110″ can function as a general purpose computer just like more complex systems such as the Pentium computer chip. An implementation of the “rule 110″ cellular automaton is one of the examples in this article.
The rule for a cellular automaton can be easily represented as an associative array that maps each possible neighborhood to a new value. This is very simple and intuitive. It is easy to implement cellular automata in programming languages with built-in associative array data types. This is illustrated in Octave in the examples in this article.
automata.m
% Description: % % implementation of rule 30 and rule 110 cellular automata using an % associative array in Octave % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % universe_size = 600; seed_start = universe_size/2; rule30 = struct("III", "O", "IIO", "O", "IOI", "O", "IOO", "I", ... "OII", "I", "OIO", "I", "OOI", "I", "OOO", "O"); % use Octave struct as an associative array (also knowns as dictionary or mapping or hash table) myseed = repmat('O', 1, universe_size); myseed(seed_start) = 'I'; myimage = simulate_ca(myseed, rule30, seed_start); figure(1); imshow(myimage); title('Rule 30 Cellular Automaton'); print('rule30.jpg'); % rule 110 % proven to be Turing Complete % rule110 = struct("III", "O", "IIO", "I", "IOI", "I", "IOO", "O", ... "OII", "I", "OIO", "I", "OOI", "I", "OOO", "O"); myimage = simulate_ca(myseed, rule110, seed_start); figure(2) imshow(myimage); title('Rule 110 Cellular Automaton'); print('rule110.jpg'); disp('ALL DONE');
which uses the function simulate_ca defined in the following code:
simulate_ca.m
function [result] = simulate_ca(myseed, rule, niter) % simulate_ca(myseed, rule, niter) % % simulate <niter> iterations of a 1D cellular automaton specified by <rule> % rule starting with the seed <myseed> % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % rule_name = inputname(2); update = repmat('O', 1, length(myseed)); previous = myseed; nx = length(myseed); ny = niter + 1 result = zeros(ny, nx, "uint8"); lut = ones(1, 26)*255; % white background lut(9) = 0; % map I to min gray scale level printf("rule: %s\n", rule_name); fflush(stdout); printf("%s\n", previous); fflush(stdout); index = uint8(previous) - uint8('A') + 1; binary = lut(index); result(1,:) = binary; for iter = 1:niter % iterations for i = 1:length(myseed)-3 % loop over cells key = previous(i:i+2); update(i+1) = getfield(rule, key); end % loop over cells printf("%s\n", update); fflush(stdout); previous = update; index = uint8(previous) - uint8('A') + 1; binary = lut(index); result(iter+1,:) = binary; end % iterations end % function simulate_ca
The Timing of Associative Arrays in Octave
The code below tests the timing of associative arrays in Octave. As expected if a hash table is used to implement an associative array, the lookup time is largely independent of the size of the associative array in Octave, which is good. As with many features in Octave and other mathematical scripting tools, the lookup time is quite slow.
For example, on a 3GHz Macintosh running OS X, the lookup time varied between 1 and 10 milliseconds. This is much slower than a compiled implementation of a hash table in the C programming language or another fast compiled language. Although languages such as Octave are slowly closing the speed of execution gap with compiled languages such as C, the compiled languages still win handily in some cases.
dictionary_time.m
% % measure timing of associative arrays in Octave % demonstrate lookup time is not slowed by number of keys (fields) % in the associative array/dictionary. Works like a hash table. % % Author: John F. McGowan, Ph.D. % E-Mail: jmcgowan11@earthlink.net % % clear small_dict; small_dict = struct('key', 'value'); % make sure a is defined clear big_dict; big_dict = struct('key', 'value'); % make sure a is defined offset = uint8('A') -1; n = 3; if(n > 26) % 26 characters in English alphabet n = 26; end [total, user, system] = cputime(); for i = 1:n for j = 1:n for k = 1:n key = char([i j k] + offset); value = ['value' key]; % printf("key: %s value: %s\n", key, value); small_dict = setfield(small_dict, key, value); end end end [total1, user1, system1] = cputime(); printf("CPU TIME: %f\n", total1 - total); printf("creating big dictionary\n"); fflush(stdout); % create big dictionary (associative array) n = 26; if(n > 26) % 26 characters in English alphabet n = 26; end [total, user, system] = cputime(); for i = 1:n printf("creating fields starting with %s\n", char(i + offset)); fflush(stdout); for j = 1:n for k = 1:n key = char([i j k] + offset); value = ['value' key]; % printf("key: %s value: %s\n", key, value); big_dict = setfield(big_dict, key, value); end end end [total1, user1, system1] = cputime(); printf("CPU TIME: %f\n", total1 - total); % measure access time [total, user, system] = cputime(); blatz = getfield(small_dict, 'AAA'); [total1, user1, system1] = cputime(); printf("%f seconds to retrieve %s for AAA from small dict\n", total1 - total, blatz); [total, user, system] = cputime(); blatz = getfield(big_dict, 'AAA'); [total1, user1, system1] = cputime(); printf("%f seconds to retrieve %s for AAA from big dict\n", total1 - total, blatz); disp('ALL DONE');
Conclusion
There are associative arrays, also known as dictionaries, maps, mappings, or hash tables, in Octave. This is poorly documented both in the official documentation and most online information about Octave. One can perform the same tasks and implement the same algorithms with the associative arrays (data structures or structs) in Octave that one can with the explicitly identified associative array data types in Perl, Python, Ruby, Java, and many other modern programming languages. Associative arrays are very useful for implementing certain kinds of mathematics in Octave such as, but not limited to, cellular automata.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Admin’s message: Looking for some great mathematical reads? Check out our list of must-read math books.
]]>This is the fourth article in an occasional series of articles about Octave, a free open-source numerical programming environment that is mostly compatible with MATLAB. This series began with the article Octave: An Alternative to the High Cost of MATLAB. This article discusses plotting and graphics in Octave. Octave has extensive plotting and graphics features including two-dimensional plots, histograms, three dimensional plots, and a range of specialized plots and charts such as pie charts. This article gives an overview of the key plotting and graphics features of Octave, discusses a few gotchas, and gives several illustrative examples.
Octave can plot functions and data using the built-in plot functio. To illustrate thiese features, this article uses data on the price of gasoline in the United States from the United States Energy Information Administration, part of the US Department of Energy. The data is taken from the Motor Gasoline Retail Prices, U.S. City Average report released March 29, 2011. This data is currently available in Adobe Acrobat PDF file format, a comma separated value (CSV) ASCII text format, and Microsoft Excel (XLS) spreadsheet format. The data from the CSV report was imported into Microsoft Excel, exported as an ASCII tab delimited text fiel and then cut and pasted into two tab-delimited ASCII files. The EIA gas price report contains several time series in a single file (leaded gas prices, unleaded gas prices, and several more). The leaded and unleaded gas prices were extracted manually in the Notepad++ text editor. These dataq files were named: leaded.txt and unleaded.txt.
unleaded = dlmread('unleaded.txt');
The plot of unleaded gasonline prices above was generated using the following Octave code:
unleaded = dlmread('unleaded.txt'); unleaded_rawtime = unleaded(:,2); unleaded_rawprice = unleaded(:,3); unleaded_good_index = find(unleaded_rawprice < 10); unleaded_time_x = unleaded_rawtime(unleaded_good_index); unleaded_price = unleaded_rawprice(unleaded_good_index); unleaded_time = fix(unleaded_time_x/100) + rem(unleaded_time_x,100)*(1./12.); plot(unleaded_time, unleaded_price); title('Unleaded Regular Gas Price'); xlabel('Year'); ylabel('US Dollars Per Gallon'); print('unleaded.jpg');
A few comments may be helpful. dlmread is an Octave function that reads ASCII data files. It is fairly flexible and can often automatically identify the separator used in ASCII data files such as the tab or a comma. If necessary, the user can explicitly specify the separator and other parameters of the data file. Nonetheless it is common to encounter data files with various quirks. For example, the EIA gas price report contains several time series in a single file. There are many months for which either data is not available or not reported; these are indicated by a value of 10000000 for the gas price. The code above uses the Octave find function to select the valid data. Further the year and month are combined in the format YYYYMM sso January 1970 would be “197001″, February 1970 is “197002″, and so forth. Used directly in the Octave plot function, this will produce a nonsense plot that is not useful.. Thus, the example code above uses Octave fix and mod functions to compute a time in years (month numbers are converted to fractions of a calendar year). Octave has numerous advanced functions such as find , fix, rem, and so forth that can be used to clean up and reformat data as needed.
The Octave function plot handles the actually plotting of the graph. The Octae plot function is a very versatile plotting function for two dimensional data such as time series. Both the Octave user manual and the build in help (help plot) provide detailed information on the use of plot.
Octave has built in support for histograms. A histogram is a way of displaying the frequency of occurrence of data or events. One might, for example, be interested in how often gas prices change by one percent, two percent, or ten percent in one month.
By default, the Octave hist function creates a histogram with ten bins, which is often not very useful. One can specify more bins easily.
Probability density functions are normalized ito unity(1.0). With Octave one can easily generate histograms normalized to 1.0.
Now the values in the histogram of percent changes in gas prices are normalized to 1.0. The histogram is an estimate of the probability density function for gas price changes. One might wonder whether this distribution is the Gaussian probability denisty, also known as the Normal or Bell Curve distribution. In fact, the histogram looks narrower than a typical Gaussian and also has some outliers, a long tail, which are not typical of a true Gaussian distribution. The next two figures show the Gaussian probability density function and a histogram of synthetic data generated with Gaussian statistics using the same mean and variance as the actual gas price data.
It is easy to see that the Gaussian, or Nornal or Bell Curve, distribution differs from the distribution of the gas price data. The actual data has a narrower, sharper peak and long tails. Every now and then, gas prices jump shaprly. This is a pattern seen in many financial and other kinds of assets . Many popular financial models such as the Black-Scholes option piricing model assume Gaussian or near-Gaussian distributions which generally understates the risks when a fniancial asset or commodity has long non-Gaussian tails as gasoline does.
The histograms and related plots above were generated using the following Octave code:
change = conv(unleaded_price, [-1 1]); len = length(unleaded_price); returns = change(2:len) ./ unleaded_price(2:len); returns = 100.0 * returns; % convert to percent plot(unleaded_time(2:len), returns, 'g;return;'); title('Returns on Gas'); xlabel('Year'); ylabel('Return (Percent)'); print('gas_returns.jpg'); printf('making plot 2\n'); fflush(stdout); figure(2) hist(returns); title('hist(returns)'); print('hist_returns.jpg'); xlabel('Return (Percent)'); ylabel('Counts'); printf('making plot 3\n'); fflush(stdout); figure(3) hist(returns, 100); title('hist(returns, 100) (100 BINS)'); xlabel('Return (Percent)'); ylabel('Counts'); print('hist100_returns.jpg'); printf('making plot 4\n'); fflush(stdout); figure(4) hist(returns, 100, 1.0); title('Normalized Returns'); xlabel('Return (Percent)'); ylabel('Normalized Counts'); print('hist_norm_returns.jpg'); printf('making plot 5\n'); fflush(stdout); figure(5) m = mean(returns); sigma = std(returns); x = (-100:100); plot(x, mygauss(x, m, sigma)); title('Gaussian with Same Mean/Standard Deviation'); xlabel('Return (Percent)'); ylabel('Probability'); print('gauss_model.jpg'); % let user know processing is done % in case figure does not pop in front k_returns = kurtosis(returns); check = sigma*randn(1, length(returns)) + m; k_check = kurtosis(check); m_check = mean(check); sigma_check = std(check); printf('making plot 6\n'); fflush(stdout); figure(6) hist(check, 100, 1.0); title('Histogram of Gaussian Model'); xlabel('Percent Chnage'); ylabel('Counts'); print('hist_gauss.jpg'); printf('ALL DONE'); fflush(stdout); beep()
and the mygauss function which implements the Gaussian (Normal/Bell Curve) probability density function:
function [result] = mygauss(x, mean, sigma) norm = 1.0/(sqrt(2*pi) * sigma); exponent = -1.0*((x - mean) ./ sigma).^2; result = norm * exp(exponent); end
Octave supports most common special plots such as stairs plots, bar charts, pie charts, and so forth.
This is a stairs plot of the gas price time series data for the first few years (notice the steps or stairs in the plot). THis is generated using the Octave stairs function
This is the so-called stem plot (Octave stem function):
This is a bar chart (Octave bar function). On Windows, the bar function seems to have trouble with large amounts of data unlike the other plotting functions.:
Horizontal Bar Chart
This is a horizontal bar chart (Octave barh function). On Windows, the bar function seems to have trouble with large amounts of data unlike the other plotting functions.
Full Source Code
The plots above were generated using the following Octave code:
% regular plot figure(1) plot(unleaded_time(1:36), unleaded_price(1:36)); title('plot(time, price) Regular Plot'); xlabel('Year'); ylabel('Price'); print('regular_plot.jpg'); % figure(2) stairs(unleaded_time(1:36), unleaded_price(1:36)); title('stairs(time, price) Stairs Plot'); xlabel('Year'); ylabel('Price'); print('stairs_plot.jpg'); % figure(3) stem(unleaded_time(1:36), unleaded_price(1:36)); title('stem(time, price) Stem Plot'); xlabel('Year'); ylabel('Price'); print('stem_plot.jpg'); % figure(4) bar(unleaded_time(1:10), unleaded_price(1:10)); title('bar(time, price) Bar Chart'); xlabel('Year'); ylabel('Price'); print('bar_plot.jpg'); figure(5) barh(unleaded_price(1:10)); title('barh(price) Horizontal Bar Chart'); xlabel('Year'); ylabel('Price'); print('barh_plot.jpg'); beep();
Octave can generate standard pie charts using the pie function:
This is the Octave code for the pie chart above (NOTE the use of the cellstr(‘country name’) syntax — this is needed);
% allegedly proven oil reserves pie chart %Venezuela 297 %Saudi Arabia 267 %Canada 179 %Iraq 143 %Iran 138 %Kuwait 104 reserves = [ 297, 267, 179, 143, 138, 104 ]; names = [ cellstr('Venezuela (297)'), cellstr('Saudi Arabia (267)'), cellstr('Canada (179)'), cellstr('Iraq (153)'), cellstr('Iran (138)'), cellstr('Kuwait (104)') ]; pie(reserves, names, [1 0 0 0 0 0]); title('Pie Chart of Six Larges Oil Nations'); xlabel('Billions of Barrels of Oil'); print('pie_reserves.jpg');
The third argument to pie (after names ) tells Octave to “explode” the first wedge (Venezuela).
Octave has functions to plot an display three dimensional data and functions:
The two 3D plots above were generated using the built-in peaksand sombrero test functions. It is possible to compute and display almost any 3D surface using the meshgrid function and 3D display functions such as plot3, surf, mesh, contour, and quiver.
For people coming from another type of programming such as the C family of langauges, the meshgrid concept and function is new and may take a little getting used to. A meshgrid is basically a very simple concept which is also very powerful. Octave represents almost everything as a “matrix” or multi-dimensional array. This is the source of much of the power of Octave. One can often avoid explicitly coding loops over the elements of the Octave matrix. This speeds development and reduces errors.
A meshgrid is a two (or higher) dimenaional array in which the elements of the array are the spatial location (x or y coordinate usually) of the associated element of a spatial grid (the mesh grid). Here is some Octave code which explicitly computes and plots the sombrero function:
ticks = [-10.0:0.5:10.0]; [x, y] = meshgrid(ticks, ticks); z = sin (sqrt (x.^2 + y.^2)) ./ (sqrt (x.^2 + y.^2)); surf(x,y,z); title('sombrero using meshgrid'); print('sombrero_meshgrid.jpg');
In this example, the Octave function meshgrid returns two arrays x and y which contin the x and y corrdinates respectively for the mesh grid elements. In this case, the x and y positions of the grid are specified by the one dimensional ticks array. The ticks run from -10 to 10.0 in steps of 0.5., a total of 41 ticks. The x array generated by meshgrid is a 41 by 51 array with the x coordinate of each element. The y array is a 41 by 41 element array with the y coordinate of each element.
The meshgrid enables one to express 3D surfaces or functions in Octave in a simple intuitive compact way:
z = sin (sqrt (x.^2 + y.^2)) ./ (sqrt (x.^2 + y.^2));
In this example, z is a two dimensional array (matrix) with the function value at the xand y coordinates specified by each grid point in the arrays xand y. Then, one can display the surface using display functions such as surf, mesh, and so forth. It usually takes some practice to get used to using meshgrid if one is not familiar with the concept.
Octave can create contour plots using the contourfunction.
Octave can create vector field plots using the quiver (as in quiver of arrows) function:
These 3D plots were generated with the following Octave code:
% 3d graphics % tests figure(1); sombrero(); print('sombrero.jpg'); pause(1); figure(2); peaks(); print('peaks.jpg'); % meshgrid figure(3); ticks = [-10.0:0.5:10.0]; [x, y] = meshgrid(ticks, ticks); z = sin (sqrt (x.^2 + y.^2)) ./ (sqrt (x.^2 + y.^2)); surf(x,y,z); title('sombrero using meshgrid'); print('sombrero_meshgrid.jpg'); % plot3 figure(4); ticks = [-10.0:0.5:10.0]; [x, y] = meshgrid(ticks, ticks); z = sin (sqrt (x.^2 + y.^2)) ./ (sqrt (x.^2 + y.^2)); plot3(x,y,z); title('sombrero using plot3'); print('sombrero_plot3.jpg'); % mesh figure(5); ticks = [-10.0:0.5:10.0]; [x, y] = meshgrid(ticks, ticks); z = sin (sqrt (x.^2 + y.^2)) ./ (sqrt (x.^2 + y.^2)); mesh(x,y,z); title('sombrero using mesh'); print('sombrero_mesh.jpg'); % contour figure(6); ticks = [-10.0:0.5:10.0]; [x, y] = meshgrid(ticks, ticks); z = sin (sqrt (x.^2 + y.^2)) ./ (sqrt (x.^2 + y.^2)); contour(x,y,z); title('sombrero using contour'); print('sombrero_contour.jpg'); % quiver figure(7); ticks = [-10.0:0.5:10.0]; [x, y] = meshgrid(ticks, ticks); z = sin (sqrt (x.^2 + y.^2)) ./ (sqrt (x.^2 + y.^2)); theta = atan2(y, x); quiver(x,y, z.*cos(theta), z.*sin(theta)); title('sombrero vector field using quiver'); print('sombrero_quiver.jpg'); % all done disp('ALL DONE!'); beep();
Octave has cryptic error messages. These messages almost always correctly identify the line of code that is in error. The verbal descripton of the error is often incoprehensible and may be wrong. The column number reported for the location of an error in the line is often wrong, for example indicating the start of the expression on the right hand side of an assignment statement where the problem is later in the line of code.
If a user cannot spot the error by reading the line of code, a common occurence, it is usually best to convert the line of code into several lines of code with each lnew line of code representing a sub-expression o f the original line of code. This approach will usually narrow the error/bug down to a specific symbol and identify the specific error.
Octave supports both true-matrix operations and element by element (aka element-wise) operations. For example, A*B is true-matrix multiplication if A and B are matrices. A.*B is element by element multiplication in which each element is multiplied by its corresponding element in the other matrix. It is easy to mistakenly use * where one should use .* or .* where one should use * in Octave. Pay close attention to th edistincition between the true matrix and element-by-element operators.
Octave has extensive built in plotting and graphics functions. There are a few weaknesses, notably some problems with the bar chart functions, at least in the Windows version of Octave 3.2.4. Users coming from a different type of programming background such as the C family of languages may need a little time and practice to adjust to the meshgrid concept. The plotting and graphics funsions of Octave are more than adequate for all common scientfic, engineeering, and general analytical tasks, both two and three dimensional.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Sponsor’s message: Check out Math Better Explained, an insightful ebook and screencast series that will help you see math in a new light and experience more of those awesome “aha!” moments when ideas suddenly click.
]]>There are strong practical reasons for applied mathematical research and development and programming projects. Many potentially beneficial projects exist. These projects often suffer from the “cure for cancer” problem. With several hundred thousand people each year in the United States alone succumbing to cancer, there is little question that there is a large market for a cure for cancer. The problem is that we do not know how to cure cancer. Similarly, successful mathematical research and development and programming projects offer everything from profitable investment advice to speech recognition for mobile devices and household appliances to working fusion reactors and other new energy sources. Indeed, a cure for cancer is something that mathematical methods may offer in the future through molecular modeling or other quantitative approaches. Given the huge potential markets for successful mathematical projects, it is common to encounter individuals, organizations, and companies with great interest in particular, usually practical mathematical projects. These projects are highly unlikely to succeed without accurate ideas about the scope of the projects.
The Scope of Some Successful Free Open-Source Mathematical Software Projects
Program | Lines of Code | Core Lines of Code | Calendar Duration | Number of Contributors |
FFMPEG 0.6.1 Video Encoder | 373,742 | 368,457 | at least 2004-2011 | 50 |
x264 h.264 Video Encoder (x264-snapshot-20110204-2245) |
67,986 | 62,968 | at least 2004-2011 | 18 |
Independent JPEG Group JPEG encoder/decoder v8c | 61,102 | 52,304 | at least 2000-2011 | 13 |
Open CV 2.2.0 Computer Vision Library | 884,808 | 396,399 | at least 1999-2011 | 80 |
Insight Toolkit 3.20.0 Image Segmentation and Registration Toolkit |
698,143 | 685,466 | at least 1999-2011 | at least 14 |
Pythia/Lund Monte Carlo 8.145 Particle Physics Event Simulation (C++ version) |
141,353 | 46,258 | 1977-2011 | 5 |
Pythia/Lund Monte Carlo 6.327 Particle Physics Event Simulation (last FORTRAN Version) |
60,455 | 60,455 | 1977-1996 | 5 |
EGS (Electron Gamma Shower) | 38,921 | 31,151 | 1950′s-2011 | unknown |
LAPACK 3.3.0 Linear Algebra Library | 459,993 | 458,645 | at least 1970′s to present | many contributors (probably over 100) |
AESCRYPT Encryption/Decryption Utility | 4,331 | 4,286 | 2001-2009 | at least 2 |
GNU Privacy Guard (GNUpg) v. 1.4.11 | 148,374 | 120,441 | at least 1998 to 2008 | 47 |
Octave 3.2.4 Numerical Programming Tool |
539,233 | 453,160 | at least 1980s to present | many contributors (probably over 100) |
Notes
The free, open-source CLOC (Count Lines of Code) utility was used to count the number of lines of code in each project. CLOC lists the number of lines of code in each programming language in the project such as C, C++, Bourne Shell, HTML, and so forth. CLOC does not count blank lines or comment lines. Some projects include sizable amounts of installation code (in the Unix Bourne Shell for example), HTML documentation, and so forth which is counted in the total number of lines of code reported by CLOC. The actual mathematical code is typically implemented in a few languages such as C, C++, FORTRAN, or MATLAB. The term “Core Lines of Code” refers to the lines of code in these languages, as reported by CLOC, which is presumed to contain the actual mathematical software.
In general, open source projects provide a wealth of detailed information that is difficult or impossible to acquire for many commercial proprietary projects. In particular, one can see the source code, count the lines of code or other measures of size and scope, and often read comments, change logs, logs of version control systems, and so forth. Nearly all open source projects give a list of contributors somewhere in the documentation and provide rough information on the calendar duration of the project. There is usually precise information on releases and release dates. Unfortunately, it is difficult to get a reasonably exact measure of the actual effort expended on the project. Most open source projects do not publish information on exact hours worked, dollars expended, even if records exist. Several of the examples were fully or partially funded either by government funding agencies (e.g. the National Library of Medicine for the Insight Toolkit) or private sources (e.g. Intel for OpenCV), so such detailed information may be available in some cases.
The Examples
The examples were chosen as successful free open-source projects widely used within their field or application with a quality comparable to or superior to good commercial software products. Several such as FFMPEG and x264 are highly applied and used in the everyday world. Several such as the Pythia/Lund Monte Carlo are primarily scientific research tools. Some such as Octave and LAPACK span both worlds.
FFMPEG is a widely used open source audio/video encoding utility and collection of libraries. FFMPEG can encode and decode a wide range of different audio and video formats and compression schemes including h.264. It incorporates a number of other utilities and libraries. x264 is a widely used open source h.264 video encoder. The Independent JPEG Group disributes a widely used open source JPEG image encoder and decoder. Open CV is a widely used computer vision library incorporating many of the current state of the art computer vision algorithms; it is used in research and in a few commercial products. The Insight Toolkit is a toolkit of image segmentation and registration algorithms, somewhat similar to Open CV in practice, geared towward medical imaging.
The Pythia/Lund Monte Carlo is a widely used program for simulating the formation of jets of subatomic particles and other processes in experimental and theoretical particle physics, for example at the Large Hadron Collider (LHC) at CERN. Two versions, the original FORTRAN version and the more recent rewrite in C++, are listed. Electron Gamma Shower or EGS is a widely used program for simulating the interactions of electrons and photons (gamma rays and x-rays) with matter. It was originally developed for nuclear and particle phyics at the Stanford Linear Accelerator Center (SLAC), but is now widely used for medical radiation studies. LAPACK is a widely used FORTRAN library of linear algebra and other basic numerical algorithms; it is often found in other programs as well. AESCRYPT is a free open-source implementation of the Advanced Encryption Standard (AES) for data encryption. GNU Privacy Guard (GNUpg) is a free, open-source implementation of the OpenPGP encryption standard. Octave is a free, open-source numerical programming tool that is mostly compatibly with MATLAB. Octave has been discussed in previous articles by this author starting with Octave: An Alternative to the High Cost of MATLAB.
Actual Effort Estimation with Basic COCOMO
The Constructive Cost Model (COCOMO) is a software cost estimation model developed by Barry Boehm. Basic COCOMO is the original, very simple cost estimation model published by Boehm in his 1981 book Software Engineering Economics. It gives a simple, crude estimate of the effort in man-months as a function of the number of lines of code in a project. The following table gives the estimated effort in man-months/man-years from applying the “organic” Basic COCOMO model to the number of lines of code in each mathematical open source project in this article:
Program | Basic COCOMO Man-Months | Basic COCOMO Man-Years |
FFMPEG 0.6.1 | 1,204 | 100 |
x264 | 201.5 | 16.75 |
IJG v8c | 179.8 | 15 |
Open CV 2.2.0 | 2,982 | 248.5 |
Insight Toolkit | 2,324 | 193.7 |
Pythia/Lund 8.145 | 443 | 36 |
Pythia/Lund 6.327 | 178 | 14.8 |
EGS | 112 | 9.3 |
LAPACK 3.3.0 | 1,637 | 136.4 |
AESCRYPT | 11 | 0.9 |
GNU Privacy Guard (GNUpg) 1.4.11 | 456 | 38 |
Octave 3.2.4 | 1,771 | 147.6 |
The following Octave/MATLAB function was used to compute the estimated man-months using the Basic COCOMO “organic” model:
function [man_months, dev_time, people_required] = cocomo(kloc, type) % [man_months, dev_time, people_required] = cocomo(kloc [, type]) % % kloc (thousands of lines of code) % type (type of project: organic, semi-detached, embedded) % if nargin < 2 type = 'organic' end c = 2.5; if strcmp(type, 'organic') a = 2.4; b = 1.05; d = 0.38; end if strcmp(type, 'semi') % semi detached a = 3.0; b = 1.12; d = 0.35; end if strcmp(type, 'embedded') a = 3.6; b = 1.2; d = 0.32; end man_months = a*(kloc)^b; dev_time = c*(man_months)^d; people_required = man_months / dev_time; end
Conclusion
While this data sample is clearly limited and a larger study is desirable, it should nonetheless be evident that successful mathematical programming projects are usually substantial. Even the smallest project on the list, the AESCRYPT encryption utility, probably took several man-months to fully develop; Basic COCOMO would estimate almost one year. Thus, expectations of a few weeks are generally unrealistic. Indeed, expectations of three calendar months, a fiscal quarter, the current fetish of American business, are usually unrealistic. On the other hand, expectations ranging from six months to several years may be realistic depending on the specific project.
In part because of heavy government funding of mathematical research and development, there are a large number of open-source, free mathematical programming projects available. This provides an excellent database of information on the size and scope of such projects, something often difficult to find for business applications where most products and projects are proprietary. Anyone considering such a mathematical project is well advised to examine comparable open source projects if they exist to determine the size and scope to the extent possible. Unfortunately, open source projects often can give only a rough measure of the actual effort (mythical man-months) used in the project. The Basic COCOMO model can provide a very rough way of estimating the actual effort of the open source project from the lines of code, but clearly a more direct way of measuring the actual effort is needed.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Sponsor’s message: Check out Math Better Explained, an insightful ebook and screencast series that will help you see math in a new light and experience more of those awesome “aha!” moments when ideas suddenly click.
]]>This is the third in a series of articles on Octave starting with Octave, An Alternative to the High Cost of MATLAB. Octave is a free, both free as in beer and free as in speech, MATLAB compatible numerical programming tool available under the GNU General Public License. In part because MATLAB has become the de facto industry standard for numerical programming, Octave is of particular interest to individuals, companies, and organizations engaged in numerical and mathematical programming and research and development.
Octave has some limitations. The base Octave tool has no symbolic manipulation features. It is not a computer algebra system (CAS) such as Mathematica or Maple. It cannot, for example, perform symbolic integration, symbolic differentiation, factor polynomials, and so forth. Octave does have the symbolic toolbox available through the Octave Forge repository of Octave toolboxes, but the symbolic toolbox is quite limited. A better option for symbolic manipulation tasks is to use the Maxima computer algebra system in combination with Octave. Octave also lacks the ability to generate TeX or LaTex mathematical output, the de facto standard of mathematical publication. Maxima can also generate TeX output for inclusion in papers or WordPress blog posts.
Maxima
Maxima is a computer algebra system descended from MACSYMA, one of the original computer algebra systems. MACSYMA was developed at MIT in part for use in theoretical physics. Maxima is available both as source code and pre-compiled binaries for all three major computer platforms: Unix/Linux, Microsoft Windows, and Mac OS. Maxima is free software, both free as in beer and free as in speech, available under the GNU General Public License (GPL). wxMaxima is a Graphical User Interface (GUI) for Maxima available as both source code and pre-compiled binaries for all three major computer platforms. wxMaxima has human readable menu items and buttons for many common symbolic manipulation and mathematical functions. wxMaxima also has “notebooks” similar to Mathematica notebooks. There is considerable documentation on Maxima; interested readers are referred to the excellent online and published documentation on Maxima. This article is focused on using Maxima as an adjunct to Octave.
Maxima can perform both symbolic differentiation and integration. Symbolic differentiation is illustrated in the screen shot of Maxima above. Some optimization algorithms, used, for example, for model fitting, require the derivative of the function being optimized; the function is usually being minimized. If the function is rather complex, deriving the derivatives of the function with respect to the parameters over which the optimization is performed by hand can be time consuming, tedious, and error prone. The author has used Maxima successfully to perform the differentiation of a model function. One can then convert the Maxima output, the derivative produced by Maxima’s symbolic differentiation, into an expression that can be used in Octave by using the fortran(expression) command in Maxima. The Maxima fortran command generates FORTRAN code for the Maxima expression. In many cases, the FORTRAN expressions are identical to Octave/MATLAB mathematical expressions. In some cases, the FORTRAN code generated by Maxima must be edited slightly to create valid Octave/MATLAB code.
For example, the Cauchy-Lorentz distribution is a commonly used mathematical model of a peak.
The Cauchy-Lorentz distribution is the frequency response of a forced-damped harmonic oscillator. It is widely used in physics, mathematics, and engineering under a number of different names.
In using the Cauchy Lorentz distribution to model a peak in data, one typically wants to determine the values of the parameters A, mu, and W representing the magnitude of the peak (A), the position of the peak (mu), and the width of the peak (W) that best fits the data. To do this, some model fitting algorithms need the derivatives of the Cauchy Lorentz distribution with respect to each parameter A, mu, and W.
This is the derivative of the Cauchy Lorentz distribution with respect to the width parameter W from symbolic differentiation in Maxima:
This derivative is moderately complex. Calculating this derivative by hand is time consuming and error prone. Imagine computing the derivative of an extremely complex mathematical model with hundreds of terms by hand. The probability of error even by a highly-skilled mathematician is very high. It was for this reason that tools like Maxima were developed.
This is the FORTAN code generated by applying the Maxima fortran(expression) function to the Maxima expression for the derivative of the Cauchy Lorentz with respect to the width W
2*(x-mu)**2*A/(((x-mu)**2/W**2+1)**2*W**3)
This is actually valid Octave/MATLAB code. If the variables x, mu, A, and W are scalar variables in Octave, this FORTRAN expression will evaluate correctly. Here is the calculation in Octave when x, mu, A, and W are all scalar variables with the value 1.0.
octave-3.2.4.exe:25> x = 1 x = 1 octave-3.2.4.exe:26> 2*(x-mu)**2*A/(((x-mu)**2/W**2+1)**2*W**3) ans = 0 octave-3.2.4.exe:27>
However, in Octave the variable x is often a vector. If x is a vector, the expression above will produce an error in Octave:
octave-3.2.4.exe:25> 2*(x-mu)**2*A/(((x-mu)**2/W**2+1)**2*W**3) error: for A^b, A must be square octave-3.2.4.exe:25>
The reason for this error is that in Octave and MATLAB some of the operators such as * and / are not by default interpreted as element by element operators when applied to vectors and matrices. For example, the operator * is matrix multiplication by default in Octave and MATLAB. An element by element operator is an operator that is applied separately to each element in each vector or matrix that is an operand. In Octave and MATLAB, the element by element operators are .*, ./, .+, .-, and so forth. For example, if one has two vectors a and b in Octave, the operator * will give an error:
octave-3.2.4.exe:34> a a = 1 2 3 4 5 octave-3.2.4.exe:35> b b = 1 2 3 4 5 octave-3.2.4.exe:36> a * b error: operator *: nonconformant arguments (op1 is 1x5, op2 is 1x5) octave-3.2.4.exe:36>
However, in Octave and MATLAB, one can multiply each element of each vector by the corresponding element of the other vector using the element by element operator .* thus:
octave-3.2.4.exe:36> a .* b ans = 1 4 9 16 25 octave-3.2.4.exe:37>
In the output of the element by element (or elementwise) operator .*, the first element is 1*1, the second element is 2*2, and so forth.
Thus, the FORTRAN expressions generated by Maxima are not valid Octave/MATLAB code for vectors and matrices, only for scalar variables. One can convert the FORTRAN expression to a valid Octave expression for vectors by converting the non-elementwise operators to element by element operators where the operands are vectors, usually by adding a preceding dot. Here is the edited code for the example derivative:
function [result] = mydiff_edited(x, A, mu, W) % function [result] = mydiff(x, A, mu, W) % FORTRAN code from wxMaxima edited to support an array x as input % result = 2*(x-mu).**2*A ./ (((x-mu).**2 ./ W.**2+1).**2*W**3) end
If x is defined as a vector in Octave such as:
x = [0.0:0.1:10.0]; % x values from 0.0 to 10.0 in steps of 0.1
and then this function is used to compute the value of the derivative of the Cauchy Lorentz distribution with respect to the width parameter W:
data = mydiff_edited(x, 1.0, 1.0, 1.0); plot(data);
Maxima can also generate valid TeX code for mathematical publication through its built in tex(expression) command. This command can also be invoked through a menu item in the wxMaxima GUI (shown below).
Some TeX generated by Maxima:
tex(1/(1+x^2));
generates the TeX code:
$${{1}\over{x^2+1}}$$
which displays in WordPress after removing the $$ tags which WordPress does not need as:
All of the mathematical formulas in this article are TeX generated by Maxima in this way.
Conclusion
Octave has extensive numerical analysis and programming features. Octave has the special advantage that it is mostly compatible with MATLAB, which is currently the de facto industry standard for numerical programming. Most Octave scripts will run under MATLAB and many MATLAB scripts will run under Octave with no changes. If a user or software developer has an occasional need for symbolic manipulation features such as symbolic integration and differentiation, one can use Maxima as an adjunct to Octave. Similarly, one can use Maxima to generate TeX code for mathematical publications. If one needs to perform extensive symbolic manipulation, one may need to use Maxima or similar tools as one’s primary tool.
Both Octave and Maxima have the advantage that they are free, both free as in beer and as in speech, and available as source code. There are many cases where a merger, change in corporate strategy, bankruptcy, or even the whim of an executive has resulted in a proprietary development platform being discarded or deprecated to the detriment of end users, developers, and other customers. A well known example is FoxPro, once one of the leading database programs, which Microsoft acquired and has now announced will be discontinued in favor of Microsoft’s other database products. In contrast, open source development tools such as Octave and Maxima can be kept alive and indeed improved by their end users, developers, and customers if needed.
© 2011 John F. McGowan
About the Author
John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.
Sponsor’s message: Check out Math Better Explained, an insightful ebook and screencast series that will help you see math in a new light and experience more of those awesome “aha!” moments when ideas suddenly click.
]]>