Subcribe

How to Build a Recommendation Engine


This article shows how to build a simple recommendation engine using GNU Octave, a high-level interpreted language, primarily intended for numerical computations, that is mostly compatible with MATLAB. A recommendation engine is a program that recommends items such as books and movies for customers, typically of a web site such as Amazon or Netflix, to purchase. Recommendation engines frequently use statistical and mathematical methods to estimate what items a customer would like to buy or would benefit from purchasing.

From a purely business point of view, one would like to maximize the profit from a customer, discounted for time (a dollar today is worth more than a dollar next year), over the duration that the customer is a customer of the business. In a long term relationship with a customer, this probably means that the customer needs to be happy with most purchases and most recommendations.

Recommendation engines are “hot” right now. There are many attempts to apply advanced statistics and mathematics to predict what customers will buy, what purchases will make customers happy and buy again, and what purchases deliver the most value to customers. Data scientists are trying to apply a range of methods with fancy technical names such as principal component analysis (PCA), neural networks, and support vector machines (SVM) — amongst others — to predicting successful purchases and personalizing recommendations for individual customers based on their stated preferences, purchasing history, demographics and other factors.

This article presents a simple recommendation engine using Pearson’s product moment correlation coefficient, also known as the linear correlation coefficient. The engine uses the correlation coefficient to identify customers with similar purchasing patterns, and presumably tastes, and recommends items purchased by one customer to the other similar customer who has not purchased those items.

Quick Installation Instructions

This article contains sample code for a simple recommendation engine written in GNU Octave. There are four files: simulate_purchases.m, recommend_purchases.m, csvreadfix.m, and randi.m. The randi.m files implements the randi integer random generation function for earlier versions of GNU Octave.

Download and install GNU Octave 3.6.2 if possible. If you have GNU Octave 3.6 or later, you will not need the randi.m file. The simulation and recommendation software runs much faster under GNU Octave 3.6.2 than GNU Octave 3.2.4 on the author’s Windows 7 laptop.

Download the three files simulate_purchases.m, recommend_purchases.m, and csvreadfix.m to a working directory (folder). It is probably prudent to use a separate directory/folder for running the engine. Download randi.m if needed.

Launch GNU Octave (3.6 if possible). Change current directory to your working directory with the downloaded files.

To run simulation and recommendation with default settings, simply enter:

octave-prompt> simulate_purchases;
...
octave-prompt> recommend_purchases;

The figure below shows the working directory after running the simulation which creates the purchase data files SIM_xxx which are in comma separated values (CSV) format.

Working Directory with Recommendation Engine Files

Working Directory with Recommendation Engine Files

Simulating Purchases

For illustration and development purposes, a simulator was developed that generates simulated purchasing data. The recommendation is for a hypothetical on-line video service that sells one-time views of movies and other video for a small amount, ninety-nine cents for example. This simple recommendation engine does not need to know the price of the items. The online video service has four categories of movies: science-fiction, romantic comedy, action/adventure, and horror. A real online video service would hopefully have a larger selection of categories. The recommendation engine relies on the purchasing history and does not need to know the movie categories.

The simulator and recommendation engine (see below) are GNU Octave functions. These were developed and tested on an HP laptop running Windows 7 using GNU Octave version 3.6.2. Note that the simulation engine uses the randi function which generates random integers uniformly distributed in the range from 1 to N. randi is a new function in Octave not found in earlier versions of Octave. An implementation of a randi function for earlier versions of Octave can be found in the appendix. randi emulates a random number generation function in MATLAB of the same name.

The simulator returns the simulated purchasing data in Octave “matrices” and saves the data to text files — comma separated values or CSV files with a file extension of CSV. The simulated data includes a list of purchases — customer CUSTOMER_ID bought movie MOVIE_ID, a list of simulated customer names, a list of simulated movie names, and the movie category or class — movie MOVIE_ID class CLASS_ID (science fiction, romantic comedy, action, or horror).

simulate_purchases.m

function [purchase_record, cust_names, cust_prefs, movie_name, movie_class] = simulate_purchases(ncust, npurchases, stem, debug)
%  [purchase_record, cust_names, cust_prefs, movie_name, movie_class] = simulate_purchases(ncust, npurchases, stem, debug)
%
%  Inputs:
%     ncust -- number of customers (default 100)
%     npurchases -- number of purchases to simulate (default 100000)
%     stem  --  stem for file names (default 'SIM' for simulated data)
%     debug -- debug trace flag (default=false)
%
%  Outputs:
%     purchase_record -- record of purchases (customer id, movie id)
%     cust_names -- names of customers
%     cust_prefs -- movie class (scifi, romcom, action, horror) preferences of customers, 1-5 rating
%     movie_names -- names of movies
%     movie_class -- movie class (scifi=1, romcom=2, action=3, horror=4)
%
% (C) 2012 by John F. McGowan, Ph.D.
%

%  simulate_purchases.m (A Simulator for a Recommendation Engine for an On Line Movie Service)
%
%  Simulates a series of on-line purchases of movies by a group of
%  customers.  It then infers the preferences of the simulated
%  customers and recommends movies they may like.
%
% This is an Octave script.

% GNU Octave is a high-level interpreted language, primarily intended
% for numerical computations. It provides capabilities for the
% numerical solution of linear and nonlinear problems, and for
% performing other numerical experiments. It also provides extensive
% graphics capabilities for data visualization and
% manipulation. Octave is normally used through its interactive
% command line interface, but it can also be used to write
% non-interactive programs. The Octave language is quite similar to
% Matlab so that most programs are easily portable.
%
% URL: http://www.gnu.org/software/octave/
%

% NOTE: to install io and xlswrite in Octave 3.6.2 on Windows 7 PC
%
%  download and install the io package windows installer from SourceForge
%  Octave> pkg load io
%  Octave> savepath  (defaults to .octaverc
%

% set seed for random number generation
% this way can generate the same simulated data each time if needed

% VALIDATE THE ARGUMENTS TO THE FUNCTION

if nargin < 1
	ncust = 100; % start with 100 customers
end

if ischar(ncust)
	error('FIRST ARGUMENT IS STRING -- SHOULD BE NUMBER OF CUSTOMERS');
	return;
end

if ~isnumeric(ncust)
	error('FIRST ARGUMENT IS NOT A NUMBER -- SHOULD BE NUMBER OF CUSTOMERS');
	return;
end

if nargin < 2
	npurchases = 100000;
end

if ischar(npurchases)
	error('SECOND ARGUMENT IS A STRING -- SHOULD BE NUMBER OF PURCHASES');
	return
end

if ~isnumeric(npurchases)
	error('FIRST ARGUMENT IS NOT A NUMBER -- SHOULD BE NUMBER OF PURCHASES');
	return;
end

if nargin < 3
	stem = 'SIM';
end

if ~ischar(stem)
	error('STEM IS NOT A STRING');
	return;
end

if nargin < 4
	debug = false;
end

if ~islogical(debug)
	error('ARGUMENT FOUR (DEBUG TRACE FLAG) IS NOT LOGICAL TYPE');
	return;
end

	

disp('SIMULATING CUSTOMER PURCHASES');
fflush(stdout);

rand('seed', 1234567);

myseed = rand('seed');

% simulate customers


nclasses = 4;
class = { 'scifi', 'romcom', 'action', 'horror'};

cust_prefs = randi(5, [ncust nclasses]);  % customer preferences by
                                          % type of movie

cust_cdf = cumsum(cust_prefs, 2);   % cumulative sum

first_names = { 'John', 'James', 'Jane', 'Jennifer', 'Scott', 'Steve', ...
                'Sally', 'Serena', 'Tom', 'Tiffany', 'Adrian', ...
                'Ariel', 'Bob', 'Betty', 'Charles', 'Christie', ...
                'Derek', 'Dana', 'Edward', 'Ellen', 'Frank', 'Fay', ...
                'Greg', 'Gayle', 'Henry', 'Helen', 'Izzy', 'Iris', ...
                'Jack', 'Janet', 'Kyle', 'Kay', 'Lincoln', 'Letitia', ...
                'Mark', 'May', 'Neville', 'Nell', 'Oscar', 'Ophelia', ...
                'Peter', 'Petrana', 'Quimby', 'Quelle', 'Roger', ...
                'Rozanne', 'Selden', 'Samantha', 'Terrence', 'Tania', ...
                'Ulrich', 'Uma', 'Victor', 'Violet', 'Warner', ...
                'Wilma', 'Xipe', 'Xyla', 'Yorick', 'Yvette', 'Zeb', 'Zoe' };

last_names = { 'Smith', 'Jones', 'Russell', 'Walker', 'Edwards', ...
               'Allison', 'Barker', 'Coates', 'Denning', 'Essex', ...
               'Foley', 'Granger', 'Hart', 'Ivington', 'Johns', ...
               'King', 'Lawson', 'McGowan', 'Newhall', 'O''Henry', ...
               'Petersen', 'Quinne', 'Rogers', 'Stevens', 'Thomas', ...
               'Unitis', 'Volcker', 'Watts', 'Xibalba', 'Yates', ...
               'Zoller'};

cust_names = {};
for i=1:ncust
    name = [first_names{randi(length(first_names))} ' ' ...
            last_names{randi(length(last_names))}];
    cust_names{i} = name;
end

% generate a library of movies for the customers to purchase

nmovies = nclasses*100;

movie_class = randi(nclasses, [nmovies 1]);  % class of movie

sf_movies = find(movie_class == 1);
rm_movies = find(movie_class == 2);
a_movies = find(movie_class == 3);
h_movies = find(movie_class == 4);

num_sf = length(sf_movies);
num_rm = length(rm_movies);
num_a = length(a_movies);
num_h = length(h_movies);

n = [num_sf num_rm num_a num_h];
maxn = max(n);

idxmat = zeros(4, maxn);
idxmat(1,1:num_sf) = sf_movies;
idxmat(2,1:num_rm) = rm_movies;
idxmat(3,1:num_a) = a_movies;
idxmat(4,1:num_h) = h_movies;

% Make Titles for Movies

% Science Fiction Movie Titles

sf_start = { 'Zardoz', 'Star', 'Space', 'Mutant', 'Atomic', 'Computer', ...
             'Nerd', 'Time', 'Galaxy' };
sf_end = { 'Wars', 'Traveller', 'Genesis', 'Quest', 'Escape', 'Visit', ...
           'Explorer' };

% Romantic Comedy (rom-com) Movie Titles

rm_start = { 'Harry', 'Tom', 'Peter', 'Gregory', 'Alastair', 'Dirk', ...
             'Drake', 'Scott', 'Harrison', 'Berthold' };

rm_middle = { 'Meets', 'Loves', 'Hates', 'and', 'versus' };

rm_end = { 'Julia', 'Sally', 'Aurora', 'Meg', 'Michelle', 'Summer', ...
           'Cameron', 'Jennifer', 'Jenny', 'Heather', 'Wendy', ...
           'Rachel', 'Natalie', 'Portia'};

% Action Adventure Movie Titles

a_start = { 'Ninja', 'Barbarian', 'Warrior', 'Killer', 'Sword', ...
            'Kung Fu', 'Knight' };

a_end = { 'Battle', 'War', 'Raid', 'Apocalypse', 'Armageddon', ...
          'Rescue', 'Assassin' };

% Horror Movie Titles

h_start = { 'The', '', 'Halloween', 'Terror', 'Horror', 'Nightmare', ...
          'Satan''s', 'Devil''s'};
h_end = { 'Dog', 'Cat', 'Car', 'Vampire', 'Werewolf', 'Wolf', 'Mummy', ...
          'Mutant', 'Demon', 'Apocalypse', 'Zombie', 'Walking Dead', ...
          'Armageddon', 'Omen', 'Mother-In-Law', 'Stepfather', ...
          'Witch', 'Warlock', 'Ghoul', 'Ghost', 'Banker' };

movie_name = {};
for i = 1:length(movie_class)
    class = movie_class(i);
    if class == 1
        name = [sf_start{randi(length(sf_start))} ' ' ...
                      sf_end{randi(length(sf_end))}];
    elseif class == 2
        name = [rm_start{randi(length(rm_start))} ' ' ...
                      rm_middle{randi(length(rm_middle))} ' ' ...
                      rm_end{randi(length(rm_end))}];
    elseif class == 3
        name = [a_start{randi(length(a_start))} ' ' ...
                      a_end{randi(length(a_end))}];
    elseif class == 4
        name = [h_start{randi(length(h_start))} ' ' ...
                      h_end{randi(length(h_end))}];
    else
        error('ERROR:INVALID CLASS');
    end
    movie_name{i} = name;
end

% simulate purchases

if nargin < 2
	npurchases = 100000;
end

movies_purchased = [];
movies_purchased_class = [];
purchasers = [];

customer_matrix = zeros(ncust, nmovies);

mark = time();
for purchase = 1:npurchases
	% progress message
	newtime = time();
	delta = newtime - mark;
	if delta > 2
		printf("processsing purchase %d/%d\n", purchase, npurchases);
		fflush(stdout);
		mark = time();
	end
	
    % choose a customer
    purchaser = randi(ncust);
    purchasers = [purchasers purchaser];
    % choose movie class for purchase
    score = cust_cdf(purchaser,end)*rand(1, 1);
    idx = find(cust_cdf(purchaser,:) > score);
    class = idx(1);
    movies_purchased_class = [movies_purchased_class class];

    index = randi(n(class));
    movie_purchased = idxmat(class, index);
    movies_purchased = [movies_purchased movie_purchased];
	
	customer_matrix(purchaser, movie_purchased)++;
end % loop over purchases

% Purchase History Simulated
disp('Writing Movie Names to Files');
fflush(stdout);

%xlswrite('movie_purchases.xls', 'movie_name', 'Name_Sheet');

purchase_record = [purchasers' movies_purchased'];

if debug
	disp(sprintf('stem set to %s\n', stem));
	fflush(stdout);
end


csvwrite([stem '_movie_class.csv'], movie_class);  % movie class
csvwrite([stem '_customer_preferences.csv'], cust_prefs);  % customer prefernces for movie classes (rating 1-5 for each movie class)
csvwrite([stem '_movie_name.csv'], char(movie_name));  % names of the movies
csvwrite([stem '_customer_name.csv'], char(cust_names));  % names of the customers
csvwrite([stem '_purchases.csv'], purchase_record);  %  (customer id, movie id)
csvwrite([stem '_seed.csv'], myseed);  % save the random number generator seed used to generate this purchasing data

disp('ALL DONE');

end % function simulate_purchases

Recommending Purchases

Once there is enough simulated or real purchasing data, the recommendation engine uses Pearson’s product moment correlation coefficient to identify customers with similar buying patterns. For example, John Smith and Phil Jones might purchase only science fiction movies. In this case, their purchases would be highly correlated. The engine recommends movies that Phil Jones has purchased but not John Smith to John Smith, and vice versa.

The recommendation engine works by building a customer matrix with the customer on one axis and the movie on the other axis. The matrix entries are the number of times the customer has bought a particular movie. Customers with similar or identical tastes such as John Smith and Phil Jones are likely to buy overlapping sets of movies. The engine computes the correlation coefficient for each pair of customers from the customer matrix. If the correlation coefficient is greater than 0.5 and has a p-value, a measure of statistical significance, less than 0.05 (statistician Ronald Fisher’s famous arbitrary cutoff), then the engine makes a recommendation.

The p-value is theoretically the probability that the correlation is due to chance alone; there is no true correlation in the results. A p-value less than 0.05 means there is a less than five percent probablity that the correlation is due to chance alone. The p-value is really more complicated than this simple explanation and the problems with using the p-value are discussed in more detail in the previous article How to Hang Yourself with Statistics.

A recommendation engine for movies is not a life or death application like building a bridge or a nuclear power plant. We would almost never except a five percent probability of error in a life or death application. But in an application like recommending movies, some errors are probably acceptable. Amazon and Netflix make many unsuccessful recommendations but customers keep coming back and buying. This is an application where it is probably acceptable to use the p-value despite its known limitations.

NOTE: The colon : is used heavily in Octave (and MATLAB). WordPress displays the sequence colon right-parenthesis as smiley face :). The reader will see a few smiley-faces displayed in the source code below. Simply select, copy, and paste the code into a text or code editor. This will correctly paste the colon right-parenthesis sequence into the text or code editor — not a smiley face.

recommend_purchases.m

function [purchase_recommendations] = recommend_purchases(stem, debug)
% [purchase_recommendations] = recommend_purchases([stem, debug])
%
% Inputs:
%   stem -- stem for file names of files with simulate or actual purchasing data (default='SIM')
%   debug -- debug trace flag (default=FALSE)
%
% Outputs:
%   purchase_recommendations -- recommended purchases for customers with enough data to make prediction
%      format of each row is [customer id  movieid1 movieid2 ....]
%
% (C) 2012 by John F. McGowan, Ph.D.
%

% This is a GNU Octave script.

% GNU Octave is a high-level interpreted language, primarily intended
% for numerical computations. It provides capabilities for the
% numerical solution of linear and nonlinear problems, and for
% performing other numerical experiments. It also provides extensive
% graphics capabilities for data visualization and
% manipulation. Octave is normally used through its interactive
% command line interface, but it can also be used to write
% non-interactive programs. The Octave language is quite similar to
% Matlab so that most programs are easily portable.
%
% URL: http://www.gnu.org/software/octave/
%
%
%

if nargin < 1
	stem = 'SIM';  % default value
end

% validate argument

if ~ischar(stem)
	error('STEM IS NOT A STRING --- SHOULD BE A STRING -- IDENTIFIER XXX in XXX_NNN.CSV FOR FILES WITH PURCHASING DATA');
	return;
end


if nargin < 2
	debug = false;  %debug trace flag
end

if ~islogical(debug)
	error('DEBUG FLAG IS NOT A LOGICAL -- SHOULD BE TRUE OR FALSE');
	return;
end

cust_names = csvreadfix([stem '_customer_name.csv']);

if isempty(cust_names)
	error('Unable to open data file\n');
	return;
end

movie_name = csvreadfix([stem '_movie_name.csv']);

if isempty(movie_name)
	error('Unable to open data file\n');
	return;
end

% numeric data so use regular csvread function call
cust_prefs = csvread([stem '_customer_preferences.csv']);


[ncust blatz] = size(cust_names);
[nmovies blatz] = size(movie_name);

fname = [stem '_purchases.csv'];

purchase_record = csvread(fname);
[npurchases blatz] = size(purchase_record);

purchasers = purchase_record(:,1);  % column one is the purchaser/customer id
movies_purchased = purchase_record(:,2);  % colume two is the movie id

if debug
	disp(sprintf('number of customers: %d\n', ncust) );
	disp(sprintf('number of movies: %d\n', nmovies) );
	fflush(stdout);
end

customer_matrix = zeros(ncust, nmovies);

% populate the customer matrix
for i = 1:npurchases
	customer = purchasers(i);
	movieid = movies_purchased(i);
	customer_matrix(customer, movieid)++;
end

% analyze purchases now and recommend new purchases for customers

%[u s v] = svd(customer_matrix);

% correlations bewteen customers

version_string = version();
version_cell = strsplit(version_string, '.');
version_value = char(version_cell)';
version_number = str2num(version_value);

if version_number >= 360
	mycor = corr(customer_matrix');  % columns are customers
else
	% corrcoef is deprecated
	mycor = corrcoef(customer_matrix');
end

mask = eye(ncust);
mask = ~mask;

mycor = mask .* mycor;

similars = zeros(ncust, 3);

printf('finding most similar customers by correlation\n');
fflush(stdout);

mark = time();

for customer = 1:ncust
	
	% progress message
	newtime = time();
	if ((newtime - mark) > 1)
		printf('processing customer %d / %d\n', customer, ncust);
		fflush(stdout);
		mark = time();
	end
	
	% find a different customer with the most similar buying patterns to the current customer
	%
	mx = max(mycor(customer, :));  % maximum correlation other than self
	customer_index = find(mycor(customer, :) == mx);
	%
	% compute p-values
	test_result = cor_test(customer_matrix(customer, :), customer_matrix(customer_index, :));
	similars(customer, 1) = customer_index;  % the id of the other customer
	similars(customer, 2) = mx;   % maximum correlation coefficient
	similars(customer, 3) = test_result.pval;  % p-value statistical significance for correlation coefficient
	
	if debug
		printf('%d: %d %f %f\n', customer, customer_index, mx, test_result.pval);
		fflush(stdout);
	end
	
end % loop

printf('done with computing correlations between customers\n');
fflush(stdout);

% ids of customers with very similar tastes
%
% default to Fisher's 0.05 p-value cutoff
%
customer_similar = find(similars(:,2) > 0.5 & similars(:,3) < 0.05);
matches = similars(customer_similar, 1);
 
 
% check customer preferences from the purchasing simulation
cust_prefs(customer_similar,:)
cust_prefs(matches,:)

if debug
	disp('PAUSING');
	fflush(stdout);

	pause(10);
end

% make some recommendations
purchase_recommendations = zeros(length(customer_similar), nmovies+1);

for customer_idx = 1:length(customer_similar)

	printf('processing item %d\n', customer_idx);
	fflush(stdout);
	customer = customer_similar(customer_idx);
	printf('processing customer %d (%s)\n', customer, cust_names(customer,:) );
	fflush(stdout);

	customer_bought = customer_matrix(customer,:);
	match_purchases = customer_matrix(matches(customer_idx), :);

	recommendations = find( customer_bought == 0 & match_purchases > 0);
	printf('suggested movies for customer %d %s\n', customer, cust_names(customer,:) );
	fflush(stdout);
	recommendations;
	
	purchase_recommendations(customer_idx, 1) = customer;
	purchase_recommendations(customer_idx, 2:length(recommendations)+1) = recommendations;
	
	for k=1:length(recommendations)
		printf('%s\n', movie_name(recommendations(k),:) );
		fflush(stdout);
	end % loop over recommendations

end % for loop

disp('ALL DONE');



end % function recommend_purchases

The recommendation function calls csvreadfix to read some of the data from the data files generated by the simulator.

csvreadfix.m

function [strings] = csvreadfix(fname, debug)
%  [strings] = csvreadfix(fname, debug)
%
%  Inputs:
%     fname -- file name of file to read
%     debug -- debug trace flag (default=FALSE)
%
%   read text lines from CSV file written using GNU Octave csvwrite
%
%  (C) 2012 by John F. McGowan, Ph.D.
%

if ~ischar(fname)
	error('FIRST ARGUMENT IS NOT A STRING -- SHOULD BE NAME OF A CSV DATA FILE (E.G. MYDATA.CSV) ');
	return;
end

if nargin < 2
	debug = false;
end

if ~islogical(debug)
	error('SECOND ARGUMENT IS NOT A LOGICAL -- SHOULD BE DEBUG TRACE FLAG (TRUE OR FALSE)');
	return;
end

fid = fopen(fname, 'r');

if fid == -1
	strings = [];
	str = sprintf('Unable to open %s\n', fname);
	error(str);
	return;
end


line = fgets(fid);

strings = [];
while (line ~= -1)

	elements = strsplit(line, ',');
	string = char(elements)';
	string = strtrim(string);  % remove leading and trailing white space
	[nrows ncols] = size(string);
	if nrows > 1
		string = string(1,:);
	end
	
	if debug
		disp('debug');
		disp('size(string)');
		disp(size(string));
		disp(string);
		fflush(stdout);
		pause(2);
	end
	
	strings = [strings; string];
	if debug
		disp(strings);
		fflush(stdout);
	end
	
	line = fgets(fid);
end % while loop

fclose(fid);

end % function

Conclusion

This article shows how to build a simple recommendation engine based on Pearson’s product moment correlation coefficient using GNU Octave. It uses correlation coefficients to identify customers with similar buying patterns and presumably tastes in movies. There are many additional ways to apply correlations to purchasing data to make recommendations.

There are many more advanced statistical and mathematical methods that can be applied to recommendation engines. These include principal components analysis (PCA), clustering algorithms, neural networks, support vector machines (SVM), and many other techniques from artificial intelligence, signal processing, and mathematical modeling. Many of these more advanced methods are related to one another and, in some cases, are actually the same method with a different name or slight cosmetic differences.

© 2012 John F. McGowan

About the Author

John F. McGowan, Ph.D. solves problems using mathematics and mathematical software, including developing video compression and speech recognition technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net.

Appendix I: RANDI FUNCTION

The randi function is new in recent versions of GNU Octave. It is not present in earlier versions such as Octave 3.2. Below is the source code for a GNU Octave randi function for earlier versions of Octave. It is probably better to upgrade to version 3.6 of GNU Octave if possible. randi emulates a random number generation function in MATLAB.

function [result] = randi(imax, n, m, debug)
% [result] = randi(image [, n, m, debug])
%
% (C) 2012 by John F. McGowan, Ph.D.

if ~isnumeric(imax)
	error('FIRST ARGUMENT IS NOT A NUMBER -- SHOULD BE IMAX WHERE RANDI RETURNS INTEGER FROM 1 to IMAX');
	return;
end

if nargin < 4
	debug = false;
end

if debug
	disp(sprintf('randi called with arguments %d\n', imax));
	fflush(stdout);
end


if nargin == 1
	result = floor(imax .* rand(1,1)) + 1;
    return;
end

if nargin == 2
	if debug
		disp('n');
		disp(n);
	end
	
	[nrows ncols] = size(n);
	if debug
		disp('size(n)');
		disp(size(n));
		fflush(stdout);
	end

	if debug
		disp(sprintf('%d\n', n));
		fflush(stdout);
	end
	
	result = floor(imax .* rand(n)) + 1;
	return; 
end

if nargin == 3
	if debug
		disp(sprintf('%d %d\n', n, m));
	end
	
	result = floor(imax .* rand(n,m)) + 1;
	return;
end

end % function randi

If you enjoyed this post, then make sure you subscribe
to our Newsletter and/or RSS Feed.


3 Responses to “How to Build a Recommendation Engine”

  1. […] How to Build a Recommendation Engine by John F. McGowan. […]

  2. Abhishek says:

    The algo takes similarity corelation but I think past purchases should also be considered for same gener. And is there anything we can do with cookies?

    Also, for that smiley poblem in code, you can use ascii character to make bracket or colon and wp wont display it.

  3. Peter Acworth says:

    From my reading, the past purchases are very much taken into account, as that is what provides the correlation coefficient to identify other ‘similar’ customers.

    I.e. if one person’s purchase history focuses on a given ‘genre’, so will that of ‘similar’ customers. Assuming those similar customers also bought other items within the same genre, the engine would thus tend to recommend items within the genre the person’s purchase history indicates.

    However, in the absence of rich purchase histories of a large number of customers, the engine would fail, so one would have to resort to recommending based on some intuitive grouping (‘genre’) as you say, presuambly.

Leave a Reply