Wednesday, July 11, 2012

Performing Linear Regression in Perl


Linear regression is a very commonly used data analysis technique that can easily be performed in Perl using the module Statistics::Regression. This module is capable of performing multivariate linear regressions, but this code snippet will demonstrate a simple regression to the basic equation of a line y=mx +b, using the following data set:

X
Y
1
1.3
2
2.9
3
4.2
4
5.4
#!usr/bin/perl

use Statistics::Regression;
use strict;
use warnings;

my $reg=Statistics::Regression->new(
   "Title", ["Intercept", "Slope"]
   );
 
#the use of 1.0 in the includes allows
#for the computation of a y intercept     
$reg->include(1.3, [1.0, 1.0]);
$reg->include(2.9, [1.0, 2.0]);
$reg->include(4.2, [1.0, 3.0]);
$reg->include(5.4, [1.0, 4.0]);

$reg->print;

To understand how the module works, let’s consider the equation in a more multivariate form by rewriting it as y=bx1 + mx2.  Considering the equation in this form, will make the “include” statements that provide the data points to the module much easier to understand.  The include statements could be considered as using the following data entry format:

$reg->include(y, [x1, x2]);

Thus by always putting 1.0 in the x1 position it allows for the computation of a Y-intercept since it is telling the regression module that the x1 variable does not influence the outcome of bx1, whereas the x2 values correspond to the X values, since we want mx2 to change with a change in X.

A look at the output generated by the Perl module demonstrates the following results:

****************************************************************
Regression 'Title'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='Intercept']       0.0500          0.1775       0.28
[1='Slope']           1.3600          0.0648      20.99

R^2= 0.995, N= 4, K= 2
****************************************************************

The plot of the line that would result from the regression is as follows:

 

Kobo Wifi eReader

3 comments:

Joel Berger said...
This comment has been removed by the author.
Joel Berger said...

Always TIMTOWTDI, here is the PDL::Stats way of doing it: PDL::Stats linear regression

Aeldra Robinson said...

This is very much happy for providing the nice services in this blog and using the nice services in this blog.

Tools for code review