Linear regression is a very commonly used data analysis
technique that can easily be performed in Perl using the module
Statistics::Regression. This module is capable of performing multivariate
linear regressions, but this code snippet will demonstrate a simple regression
to the basic equation of a line y=mx +b, using the following data set:
X
|
Y
|
1
|
1.3
|
2
|
2.9
|
3
|
4.2
|
4
|
5.4
|
#!usr/bin/perl
use Statistics::Regression;
use strict;
use warnings;
my $reg=Statistics::Regression->new(
"Title", ["Intercept", "Slope"]
);
#the use of 1.0 in the includes allows
#for the computation of a y intercept
$reg->include(1.3, [1.0, 1.0]);
$reg->include(2.9, [1.0, 2.0]);
$reg->include(4.2, [1.0, 3.0]);
$reg->include(5.4, [1.0, 4.0]);
$reg->print;
use Statistics::Regression;
use strict;
use warnings;
my $reg=Statistics::Regression->new(
"Title", ["Intercept", "Slope"]
);
#the use of 1.0 in the includes allows
#for the computation of a y intercept
$reg->include(1.3, [1.0, 1.0]);
$reg->include(2.9, [1.0, 2.0]);
$reg->include(4.2, [1.0, 3.0]);
$reg->include(5.4, [1.0, 4.0]);
$reg->print;
To understand how the module works, let’s consider the
equation in a more multivariate form by rewriting it as y=bx1 + mx2. Considering the equation in this form, will
make the “include” statements that provide the data points to the module much
easier to understand. The include
statements could be considered as using the following data entry format:
$reg->include(y, [x1, x2]);
Thus by always putting 1.0 in the x1 position it
allows for the computation of a Y-intercept since it is telling the regression
module that the x1 variable does not influence the outcome of bx1,
whereas the x2 values correspond to the X values, since we want mx2
to change with a change in X.
A look at the output generated by the Perl module
demonstrates the following results:
****************************************************************
Regression 'Title'
****************************************************************
Name
Theta StdErr T-stat
[0='Intercept']
0.0500 0.1775 0.28
[1='Slope']
1.3600 0.0648 20.99
R^2= 0.995, N= 4, K= 2
****************************************************************
The plot of the line that would result from the regression
is as follows:
3 comments:
Always TIMTOWTDI, here is the PDL::Stats way of doing it: PDL::Stats linear regression
This is very much happy for providing the nice services in this blog and using the nice services in this blog.
Tools for code review
Post a Comment