Linear regression is a very commonly used data analysis
technique that can easily be performed in Perl using the module
Statistics::Regression. This module is capable of performing multivariate
linear regressions, but this code snippet will demonstrate a simple regression
to the basic equation of a line y=mx +b, using the following data set:
X

Y

1

1.3

2

2.9

3

4.2

4

5.4

#!usr/bin/perl
use Statistics::Regression;
use strict;
use warnings;
my $reg=Statistics::Regression>new(
"Title", ["Intercept", "Slope"]
);
#the use of 1.0 in the includes allows
#for the computation of a y intercept
$reg>include(1.3, [1.0, 1.0]);
$reg>include(2.9, [1.0, 2.0]);
$reg>include(4.2, [1.0, 3.0]);
$reg>include(5.4, [1.0, 4.0]);
$reg>print;
use Statistics::Regression;
use strict;
use warnings;
my $reg=Statistics::Regression>new(
"Title", ["Intercept", "Slope"]
);
#the use of 1.0 in the includes allows
#for the computation of a y intercept
$reg>include(1.3, [1.0, 1.0]);
$reg>include(2.9, [1.0, 2.0]);
$reg>include(4.2, [1.0, 3.0]);
$reg>include(5.4, [1.0, 4.0]);
$reg>print;
To understand how the module works, let’s consider the
equation in a more multivariate form by rewriting it as y=bx_{1} + mx_{2}. Considering the equation in this form, will
make the “include” statements that provide the data points to the module much
easier to understand. The include
statements could be considered as using the following data entry format:
$reg>include(y, [x_{1}, x_{2}]);
Thus by always putting 1.0 in the x_{1} position it
allows for the computation of a Yintercept since it is telling the regression
module that the x_{1} variable does not influence the outcome of bx_{1},
whereas the x_{2} values correspond to the X values, since we want mx_{2}
to change with a change in X.
A look at the output generated by the Perl module
demonstrates the following results:
****************************************************************
Regression 'Title'
****************************************************************
Name
Theta StdErr Tstat
[0='Intercept']
0.0500 0.1775 0.28
[1='Slope']
1.3600 0.0648 20.99
R^2= 0.995, N= 4, K= 2
****************************************************************
The plot of the line that would result from the regression
is as follows:
3 comments:
Always TIMTOWTDI, here is the PDL::Stats way of doing it: PDL::Stats linear regression
This is very much happy for providing the nice services in this blog and using the nice services in this blog.
Tools for code review
Post a Comment