Thursday, July 19, 2012

Static Code Analysis with Perl::Critic

It is very important for any group that develops software to adhere to a set of software coding standards.  These coding standards not only help to improve the readability and maintainability of the code base, but can also help to improve the security of the resultant applications.  Any set of coding standards should describe a set of best practices for dealing common types of coding issues. One way to help ensure that code follows the standards dictated by industry and organizational best practices is to make use of a static code analyzer.  For Perl, a popular static code analyzer is the Perl::Critic module.  Before we take a look at what this module does, let’s purposely create some bad Perl code as a test case and save it as  



#assume $filename provided by user
open (STATFILE, "/home/user/$filename");

#assume input provided by $user
system("cat $input");

Now that we have this example of some really bad Perl coding practices lets create a few line of Perl to run our script through Perl::Critic.

use Perl::Critic;
my $file = '';
my $critic = Perl::Critic->new();
my @violations = $critic->critique($file);
print @violations;

If we execute the above we will end up with output as follows:

chris@chris-Aspire-5742:~/Desktop$ perl
Code before strictures are enabled at line 3, column 1. See page 429 of PBP.
Bareword file handle opened at line 6, column 1. See pages 202,204 of PBP.
Two-argument "open" used at line 6, column 1. See page 207 of PBP.

The Perl::Critic module tells of several problems with the first being that “use strict” was not made use of in our script.  Next, it informs us of some problems with the open statement as well.  The form of open used in that sample is a depreciated syntax which makes use of global typeglobs (the Bareword error), rather than a more modern lexically scoped file handle.  Perl Critic also points out that the older "two argument" open syntax is used rather than the newer "three argument" syntax which can prevent some potential issues with file names that contain special characters and thus improve the stability and reliability of the code.  Using a static code analyzer is a strongly recommended practice and in the majority of cases will help ensure an improvement in the overall quality and consistency of the code, even though compliance with policies may not always be 100% possible.  

It is important to note however, that static code analyzers are very useful tools, but that they are not able to catch every possible coding mistake.  In particular design related errors in the code will not be picked up by static code analysis.  For example, in the $filename is never validated and this may provide a potential vector for a directory traversal type attack.  Likewise in the system call, prior validation is also lacking and could allow a user to provide something like “/etc/passwd” as $input.  These issues however were not raised by this analysis, however.  Remember that passing a static code analysis does not necessarily equate with secure code. 

Wednesday, July 11, 2012

Performing Linear Regression in Perl

Linear regression is a very commonly used data analysis technique that can easily be performed in Perl using the module Statistics::Regression. This module is capable of performing multivariate linear regressions, but this code snippet will demonstrate a simple regression to the basic equation of a line y=mx +b, using the following data set:


use Statistics::Regression;
use strict;
use warnings;

my $reg=Statistics::Regression->new(
   "Title", ["Intercept", "Slope"]
#the use of 1.0 in the includes allows
#for the computation of a y intercept     
$reg->include(1.3, [1.0, 1.0]);
$reg->include(2.9, [1.0, 2.0]);
$reg->include(4.2, [1.0, 3.0]);
$reg->include(5.4, [1.0, 4.0]);


To understand how the module works, let’s consider the equation in a more multivariate form by rewriting it as y=bx1 + mx2.  Considering the equation in this form, will make the “include” statements that provide the data points to the module much easier to understand.  The include statements could be considered as using the following data entry format:

$reg->include(y, [x1, x2]);

Thus by always putting 1.0 in the x1 position it allows for the computation of a Y-intercept since it is telling the regression module that the x1 variable does not influence the outcome of bx1, whereas the x2 values correspond to the X values, since we want mx2 to change with a change in X.

A look at the output generated by the Perl module demonstrates the following results:

Regression 'Title'
Name                   Theta          StdErr     T-stat
[0='Intercept']       0.0500          0.1775       0.28
[1='Slope']           1.3600          0.0648      20.99

R^2= 0.995, N= 4, K= 2

The plot of the line that would result from the regression is as follows:


Kobo Wifi eReader