Wednesday, May 30, 2012

Computing Descriptive Statistics with Perl


For anyone that does any type of data analysis work, the computing of basic descriptive statistics is often essential.  As with most things, Perl has a CPAN module available that actually makes the computation of basic statistical values quite straightforward.  In this short script we will take a look at the CPAN module Statistics::Descriptive (http://search.cpan.org/dist/Statistics-Descriptive/lib/Statistics/Descriptive.pm) and use it to perform some basic statistical analysis of a numeric data set.  The data set in the script will consist of 100 randomly generated integers in the range of 50 to 150.  The mean, median, mode, standard deviation, minimum, and maximum values of the data set will then be computed.  

 #!usr/bin/perl

# Copyright 2012- Christopher M. Frenz
# This script is free software - it may be used, copied, redistributed, and/or modified
# under the terms laid forth in the Perl Artistic License

use Statistics::Descriptive;
use strict;
use warnings;

#generate 100 random numbers between 50 and 150
my $range=101;
my $minimum=50;
my @randnums = map { int(rand($range)+$minimum) } ( 1..100 );

#prints the random numbers
#to prove the random number generation worked
foreach my $randnum (@randnums){
    print "$randnum\n";
}

#computes basic statistics on data
my $stat=Statistics::Descriptive::Full->new();
$stat->add_data(@randnums);
my $mean=$stat->mean();
print "The mean is: $mean\n";
my $median=$stat->median();
print "The median is: $median\n";
my $mode=$stat->mode();
print "The mode is: $mode\n";
my $sd=$stat->standard_deviation();
print "The standard deviation is: $sd\n";
my $min=$stat->min();
print "The minimum is: $min\n";
my $max=$stat->max();
print "The maximum is: $max\n";

Monday, May 28, 2012

Creative Commons Licensed Perl Tutorials


While this site is still relatively new and as a result still light on content, I intend to develop it into an increasingly useful resource for people to learn about various topics pertaining to software development, with a particular emphasis on Perl of course.  This idea is somewhat evident in some of the posts that I have already made available on the site, such as those pertaining to Input Validation and Escaping Data, which are topics that experienced programmers should already be familiar with in much more depth than my posts cover.  As time progresses, in addition to posting interesting Perl scripts, I intend to keep posting educational content as well.  To facilitate the educational value of this site all Perl code that I have authored and posted on the site is available under the Perl artistic license and all accompanying text that I have authored is made available under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.  These terms will apply to all currently posted work as well as all future work posted to this site and means that anyone can feel free to use the content of this site, or derivatives of it, for any non-commercial purpose as long as they credit the source of the content (a link would be even better).  So for any professors or technical trainers out there, feel free to use any of the content you find beneficial to further the education of your students. 

Creative Commons License

Friday, May 25, 2012

Password Storage with Salted Hashes


Password storage is a hugely important issue for any application that makes use of passwords as an authentication mechanism.  One of the primary rules of password storage is that passwords should never be stored in plain text, but should instead be stored in a hashed form.  Hashes are one way cryptographic functions that provide a unique output for every input, and, as such, as long as the user always types in the correct password, the hash of the password should always result in the same value.  Any difference in the supplied password will result in a different hash value.  Thus as long as the hash of the typed in password matches the stored hash value, it can be concluded that the proper password was entered and the user can be given the appropriate access to the system.  The one way nature of hash functions works to improve security, because theoretically it should not be possible to determine the password value used to create the hash (e.g. without resorting to techniques like brute forcing, rainbow tables, etc). 

The security of stored passwords can be even further improved by using a strong hash function such as SHA-512 over older hash functions like MD-5 or SHA-1.  Moreover, salting hashes can provide a further means improving the security of stored passwords, as salts can work to nullify the usefulness of rainbow table based attacks.  A salt is a set of random bits that is also provided as input to the hash function.  Ideally each user of your application should have a unique salt applied to his password hash function.  In Perl, this is actually quite easy to achieve with the Crypt::Salted Hash module (http://search.cpan.org/~esskar/Crypt-SaltedHash-0.06/lib/Crypt/SaltedHash.pm).  Let’s consider the following snippet of Perl code which uses the module to create the salted SHA-512 hash of the supplied password.  

#!usr/bin/perl

use Crypt::SaltedHash;
use strict;
use warnings;

my $password='password';

#creating the salted hash
my $crypt=Crypt::SaltedHash->new(algorithm=>'SHA-512');
$crypt->add($password);
my $shash=$crypt->generate();
my $salt=$crypt->salt_hex();

print "Salted Hash= $shash\n";
print "Salt= $salt\n";

 
The module automatically generates a random salt value when it generates the hash, and this can be verified by running the same code multiple times and seeing the different salts and salted hash values generated. 

The same module can also be used to verify that the proper password was entered, by comparing the supplied password with the stored salted hash as seen below.  If the password is found to be the same as the one used to create the salted hash, the validate method will return a value of “1”.

#verifying the salted hash
my $crypt2=Crypt::SaltedHash->new(algorithm=>'SHA-512');
my $verified=$crypt2->validate($shash, $password);
if($verified==1){
    print "This is the correct password\n\n";
}
else{print "This is the wrong password\n\n";}
 
To show what would happen if an incorrect password was supplied, consider the following code snippet.  Note how the “This is the wrong password” message is printed.  

#showing what would happen if the password was wrong
$password='passw0rd';
my $verified=$crypt2->validate($shash, $password);
print "$verified\n";
if($verified==1){
    print "This is the correct password\n\n";
}
else{print "This is the wrong password\n\n";}

Kobo has over 2 million ebooks to choose from!

Wednesday, May 23, 2012

Copy Windows Event Logs into an SQLite Database


For anyone that has to administer, troubleshoot, or audit Windows systems, the records stored in the Windows Event Log can be a treasure trove of information.  The one potential problem is that there are often many thousands of entries in such logs which can often make finding information of interest a challenge.  The Perl script below makes use of the Win32::EventLog module as a means of offloading the records stored in the Event Log to an SQLite database, which can later be searched for pertinent information.  The script is as follows:

 #!usr/bin/perl

# Copyright 2012- Christopher M. Frenz
# This script is free software - it may be used, copied, redistributed, and/or modified
# under the terms laid forth in the Perl Artistic License
  
use DBI;
use Win32::EventLog; #v0.076 used for development
use strict;
use warnings;

my $server="\\\\192.168.1.2"; #put UNC name or IP address here

#creates/opens SQLite DB
my $dbh = DBI->connect("dbi:SQLite:dbname=EventLogData.sql","","");

#Removes Table Data if it already exists and creates new Table Data
$dbh->do("DROP TABLE Data");
$dbh->do( "CREATE TABLE Data (Record INTEGER PRIMARY KEY, EventLog,Server,Time,Source,Message,EventID);" );

my %type = (1 => "ERROR",
2 => "WARNING",
4 => "INFORMATION",
8 => "AUDIT_SUCCESS",
16 => "AUDIT_FAILURE");


$Win32::EventLog::GetMessageText = 1;
my $i=0;
#processes System, Appication, and Security Logs
#inserts each event log record into SQLite DB
for my $eventlog ("System", "Application", "Security") {
   my $handle = new Win32::EventLog($eventlog, $server)
   or die "Unable to open system log:$^En";
   $handle->GetNumber(my $recs)
        or die "Can't get number of EventLog records\n";
   $handle->GetOldest(my $base)
        or die "Can't get number of oldest EventLog record\n";
   my $j=0;
   while ($j < $recs) {
        $handle->Read(EVENTLOG_FORWARDS_READ|EVENTLOG_SEEK_READ,
                                  $base+$j,
                                  my $hashRef)
                or die "Can't read EventLog entry #$j\n";
                Win32::EventLog::GetMessageText($hashRef);
                my $time=scalar localtime($hashRef->{TimeGenerated});
                my $source=$type{$hashRef->{EventType}};
                my $message=$hashRef->{Message};
                my $eventID=($hashRef->{EventID}& 0xffff);
                my $sql= 'INSERT INTO Data(Record, EventLog,Server,Time,Source,Message,EventID) VALUES (?,?,?,?,?,?,?)';
                my $insert=$dbh->prepare($sql);
                $insert->execute($i,$eventlog, $server,$time,$source,$message,$eventID);
        $j++;
        $i++;

}}

$dbh->disconnect();

To examine the output of the script, I would recommend the SQLite Database Browser (http://sqlitebrowser.sourceforge.net/) which provides a nice GUI interface for visually inspecting the contents of a SQLite database.  A screenshot showing a sample database created by the script can be seen below:

Tuesday, May 22, 2012

First Impressions - Padre the Perl IDE


I recently decided to give the Perl IDE Padre (http://padre.perlide.org/) a try and must admit that I was pleasantly surprised by its feature set and how intuitive it was to use.  While many experienced Perl programmers may not be likely to switch away from their longstanding preferences in development environments, I think that Padre is a great IDE to introduce a novice Perl programmer to (particularly since many newer programmers are trained to expect an IDE for programming).  At a basic level, the Padre interface is quite straightforward and any user would likely not find its basic features any more difficult to use than a text editor like gedit. 

Like most IDEs, Padre includes an integrated debugger with full support for Step In, Step Out, and Step Over functionalities as well as the ability to set breakpoints in the code.  Moreover, it includes a Display Value window that allows you to monitor the values of designated variables as you step through your code.  Padre also supports a Regex Editor that, I believe, would be particularly helpful to novices of Perl regular expressions since it clearly spells out what various predefined sub-patterns (e.g. \w), POSIX character classes, and quantifiers correspond to.   The IDE also includes lots of smaller, but still useful features, like the ability to comment out or uncomment an entire code block.  Additionally, while they are features I have not yet experimented with personally, Padre’s proposed integration with CPAN and the ability to extend Padre with plug-ins look quite promising. 

It is also interesting to note that Padre is written in Perl, and as such is truly a Perl IDE written by Perl programmers for Perl programmers.  For anyone interested in using an IDE to develop Perl applications, I would recommend that they give Padre a look.  I know I will be experimenting with it some more on some upcoming projects. 

For any interested in Perl IDEs other than (or in addition to) Padre there also exists Komodo IDE (http://www.activestate.com/komodo-ide) and EPIC (http://www.epic-ide.org/).  

Kobo has over 2 million ebooks to choose from!

A Brief Introduction to Input Validation


The importance of input validation should never be overlooked as a means of enhancing both the stability and security of your application, since input validation helps to ensure that your application only processes inputs that it was designed to process.  Ideally, any input supplied to your application should be treated as untrusted and passed through a validation routine to ensure that the data is the proper type and format required by your application.  This short tutorial will look at two basic approaches to input validation, one involving the concept of whitelisting and the other involving the concept of blacklisting. 

First we will consider a whitelist approach, whereby the whitelist consists of all of the inputs that are allowed to pass through to the application.  In other words, if the supplied input, matches the criteria laid forth in the whitelist the input will be treated as valid and will be processed by the application.  If the input does not meet the criteria laid forth in the whitelist, it will be considered an invalid input and will result in an error message rather than further processing.  As an example of this, let’s consider the following code snippet which uses a regular expression to whitelist valid U.S. phone numbers:

@inputs =('(555) 555-5555', '555-555-5555', '555 555-5555', '(xyz) abc-defg');
foreach $input (@inputs){
  if($input=~/(\s?\(?\d{3}\)?[-\s.]?\d{3}[-.]\d{4})/){
    print "$1 is a valid phone number\n";
    #execute code requiring phone number
  }
  else{print "invalid input\n";}
}

If we were to execute this code snippet, we would see that the first 3 values of @inputs pass the whitelist filter and could thus be used for further processing by the application, whereas the 4th value of @inputs does not match the whitelist criteria and instead results in an “invalid input” error message.  This approach of whitelisting valid inputs is actually the preferred way to perform input validation and should be made use of wherever it is feasible to define the allowable inputs according a precise set of values or a precise pattern, since whitelisting gives the application author very fine-grained control over what inputs will be considered valid and what inputs will not be considered valid. 

While whitelisting is always the more secure way of doing things, it is not always feasible to make use of a whitelisting approach, since possible inputs may be too varied to be readily packaged into a predefined set of inputs or a predefined pattern.  In cases where whitelisting is not practical, blacklisting can be used as an alternative approach.  Blacklisting, does not focus on listing valid (allowable) inputs, but rather focuses on listing inputs that should be considered invalid.  In a blacklisting approach, any input that matches the criteria laid forth in the blacklist is considered an invalid input and will result in an error message, while any input that does not match the blacklist criteria is treated as a valid input and passed through for further processing.  Let’s take the hypothetical example of writing a piece of software that serves as the backend for a Web-based forum.  Due to the large variances in types of content that we may want to allow users to post, it may be difficult to whitelist valid forum posts.  We may, however, want to blacklist certain types of content to prevent our forum from turning into a potential XSS attack vector.  In this case, we may consider employing something like the following code snippet, which would blacklist any tagged content:

@inputs=('123','abc def ghi','<script>123</script>');
foreach $input (@inputs){
  if($input=~/((\%3C)|<).*?((\%3E)|>)/){
    print "invalid input\n";
  }
  else{
    print "$input is valid\n";
    #allow use of input
  }
}

Note, how the untagged content is considered valid, but the tagged content is considered invalid. 

Kobo has over 2 million ebooks to choose from!

Saturday, May 19, 2012

An Interesting Dichotomy - Closed Source Software Security


There have been many debates over the years as to the advantages and disadvantages of open source software products with regards to computer security, and this post is not meant to rehash the pros and cons of any of those debates.  Although in the interest of full disclosure, I am a proponent of the open source way of doing things and have always agreed with Eric S. Raymond’s conjecture that many eyes make all bugs shallow (even security bugs).  Personal beliefs aside however, I have always felt an interesting dichotomy always existed in the belief system of any who favored closed source software in terms of security. 

If you ask any security person about giving consideration to a proprietary encryption algorithm, they will instead recommend that you use an established and vetted algorithm like AES.  Why? In the field of cryptography, algorithms are only considered cryptographically secure by the cryptographic community after public disclosure of the algorithm and extensive peer review. Algorithms such as AES, Twofish, RSA, etc are all public knowledge and that has not served to lessen their security, but actually served to provide evidence of their security.  No security professional would trust an algorithm that has not been through such a vetting process. 

Yet, there are many security professionals that don’t seem to hold software to the same standards that they would apply to a cryptographic algorithm, and rather consider security through obscurity a benefit in this case.  Why is vetting and peer review deemed so important for one and not for the other by some?  While I can understand the desire for keeping things proprietary for business motives, it is much harder to understand a security related case for source code secrecy (outside of certain niche cases like the algorithms used to generate the numbers for secure tokens, etc), given the lengthier experience of the cryptographic community. 

Friday, May 18, 2012

Be Sure to Escape Untrusted Data


Escaping is a method of rendering untrusted data non-executable by ensuring that the characters that comprise the data are treated as data and not characters of significance by the parser that will process the data.  As such, escaping is a common defense against cross site scripting attacks, whereby a user attempts to inject malicious JavaScript content into a Web page.  For example, the typical JavaScript is enclosed in a set of <script></script> tags which are used to identify which elements of the HTML page need to be forwarded to a browser's JavaScript engine.  Escaping characters like, <, >, “, ‘,  /, and & into an HTML entity encoded (e.g. &lt; for <) form will allow the untrusted data to display as written, but will prevent its execution.  To get a feel for how escaping works, consider the following Perl code snippet:

 #!usr/bin/perl

# Copyright 2012- Christopher M. Frenz
# This script is free software - it may be used, copied, redistributed, and/or modified
# under the terms laid forth in the Perl Artistic License

use HTML::EscapeEvil;
use strict;

#simulated input containing JavaScript
my $input=q{<script type="text/javascript">

   var d=new Date();
   document.write(d);

   </script>
};

#code to escape the tags from JavaScript input   
my $escape = HTML::EscapeEvil->new;
$escape->parse($input);
my $html = $escape->filtered_html;
$escape->clear;

#prints out escaped html output
print "<p>$html</p>";

In its original form, the script stored in the $input variable would be able to execute inside a browser and result in an output such as the following:

 Fri May 18 2012 15:44:24 GMT-0400 (Eastern Daylight Time) 

The escaped version is non-executable and would be output from the Perl script as follows:

<p>&lt;script type=&quot;text/javascript&quot;&gt;

   var d=new Date();
   document.write(d);

   &lt;/script&gt;</p>

If the resultant HTML was displayed in a browser, it would not be executed and would yield the following:

<script type="text/javascript"> var d=new Date(); document.write(d); </script>



334771_Hungry Devices 125x125

Wednesday, May 16, 2012

Are We Teaching Programming Wrong?


An issue that I have been wondering a lot about lately is whether or not people currently studying to be programmers and people who have to use the programs they write are being done a huge disservice by those that are responsible for training them.  It seems that many colleges and vocational schools that teach computer programming focus heavily on teaching students to write code that is operational but not necessarily robust.  By that I mean that the resultant code will work in the sense that it will properly execute the algorithm of interest, perhaps even in an efficient way, but little to no attention will often be put into other real world essentials like error handling, input validation, sanitizing inputs, proper session handling, etc.  Producing code that achieves the desired function is important, but it is not the only thing that is important for the creation of a quality product.  It really leaves me to wonder if security problems would be as widespread as they are now, if developers were taught to deal with such issues as they learned to program.  While many schools are now offering classes on secure application development, these classes are usually an elective and not a standardized part of the curriculum and, as such, the techniques taught may be viewed by many students as “add ons” and not essentials. 

In many cases, the importance and utility of such techniques could be emphasized without the need for more than a basic understanding of a programming language.  For example, if an application required a number between 1 and 10 as input, basic forms of input validation could be illustrated with the addition of an if statement and basic error handling with an else statement.  Sure these are rudimentary ways of doing things, but the point is that it instills in the would be programmers the need for such techniques from the start of their education.  Approaching these topics from early on, I believe, would contribute to greater awareness of such issues and better habits towards dealing with such issues.  Of course as the knowledge of the students grows so too could the sophistication of the techniques.  I think it is essential for all programmers to understand that a functional routine is an important milestone, but for any code that will be put into a production environment, proper functionality when provided proper inputs is not enough.