The importance of input validation should never be
overlooked as a means of enhancing both the stability and security of your
application, since input validation helps to ensure that your application only
processes inputs that it was designed to process. Ideally, any input supplied to your
application should be treated as untrusted and passed through a validation
routine to ensure that the data is the proper type and format required by your application. This short tutorial will look at two basic
approaches to input validation, one involving the concept of whitelisting and
the other involving the concept of blacklisting.
First we will consider a whitelist approach, whereby the
whitelist consists of all of the inputs that are allowed to pass through to the
application. In other words, if the
supplied input, matches the criteria laid forth in the whitelist the input will
be treated as valid and will be processed by the application. If the input does not meet the criteria laid
forth in the whitelist, it will be considered an invalid input and will result
in an error message rather than further processing. As an example of this, let’s consider the
following code snippet which uses a regular expression to whitelist valid U.S.
phone numbers:
@inputs =('(555) 555-5555', '555-555-5555', '555 555-5555', '(xyz) abc-defg');
foreach $input (@inputs){
if($input=~/(\s?\(?\d{3}\)?[-\s.]?\d{3}[-.]\d{4})/){
print "$1 is a valid phone number\n";
#execute code requiring phone number
}
else{print "invalid input\n";}
}
foreach $input (@inputs){
if($input=~/(\s?\(?\d{3}\)?[-\s.]?\d{3}[-.]\d{4})/){
print "$1 is a valid phone number\n";
#execute code requiring phone number
}
else{print "invalid input\n";}
}
If we were to execute this code snippet, we would see that
the first 3 values of @inputs pass the whitelist filter and could thus be used
for further processing by the application, whereas the 4th value of
@inputs does not match the whitelist criteria and instead results in an “invalid
input” error message. This approach of
whitelisting valid inputs is actually the preferred way to perform input
validation and should be made use of wherever it is feasible to define the
allowable inputs according a precise set of values or a precise pattern, since
whitelisting gives the application author very fine-grained control over what
inputs will be considered valid and what inputs will not be considered
valid.
While whitelisting is always the more secure way of doing
things, it is not always feasible to make use of a whitelisting approach, since
possible inputs may be too varied to be readily packaged into a predefined set
of inputs or a predefined pattern. In
cases where whitelisting is not practical, blacklisting can be used as an
alternative approach. Blacklisting, does
not focus on listing valid (allowable) inputs, but rather focuses on listing inputs
that should be considered invalid. In a blacklisting
approach, any input that matches the criteria laid forth in the blacklist is
considered an invalid input and will result in an error message, while any
input that does not match the blacklist criteria is treated as a valid input
and passed through for further processing.
Let’s take the hypothetical example of writing a piece of software that
serves as the backend for a Web-based forum.
Due to the large variances in types of content that we may want to allow
users to post, it may be difficult to whitelist valid forum posts. We may, however, want to blacklist certain
types of content to prevent our forum from turning into a potential XSS attack
vector. In this case, we may consider
employing something like the following code snippet, which would blacklist any
tagged content:
@inputs=('123','abc def ghi','<script>123</script>');
foreach $input (@inputs){
if($input=~/((\%3C)|<).*?((\%3E)|>)/){
print "invalid input\n";
}
else{
print "$input is valid\n";
#allow use of input
}
}
foreach $input (@inputs){
if($input=~/((\%3C)|<).*?((\%3E)|>)/){
print "invalid input\n";
}
else{
print "$input is valid\n";
#allow use of input
}
}
Note, how the untagged content is considered valid, but the
tagged content is considered invalid.
3 comments:
In general I think especially when white-listing one should probably anchor the regex to both the beginning and the end of the string using ^ and $. In this case the input string 'garbage (555) 555-5555' is accepted.
Though in this case $1 will only hold ' (555) 555-5555' I think it might be better to emphasize this.
Definitely a good point and a practice that I agree with. I was just trying to demonstrate the difference between a blacklisting and a whitelisting approach more so than creating the “best possible” filter, but it really always pays to be rigorous. Your post also demonstrates why it is important to test any security feature not just with typical use cases, but also with some “abuse” cases that attempt to bypass those security features.
This is a great point to understand.and execute easily
Post a Comment