Below is simple Perl script for counting how many times a
word of interest appears in a text file (Input.txt). Words of interest are specified in the
regular expression stored in the $bucket variable and the number of occurrences
of each word is output to a file called WordFreqs.txt. To illustrate it use, let’s consider the
following as the contents of the Input.txt file:
red green yellow blue
red
red red yellow
yellow green blue blue green
green red
Now we can run that file through the following Perl script:
#usr/bin/perl
# Copyright 2012- Christopher M. Frenz
# This script is free software - it may be used, copied,
redistributed, and/or modified
# under the terms laid forth in the Perl Artistic License
sub by_count {
$count{$b}
<=> $count{$a};
}
open(INPUT, "<Input.txt");
open(OUTPUT, ">WordFreqs.txt");
$bucket='red|blue|green';
while(<INPUT>){
@words =
split(/\s+/);
foreach $word
(@words){
if($word=~/($bucket)/io){
$count{$1}++;}
}
}
foreach $word (sort by_count keys %count) {
print OUTPUT
"$word occurs $count{$word} times\n";
}
close INPUT;
close OUTPUT;
which will yield a WordFreqs.txt file with the following
contents:
red occurs 5 times
green occurs 4 times
blue occurs 3 times
6 comments:
Thank you so much, this was really useful. Is there anyway i can use wildcards with this? I need to count .xml tags in a file and need to know how many times each one apears, so the $bucket variable would be something like "<* *>" but it's only counting how many times the "<" appears. Is there any way to make it list count the expressions between <>?
You could theoretically use a modification such as the following to perform this:
#usr/bin/perl
my $XML='<tag1><tag2></tag2><tag2></tag2></tag1>';
while($XML=~/<(.*?)>/g){
$count{$1}++;
}
while( my ($key,$value)=each(%count)){
print "$key => $value\n";
}
However, this is the type of situation that you would probably be better served making use of an XML parsing module rather than regular expressions. An example of an XML parser would be XML::LibXML.
I'm new to perl and programming in general so i have no idea how to use that. But anyway i'll try your code. In case it doesn't work i'll try a xml parser. Thank you very much again
No luck
Could you give me an example on how to do this with xml parser?
Solved it, thanks for your help, i got it parsing everything between '<' and '>'.
Any idea on how to make it recursive?
http://stackoverflow.com/questions/16689082/recursive-open-files-in-perl
Post a Comment