ALINK="#FF0000">
LINUX GAZETTE
[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]

"Linux Gazette...making Linux just a little more fun!"


Learning Perl, part 5

By Ben Okopnik



What is the sound of Perl? Is it not the sound of a wall that people have stopped banging their heads against?
 -- Larry Wall

Overview

This month, we're going to cover some general Perl issues, look at a way to use it in Real Life(tm), and take a quick look at a mechanism that lets you leverage the power of O.P.C. - Other People's Code :). Modules let you plug chunks of pre-written code into your own scripts, saving you hours - or even days - of programming. That will wrap up this introductory series, hopefully leaving you with enough of an idea of what Perl is to write some basic scripts, and perhaps a desire to explore further.
 

A Quick Correction

One of our readers, A. N. Onymous (he didn't want to give me his name; I guess he decided that fame wasn't for him...), wrote in regarding a statement that I made in last month's article - that "close" without any parameters closes all filehandles. After cranking out a bit of sample code and reading the docs a bit more closely, I found that he was right: it only closes the currently selected filehandle (STDOUT by default). Thanks very much - well spotted!
 

Excercises

In the last article, I suggested a couple of script ideas that would give you some practice in using what you'd learned previously. One of the people who sent in a script was Tjabo Kloppenburg; a brave man. :) No worries, Tjabo; either you did a good job, or you get to learn a few things... it's a win-win situation.

The idea was to write a script that read "/etc/services", counted UDP and TCP ports, and wrote them out to separate files. Here was Tjabo's solution (my comments are preceded with a '###'):


#!/usr/bin/perl -w
### Well done; let the computer debug your script!

$udp = $tcp = 0;
### Unnecessary: Perl does not require variable declaration.

# open target files:
open (TCP, ">>tcp.txt") or die "Arghh #1 !";
open (UDP, ">>udp.txt") or die "Arghh #2 !";

### My fault here: in a previous article, I showed a quick hack in which
### I used similar wording for the "die" string. Here is the proper way
### to use it:
###
### open TCP, ">tcp.txt" or die "Can't open tcp.txt: $!\n";
###
### The '$!' variable gives the error returned by the system, and should
### definitely be used; the "\n" at the end of the "die" string prevents
### the error line number from being printed. Also, the ">>" (append)
### modifier is inappropriate: this will cause anything more than one
### execution of the script to append (rather than overwrite) the
### contents of those files.

# open data source:
open (SERV, "</etc/services") or die "Arghh #3 !";

while( <SERV> ) {
  if (/^ *([^# ]+) +(\d+)\/([tcpud]+)/) {

### The above regex has several problems, some of them minor
### (unnecessary elements) and one of them critical: it actually misses
### out on most of the lines in "/etc/services". The killer is the ' +'
### that follows the first capture: "/etc/services" uses a mix of spaces
### and *tabs* to separate its elements.

    $name   = $1;
    $port   = $2;
    $tcpudp = $3;
    $tmp = "$name ($port)\n";

### The above assignments are unnecessary; $1, $2, etc. will keep their
### values until the next succesful match. Removing all of the above
### and rewriting the "if" statement below as
###
### if ( $3 eq "udp" ) { print UDP "$1 ($2)\n"; $udp++; }
###
### would work just fine.

    if ($tcpudp eq "udp") {
      print UDP $tmp;
      $udp++;
    }

    if ($tcpudp eq "tcp") {
      print TCP $tmp;
      $tcp++;
    }
  }
}

# just learned :-) :
for ( qw/SERV TCP UDP/ ) { close $_ or die "can't close $_: $!\n"; }

print "TCP: $tcp, UDP: $udp\n";


The above script counted 14 TCPs and 11 UDPs in my "/etc/services" (which actually contains 185 of one and 134 of the other). Let's see if we can improve it a bit:



#!/usr/bin/perl -w

open SRV, "</etc/services" or die "Can't read /etc/services: $!\n";
open TCP, ">tcp.txt"       or die "Can't write tcp.txt: $!\n";
open UDP, ">udp.txt"       or die "Can't write udp.txt: $!\n";

for ( <SRV> ) {
    if ( s=^([^# ]+)(\s+\d+)/tcp.*$=$1$2= ) { print TCP; $tcp++; }
    if ( s=^([^# ]+)(\s+\d+)/udp.*$=$1$2= ) { print UDP; $udp++; }
}

close $_ or die "Failed to close $_: $!\n" for qw/SRV TCP UDP/;

print "TCP: $tcp\t\tUDP: $udp\n";



In the "for" loop, where all the 'real' work is done, I perform the following matches/substitutions:

Starting at the beginning of the line, (begin capture into $1) match any character that is not a '#' or a space and occurs one or more times (end capture). (Begin capture into $2) Match any whitespace character that occurs one or more times that is followed by one or more digits (end capture), a forward slash, and the string 'tcp' followed by any
number of any character to the end of the line. Replace the matched string (i.e., the entire line) with $1$2 (which contain the name of the service, whitespace, and the port number.) Write the result to the TCP filehandle, and increment the "$tcp" variable.

Repeat for 'udp'.

Note that I used the '=' symbol for the delimiter in the 's///' function. '=' has no particular magic about it; it's just that I was trying to avoid conflict with the '/' and the '#' characters which appear as part of the regex (those being two commonly used delimiters), and there was a sale on '=' at the neighborhood market. :) Any other character or symbol would have done as well.
 

Here are a couple of simple solutions for the other two problems:

1. Open two files and exchange their contents.



#!/usr/bin/perl -w
# The files whose contents are to be exchanged are named "a" and "b".

for ( A, B ) { open $_, "<\l$_" or die "Can't open \l$_: $!\n"; }
@a = <A>; @b = <B>;

for ( A, B ) { open $_, ">\l$_" or die "Can't open \l$_: $!\n"; }
print A @b; print B @a;



Pretty conservative, basic stuff. A minor hack: I used the '\l' modifier to set the filename to lowercase. Note that re-opening a filehandle closes it automatically - you don't have to close a handle between different "open"s. Also, explicitly closing a file isn't always necessary: Perl will close the handles for you on script exit (but be aware that some OSs have been reported as leaving them open.) By the way, the current version of Perl (5.6.1) has a neat mechanism that helps you do what I did above, but far more gracefully:


...
$FN = "/usr/X11R6/include/X11/Composite.h";
open FN or die "I choked on $FN: $!\n";
# "FN" is now open as a filehandle to "Composite.h".
...


All the distros with which I'm familiar currently come with Perl version 5.005.003 installed. I suggest getting 5.6.1 from CPAN (see below) and installing it; different versions of Perl coexist quite happily on the same machine. (Note that replacing an installed version by anything other than a distro package can be rather tricky, given how much system stuff depends on
Perl.)

I'm sure that a number of folks figured out that renaming the files would produce the same result. That wasn't the point of the excercise... but here's a fun way to do that:



#!/usr/bin/perl -w
%h = qw/a $$.$$ b a $$.$$ b/;
rename $x, $y while ($x, $y) = each %h


Here, I created a hash using a list of the filenames and a temporary variable - "$$" in Perl, just as in the shell, is the current process ID, and "$$.$$" is almost certainly a unique filename - and cycled through it with the "each" command, which retrieves key/value pairs from hashes. I suppose you could call it "round-robin renaming"...
 

2. Read "/var/log/messages" and print out any line that contains the words "fail", "terminated/terminating", or " no " in it. Make it case-insensitive.

This one is an easy one-liner:



perl -wne 'print if /(fail|terminat(ed|ing)| no )/i' /var/log/messages


The interesting part there is the "alternation" mechanism in the match: it allows strings like "(abc|def|ghi)" for lines matching any of the above.
 

Building Quick Tools

A few days ago, I needed to convert a text file into its equivalent in phonetic alphabet - a somewhat odd requirement. There may or may not have been a program to do this, but I figured I could write my own in
less time that it would take me to find one:

1) I grabbed a copy of the phonetic alphabet from the Web and saved it to a file. I called the file "phon", and it loked like this:

Alpha
Bravo
Charlie
Delta
Echo
Foxtrot
Golf
...

2) Then, I issued the following command:



perl -i -wple's/^(.)(.*)$/\t"\l$1" => "$1$2",/' phon


Ta-daa! Magic. (See below for a breakdown of the substitute operation.) The file now looked like this:

        "a" => "Alpha",
        "b" => "Bravo",
        "c" => "Charlie",
        "d" => "Delta",
        "e" => "Echo",
        "f" => "Foxtrot",
        "g" => "Golf",
        ...

3) A few seconds later, I had the tool that I needed - a script with exactly one function and one data structure in it:



#!/usr/bin/perl -wlp
# Created by Benjamin Okopnik on Sun May 27
13:07:49 2001

s/([a-zA-Z])/$ph{"\l$1"} /g;

BEGIN {
    %ph = (
        "a" => "Alpha",
        "b" => "Bravo",
        "c" => "Charlie",
        "d" => "Delta",
        "e" => "Echo",
        "f" => "Foxtrot",
        "g" => "Golf",
        "h" => "Hotel",
        "i" => "India",
        "j" => "Juliet",
        "k" => "Kilo",
        "l" => "Lima",
        "m" => "Mike",
        "n" => "November",
        "o" => "Oscar",
        "p" => "Papa",
        "q" => "Quebec",
        "r" => "Romeo",
        "s" => "Sierra",
        "t" => "Tango",
        "u" => "Uniform",
        "v" => "Victor",
        "w" => "Whisky",
        "x" => "X-ray",
        "y" => "Yankee",
        "z" => "Zulu",
    );
}



The above script will accept either keyboard input or a file as a command-line argument, and return the phonetic alphabet equivalent of the text.

This is one of the most common ways I use Perl - building quick tools that I need to do a specific job. Other people may have other uses for it - after all, TMTOWTDI [1] - but for me, a computer without Perl is only half-useable. To drive the point even further home, a group of Perl Wizards have rewritten most of the system utilities in Perl - take a look at <http://language.perl.com/ppt/> - and have fixed a number of annoying quirks in the process. As I understand it, they were motivated by the three chief virtues of the programmer: Laziness, Impatience, and
Hubris (if that confuses you, see the Camel Book ["Programming Perl, Third Edition"] for the explanation). If you want to see well-written Perl code, there are very few better places. Do note that the project is not yet complete, but a number of Unices are already catching on: Solaris 8 has a large number of Perl scripts as part of the system
executables, and doing a

file /sbin/* /usr/bin/* /usr/sbin/*|grep -c perl

shows at least the Debian "potato" distro as having 82 Perl scripts in the above directories.
 

OK, now for the explanation of the two s///'s above. First, the "magic" converter:

perl -i -wple's/^(.)(.*)$/\t"\l$1" => "$1$2",/' phon

The "-i", "-w", "-p", and "-e" switches were described in the second part of this series; as a quick overview, this will edit the contents of the file by looping through it and acting on each line. The Perl "warn" mechanism is enabled, and the script to be executed runs from the command line. The "-l" enables end-of-line processing, in effect adding a carriage return to the lines that don't have it. The substitution regex goes like this:

Starting at the beginning of the line, (begin capture into $1) match one character (end capture, begin capture into $2). Capture any number of any character (end capture) to the end of the line.

The replacement string goes like this:

Print a tab, followed by the contents of $1 in lowercase* and surrounded by double quotes. Print a space, the '=>' digraph, another space, $1$2 surrounded by double quotes and followed by a comma.

* This is done by the "\l" 'lowercase next character' operator (see 'Quote and Quote-like Operators' in the "perlop" page.)
 

The second one is also worth studying, since it points up an interesting feature - that of using a hash value (including modifying the key "on the fly") in a substitution, a very useful method:

s/([a-zA-Z])/$ph{"\l$1"} /g;

First, the regex:

(Begin capture into $1) Match any character in the 'a-zA-Z' range (end capture).

Second, the replacement string:

Return a value from the "%ph" hash by using the lowercase version of the contents of $1 as the key, followed by a space.
 

The BEGIN { ... } block makes populating the hash a one-time event, despite the fact that the script may loop thousands of times. The mechanism here is the same as in Awk, and was mentioned in the previous article. So, all we do is use every character as a key in the "%ph" hash, and print out the value associated with that key.

Hashes are very useful structures in Perl, and are well worth studying and understanding.
 

Modular Construction

One of the wonderful things about Perl - really, the thing that makes it a living, growing language - is the community that has grown up around it. A number of these folks have contributed useful chunks of code that are made to be re-used; that, in fact, make Perl one of the most powerful languages on the planet.

Imagine a program that goes out on the Web, connects to a server, retrieves the weather data - either the current or the forecast - for your city, and prints the result to your screen. Now, imagine this entire Perl script taking just one line.

perl -MGeo::WeatherNOAA -we 'print print_forecast( "Denver", "CO" )'

That's it. The whole thing. How is it possible?

(Note that this will not work unless you have the 'Geo::WeatherNOAA' module installed on your system.)

The CPAN (Comprehensive Perl Archive Network) is your friend. :) If you go to <http://cpan.org/> and explore, you'll find lots and lots (and LOTS) of modules designed to do almost every programming task you could imagine. Do you want your Perl script converted to Klingon (or Morse code)? Sure. Would you like to pull up your stock's performance from Deutsche Bank Gruppe funds? Easy as pie. Care to send some SMS text messages? No problem! With modules, these are short, easy tasks that can be coded in literally seconds.

The standard Perl distribution comes with a number of useful modules (for short descriptions of what they do, see "Standard Modules" in 'perldoc perlmodlib'); one of them is the CPAN module, which automates the module downloading, unpacking, building, and installation process. To use it, simply type

perl -MCPAN -eshell

and follow the prompts. The manual process, which you should know about just in case there's some complication, is described on the "How to install" page at CPAN, <http://http://cpan.org/modules/INSTALL.html>. I highly recommend reading it. The difference between the two processes, by the way, is exactly like that of using "apt" (Debian) or "rpm" (RedHat) and trying to install a tarball by hand: 'CPAN' will get all the prerequisite modules to support the one you've requested, and do all the tests and the installation, while doing it manually can be rather painful. For specifics of using the CPAN module - although the above syntax is the way you'll use it 99.9% of the time - just type

perldoc CPAN

The complete information for any module installed on your system can be accessed the same way.

As you've probably guessed by now, the "-M" command line switch tells Perl to use the specified module. If we want to have that module in a script, here's the syntax:



#!/usr/bin/perl -w

use Finance::Quote;

$q = Finance::Quote->new;
my %stocks = $q->fetch("nyse","LNUX");
print "$k: $v\n" while ($k, $v) = each %stocks;



The above program (you'll need to install the "Finance::Quote" module for it to work) tells me all about VA Linux on the New York Stock Exchange. Not bad for five lines of code.

The above is an example of the object-oriented style of module, the type that's becoming very common. After telling Perl to use the module, we create a new instance of an object from the "Finance::Quote" class and assign it to $q. We then call the "fetch" method (the methods are listed in the module's documentation) with the "nyse" and "LNUX" variables, and print the results stored in the returned hash.

A lot of modules are of the so-called exporting style; these simply provide additional functions when "plugged in" to your program.



#!/usr/bin/perl -w
use LWP::Simple;

$code = mirror( "http://slashdot.org", "slashdot.html" );
print "Slashdot returned a code of $code.\n";



In this case, "mirror" is a new function that came from the LWP::Simple module. Somewhat obviously, it will copy ("mirror") a given web page to a specified file, and return the code (e.g., '404' for 'RC_NOT_FOUND).
 

Wrapping It Up

Well, that was a quick tour through a few interesting parts of Perl. Hopefully, this has whetted a few folks' tastebuds for more, and has shown some of its capabilities. If you're interested in extending your Perl knowledge, here are some recommendations for reading material:

Learning Perl, 3rd Edition (coming out in July)
Randal Schwartz and Tom Phoenix

Programming Perl, 3rd Edition
Larry Wall, Tom Christiansen & Jon Orwant

Perl Coookbook
By Tom Christiansen & Nathan Torkington

Data Munging with Perl
By David Cross

Mastering Algorithms with Perl
By Jon Orwant, Jarkko Hietaniemi & John Macdonald

Mastering Regular Expressions
By Jeffrey E. F. Friedl

Elements of Programming with Perl
by Andrew Johnson
 

Good luck with your Perl programming - and happy Linuxing!
 

Ben Okopnik
perl -we'print reverse split//,"rekcah lreP rehtona tsuJ"'



1. "There's More Than One Way To Do It" - the motto of Perl. I find it applicable to all of Unix, as well.

References:

Relevant Perl man pages (available on any pro-Perl-y configured
system):

perl      - overview              perlfaq   - Perl FAQ
perltoc   - doc TOC               perldata  - data structures
perlsyn   - syntax                perlop    - operators/precedence
perlrun   - execution             perlfunc  - builtin functions
perltrap  - traps for the unwary  perlstyle - style guide

"perldoc", "perldoc -q" and "perldoc -f"

Ben Okopnik

A cyberjack-of-all-trades, Ben wanders the world in his 38' sailboat, building networks and hacking on hardware and software whenever he runs out of cruising money. He's been playing and working with computers since the Elder Days (anybody remember the Elf II?), and isn't about to stop any time soon.


Copyright © 2001, Ben Okopnik.
Copying license http://www.linuxgazette.net/copying.html
Published in Issue 69 of Linux Gazette, August 2001

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]