ALINK="#FF0000"> [ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]
LINUX GAZETTE
...making Linux just a little more fun!
Perl One-Liner of the Month: The Adventure of the Arbitrary Archives
By Ben Okopnik

Spring was in full bloom, and Woomert Foonly was enjoying another perfect day. It had featured a trivially easy configuration of a 1,000-node Linux cluster, and had been brought to an acme by lunching on Moroccan b'stila with just a touch of ras el hanout curry and a fruited couscous on the side, complemented by a dessert of sweet rice with cinnamon. All was at peace... until Frink Ooblick burst in, supporting - almost carrying - a man who seemed to be in the last extremity of shock. Frink helped him to the couch, then dropped into the easy chair, clearly exhausted by his effort.

 -- "Woomert, it's simply scandalous. This is Resolv Dot Conf, a humble... erm, well, a sysadmin, anyway. Recently, he was cruelly forced to install some kind of a legacy OS on his manager's computer - can you imagine? - and now, he's being asked to do something that sounds nearly impossible, although I could only get scant details. He had heard of your reputation (who hasn't, these days?), and was coming to see if you could help him, but collapsed in the street just outside your door due to the residual shock and a severe Jolt Cola deficiency. As to the problem... well, I'll let him tell you."

Woomert had been tending to their guest while listening, with the result that the latter now looked almost normal. Indeed, Woomert's "sysadmin-grade coffee" was well-known among the cognoscenti for its restorative powers, although the exact recipe (it was thought to have Espresso Alexander and coffee ice cream somewhere in its ancestry, but the various theories diverged widely after that point) remained a deep secret.

Now, though, the famous detective's eyes sharpened to that look of concentration he habitually wore while working.

 -- "Please state your problem clearly and concisely."

The quickly recovering sysadmin shook his head mournfully.

 -- "Well, Mr. Foonly... you see, what I have is a script that processes the data submitted to us by our satellite offices. The thing is, it all comes in various forms: we're a health insurance data processor, and every company's format is different. Not only that, but the way everyone submits the data is different: some just send us a plain data file, others use 'gzip', or 'compress', or 'bzip', or 'rar', or even 'tar' and 'compress' (or 'gzip'), and others - fortunately, all of those are just plain data - hand us a live data stream out of their proprietary applications. Our programmers handled the various format conversions as soon as they got the specs, but this arbitrary compression problem was left up to me, and it's got me up a tree!"

He stopped to take a deep breath and another gulp of Woomert's coffee, which seemed to revive him further, although he still sat hunched over, his forehead resting on his hand.

"Anyway, at this point, making it all work still requires human intervention; we've got two people doing nothing but sorting and decompressing the files, all day long. If it wasn't for that, the whole system could be completely automated... and of course, management keeps at me: 'Why isn't it fixed yet? Aren't you computer people supposed to...' and so on."

When he finally sat up and looked at Woomert, his jaw was firmly set. He was a man clearly resigned to his fate, no matter how horrible.

"Be honest with me, Mr. Foonly. Is there a possibility of a solution, or am I finished? I know The Mantra [1], of course, but I'd like to go on if possible; my users need me, and I know of The Dark Powers that slaver to descend upon their innocent souls without a sysadmin to protect them."

Woomert nodded, recognizing the weary old warrior's words as completely true; he, too, had encountered and battled The Dark Ones, creatures that would completely unhinge the minds of the users if they were freed for even a moment, and knew of the valiant SysAdmin's Guild (http://sage.org) which had sworn to protect the innocent (even though it was often protection from themselves, via the application of the mystic and holy LART [2]).

 -- "Resolv, I'm very happy to say that there is indeed a solution to the problem. I'm sure that you've done your research on the available tools, and have heard of 'atool', an archive manager by Oskar Liljeblad..."

At Resolv's nod, he went on.

"All right; then you also know that it will handle all of the above archive formats and more. Despite the fact that it's written in Perl, we're not going to use any of its code in your script - that would be a wasteful duplication of effort. Instead, we're simply going to use 'acat', one of 'atool's utilities, as an external filter - a conditional one. All we have to do is insert it right at the beginning of your script, like so:


#!/usr/bin/perl -w # Created by Resolv Dot Conf on Pungenday, Chaos 43, 3166 YOLD @ARGV = map { /\.(gz|tgz|zip|bz2|rar|Z)$/ ? "acat $_ '*' 2>/dev/null|" : $_ } @ARGV; # Rest of script follows
...


"Perl will take care of the appropriate magic - and that will take care of the problem."

The sysadmin was on his feet in a moment, fervently shaking Woomert's hand.

 -- "Mr. Foonly, I don't know how to thank you. You've saved... well, I won't speak of that, but I want you to know that you've always got a friend wherever I happen to be. Wait until they see this!... Uh, just to make sure I understand - what is it? How does it work?"

Woomert glanced over at Frink, who also seemed to be on the edge of his seat, eager for the explanation.

 -- "What do you think, Frink - can you handle this one? I've only used one function and one operator; the rest of it happened automagically, simply because of the way that Perl deals with files on the command line."

Frink turned a little pink, and chewed his thumb as he always did when he was nervous.

 -- "Well, Woomert... I know you told me to study the 'map' function, but it was pretty deep; I got lost early on, and then there was this new movie out..."

Woomert smiled and shook his head.

 -- "All right, then. 'map', as per the info from 'perldoc -f map', evaluates the specified expression or block of expressions for each element of a list - sort of like a 'for' loop, but much shorter and more convenient in many cases. I also used the ternary conditional operator ('?:') which works somewhat like an "if-then-else" construct:


# Ternary conditional op - sets $a to 5 if $b is true, to 10 otherwise $a = $b ? 5 : 10; # "if-then-else" construct - same action if ( $b ){ $a = 5; } else { $a = 10; }
"Both of the above do the same thing, but again, the first method is shorter and often more convenient. Examining the script one step at a time, what I have done is test each of the elements in @ARGV, which initially contains everything on the command line that follows the script name, against the following regular expression:

/\.(gz|tgz|zip|bz2|rar|Z)$/

This will match any filename that ends in a period (a literal dot) followed by any of the specified extensions.

Now, if the filename doesn't match the regex, the ternary operator returns the part after the colon, '$_' - which simply contains the original filename. Perl then processes the filename as it normally does the ones contained in @ARGV: it opens a filehandle to that file and makes its contents available within the script. In fact, there are a number of ways to access the data once that's done; read up on the diamond operator ('<>') , the STDIN filehandle, and the ARGV filehandle (note the similarity and the difference, Frink!) for information on some of the many available methods of doing file I/O in Perl."

"On the other hand, if the current element does match, the ternary operator will return the code before the colon, in this case

"acat $_ '*' 2>/dev/null|"

Perl will then execute the above command for the current filename. The syntax may seem a little odd, but it's what 'acat' (or, more to the point, the archive utilities that it uses) requires to process the files and ignore the error messages. Note that the command ends in '|', the pipe symbol; what happens here is much like doing a pipe within the shell. The command will be executed, the output will be placed in a memory buffer, and the contents of that buffer will become available on the filehandle that Perl would normally have opened for that file - presto, pure magic! [3]"

"So, to break it all out in long form, here's what I did:


@ARGV = map { # Use the BLOCK syntax of 'map' if ( /\.(gz|tgz|zip|bz2|rar|Z)$/ ){ # Look for archive extensions "acat $_ '*' 2>/dev/null|"; # Uncompress/pipe out the contents } else { $_; # Otherwise, return original name } } @ARGV; # This is the list to "walk" over
"Perl handles it from that point on. Once you pass it something useful on the command line or standard input, it knows just what to do. In fact," he glanced sternly over at Frink, who once again looked abashed, "studying 'perldoc perlopentut' is something I recommend to anyone who wants to understand how Perl does I/O. This includes files, pipes, forking child processes, building filters, dealing with binary files, duplicating file handles, the single-argument version of 'open', and many other things. In some ways, this could be called the most important document that comes with Perl. Taking a look at 'perldoc perlipc' as a follow-up would be a good idea as well - it deals with a number of related issues, including opening safe (low privilege) pipes to possibly insecure processes, something that can become very important in a hurry."

 -- "Now, Resolv, I believe that you have a bright new future stretching out ahead of you; your problem will be solved, your management will be pleased, and your users will remain safe from Those Outside The Pale. If you would care to join us in a little celebration, I've just finished boiling a Spotted Dog, and - oh. Where did he go?... It's a very fine English pudding with currants, after all. Well, I suppose he wanted to implement that change as soon as possible..."


Footnotes

[1] "Down, Not Across." For those who need additional clues on the grim meaning of The Sysadmin Mantra, search the archives of alt.sysadmin.recovery at <http://groups.google.com>, and all will become clear. If it does not, then you weren't meant to know. :)

[2] From The Jargon File:

  Luser Attitude Readjustment Tool. ... The LART classic is a 2x4 or
  other large billet of wood usable as a club, to be applied upside the
  head of spammers and other people who cause sysadmins more grief than
  just naturally goes with the job. Perennial debates rage on
  alt.sysadmin.recovery over what constitutes the truly effective LART;
  knobkerries, semiautomatic weapons, flamethrowers, and tactical nukes
  all have their partisans. Compare {clue-by-four}.


[3] See "perldoc perlopentut" for a tutorial on opening files, the 'magic' in @ARGV, and even "Dispelling the Dweomer" for those who have seen too much magic already. :)

 

Ben is a Contributing Editor for Linux Gazette and a member of The Answer Gang.

picture Ben was born in Moscow, Russia in 1962. He became interested in electricity at age six--promptly demonstrating it by sticking a fork into a socket and starting a fire--and has been falling down technological mineshafts ever since. He has been working with computers since the Elder Days, when they had to be built by soldering parts onto printed circuit boards and programs had to fit into 4k of memory. He would gladly pay good money to any psychologist who can cure him of the resulting nightmares.

Ben's subsequent experiences include creating software in nearly a dozen languages, network and database maintenance during the approach of a hurricane, and writing articles for publications ranging from sailing magazines to technological journals. Having recently completed a seven-year Atlantic/Caribbean cruise under sail, he is currently docked in Baltimore, MD, where he works as a technical instructor for Sun Microsystems.

Ben has been working with Linux since 1997, and credits it with his complete loss of interest in waging nuclear warfare on parts of the Pacific Northwest.


Copyright © 2003, Ben Okopnik. Copying license http://www.linuxgazette.net/copying.html
Published in Issue 87 of Linux Gazette, February 2003

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]