ALINK="#FF0000">

"Linux Gazette...making Linux just a little more fun!"


perlpp: cpp on Steroids

By Dr. Warren MacEvoy


The point of this article is to introduce a tool I call perlpp, the Perl preprocessor. Since I wrote it, perlpp is not available in any Linux distribution. See Resources for information on obtaining perlpp and the examples described here.

perlpp is a beefy version of cpp, the C preprocessor; it can do what cpp can do and much more. For example, introducing the idea of code templates in any programming language is easy with perlpp.

Using perlpp, the Perl preprocessor, requires at least a rudimentary knowledge of programming in Perl. Perl 5 or later must be installed on your system.

Since Perl is such a useful language, almost every programmer should know a little about it. I will start by covering some of the rudiments of Perl used in the examples. If you are already fairly comfortable with Perl, move on to the next section.

Variables. Scalar variables, which can take on values of strings, integers, or doubles, always have a $ as the first character. List variables, which are simple lists of scalars, always have a @ as the first character. All variables are global, unless preceded by my when first used within a block.

String quoting. Strings can be quoted three ways in Perl. They can be quoted almost exactly using single forward quotes ('), quoted with interpolation using double quotes ("), or system quotes using single back quotes (`). We will present more detail on this later, but basically:

Loops. Perl supports the csh-style loop of the form

foreach $index (@LIST) { 
   statement1;
   statement2; 
   .... 
}
as well as the C-style loop:

for (do-once; check-first-each-time; do-last-each-time) { 
   statement1;
   statement2; 
   .... 
}
Both types are used in the examples.

In fact, the basic syntax of Perl mimics C in many respects, so C programmers can read Perl scripts fairly easy. No, that is too bold: a C programmer can write C-looking Perl, and it will mostly work as expected. A Perl programmer would solve the same problem in a completely different manner. In doing so, he may accomplish something difficult to imagine: a program more obscure than what can readily be written in C. If you don't believe me, look at the perlpp source, which is a Perl script.

Perl is a great deal more than this tiny view, but these ideas should be enough to understand the examples. See Resources for more information about Perl.

Introduction

Let's begin by talking about cpp. C programmers don't get far before learning that C programs, at least logically, pass through two stages of translation. The first stage, the preprocessing stage, uses commands such as

#include <stdio.h>
and

#define FOO(x) bar(x)
to translate the hybrid C/cpp input file into a pure C input file, which is then input to the pure C compiler. Pictorially,

input_file -> cpp -> cc1 -> object_file
While the intended job of cpp is to preprocess input files for a C (or C++) compiler, it can be used to preprocess other files. For example, xrdb uses cpp to preprocess X11 resource files before loading them. cpp is a very useful tool, but a programmer can quickly run into limitations, essentially because cpp is a macro-processor with limited facilities for computation and the manipulation of text.

The reason I wrote perlpp was to overcome these limitations for a scientific computation problem at Pacific Northwest National Laboratories, where I wrote the chemical equilibrium portion of a ground water transport model. For the sake of compatibility with the rest of the model, it had to be programmed in FORTRAN. For the sake of compatibility with Linux, Sun and SGI development environments, it had to be FORTRAN 77. The problem statement was roughly this: given the chemical equilibrium equations for a given set of species, automatically generate an efficient reliable solver for these equations.

This created a need to go from chemical equilibrium equations in symbolic form to the generation of a Maple V (a symbolic mathematics package) batch file from a template, followed by the inclusion of the results from that batch file into a template-generated FORTRAN subroutine library that satisfied the requirements of the project.

This environment required the automatic generation of several kinds of programs from templates and was a natural breeding ground for thoughts about useful preprocessors. Although it took me most of a week to come up with the alpha version of perlpp, it easily saved that amount of time just for that one project. Solving the same problem without it may have taken four or five weeks longer. Furthermore, without perlpp, the project would be much harder to maintain.

What Perlpp Does

perlpp takes input files and generates perl scripts which, when run, create similar but better output files.

Example 1: Hello World!

Create a file called hello.c.ppp containing the lines

#include <stdio.h>
int main()
{
printf("Hello World!\n");
return 0;
}
Now run the perlpp command by typing:

perlpp -pl hello.c.ppp
The -pl option is discussed later. If you check, perlpp created the file hello.c.pl, which contains the following Perl script:

#!/usr/bin/perl
print '#include <stdio.h>
';
print 'int main()
';
print '{
';
print '  printf("Hello World!\\n");
';
print '  return 0;
';
print '}
';
Your mileage may vary on the exact contents of the first line. See "Troubleshooting" if you have problems generating this script.

Running hello.c.pl generates the same text as the original input file, hello.c.ppp. In this way, perlpp can be viewed as an obscure and computationally expensive way to copy text files.

The -pl option means ``create a perl program''. If you leave it off, it simply runs the program and saves the output in hello.c. This means

perlpp hello.c.ppp
is equivalent to

perlpp -pl hello.c.ppp
  ./hello.c.pl > hello.c
  rm hello.c.pl
except that the file hello.c.pl is never explicitly created.

So our first example, hello.c.ppp, when normally processed by perlpp, creates a copy of itself, hello.c. While this should not excite you, it should not surprise you either. After all, if you processed a text file using cpp, containing no cpp directives, you would get back exactly what you put in.

cpp is interesting only when the input file contains cpp directives. Perlpp is only slightly interesting when the input file contains no perlpp directives, because it generates a Perl script that regenerates the input file using print statements. To get any further, the perlpp directives must be used.

Directives

Only four directives are available for perlpp, along with a default directive. Each describes how a given line of input will be translated into the perl script.

  1. ! Perl source rule: if the first character of a line is a ! (bang), copy the remaining part of the line to the generated perl script verbatim.
  2. ' print exact: If the first character of a line is a ' (single quote), then generate a single-quoted (uninterpolated) print statement. Executing this print statement will produce the remaining part of the input line exactly.
  3. " print interpolated: if the first character of a line is a " (double quote), generate a double-quoted (interpolating) print statement. For more on interpolating strings, see the perlop man page. If use locale is in effect, the case map used by \l, \L, \u and <\U> is taken from the current locale. See the perllocale man page. [It should be noted that \\ (two backslashes) in an interpolated string translates into a single backslash, so \\n interpolates to \n in the output. This will show up in our next example.]
  4. ` print system: if the first character of a line is a ` (back quote), then generate a back-quoted (system) print statement. Executing this print statement will produce the output of, first, interpolating the remainder of the line as in rule 2 above, then running the interpolated text as a shell command.
If none of the characters bang(!), single quote('), double quote(") or back quote(`) begin a line, a default translation occurs:

Example 2: Salutations

Create a file called salutations.c.ppp containing the lines:

  #include <stdio.h>
  int main()
  {
  !foreach $s ('Hello World!','Hola Mundo!', 'Ciao!') {
  "  printf("$s\\n");
  !}
    return 0;
  }
Let's first look at the generated Perl script by typing:

perlpp -pl salutations.c.ppp
In salutations.c.pl, you will find

  print '#include <stdio.h>
  ';
  print 'int main()
  ';
  print '{
  ';
  foreach $s ('Hello World!','Hola Mundo!', 'Ciao!') { 
  print "  printf(\"$s\\n\");
  ";
  }
  print '  return 0;
  ';
  print '}
  ';
Look carefully at the print statement generated by the printf statement in salutations.c.ppp:

print "  printf(\"$s\\n\");
  ";
Perlpp goes to the trouble of adding backslashes where appropriate so that double quotes do not prematurely terminate the string. The same idea applies to the other forms of quoted print statements perlpp generates.

Let perlpp run this script for us with

perlpp salutations.c.ppp
This generates the file salutations.c,

#include <stdio.h>
int main()
{
printf("Hello World!\n");
printf("Hola Mundo!\n");
printf("Ciao!\n");
return 0;
}

Example 3: Fast Point Template

This last example uses perlpp to generate a template for fixed-length vector classes in C++, where loops are unwound. Unwinding a loop means, for example, replacing the code

for (int i=0; i<3; ++i) a[i]=i;
with

a[0]=0; a[1]=1; a[2]=2;
Unwinding the loop does not change the effect of the code, but it does make it faster. This is because the index variable does not have to be incremented and compared between each assignment.

Such a fixed-length template class would be useful, for example, in a graphics library where two-dimensional and three-dimensional vectors of fixed types (float, int, double) would be used by the package. All of these would be essentially the same--and thus a candidate for a template class--except that the performance overhead for the looping may not be acceptable in such a high-end application.

perlpp can help here. perlpp is first used to generate a Perl program (using the -pl option) from a template file, Point.Template.ppp. The Point.Template.pl script is designed to create different fixed-length vector classes, depending on what arguments are passed to it. Using the back-quote print system directive, this script is then used in the primary source file, testPoint.cpp.ppp, to generate the specific desired class.

The file Point.Template.ppp is fairly long, and available by anonymous FTP as noted in Resources. Consequently, I will consider only the portions of this file which illustrate something interesting about how to use perlpp.

The first interesting line of Point.Template.ppp is

! eval join(";",@ARGV);
This, of course, will translate into the Perl statement

eval join(";",@ARGV);
Only the leading bang is deleted. Executing this line joins all the command-line arguments of the script, separated by semicolons, and evaluates that as a sequence of Perl statements. This is an extremely crude form of command-line argument processing, but it serves our purposes.

The next few lines check that the previous command-line evaluation actually defined three crucial variables:

If they were not defined, the script writes to STDERR about it and exits with an exit code of 1.

After this, the template goes about the business of generating the desired class. This begins with

"class $name {
"public:
!#
!# Declare internal array of desired type and size
!#
"  $type a[$dim];
"  static const int dim=$dim;
Here $name, $type and $dim are used to create specific text in the class definition. In Perl, # denotes a comment, so !# is effectively a comment in perlpp.

The first instance of loop unwinding is seen in the default constructor for the class. The lines

!  for ($i=0; $i<$dim; ++$i) {
"    a[$i]=0;
!  }
translate into the Perl segment

for ($i=0; $i<$dim; ++$i) {
     print("    a[$i]=0;
");
}
This loop is executed in the Perl script as the preprocessor, where the assignment will be expanded to a sequence of assignments in the C++ class source. Loops are unwound in a similar fashion in other parts of the class definition.

Efficiency aside, the next block of the perlpp source provides a class constructor that would be impossible to declare using standard template facilities: one with as many arguments as the dimension of the vector class to be constructed.

  !  @arg=(); for ($i=0; $i<$dim; ++$i) { $arg[$i]="$type a$i"; } 
  !  $args=join(',',@arg);
  !
  "  $name($args)
If you are new to Perl, the first line may be difficult to understand. It begins by setting the @arg list to an empty list, then loops to build $dim entries in @arg: "$type a0", "$type a1", etc. The reason elements of @arg are denoted by $arg[$i] in the for loop is that @arg, once subscripted, refers to the scalar variable available as the ith entry of @arg. Remember, scalar variables always begin with a $ character--even those tucked inside a list.

Following this declaration, the constructor is defined to initialize the vector with its arguments:

"  {
!    for ($i=0; $i<$dim; ++$i) {
"      a[$i]=a$i;
!    }
"  }
This is followed by the definition of subscript operators, which are perfectly standard. After this, another feature of perlpp is illustrated: the code for defining all the assignment operators is generated using a loop structure:

!  foreach $op ("=","+=","-=","*=","/=") {
    .
    . # define the $op assignment operator
    .
!  }
Since all the assignment operators are defined in essentially the same way, this loop allows the template to be written more compactly than with the standard template facilities. This makes the template faster to write, maintain and debug.

A similar loop follows this to define the various binary operators for the class: addition, subtraction, etc. These loops reduce the redundancy of effort in defining the template, which, amusingly, is itself a tool to reduce redundancy of effort. Okay, I admit I am easily amused.

The rest of the template declares and defines three operators, I/O functions and a scalar multiply. They do what they are supposed to do, and nothing new about perlpp is learned by going over them.

Let's move on to using Point.Template.ppp. First, convert it to a Perl script with the command:

perlpp -pl Point.Template.ppp
Now look in the test program source file, testPoint.cpp.ppp. The only interesting line is

` ./Point.Template.pl '\$name="FixVect"' '\$dim=2' '\$type="float"'
This runs the Point.Template.pl script just generated with the arguments:

$name="FixVect"  $dim=2 $type="float"
With these arguments, the template script prints out a FixVect class, which represents two-dimensional arrays of floats. The back-quote perlpp directive includes this in the testPoint.cpp source file.

Generating template classes in this way is not completely satisfying, because the idea of declaring and defining the class must usually be separated. However, this can be corrected by modifications of the template file. Essentially, a fourth variable could be set on calling the script, $use, which has a value of either "declare" or "define". Using if clauses, the script would then provide either the definition or declaration portion of the class. This is yet another way in which the redundancy of a template can be reduced using perlpp.

Conclusions

I don't want to leave you thinking of perlpp as sort of a ``compression algorithm.'' Keeping ideas together in a project simplifies maintaining them. The goal of perlpp is to prevent ``concept leakage,'' where several parts of source files redundantly represent an idea, and those source files have to be maintained separately.

Essentially, perlpp replaces the rather rigid (but simple!) text-processing language available as cpp with the expressive (but complex) text-processing language available as Perl. Many programmers use Perl in any case, so knowing the syntax of Perl pays twice: once as a language in itself, and once as a powerful macro language for any programming language.

If you don't know Perl, then perlpp is just another good reason to learn it.

Resources

perlpp is available as a tar file by anonymous FTP at ftp://zot.mesastate.edu/pub/wmacevoy/perlpp/perlpp-0.5.tar.gz (local copy here.) The distribution includes installation instructions for perlpp.

The examples from this article are at ftp://zot.mesastate.edu/pub/wmacevoy/perlpp/lj-article.tar.gz (local copy here).

You must have Perl 5, or later, installed to use perlpp. All Linux distributions have Perl available as a package in some form. The web page http://www.perl.org/ is a great place to begin if you want to learn more about Perl.

Troubleshooting

perlpp is a Perl script that generates Perl scripts. To use it, you must have Perl installed, and perlpp must be able to find it. If perlpp does not work, check that the first two lines of perlpp reflect the actual location of your Perl executable.

If these are correct, make sure that execute permissions are set for the script (chmod 755 perlpp), and that perlpp is visible from your PATH.

If you just installed perlpp, you may have to refresh your shell PATH directory cache with hash -r (if you use bash) or rehash (if you use csh).

Acknowledgements

Thanks to the Linux community for providing such a wonderful environment for reliable scientific computations. I try very hard not to taunt every time a colleague of mine tries to accomplish something useful on a machine which crashes so often they have come to expect it.

I also want to thank Mike Littlejohn for test-driving perlpp and this article, as well as Karl Castleton, Steve Yabusaki and Ashok Chilakapati for getting me on the groundwater modeling project.

Finally, thanks to Pacific Northwest National Laboratories, the Associated Western Universities fellowship program, and Mesa State College for allowing me the time, resources and opportunity to develop perlpp.

A Story

Over the summer, I left my Red Hat 5.0 machine running in my Mesa State College, Grand Junction, Colorado office. I then went to Pacific Northwest National Laboratories in Richland, Washington, where I dreamed up perlpp.

I used my Linux box remotely for the whole summer: web-browsing, e-mail, obtaining old source files, using Emacs, Maple, TeX, Perl or the FORTRAN compiler. It's true that I used these tools on the PNNL machines as well, but sometimes a license was not available, or the Linux tool was better for my purposes than what I could obtain at the lab.

For six weeks I used that machine remotely at least once each day. Only once did I have a problem with connecting to it. After the summer, I learned that my Colorado office, which is in a building that is being remodeled, had experienced several power failures. Apparently, my machine had restarted each time without a hitch, and I had only noticed the single time I requested something during an outage.

That is far more reliability--and accessibility--than many of my colleagues experience with other operating systems.


Copyright © 1999, Dr. Warren MacEvoy
Published in Issue 44 of Linux Gazette, August 1999


[ TABLE OF CONTENTS ] [ FRONT PAGE ]  Back  Next