ALINK="#FF0000">

"Linux Gazette...making Linux just a little more fun!"


Word Processing and Text Processing

by Larry Ayers


One of the most common questions posted in the various Linux newsgroups is "Where can I find a good word-processor for Linux?". This question has several interesting ramifications:


Vital For Some...

A notion has become prevalent in the minds of many computer users these days: the idea that a complex word processor is the only tool suitable for creating text on a computer. I've talked with several people who think of an editor as a primitive relic of the bad old DOS days, a type of software which has been superseded by the modern word-processor. There is an element of truth to this, especially in a business environment in which even the simplest memos are distributed in one of several proprietary word-processor formats. But when it is unnecessary to use one of these formats, a good text editor has more power to manipulate text and is faster and more responsive.

The ASCII format, intended to be a universal means of representing and transferring text, does have several limitations. The fonts used are determined by the terminal type and capability rather than by the application, normally a fixed, monospace font. These limitations in one sense are virtues, though, as this least-common-denominator approach to representing text assures readability by everyone on all platforms. This is why ASCII is still the core format of e-mail and usenet messages, though there is a tendency in the large software firms to promote HTML as a replacement. Unfortunately, HTML can now be written so that it is essentially unreadable by anything other than a modern graphical browser. Of course, HTML is ASCII-based as well, but is meant to be interpreted or parsed rather than read directly.

Working with ASCII text directly has many advantages. The output is compact and easily stored, and separating the final formatting from actual writing allows the writer to focus on content rather than appearance. An ASCII document is not dependent on one application; the simplest of editors or even cat can access its content. There is an interesting parallel, perhaps coincidental, between the Unix use of ASCII and other OS's use of binary formats. All configuration files in a Linux or any Unix system are generally in plain ASCII format: compact,editable, and easily backed-up or transferred. Many programmers use Linux; source code is written in ASCII format, so perhaps using the format for other forms of text is a natural progression. The main configuration files for Win95, NT and OS/2 are in binary format, easily corruptible and not easily edited. Perhaps this is one reason users of these systems tend towards proprietary word-processing formats which, while not necessarily in binary format, aren't readable by ASCII-based editors or even other word-processors. But I digress...

There are several methods of producing professional-looking printable documents from ASCII input, the most popular being LaTeX, Lout, and Groff.


Text Formatting with Mark-Up Languages

LaTeX

LaTeX, Leslie Lamport's macro package for the TeX low-level formatting system, is widely used in the academic world. It has become a standard, and has been refined to the point that bugs are rare. Its ability to represent mathematical equations is unparalleled, but this very fact has deterred some potential users. Mentioning LaTeX to people will often elicit a response such as: "Isn't that mainly used by scientists and mathematicians? I have no need to include equations in my writing, so why should I use it?" A full-featured word-processor (such as WordPerfect) also includes an equation editor, but (as with LaTeX) just because a feature exists doesn't mean you have to use it. LaTeX is well-suited to creating a wide variety of documents, from a simple business letter to articles, reports or full-length books. A wealth of documentation is available, including documents bundled with the distribution as well as those available on the internet. A good source is this ftp site, which is a mirror of CTAN, the largest on-line repository of TeX and LaTeX material.

LaTeX is easily installed from any Linux distribution, and in my experience works well "out of the box". Hardened LaTeX users type the formatting tagging directly, but there are several alternative approaches which can expedite the process, especially for novices. There is quite a learning curve involved in learning LaTeX from scratch, but using an intermediary interface will allow the immediate creation of usable documents by a beginner.

AucTeX is a package for either GNU Emacs or XEmacs which has a multitude of useful features helpful in writing LaTeX documents. Not only does the package provide hot-keys and menu-items for tags and environments, but it also allows easy movement through the document. You can run LaTeX or TeX interactively from Emacs, and even view the resulting output DVI file with xdvi. Emacs provides excellent syntax highlighting for LaTeX files, which greatly improves their readability. In effect AucTeX turns Emacs into a "front-end" for LaTeX. If you don't like the overhead incurred when running Emacs or especially XEmacs, John Davis' Jed and Xjed editors have a very functional LaTeX/TeX mode which is patterned after AucTeX. The console-mode Jed editor does syntax-highlighting of TeX files well without extensive fiddling with config files, which is rare in a console editor.

If you don't use Emacs or its variants there is a Tcl/Tk based front-end for LaTeX available called xtem. It can be set up to use any editor; the September 1996 issue of Linux Journal has a good introductory article on the package. Xtem has one feature which is useful for LaTeX beginners: on-line syntax help-files for the various LaTeX commands. The homepage for the package can be visited if you're interested.

It is fairly easy to produce documents if the default formats included with a TeX installation are used; more knowledge is needed to produce customized formats. Luckily TeX has a large base of users, many of whom have contributed a variety of style-formatting packages, some of which are included in the distribution, while others are freely available from TeX archive sites such as CTAN.

At a further remove from raw LaTeX is the LyX document processor. This program (still under development, but very usable) at first seems to be a WYSIWYG interface for LaTeX, but this isn't quite true. The text you type doesn't have visible LaTeX tagging, but it is formatted to fit the window on your screen which doesn't necessarily reflect the document's appearance when printed or viewed with GV or Ghostscript. In other words, the appearance of the text you type is just a user convenience. There are several things which can be done with a document typed in LyX. You can let LyX handle the entire LaTeX conversion process with a DVI or Postscript file as a result, which is similar to using a word-processor. I don't like to do it this way; one of the reasons I use Linux is because I'm interested in the underlying processes and how they work, and Linux is transparent. If I'm curious as to how something is happening in a Linux session I can satisfy that curiosity to whatever depth I like. Another option LyX offers is more to my taste: LyX can convert the document's format from the LaTeX-derived internal format to standard LaTeX, which is readable and can be loaded into an editor.

Load a LyX-created LaTeX file into an Emacs/Auctex session (if you have AucTeX set up right it will be called whenever a file with the .tex suffix is loaded), and your document will be displayed with new LaTeX tags interspersed throughout the text. The syntax-highlighting can make the text easier to read if you have font-locking set up to give a subdued color to the tagging (backslashes (\) and $ signs). This is an effective way to learn something about how LaTeX documents are written. Changes can be made from within the editor and you can let AucTeX call the LaTeX program to format the document, or you can continue with LyX. In effect this is using LyX as a preprocessor for AucTeX. This expands the user's options; if you are having trouble convincing LyX to do what you want, perhaps AucTeX can do it more easily.

Like many Linux software projects LyX is still in a state of flux. The release of beta version 0.12 is imminent; after that release the developers are planning to switch to another GUI toolkit (the current versions use the XForms toolkit). The 0.11.38 version I've been using has been working dependably for me (hint: if it won't compile, give the configure script the switch --disable-nls. This disables the internationalization support).


YODL

YODL (Yet One-Other Document Language) is another way of interacting with LaTeX. This system has a simplified tagging format which isn't hard to learn. The advantage of YODL is that from one set of marked-up source documents, output can be generated in LaTeX, HTML, and Groff man and ms formats. The package is well-documented. I wrote a short introduction to YODL in issue #9 of the Gazette. The current source for the package is this ftp site.


Lout

About thirteen years ago Jeffrey Kingston (of the University of Sydney, Australia) began to develop a document formatting system which became known as Lout. This system bears quite a bit of resemblance to LaTeX: it uses formatting tags (using the @ symbol rather than \) and its output is Postscript. Mr. Kingston calls Lout a high-level language with some similarities to Algol, and claims that user extensions and modifications are much easier to implement than in LaTeX. The package comes with hundreds of pages of Postscript documentation along with the Lout source files which were used to generate those book-length documents.

The Lout system is still maintained and developed, and in my trials seemed to work well, but there are some drawbacks. I'm sure Lout has nowhere near as many users as LaTeX. LaTeX is installed on enough machines that if you should want to e-mail a TeX file to someone (especially someone in academia) chances are that that person will have access to a machine with Tex installed and will be able to format and print or view it. LaTeX's large user-base also has resulted in a multitude of contributed formatting packages.

Another drawback (for me, at least) is the lack of available front-ends or editor-macro packages for Lout. I don't mind using markup languages if I can use, say, an Emacs mode with key-bindings and highlighting set up for the language. There may be such packages out there for Lout, but I haven't run across them.

Lout does have the advantage of being much more compact than a typical Tex installation. If you have little use for some of the more esoteric aspects of LaTeX, Lout might be just the thing. It can include tables, various types of lists, graphics, foot- and marginal notes, and equations in a document, and the Postscript output is the equal of what LaTeX generates.

Both RedHat and Debian have Lout packages available, and the source/documentation package is available from the Lout home FTP site.


Groff

Groff is an older system than TeX/LaTeX, dating back to the early days of unix. Often a first-time Linux user will neglect to install the Groff package, only to find that the man command won't work and that the man-pages are therefore inaccessible. As well as in day-to-day invocation by the man command, Groff is used in the publishing industry to produce books, though other formatting systems such as SGML are more common.

Groff is the epitome of the non-user-friendly and cryptic unix command-line tool. There are several man-pages covering various of Groff's components, but they seem to assume a level of prior knowledge without any hint as to where that knowledge might be acquired. I found them to be nearly incomprehensible. A search on the internet didn't turn up any introductory documents or tutorials, though there may be some out there. I suspect more complete documentation might be supplied with some of the commercial unix implementations; the original and now-proprietary version is called troff, and a later version is nroff; Groff is short for GNU roff.

Groff can generate Postscript, DVI, HP LaserJet4, and ASCII text formats.

Learning to use Groff on a Linux system might be an uphill battle, though Linux software developers must have learned enough of it at one time or other, as most programs come with Groff-tagged man-page files. Groff's apparent opacity and difficulty make LaTeX look easy in contrast!


A Change in Mind-Set

Processing text with a mark-up language requires a different mode of thought concerning documents. On the one hand, writing blocks of ASCII is convenient and no thought needs to be given to the marking-up process until the end. A good editor provides so many features to deal with text that using any word-processor afterwards can feel constrictive. Many users, though, are attracted by the integration of functions in a word processor, using one application to produce a document without intermediary steps.

Though there are projects underway (such as Wurd) which may eventually result in a native Linux word-processor, there may be a reason why this type of application is still rare in the Linux world. Adapting oneself to Linux, or any unix-variant, is an adaptation to what has been called "the Unix philosophy", the practice of using several highly-refined and specific tools to accomplish a task, rather than one tool which tries to do it all. I get the impression that programmers attracted to free software projects prefer working on smaller specialized programs. As an example look at the plethora of mail- and news-readers available compared to the dearth of all-in-one internet applications. Linux itself is really just the kernel, which has attracted to itself all of the GNU and other software commonly distributed with it in the form of a distribution.

Christopher B. Browne has written an essay titled An Opinionated Rant About Word-Processors which deals with some of the issues discussed in this article; it's available at this site.

The StarOffice suite is an interesting case, one of the few instances of a large software firm (StarDivision) releasing a Linux version of an office productivity suite. The package has been available for some time now, first in several time-limited beta versions and now in a freely available release. It's a large download but it's also available on CDROM from Caldera. You would think that users would be flocking to it if the demand is really that high for such an application suite for Linux. Judging by the relatively sparse usenet postings I've seen, StarOffice hasn't exactly swept the Linux world by storm. I can think of a few possible reasons:


I remember the first time I started up the StarOffice word-processor. It was slow to load on a Pentium 120 with 32 mb. of RAM (and I thought XEmacs was slow!), and once the main window appeared it occurred to me that it just didn't look "at home" on a Linux desktop. All those icons and button-bars! It seemed to work well, but with the lack of English documentation (and not being able to convince it to print anything!) I eventually lost interest in using it. I realized that I prefer my familiar editors, and learning a little LaTeX seemed to be easier than trying to puzzle out the workings of an undocumented suite of programs. This may sound pretty negative, and I don't wish to denigrate the efforts of the StarDivision team responsible for the Linux porting project. If you're a StarOffice user happy with the suite (especially if you speak German and therefore can read the docs) and would like to present a dissenting view, write a piece on it for the Gazette!

Two other commercial word-processors for Linux are Applix and WordPerfect. Applix, available from RedHat, has received favorable reviews from many Linux users.

A company called SDCorp in Utah has ported Corel's WordPerfect 7 to Linux, and a (huge!) demo is available now from both the SDCorp ftp site and Corel's. Unfortunately both FTP servers are unable to resume interrupted downloads (usually indicating an NT server) so the CDROM version, available from the SDCorp website, is probably the way to go, if you'd like to try it out. The demo can be transformed into a registered program by paying for it, in which case a key is e-mailed to you which registers the program, but only for the machine it is installed on.

Addendum: I recently had an exchange of e-mail with Brad Caldwell, product manager for the SDCorp WordPerfect port. I complained about the difficulty of downloading the 36 mb. demo, and a couple of days later I was informed that the file has been split into nine parts, and that they were investigating the possibility of changing to an FTP server which supports interrupted downloads. The smaller files are available from this web page.


There exists a curious dichotomous attitude these days in the Linux community. I assume most people involved with Linux would like the operating system to gain more users and perhaps move a little closer to the mainstream. Linux advocates bemoan the relative lack of "productivity apps" for Linux, which would make the OS more acceptable in corporate or business environments. But how many of these advocates would use the applications if they were more common? Often the change of mindset discussed above mitigates against acceptance of Windows-like programs, with no source code available and limited access to the developers. Linux has strong roots in the GNU and free software movements (not always synonymous) and this background might be a barrier towards development of a thriving commercial software market.


Copyright © 1997, Larry Ayers
Published in Issue 22 of the Linux Gazette, October 1997


[ TABLE OF CONTENTS ] [ FRONT PAGE ]  Back  Next