|
25 Years of Programming
An open source source for C, C++, OWL, BASIC, MDB, XLS, DOT, and more... |
Home Projects Up Sitemap Search Blog Forum+Chat About Us Privacy Terms of Use Feedback FAQ Images Services Payments Humor Music |
Use Perl one-line scripts to transform text data filesIn the early 1980's, Kim Carnes's Bette Davis Eyes and Kool and The Gang's Celebration played on the radio, popular TV shows included Magnum P.I and Hill Street Blues, and if you bought a "microcomputer" (the term PC didn't exist yet), you were almost certainly the first person on your block to own one. There wasn't much to do with your new microcomputer unless you knew a bit about programming. The popular language was BASIC. There were books with programs for you to type into your microcomputer yourself, and you couldn't help but learn something about programming while you did it. Writing a short utility program to do a simple transformation of text in a file was a common part of daily computer use. Computer users today still don't have to become very advanced before they start encountering situations where they have a file that isn't in the format they need, but today's Graphical User Interface (GUI) operating systems and programs don't always make the needed transformations easy. Either the file is too big for the GUI program or the desired transformation involves an absurd number of selects/mouseclicks/drags/dialog box selections that, in addition to being cumbersome, don't facilitate easy automation if you want to do the task more than once. A short custom program that you write yourself is still usually the fastest, easiest, and most flexible solution. As an example of something done much more easily by a small program then by GUI, assume we have a data file containing 5 columns of comma-delimited floating point numbers: 1.1,2.2,3.3,4.4,5.5 1.1,2.2,3.3,4.4,5.5 1.1,2.2,3.3,4.4,5.5 1.1,2.2,3.3,4.4,5.5 1.1,2.2,3.3,4.4,5.5 For some purpose, we want the data in a different format, with Column B omitted and Columns C and D exchanged. BASICIt was very easy to do this in BASIC. Anyone with even a small amount of BASIC knowledge would easily be able to write and run this program: 10 OPEN "I", 1, "INFILE" 20 OPEN "O", 2, "OUTFILE" 30 WHILE NOT EOF(1) 40 INPUT #1, A, B, C, D, E 50 WRITE #2, A, D, C, E 60 WEND 70 CLOSE CLater, the C language gained popularity. The equivalent program in C is still short but not nearly so simple. In addition, the user must have a C compiler and know how to use it. For someone who works with C regularly, this is a trivial program, but it's beyond the ability of most average or even above-average non-programmer computer users. If this is the only method available to them, they'll probably resort to a GUI program instead, no matter how many steps it takes in the GUI. Lots of people were willing to learn BASIC to do a task like this, but not many are willing to learn C: #include <stdio.h>
int main(int argc, char **argv)
{
double a, b, c, d, e;
FILE *infile = fopen("INFILE","r");
FILE *outfile = fopen("OUTFILE","w");
while(fscanf(infile,"%lf,%lf,%lf,%lf,%lf\n",&a,&b,&c,&d,&e) == 5)
fprintf(outfile,"%lf,%lf,%lf,%lf\n",a,d,c,e);
fclose(infile);
fclose(outfile);
return(0);
}
C++In C++, the stream extraction and insertion operators appear to make this code simpler, but really they don't, or at least not much, and this code is not as easy to customize later to do a similar but not identical task. For this job, even in C++, I'd be more inclined to use the C-style fscanf() method from above. In any event, C++ has an intimidating reputation, and the average computer user confronted by this task and a suggestion that they could use C++ for it is more likely to run away screaming: #include "stdafx.h"
#include <fstream>
using namespace std;
int main(int argc, char **argv)
{
double a, b, c, d, e;
char comma;
ifstream infile("INFILE");
ofstream outfile("OUTFILE");
while(infile >> a >> comma >> b >> comma >> c >> comma >> d >> comma >> e)
outfile << a << "," << d << "," << c << "," << e << endl;
return(0);
}
PerlPerl's reputation is probably just as intimidating as C++, but for different reasons. While C++ is thought of as a giant sledgehammer of a language, the domain of professional programmers creating big applications for big software companies, Perl suffers from its mystique as the inscrutably arcane language in which ultra-geeks and even Linux users script magical incantations to perform their miracles. With every Perl feature seeming to have an associated quirky idiosyncrasy known only to the fully initiated, and with its sometimes weird syntax and optionally weird formatting, it is easy for a newcomer overwhelmed by strangeness to overlook the significance of one of Perl's slogans: "Easy things should be easy and hard things should be possible". Perl does an exceedingly good job of making easy things easy. That's what some of its quirks are especially for. This is the Perl code for our text transformation task. It consists of one line typed at the console, making it even easier in Perl than it was in BASIC: perl -WTne "chomp; my($a,$b,$c,$d,$e)=split(/,/); print(\"$a,$d,$c,$e\n\");" < INFILE > OUTFILE One of the Perl quirks that makes this so simple is the -n switch that automatically puts our program inside a loop. It is executed as though it were this: while(<>) # for each line from the input file or from STDIN
{
# remove the trailing /n so it doesn't become part of $e
chomp;
# split the items into a list (split at each comma, discarding the commas)
my($a,$b,$c,$d,$e) = split(/,/);
# print in new order with commas between
print("$a,$d,$c,$e\n");
}
This code is more easily modified than any of the others for situations where the separator is something other than just a comma. ConclusionAlthough today's typical computer user doesn't have the same inclination toward doing their own programming that was so common years ago, there are still situations where a small amount of simple programming can get a job done faster and easier than it can be done any other way. That used to be common knowledge, but it seems to be less so today. The reason is not so much the death of BASIC because it didn't die and you can still get it. Rather, it's that there is no simple language, BASIC or anything else, provided as standard equipment on all computers for people to absorb, have frequent contact with, and gradually get used to. To do any programming today, you must choose a language. That can be a daunting challenge even for someone who's already a programmer, let alone someone who just wants to learn and do a bit of programming but doesn't know where to start. In spite of its complexities, Perl has some qualities that make it a workable choice as someone's first language for doing simple utility tasks:
When you need to do something beyond the abilities of your GUI programs, you have more available options than abandoning your project in dismay or submitting a "feature request" to the GUI program's maker. Just a small amount of programming might get the job done, and Perl is a language that can probably do it. |
|
|
|
|
|
Copyright ©2011 Steven Whitney. Last modified Sun 04/24/2011 11:17:43 -0700. |
||