|
|
|
Command-line processing doesn't seem like a terribly important subject
for an article, but option handling was often done in an ad-hoc manner
back then. Several people contacted me by phone to request a copy of
getopt.c
on floppy disk and the function was used in a
COBOL
to C translator (Wayback Machine).
The article was probably written in 1989 or 1990. I remember finding a bookstore that had the AT&T System V manuals and studying up, in the store, on the AT&T Command Syntax Standard—an impressive title for such a minimal standard!
I had a companion function, sgetopt()
, that parsed the
command-line options from a string. Our software ran on both UNIX
and VxWorks. From the VxWorks command line, it was simplest to spawn
a program and pass it the command-line arguments in a string,
'sp program, "option(s)"
' , and then use
sgetopt()
to parse the options. In 1992, all this was
superseded by my
opt_util
package which handles full-word (possibly abbreviated) command-line
options as well as single-letter options, and can also break a string
into an argc/argv[]
array of arguments.
UNIX command lines, with all their dots and dashes, sometimes approach Morse code in their unreadability; nevertheless, UNIX's concise method of specifying and passing parameters to programs has proven very user-friendly to frequent UNIX users. Unfortunately, the proliferation of UNIX programmers produced a multitude of command line processing styles.
The AT&T Command Syntax Standard and the Standard C Library function,
getopt(3)
*, attempted to bring some order into this chaos by
providing a consistent, easily-programmed, command line structure. Even
more generalized and powerful command line processing can be achieved by
broadening the command syntax standard and implementing an enhanced,
portable version of getopt(3)
.
(Note: The "3
" ingetopt(3)
indicates the UNIX manual, volume 3 in this case, that documents thegetopt
function. In this article,getopt(3)
refers to the standard UNIX implementation ofgetopt
;getopt()
with no number refers to my enhanced version of the same function.)
The new getopt()
should be useable on any system, UNIX or
non-UNIX, that supports C and the argc
/argv
program interface. It has been successfully compiled and tested on a Sun
workstation (UNIX), an IBM PC RT (AIX), and an IBM PC XT (MS-DOS and
Turbo-C).
The AT&T Command Syntax Standard specifies that commands will adhere to the following basic format:
command_name [options] [other_arguments]
Command_name identifies the program to be executed to the operating system; options and other_arguments are both optional.
Options are introduced by hyphens ("-
") and can be one of two types:
single-letter options and argument options. Single-letter options are
simply that; for example, our C compiler has options "-g
" for debug, "-c
"
for object module generation, etc.
An argument option is a single-letter option followed by exactly one
argument; the argument that follows the option letter is not
optional! Our C compiler also uses argument options, e.g., a "-o
filename
" option to specify an output file name and a
"-Ddefinition
" option that defines symbols for the C
Pre-Processor. The argument may be flush up against the option letter or
separated from it by white space (blanks and tabs); the latter is preferred
(and mandated by the standard).
Options may be grouped after a single hyphen and their order is not important, so
-a -l -e -x -o filename
can be rewritten as "-alex -o filename
", "-xeo
filename -la
", and so forth.
Other_arguments are non-option arguments, i.e., those arguments not
prefixed with a hyphen and not associated with an argument option. (If you
need to specify arguments beginning with hyphens, the special
end-of-options indicator, "--
", can be used to separate
options from other_arguments.) Examples of non-option
arguments include filenames, such as the list of ".o
" object
modules passed to the C compiler:
cc -g -DCPU=4004 -o prog main.o func1.o func2.o func3.o ...
and positional arguments, such as the source and destination files in a copy command:
cp source destination
getopt(3)
The Standard C Library function, getopt(3)
, provides a simple
mechanism for processing command line options. The inputs to
getopt(3)
are the argc
count of command line
arguments, the argv
array of arguments, and an options string.
The options string is composed of all the legal options recognized by a
program; a colon (":
") following an option letter indicates an
argument option that expects an argument. The grouping/ordering example
given earlier, "-alex -o filename
", would be coded as
"aelo:x
"; "o
" expects a filename and the others
are single-letter options.
Each call to getopt(3)
returns the next option letter from the
command line, the index of the current argument (in global variable
optind
), and, for argument options, the option's argument (in
global variable optarg
). '?' is the option letter returned in
the case of an illegal option; -1 is returned when there are no more
options.
Programs that follow the AT&T Command Syntax Standard
typically scan the command line options using getopt(3)
and then
manually increment optind
through the non-option arguments that
remain:
while ((option = getopt (argc, argv, optstring)) != -1) {
switch (option) {
case 'a': ... ; break ; /* Options */
case 'b': ... ; break ;
...
case '?': ... error ...
default: break ;
}
}
while (optind < argc) { /* Non-option arguments */
... process argv[optind++]
...
}
While the manual pages for getopt(3)
don't explicitly
reference the AT&T Command Syntax Standard, the usage notes constrain the
programmer to complying with the standard and limit the utility of the
function. The restrictions imposed by getopt(3)
's major
shortcoming, the inability to manipulate global variable
optind
, point up the need for relaxing the command standard
and writing a new, enhanced getopt()
.
The effect of modifying optind
in between calls to
getopt(3)
is implementation-dependent and such changes are
strongly discouraged. Consequently, the programmer has little freedom in
altering getopt(3)
's argument scan. In particular, the user
cannot alternate options and non-option arguments in a command and a
program cannot make multiple passes over its arguments when processing a
command line.
Allowing the user to mix options and arguments on the command line adds to the user-friendliness of a program. In the simplest case, it lets you easily recall a previous command and append a forgotten option. I frequently type in an option-laden command line to compile a program,
cc -g -I/usr/alex/prog/include -o prog prog.c
only to realize that I forgot a "-I
" option for the project's
include directory. Fortunately, our C compiler never heard of AT&T's
Command Syntax Standard, so I don't need to retype the whole line. In
UNIX,
!! -I/usr/proj/include
will recall the previous command, append the missing include option, and resubmit the command to the operating system.
A more complex situation occurs with the UNIX ipcrm(1) command.
ipcrm(1) has three options for deleting interprocess communication
(IPC) resources: "-m id
" for shared memories, "-q
id
" for message queues, and "-s id
" for
semaphores. Thanks to its strict adherence to the Command Syntax Standard,
ipcrm(1) requires a lot of awkward typing to clean up a trail of IPC
objects:
ipcrm -m 200 -m 205 -m 307 -q 201 -q 11 -s 430 -s 431 -s 432
An improved ipcrm(1) would also have three options:
"-m
", "-q
", and "-s
". Each option
specifies the type of the zero or more IPC identifiers that follow,
resulting in a concise, easier-to-type command line:
ipcrm -m 200 205 307 -q 201 11 -s 430 431 432
Applications that find it necessary to scan their command lines more than once are probably few and far between. One example that comes to mind is that of a print job spooler. A novelist, faced with the task of printing out 10 copies of his/her 30-chapter, 30-file book, would waste little time choosing between: (i) a print command that scanned the command line once, making 10 copies of each individual file it encountered, and (ii) a print command that scanned its command line 10 times to print out 10 collated copies of the entire set of files.
getopt()
Writing a new and improved version of getopt(3)
reinforced two
basic lessons of programming. First, a given function is usually less
trivial than it first appears to be. Coding up the enhanced
getopt()
was not difficult, but close attention had to be paid
to detail. Second, hindsight is an unflagging source of "better" ideas.
Suggestions for improvement kept cropping up while programming
getopt()
, but things always look easier the second time
around. Furthermore, compatibility considerations limited the extent of
any changes.
The new getopt()
is fully-compatible
with the original getopt(3)
, so no changes to existing
software are required. New applications can take advantage of the
getopt.h
header file, which provides the
external definitions for getopt()
and its global variables.
Also defined is a constant, NONOPT
, for the non-option or
end-of-options flag; this value is hardcoded in getopt(3)
as
-1.
The inputs to getopt()
are identical to those of
getopt(3)
; changes to global variable optind
,
however, will advance or reset the argument scan. The outputs of
getopt()
are functionally equivalent to those of
getopt(3)
, although global variable optarg
has
acquired some additional meanings in certain cases. Table 1 compares the
outputs of getopt(3)
and getopt()
.
Optind
, not shown, indexes the current command line argument
in both versions of getopt
.
Table 1: Outputs ofgetopt(3)
andgetopt()
getopt(3) getopt()
option optarg option optarg Interpretation
letter letter NULL Single-letter option letter string letter string Option plus its argument '?' '?' error Illegal option/missing argument -1 NONOPT string Non-option argument -1 NONOPT NULL Command line scan completed
In the case of the question mark ('?') option, getopt()
's
optarg
returns the trailing portion of the command line
argument that contains the offending option. For example, if illegal
option 'Q' is detected in "prog -abQcde
",
getopt()
returns '?' with optarg
set to
"Qcde
".
The traditional getopt(3)
approach to
command line processing handled options and non-options in separate
sections of code. The new method of command line processing uses the
enhanced getopt()
to scan the entire command line:
while ((option = getopt (argc, argv, optstring)) != NONOPT) ||
(optarg != NULL)) {
switch (option) {
case 'a': ...
case 'b': ...
...
case '?': ... error ...
case NONOPT: ... process optarg
...
default: break ;
}
}
Thanks to the expanded role assumed by optarg
, options and
non-options alike can be processed within a single loop. A new
NONOPT
case in the switch statement picks up the non-option
arguments returned in optarg
. As an added bonus,
optind
is now only a vestige of the former
getopt(3)
; unless you're doing multi-pass command line
processing, optind
can be dispensed with.
Using getopt()
is fairly simple the first time and extremely
simple afterwards - just cut and paste your original "template" and delete
or add the appropriate options. To start you off,
Listing 5 contains the command line processing code
from ffc (Format File in Columns), a program that outputs one or
more files in multiple columns. ffc is invoked as follows:
ffc [-c num] [-d] [-h num] [-l num] [-o output_file] [-p] [input_file(s)]
The meanings of the options are explained in the prolog in Listing 5.
Options "-c
", "-h
", and "-l
" each
take a numeric argument; library function atoi(3)
performs the
text-to-integer conversion on the option argument returned in
optarg
. The "-o
" option expects a file name; the
character string pointer returned in optarg
is simply saved in
a local variable. "-d
" and "-p
" are switches
that set boolean flags in the program. The non-option arguments are the
input files; each file name encountered is added to the list of files to be
processed. Note that the various page parameters, the flags, and the file
table have to be properly initialized (e.g., in the variable declarations)
before the command line is scanned.
Computer users and programmers alike should be indebted to the UNIX
designers for developing an easy-to-use and easy-to- program command line
interface. The enhancements suggested in this article detract in no way
from the simplicity and power of the original Command Syntax Standard and
getopt(3)
. They do, however, provide a portable,
well-defined, command line processing function and, if you choose to use
it, a more user-oriented command line syntax.
getopt()
/************************************************************************** getopt () Function GETOPT gets the next option letter from the command line. GETOPT is an enhanced version of the C Library function, GETOPT(3). Invocation: option = getopt (argc, argv, optstring) ; where <argc> is the number of arguments in the argument value array. <argv> is the argument value array, i.e., an array of pointers to the "words" extracted from the command line. <optstring> is the set of recognized options. Each character in the string is a legal option; any other character encountered as an option in the command line is an illegal option and an error message is displayed. If a character is followed by a colon in OPTSTRING, the option expects an argument. <option> returns the next option letter from the command line. If the option expects an argument, OPTARG is set to point to the argument. '?' is returned in the cases of an illegal option letter or a missing option argument. Constant NONOPT is returned if a non-option argument is encountered or the command line scan is completed (also see OPTARG below for both cases). Public Variables: OPTARG - returns the text of an option's argument or of a non-option argument. NULL is returned if an option has no argument or if the command line scan is complete. For illegal options or missing option arguments, OPTARG returns a pointer to the trailing portion of the defective ARGV. OPTERR - controls whether or not GETOPT prints out an error message upon detecting an illegal option or a missing option argument. A non-zero value enables error messages; zero disables them. OPTIND - is the index in ARGV of the command line argument that GETOPT will examine next. GETOPT recognizes changes to this variable. Arguments can be skipped by incrementing OPTIND outside of GETOPT and the command line scan can be restarted by resetting OPTIND to either 0 or 1. **************************************************************************/ #include <stdio.h> /* Standard I/O definitions. */ #define USE_INDEX 0 /* Set to 1 if your C Library uses "index" instead of "strchr". */ #if USE_INDEX # include <strings.h> /* C Library string functions. */ # define strchr index #else # include <string.h> /* C Library string functions. */ #endif #include "getopt.h" /* GETOPT(3) definitions. */ /* Public variables. */ char *optarg = NULL ; int opterr = -1 ; int optind = 0 ; /* Private variables. */ static int end_optind = 0 ; static int last_optind = 0 ; static int offset_in_group = 1 ; int getopt (argc, argv, optstring) int argc ; char **argv ; char *optstring ; { /* Local variables. */ char *group, option, *s ; /* Did the caller restart or advance the scan by modifying OPTIND? */ if (optind <= 0) { end_optind = 0 ; last_optind = 0 ; optind = 1 ; } if (optind != last_optind) offset_in_group = 1 ; /************************************************************************** Scan the command line and return the next option or, if none, the next non-option argument. At the start of each loop iteration, OPTIND is the index of the command line argument currently under examination and OFFSET_IN_GROUP is the offset within the current ARGV string of the next option (i.e., to be examined in this iteration). **************************************************************************/ for (option = ' ', optarg = NULL ; optind < argc ; optind++, offset_in_group = 1, option = ' ') { group = argv[optind] ; /* Is this a non-option argument? If it is and it's the same one GETOPT returned on the last call, then loop and try the next command line argument. If it's a new, non-option argument, then return the argument to the calling routine. */ if ((group[0] != '-') || ((end_optind > 0) && (optind > end_optind))) { if (optind == last_optind) continue ; optarg = group ; /* Return NONOPT and argument. */ break ; } /* Are we at the end of the current options group? If so, loop and try the next command line argument. */ if (offset_in_group >= strlen (group)) continue ; /* If the current option is the end-of-options indicator, remember its position and move on to the next command line argument. */ option = group[offset_in_group++] ; if (option == '-') { end_optind = optind ; /* Mark end-of-options position. */ continue ; } /* If the current option is an illegal option, print an error message and return '?' to the calling routine. */ s = strchr (optstring, option) ; if (s == NULL) { if (opterr) (void) fprintf (stderr, "%s: illegal option -- %c\n", argv[0], option) ; option = '?' ; optarg = &group[offset_in_group-1] ; break ; } /* Does the option expect an argument? If yes, return the option and its argument to the calling routine. The option's argument may be flush up against the option (i.e., the argument is the remainder of the current ARGV) or it may be separated from the option by white space (i.e., the argument is the whole of the next ARGV). */ if (*++s == ':') { if (offset_in_group < strlen (group)) { optarg = &group[offset_in_group] ; offset_in_group = strlen (group) ; } else { if ((++optind < argc) && (*argv[optind] != '-')) { optarg = argv[optind] ; } else { if (opterr) (void) fprintf (stderr, "%s: option requires an argument -- %c\n", argv[0], option) ; option = '?' ; optarg = &group[offset_in_group-1] ; offset_in_group = 1 ; } } break ; } /* It must be a single-letter option without an argument. */ break ; } /* Return the option and (optionally) its argument. */ last_optind = optind ; return ((option == ' ') ? NONOPT : (int) option) ; }
getopt.h
#ifndef getopt_h_DEFINED #define getopt_h_DEFINED /************************************************************************** This INCLUDE file contains the external definitions for the GETOPT(3) function and its global variables. **************************************************************************/ extern int getopt () ; /* Function to get command line options. */ extern char *optarg ; /* Set by GETOPT for options expecting arguments. */ extern int optind ; /* Set by GETOPT: index of next ARGV to be processed. */ extern int opterr ; /* Disable (== 0) or enable (!= 0) error messages written to standard error. */ #define NONOPT (-1) /* Non-Option - returned by GETOPT when it encounters a non-option argument. */ #endif getopt_h_DEFINED
/************************************************************************** ffc.c Format File in Columns. Invocation: % ffc [-c num] [-d] [-h num] [-l num] [-o output_file] [-p] [input_file(s)] where "-c num" specifies the number of columns (default = 2). "-d" turns debug output on. "-h num" specifies the number of blank lines at the top of each page (default = 0). "-l num" specifies the number of lines per page (default = 66). "-o output_file" specifies the name of the output file (default = standard output). "-p" invokes page numbering on output. input_file(s) are the files to be read and formatted in columns on output (default = standard input). Compilation: The program should be compiled and linked with the GETOPT function: % cc ffc.c getopt.c -o ffc **************************************************************************/ #include <stdio.h> /* Standard I/O definitions. */ #include "getopt.h" /* GETOPT(3) definitions. */ /* List of input file names. */ #define MAX_FILES 1024 static char *file_table[MAX_FILES] ; static int num_input_files = 0 ; /* Page dimensions, etc. */ static int debug = 0 ; /* 0 = no, -1 = yes. */ static int num_columns = 2 ; static int num_header_lines = 0 ; static int page_length = 66 ; static int page_numbering = 0 ; /* 0 = no, -1 = yes. */ main (argc, argv) int argc ; char *argv[] ; { /* Local variables. */ char *output_file = NULL ; int errflg, option ; /* Scan the command line arguments. */ errflg = 0 ; while (((option = getopt (argc, argv, "c:dh:l:o:p")) != NONOPT) || (optarg != NULL)) { switch (option) { case 'c': num_columns = atoi (optarg) ; break ; case 'd': debug = -1 ; break ; case 'h': num_header_lines = atoi (optarg) ; break ; case 'l': page_length = atoi (optarg) ; break ; case 'o': output_file = optarg ; break ; case 'p': page_numbering = -1 ; break ; case '?': errflg++ ; break ; case NONOPT: if (num_input_files < MAX_FILES) { file_table[num_input_files++] = optarg ; } break ; default : break ; } } /* If an invalid option was detected, print out a command usage message. */ if (errflg) { fprintf (stderr, "Usage: ffc [-c num] [-d] [-h num]\n") ; fprintf (stderr, " [-l num] [-o output_file]\n") ; fprintf (stderr, " [-p] [input_file(s)]\n") ; exit (-1) ; } /* Print out the files in multiple columns. */ ... the remainder of the program ... }