npath - C Source Complexity Measures

This program was written under Unix back around 1990, so the C language being parsed is out-of-date with the present-day C Standard. I ported the program to VMS in 1992 and, in 1996, I attempted to tweak it to handle some problematic C code, but without much success. The program always worked well on my code, but I ran into serious problems with code that was rife with declarations such as the following:

struct  abcdef {
    ...
} ;
typedef  struct  abcdef  abcdef ;
struct  ghijkl {
    ...
    abcdef  abcdef ;
    ...
} ;

What should the lexer return abcdef as? When you get to the declaration of struct ghijkl, the first instance of abcdef is a type name and the second instance is an identifier (field name). And this dilemma isn't even remotely as difficult as attempting to parse typedefed function signatures. To correctly parse C, the lexer really needs access to a full symbol table that includes lexicograpical scoping for symbols.

In short, I haven't used this program in a long time.

npath computes various measures of software complexity for C Language source files. For each of the input C source files, npath pipes the source file through the C Preprocessor, cpp(1). The output from cpp(1) is input by npath, parsed, and the complexity statistics computed. For example, the statistics for one source file look as follows:

                              NCSL   Volume    V(G)   NPATH   CLOC
                              ----   ------   ------  -----  ------

    LIBALEX.C;1:
        date_and_time           3      186        1      1     0.3
        str_detab              21      991        9    120     5.7
        str_dupl               21      883        7     54     2.6
        str_etoa                5     5008        3      4     0.8
        str_free               12      395        5      6     0.5
        str_index               9      404        5     16     1.8
        str_insert             26     1158       11    864    33.2
        str_lcat                6      264        3      4     0.7
        str_lcopy              11      460        4      8     0.7
        str_lowcase             6      279        4      8     1.3
        str_upcase              6      279        4      8     1.3
        str_match               6      208        3      3     0.5
        str_trim                9      362        4     20     2.2

    Summary - # of files: 1, # of modules: 13, # of NCSL: 141

Underneath the file name (a VMS file name, in this case) is a list of all the functions defined in the file. Following each function name are the following complexity figures:

NCSL - is the number of non-comment source lines.

VOLUME - is Halstead's volume measure derived from his Software Science metric.

V(G) - is McCabe's cyclomatic complexity measure.

NPATH - is the NPATH measure.

CLOC - is the NPATH complexity per line-of-code.

After all the input source files have been processed, npath outputs a summary line totaling the number of files processed, the modules, and the lines of code.

NOTE that measurements are only made inside function bodies. Declarations outside the body of a function are not included in the counts.

NCSL

"NCSL" is actually a misnomer for the first column of figures. npath only counts the number of statements (excluding declarations) in the body of the function. This is true for all the metrics measured by npath.

Volume

Halstead's Software Science metric is based on the number of operators and operands in a function. The length of a function is

    total number of operators + total number of operands

The vocabulary of the function is

    number of unique operators + number of unique operands

And, lastly, the volume of the function is defined as

    length * log2 (vocabulary)

The sticky thing about the Software Science metric is deciding which things in a language are operators and which things are operands. npath uses the following conventions for the C language:

    Operators
    ---------

    break  case  continue  default  do  else  for
    goto  if  return  sizeof  switch  while

    function call           (Counts as one operator.)

    {} () []                (Each pair counts as one operator.)

    >>=  <<=  +=  -=  *=  /=  %=  &=  ^=  |=  >>  <<
    ++  --  ->  &&  ||  <=  >=  ==  !=  ;  ,  :  =  .
    &  !  ~ -  +  *  /  %  <  >  ^  |  ?

    Operands
    --------

    Identifiers  Numbers  Characters ('x')  Strings ("...")

Note that function calls get counted twice, both as an operator and as an operand (because of the identifier). It looked too difficult to do one and not the other. By the time the parser knows its dealing with a function call, it's not completely clear what the function name is—remember, the stuff preceding the left parenthesis might be a complicated expression that produces a function pointer.

V(G)

McCabe's cyclomatic complexity metric, V(G), is basically the number of conditional statements (plus one) in a function. npath counts the following statements when calculating V(G):

    case  default  if
    while  do  for

Renaud suggests steering clear of routines with a V(G) greater than 10.

NPATH

The NPATH metric computes the number of possible execution paths through a function. It takes into account the nesting of conditional statements and multi-part boolean expressions (e.g., A && B, C || D, etc.). Nejmeh says that his group had an informal NPATH limit of 200 on individual routines; functions that exceeded this value were candidates for further decomposition—or at least merited a closer look.

CLOC

The CLOC metric is simply a function's NPATH number divided by the number of executable statements (NCSL) in the function; i.e., it measures the complexity per line of code in the function. Lower values of CLOC can mean one of two things: you write very clear code or your code doesn't do much of anything. Higher values of CLOC can also mean one of two things: your code has lots of important things to do or you program in a very obtuse manner!

References

Norman Salt, "Defining Software Science Counting Strategies", ACM SIGPLAN Notices, March 1982.

Brian Nejmeh, "NPATH: A Measure of Execution Path Complexity and its Applications", in Communications of the ACM, Februrary 1988.

... and various other articles I can't recall. Nejmeh's NPATH article provided the inspiration for this program; see the article for more information about how the NPATH metric measures up to the others. Salt's Software Science article provides details on measuring the complexity of Pascal programs using Halstead's metric. I haven't read any of the original literature by Halstead or McCabe; my understanding of their metrics is derived from the different articles on metrics that I've read (mostly in SIGPLAN and CACM).

Notes

The npath program uses an ANSI C grammar and lexer posted on the Internet by Jeff Lee of Georgia Tech. Thank you, Jeff.

I modified the lexer to read C Preprocessor output (as well as to help collect complexity statistics). I added VAX C extensions to the grammar. I had to do some finagling with the grammar and lexer to get them to accept type names used as identifiers. C allows you to use a type name as a variable name or as the name of a field in a structure. This was a royal headache, since the lexer would prefer to return TYPE_NAME tokens for type names and not IDENTIFIER tokens. npath appears to handle these situations correctly. The "comp.compilers" USENET group had a discussion about how hard it is parse C without writing a full-blown compiler - they were right!

I found out about Brian Renaud's METRIC suite (in Volume 20 of "comp.sources.unix") after writing my program. This collection of tools measures delivered source instructions (like my NCSL), McCabe's metric, and Halstead's software science metric. McCabe's metric is measured by an awk(1) script, Halstead's by a lex(1) program. While my C parser-based npath program might seem more sophisticated than Renaud's tools, Renaud, unlike me, knows what he's talking about when it comes to metrics! Using code from a large project, Renaud studied the relationships between the different metrics and the maintenance history of the code.

My selection of operators and operands in the C language is based on Salt's Pascal selection; a later article by someone else in Computer Language magazine made a similar selection for Pascal. Designating statement terminators (";"), parentheses ("()"), braces ("{}"), etc. as "operators" seems a little strange, but Halstead himself apparently considered parentheses as operators. Renaud's METRIC suite uses a different selection of operators; e.g., he doesn't count statement terminators or braces.

The program has been compiled, linked, and run under UNIX (SunOS and HP/UX) and under VMS. On the UNIX workstations, I used the standard LEX and YACC tools. On the VAX, I used Bison for the parser and both FLEX and LEX for the lexer.

The program is not speedy on VAXes, which is largely due to the overhead in spawning the C compiler (in preprocessor mode) for each input file. The spawning of CC/PREPROCESS is done by a version of popen(3) that I wrote for VMS.

The program is not speedy on UNIX when processing X Windows-based code. There ought to be a law against the Xt and Motif header files!

The "-cflow" command line option generates a cflow(1)-style, textual structure chart.

Invocation:

    % npath [-D...] [-I...] [-U...] [-nostdinc]
            [-cflow] [-cpp ] [-c++] [-debug]
            [-echo] [-exclude file] [-full]
            [-long] [-longer]
            [-nocpp] [-noheading] [-npath_debug]
            [-verbose] [-verify level] [-vperror] [-yacc_debug]
            source_file(s)

where

-D...
-I...
-U...
-nostdinc: are C Preprocessor, cpp(1), options. These are passed on to cpp.
-cflow: causes npath to construct a calling hierarchy from the source files and to write a cflow(1)-style structure chart to stdout.
-cpp: specifies the pathname of the C Preprocessor. By default, this is /lib/cpp under UNIX and CC/PREPROCESS_ONLY=SYS$OUTPUT under VMS.
-c++: causes npath to recognize certain C++ keywords ...
-debug: enables debug output (written to stdout).
-echo: causes the C source being scanned by the parser to be echoed to stdout. The C source is actually the preprocessed source output by the C Preprocessor.
-exclude: specifies the name of a file containing a list of functions that should be ignored when generating a structure chart. Commonly-used library routines are candidates for this file.
-full: causes npath to give the full path name for a function's source file when generating a cflow(1)-style structure chart. By default, npath only prints out the file name and extension.
-long: specifies that long module names will be encountered. npath will shift the columns of numbers to the right by one tab stop so that the columns line up (more or less).
-longer: is like the -long option, except that the columns of numbers are shifted two tab stops to the right.
-nocpp: disables the preprocessing of source files. Normally, the source files are run through the C Preprocessor. With the -nocpp option, the source files are read directly.
-noheading: inhibits the output of column headings.
-npath_debug: enables npath-related debug output (written to stdout).
-verbose: enables verbose mode. In this mode, npath displays (to stderr) the name of each file as the file is processed. This is useful when you have redirected the module/statistics output (stdout) to a file.
-vperror: turns vperror() message output on. vperror() messages are low-level error messages generated by libgpl functions; normally, they are disabled. If enabled, the messages are output to stderr.

Source Code

npath's source distribution contains no README file—yet!—and a single Makefile for SunOS 4.1.3.

npath.tgz

Alex Measday / E-mail