|
|
|
This program was written under Unix back around 1990, so the C language being parsed is out-of-date with the present-day C Standard. I ported the program to VMS in 1992 and, in 1996, I attempted to tweak it to handle some problematic C code, but without much success. The program always worked well on my code, but I ran into serious problems with code that was rife with declarations such as the following:
struct abcdef { ... } ; typedef struct abcdef abcdef ; struct ghijkl { ... abcdef abcdef ; ... } ; |
What should the lexer return abcdef
as? When you
get to the declaration of struct ghijkl
, the first
instance of abcdef
is a type name and the second
instance is an identifier (field name). And this dilemma isn't
even remotely as difficult as attempting to parse
typedef
ed function signatures. To correctly parse C,
the lexer really needs access to a full symbol table that includes
lexicograpical scoping for symbols.
In short, I haven't used this program in a long time.
npath computes various measures of software complexity for C Language source files. For each of the input C source files, npath pipes the source file through the C Preprocessor, cpp(1). The output from cpp(1) is input by npath, parsed, and the complexity statistics computed. For example, the statistics for one source file look as follows:
NCSL Volume V(G) NPATH CLOC ---- ------ ------ ----- ------ LIBALEX.C;1: date_and_time 3 186 1 1 0.3 str_detab 21 991 9 120 5.7 str_dupl 21 883 7 54 2.6 str_etoa 5 5008 3 4 0.8 str_free 12 395 5 6 0.5 str_index 9 404 5 16 1.8 str_insert 26 1158 11 864 33.2 str_lcat 6 264 3 4 0.7 str_lcopy 11 460 4 8 0.7 str_lowcase 6 279 4 8 1.3 str_upcase 6 279 4 8 1.3 str_match 6 208 3 3 0.5 str_trim 9 362 4 20 2.2 Summary - # of files: 1, # of modules: 13, # of NCSL: 141
Underneath the file name (a VMS file name, in this case) is a list of all the functions defined in the file. Following each function name are the following complexity figures:
After all the input source files have been processed, npath outputs a summary line totaling the number of files processed, the modules, and the lines of code.
NOTE that measurements are only made inside function bodies. Declarations outside the body of a function are not included in the counts.
"NCSL" is actually a misnomer for the first column of figures. npath only counts the number of statements (excluding declarations) in the body of the function. This is true for all the metrics measured by npath.
Halstead's Software Science metric is based on the number of operators and operands in a function. The length of a function is
total number of operators + total number of operands
The vocabulary of the function is
number of unique operators + number of unique operands
And, lastly, the volume of the function is defined as
length * log2 (vocabulary)
The sticky thing about the Software Science metric is deciding which things in a language are operators and which things are operands. npath uses the following conventions for the C language:
Operators --------- break case continue default do else for goto if return sizeof switch while function call (Counts as one operator.) {} () [] (Each pair counts as one operator.) >>= <<= += -= *= /= %= &= ^= |= >> << ++ -- -> && || <= >= == != ; , : = . & ! ~ - + * / % < > ^ | ? Operands -------- Identifiers Numbers Characters ('x') Strings ("...")
Note that function calls get counted twice, both as an operator and as an operand (because of the identifier). It looked too difficult to do one and not the other. By the time the parser knows its dealing with a function call, it's not completely clear what the function name is—remember, the stuff preceding the left parenthesis might be a complicated expression that produces a function pointer.
McCabe's cyclomatic complexity metric, V(G), is basically the number of conditional statements (plus one) in a function. npath counts the following statements when calculating V(G):
case default if while do for
Renaud suggests steering clear of routines with a V(G) greater than 10.
The NPATH metric computes the number of possible execution paths through a
function. It takes into account the nesting of conditional statements and
multi-part boolean expressions (e.g., A && B
,
C || D
, etc.). Nejmeh says that
his group had an informal NPATH limit of 200 on individual routines;
functions that exceeded this value were candidates for further
decomposition—or at least merited a closer look.
The CLOC metric is simply a function's NPATH number divided by the number of executable statements (NCSL) in the function; i.e., it measures the complexity per line of code in the function. Lower values of CLOC can mean one of two things: you write very clear code or your code doesn't do much of anything. Higher values of CLOC can also mean one of two things: your code has lots of important things to do or you program in a very obtuse manner!
... and various other articles I can't recall. Nejmeh's NPATH article provided the inspiration for this program; see the article for more information about how the NPATH metric measures up to the others. Salt's Software Science article provides details on measuring the complexity of Pascal programs using Halstead's metric. I haven't read any of the original literature by Halstead or McCabe; my understanding of their metrics is derived from the different articles on metrics that I've read (mostly in SIGPLAN and CACM).
TYPE_NAME
tokens for type names and
not IDENTIFIER
tokens. npath appears
to handle these situations correctly. The "comp.compilers" USENET
group had a discussion about how hard it is parse C without writing
a full-blown compiler - they were right!
comp.sources.unix
") after writing my
program. This collection of tools measures delivered source instructions
(like my NCSL), McCabe's metric, and Halstead's software science metric.
McCabe's metric is measured by an awk(1) script, Halstead's by
a lex(1) program. While my C parser-based npath program
might seem more sophisticated than Renaud's tools, Renaud, unlike me,
knows what he's talking about when it comes to metrics! Using code
from a large project, Renaud studied the relationships between the
different metrics and the maintenance history of the code.
CC/PREPROCESS
is done by a version of
popen(3)
that I wrote for VMS.
-cflow
" command line option generates a
cflow(1)-style, textual structure chart.
% npath [-D...] [-I...] [-U...] [-nostdinc] [-cflow] [-cpp] [-c++] [-debug] [-echo] [-exclude file] [-full] [-long] [-longer] [-nocpp] [-noheading] [-npath_debug] [-verbose] [-verify level] [-vperror] [-yacc_debug] source_file(s)
where
-D...
-I...
-U...
-nostdinc
-cflow
-cpp
/lib/cpp
under UNIX and
CC/PREPROCESS_ONLY=SYS$OUTPUT
under VMS.
-c++
-debug
-echo
-exclude
-full
-long
-longer
-long
option, except that the columns of
numbers are shifted two tab stops to the right.
-nocpp
-noheading
-npath_debug
-verbose
-vperror
vperror()
message output on.
vperror()
messages are low-level error messages generated
by libgpl functions;
normally, they are disabled. If enabled, the messages are output to
stderr.
npath's source distribution contains no README file—yet!—and a single Makefile for SunOS 4.1.3.
npath.tgz