Faster than Run-Time

An Optimized ASCII-to-Binary
Floating Point Conversion Routine

Published in The DEC Professional, March 1985

The Thematic Mapper image processing subsystem (TIPS) of the LANDSAT-4/5 remote sensing satellite ground station is responsible for the radiometric and geometric correction of the satellite imagery.

The performance of these tasks requires large amounts of supplementary data extracted from the databases of the mission management facility (MMF) of the ground station. The data, resident on the MMF's DEC-2060 computer, is transmitted to one of two TIPS VAX-11/780 computers via a DECnet link or by magnetic tape (see Figure 1). Because the DEC-2060 is a 36-bit machine and the VAX-11/780 is a 32-bit machine, the binary representations of numbers on the two types of computers are not compatible. All numerical data, therefore, is transferred over the network in ASCII format: MMF converts the binary numbers to ASCII text prior to the transfer; TIPS converts the ASCII text to binary numbers after the transfer.

Figure 1: MMF-TIPS System

             .----------.
             !   MMF    !
             ! DEC-2060 !
             `----------'
                 /  \
                /    \
              DECnet/Tape
              /        \
             /          \
.------------.          .------------.
!   TIPS-1   !          !   TIPS-2   !
! VAX-11/780 !          ! VAX-11/780 !
`------------'          `------------'

The quantity of data (see Table I) being moved made the file conversion process (incorporation) on the TIPS computers extremely slow, requiring 1.2 minutes of CPU time per scene (the wall-clock time could be considerably worse, depending on other system activity). While only an annoyance with our present throughput of 12 scenes a day, this time factor would be unacceptable when we reach our 100 scenes per day capability by 1985, at which time incorporation would require 2 hours of CPU time per day simply to prepare the data for use. Speeding up the incorporation program was clearly a necessity.

TABLE I: Data Types, Scopes, and Quantity

A work order is a request for processing of one or more intervals of imagery. An interval consists of one or more scenes. The numbers below are for the incorporation of a single work order containing one interval of 4 scenes.

                            .---------------.--------------.------------.
                            !    Scope      !    Type      !  Quantity  !
                            !---------------+--------------+------------!
                            !               !              !            !
Miscellaneous Parameters    !   Work Order  !   INTEGER    !     3,993  !
 (Primarily radiometric     !               !   REAL*4     !     3,298  !
  correction parameters)    !               !   REAL*8     !         9  !
                            !               !              !            !
Telemetry Data              !   Interval    !   INTEGER    !        96  !
                            !               !   REAL*4     !     3,026  !
                            !               !   REAL*8     !       336  !
                            !               !              !            !
Geometric Correction Data   !   Scene       !   INTEGER    !     4,610  !
                            !               !   REAL*4     !   107,888  !
                            !               !   REAL*8     !     1,516  !
                            !               !              !            !
                            `---------------`--------------+------------!
                                                Total      !   124,772  !
                                                           `------------'

The PROFILE program mentioned in the next paragraph was written by Bob Wilson. I briefly described this incredible program in a 1993 comp.os.vms posting. My understanding from Bob back in the 1980s was that PROFILE attached itself to the VAX's clock interrupt and sampled the program counter. Arthur McClinton's response points out that it could be attached to any interrupt. His company licensed PROFILE way back then and he maintained it up through several newer VMS versions until it finally stopped working.

I guess PROFILE had to be seen to be believed. What the comp.os.vms thread I initiated doesn't capture is that PROFILE wasn't something you only used when you were debugging and testing your program. The first time Bob demonstrated PROFILE to me was at NASA's LANDSAT computer facility (GSFC, Building 28); he attached PROFILE to a CPU-bound program running in the customer's production environment. We had no control over the target program, which was one of a number of subprocesses spawned by a master program, and we couldn't interrupt the production run that was in progress.

PROFILE, a GE-developed high resolution CPU sampling monitor, was used to determine whether the incorporation program was I/O-bound by file processing (as was first thought) or CPU-bound by the actual ASCII-to-binary number conversions. We found that the incorporation program was saturating the CPU and that most of the processing time was spent in converting ASCII text to binary REAL*4 numbers. By concentrating our speed-tuning efforts in this area, we were able to achieve significant improvements in overall incorporation time.

The incorporation program, as originally written, used FORTRAN formatted internal READ statements (i.e., DECODE) to convert ASCII numbers to their binary representations. Two alternatives were apparent for converting ASCII text to binary REAL*4 numbers: the VAX/VMS Run-Time Library conversion routine OTS$CVT_T_D (text to D_floating) or a user-written assembly-language conversion routine. Benchmark tests showed that Run-Time Library conversions are about 1.4 times as fast as equivalent FORTRAN conversions (see Table II). Replacement of the FORTRAN conversions by Run-Time Library conversions coupled with a design change in the calling routines produced an incorporation program that was twice as fast as the original program.

TABLE II: ASCII-Binary Conversion Times

The times below are averages of CPU times reported by the VAX/VMS Run-Time Library routine LIB$SHOW_TIMER during the execution of a benchmark program.

                   CPU Time in Seconds for 50,000 Conversions
                 .--------------------------------------------.
      Input      !    MACRO      Run-Time Library   FORTRAN   !
   ASCII Number  ! CONV_STR_REAL   OTS$CVT_T_D       E14.7    !
.----------------+--------------+--------------+--------------!
!                !              !              !              !
! +0.0000000E+00 !     4.91     !     37.66    !     63.56    !
!                !              !              !              !
! +0.1111111E+00 !     5.16     !     60.17    !     90.29    !
!                !              !              !              !
! +0.9999999E+00 !     5.18     !     57.07    !     85.31    !
!                !              !              !              !
! +0.1234567E+23 !     5.30     !     57.84    !     86.77    !
!                !              !              !              !
! +0.9876543E+12 !     5.28     !     54.32    !     83.81    !
!                !              !              !              !
! +0.9876543E-12 !     5.39     !     75.24    !    103.87    !
!                !              !              !              !
! -0.9876543E+12 !     5.34     !     54.76    !     85.35    !
!                !              !              !              !
! -0.9876543E-12 !     5.45     !     74.74    !    102.97    !
!                !              !              !              !
`----------------`--------------'--------------'--------------'

Although an increase in speed by a factor of 2 was desirable, we looked further into the possibility of writing a MACRO conversion routine that was even faster than the Run-Time Library. Two factors were in our favor. First, the ASCII numbers transferred across the MMF-TIPS interface adhere to strict formats; the majority of the real numbers are represented in FORTRAN E14.7 format (i.e., "s0.nnnnnnnEsmm"). Second, the VAX-11/780 has a very useful set of decimal string instructions that convert ASCII strings to binary integers. The MACRO routine we wrote, CONV_STR_REAL, takes advantage of these two factors, the fixed-format input and the hardware-implemented conversions. As a result, benchmark tests showed that conversions made using our assembly-language routine are ten times as fast as equivalent Run-Time Library conversions (see Table II).

Our assembly-language conversion routine, CONV_STR_REAL, differs from the VAX/VMS Run-Time Library conversion routine, OTS$CVT_T_D, in three ways: speed, versatility, and accuracy. Conversion speed is dependent on the versatility and accuracy of the routine making the conversion. By making acceptable tradeoffs in versatility and accuracy, it is possible to increase the speed with which numbers are converted. The need for versatility is dependent on the application. OTS$CVT_T_D is a highly versatile routine that accepts ASCII real numbers in many different formats; this versatility and its associated overhead (a VAX/VMS DEBUGGER call/branch trace showed over 350 transfers of control during a single call to OTS$CVT_T_D) are not required for the majority of real numbers handled by the incorporation program.

Conversion accuracy is relative. By analyzing roundoff errors, we can determine the maximum relative error that can occur in converting an ASCII number to a single or double precision binary number. It is assumed that the result of an operation involving two real numbers of the same precision introduces a maximum error of 1/2 least significant bit (LSB) of the mantissa; n operations introduce a maximum cumulative error of n * 1/2 LSB of the mantissa. Binary single precision (F_floating or REAL*4) numbers are represented on the VAX-11/780 using a sign bit; an 8-bit, excess-128 exponent; and a 24-bit, normalized mantissa. Binary double precision (D_floating or REAL*8) numbers are represented using a sign bit; an 8-bit, excess-128 exponent; and a 56-bit, normalized mantissa. The range of magnitudes representable by double precision numbers are almost identical to those of single precision numbers; only the precision differs.

OTS$CVT_T_D uses 128-bit arithmetic internally and produces a REAL*8 result. Because of the extremely high precision of 128-bit arithmetic, it is safe to consider the Run-Time Library conversion as a single operation with a maximum relative error of 1/2 LSB of the 56-bit REAL*8 mantissa, or one part in 2**57 (1.4 * 10**17). CONV_STR_REAL uses REAL*8 arithmetic internally, produces a REAL*8 result, and converts the result to REAL*4. Two operations that may introduce roundoff errors in CONV_STR_REAL are representing powers of ten in the exponent table and multiplying the mantissa by its exponent. The maximum relative error in the ASCII-REAL*8 conversion is therefore 2 * 1/2 LSB of the 56-bit REAL*8 mantissa, or one part in 2**56 (7.2 * 10**16). (Numbers whose exponents are less than 10**-31 require an additional operation to scale the mantissa; the maximum relative error is then 3 * 1/2 LSB, or 1.5 parts in 2**56.) For both OTS$CVT_T_D and CONV_STR_REAL, converting the result to single precision may add an additional error of 1/2 LSB of the 24-bit REAL*4 mantissa, or one part in 2**25 (3.4 * 10**7). The TIPS system engineers decided that a REAL*8 accuracy of approximately one part in ten thousand million million was acceptable. Extensive, but not exhaustive, benchmark tests showed no differences between the REAL*4 numbers produced by the two conversion routines and relative errors no larger than the maximums determined above for the REAL*8 numbers produced by the routines.

The incorporation program was modified to utilize the new MACRO ASCII-to-real conversion routine. Comparison runs between the old and new programs for several scenes' worth of data yielded a sevenfold increase in speed and no differences in the contents of the data files generated. Incorporation of the data for one scene now only requires 10 seconds of CPU time. Incorporation of the data for 100 scenes will require less than 20 minutes of CPU time, a considerable reduction from the 2 hours required by the old program. Significant performance improvements were realized by exploiting the restrictions of the application and the richness of the hardware's architecture.

Daniel Roy (formerly of General Electric) assisted in the development of the algorithm. Robert Wilson (General Electric), the author of PROFILE, provided assistance during testing.

Notes

The purpose of the LANDSAT project is the collection of imagery for use in applications such as land classification, agricultural and geological resource management, etc. The satellite carries two "cameras": a 4-band Multi-Spectral Scanner (MSS) and an experimental, high-resolution, 7-band Thematic Mapper (TM). The satellite ground station, designed and built by General Electric, consists of the various subsystems needed to control the satellite, to receive and process telemetry and video data, and to manage the project's mission.

CONV_STR_REAL Conversion Function

;*******************************************************************************
;
;	CONV_STR_REAL
;
;	This routine converts an ASCII representation of a real number to its
;	corresponding VAX binary REAL*4 representation. It is optimized for
;	the particular input format chosen; it does not accept different real
;	number formats as the Run-Time Library routine OTS$CVT_T_D does.
;
;
;	INPUTS  -
;
;		ASCII_NUMBER:	ASCII representation of real number in the
;				format +/-0.NNNNNNNE+/-MM (E14.7). The mantissa
;				need not be normalized.
;
;	OUTPUTS  -
;
;		R4_NUMBER:	REAL*4 representation of input number.
;				Set to +0.0E+00 if error in conversion.
;
;		R0:		Function value =
;				SS$_NORMAL: No error in conversion.
;				SS$_ROPRND: Invalid number format or invalid
;					    characters in number.
;				SS$_FLTUND: Magnitude of number is too small
;					    for VAX REAL*4 representation;
;					    i.e., magnitude < +0.2900000E-38.
;				SS$_FLTOVF: Magnitude of number is too large
;					    for VAX REAL*4 representation;
;					    i.e., magnitude > +0.1700000E+39.
;
;
;	To call from a FORTRAN program:
;
;		INTEGER*4	CONV_STR_REAL	! Declare function.
;		CHARACTER*14	ASCII_number	! Arguments.
;		REAL*4		R4_number
;		...
;		...
;		status = CONV_STR_REAL (ASCII_number, R4_number)
;
;
;*******************************************************************************


;*******************************************************************************
;
;	SYMBOLS - Global Symbols.
;
;*******************************************************************************

	$DSCDEF				; String descriptor parameters.
	$SSDEF				; System status codes.

	.EXTRN		CONV_COND_HAND	; External routine - Condition Handler.


;*******************************************************************************
;
;	SYMBOLS - Local Symbols.
;
;*******************************************************************************

	ASCII_NUMBER =	4		; Byte offsets for argument list.
	R4_NUMBER =	8

	DIGITS_TOTAL	=	14	; Total number of characters required
					; in the input string.

	DIGITS_IN_FRACTION =	7	; Number of digits to the right of the
					; decimal point in the mantissa.

	DIGITS_IN_EXPONENT =	2	; Number of digits (excluding sign) in
					; the exponent.

	MAX_DIGITS	=	8	; Maximum number of digits (including
					; sign) in the stack string storage.

	MAX_EXPONENT	=	38	; Range of exponents (10**-m .. 10**+m)
					; in the exponent table.

					; Temporary storage on the stack.
					; Space is allocated for the input
					; leading separate numeric string and
					; the translated packed numeric string.
					; The symbols define the amount of
					; storage to be allocated and the
					; offsets of the strings from the
					; modified stack pointer.
	STACK_STORAGE	=	<MAX_DIGITS+1>/2 + MAX_DIGITS
	SEPARATE_STRING	=	0	; Number of bytes = max_digits.
	PACKED_STRING =	MAX_DIGITS	; Number of bytes = (max_digits+1)/2.


;*******************************************************************************
;
;	POWERS_OF_TEN
;
;	This module defines the exponent lookup table and the mantissa scale
;	factor. The table contains the floating point representation for
;
;		   m
;		10	for m = -38, -37, ..., +37, +38.
;
;	For a given exponent i, the corresponding power of ten is found at
;	entry i+38 in the table (entry 0 is 10**(-38)).
;
;	The digits to the right of the decimal point in the mantissa are
;	converted to a large integer (as if there were no decimal point).
;	This number must then be divided by 10**(DIGITS_IN_FRACTION) to
;	produce the correct fraction. An equivalent operation is to multiply
;	the "mantissa" by 10**(-DIGITS_IN_FRACTION); this latter number is
;	called the mantissa scale factor.
;
;	Note that the maximum and minimum powers of ten in the exponent table
;	are the maximum and minimum powers of ten representable on the VAX.
;	The exponents of input ASCII numbers can exceed the maximum exponent
;	in the table because of the scaling of the mantissa and also if the
;	mantissa is not normalized.
;
;*******************************************************************************

	.PSECT		POWERS_OF_TEN, NOEXE, NOWRT, PIC, RD, SHR

; Macro CREATE_POWER_OF_TEN defines and initializes storage for a power of ten.

	.MACRO		CREATE_POWER_OF_TEN	VALUE
	.DOUBLE		+1.0E'VALUE
	.ENDM

; Macro GEN_EXP_TABLE defines and initializes storage for a table of powers
; of ten. The numbers in the table range from 10**(-n) .. 10**(+n), with the
; exponent increasing in steps of 1.

	.MACRO		GEN_EXP_TABLE	EXPONENT
	VAL	=	-EXPONENT
	.REPEAT		2*EXPONENT + 1
		CREATE_POWER_OF_TEN	\VAL
		VAL	=	VAL + 1
	.ENDR
	.ENDM

; Macro GEN_INVERSE_POWER_OF_TEN defines and initializes storage for the
; inverse of a power of ten; i.e., the inverse of 10**i is 10**(-i).

	.MACRO		GEN_INVERSE_POWER_OF_TEN	EXPONENT
	VAL	=	-EXPONENT
	CREATE_POWER_OF_TEN	\VAL
	.ENDM

EXPONENT_TABLE:
	GEN_EXP_TABLE	\MAX_EXPONENT

MANTISSA_SCALE_FACTOR:
	GEN_INVERSE_POWER_OF_TEN	\DIGITS_IN_FRACTION


;*******************************************************************************
;
;	CONV_STR_REAL - Start of Executable Code.
;
;	REGISTER USAGE -
;
;		   R0:	Function value returned.
;		R0-R3:	Used by CVTSP and CVTPL instructions for converting
;			ASCII to packed and packed to longword (exponent
;			conversion below follows copies).
;		R2-R3:	Exponent conversion; final REAL*8 result.
;		R4-R5:	Mantissa conversion.
;		   R6:	Address of first character in input string.
;		   SP:	Modified to allocate temporary string storage on the
;			stack.
;
;*******************************************************************************

	.PSECT		CONV_STR_REAL, EXE, NOWRT, SHR

CONV_STR_REAL::
	.WORD		^M<R2,R3,R4,R5,R6>

	CLRF		@R4_NUMBER(AP)		; Zero result, in case of error.
	MOVAL		CONV_COND_HAND, (FP)	; Establish condition handler.
	SUBL2		#STACK_STORAGE, SP	; Allocate temporary stack storage.

	MOVL		ASCII_NUMBER(AP), R6	; Address of ASCII_NUMBER descriptor.
	CMPW		DSC$W_LENGTH(R6), #DIGITS_TOTAL
	BLSSU		INVALID_STRING		; Enough characters in input string?
	MOVL		DSC$A_POINTER(R6), R6	; Extract string address from descriptor.

						; Validate non-essential
	CMPB		1(R6), #^A'0'		; characters in the input
	BNEQ		INVALID_STRING		; string. Return reserved
	CMPB		2(R6), #^A'.'		; operand error (like CVTxx
	BNEQ		INVALID_STRING		; instructions) if invalid.
	CMPB		<3+DIGITS_IN_FRACTION>(R6), #^A'E'
	BEQL		CONVERT_MANTISSA

INVALID_STRING:
	BRW		INVALID_STRING_ERROR	; Invalid input string "long branch".

CONVERT_MANTISSA:				; Construct ASCII string
						; "SNNNNNNN" from input string
						; "S0.NNNNNNNESMM".
	MOVQ		2(R6), (SP)		; ".NNNNNNN" to separate string.
	MOVB		(R6), (SP)		; "S" to separate string.

						; Convert ASCII string to
						; packed string (BCD) to
						; INTEGER*4 to REAL*8,
						; giving the mantissa.
	CVTSP		#DIGITS_IN_FRACTION, (SP), -
			#DIGITS_IN_FRACTION, PACKED_STRING(SP)
	CVTPL		#DIGITS_IN_FRACTION, PACKED_STRING(SP), R4
	CVTLD		R4, R4			; Mantissa * 10**DIGITS_IN_FRACTION.

						; Construct ASCII string
						; "SMM" from input string
						; "S0.NNNNNNNESMM".
	MOVL		<3+DIGITS_IN_FRACTION>(R6), (SP)
						; "ESMM" to separate string.

						; Convert ASCII string to
						; packed string (BCD) to
						; INTEGER*4, giving exponent.
	CVTSP		#2, 1(SP), #2, PACKED_STRING(SP)
	CVTPL		#2, PACKED_STRING(SP), R2

	CMPL		R2, #-MAX_EXPONENT	; Is the exponent too small?
	BLSS		EXPONENT_TOO_SMALL_ERROR
	CMPL		R2, -			; Is the exponent too large?
			#MAX_EXPONENT+DIGITS_IN_FRACTION
	BGTR		EXPONENT_TOO_LARGE_ERROR

						; Subtract mantissa scaling factor
						; from exponent and compute index
						; into exponent lookup table.
	ADDL2		#MAX_EXPONENT-DIGITS_IN_FRACTION, R2
	BGEQ		MULT_MANTISSA_EXPONENT	; Resulting exponent in table?

						; Resultant exponent too small.
						; Scale mantissa and use original
						; exponent.
	MULD2		MANTISSA_SCALE_FACTOR, R4
	ADDL2		#DIGITS_IN_FRACTION, R2

MULT_MANTISSA_EXPONENT:				; Index into the lookup table,
						; obtain 10**exponent, and
	MULD2		EXPONENT_TABLE[R2], R4	; multiply mantissa by exponent.

	CVTDF		R4, @R4_NUMBER(AP)	; Round and return converted
						; REAL*4 value.

	MOVL		#SS$_NORMAL, R0		; Return normal status.
	RET

; Error Returns.

INVALID_STRING_ERROR:				; Invalid characters in input
	MOVL		#SS$_ROPRAND, R0	; string. Return reserved
	RET					; operand error status.

EXPONENT_TOO_SMALL_ERROR:			; The exponent is too small.
	MOVL		#SS$_FLTUND, R0		; Return floating point
	RET					; underflow error status.

EXPONENT_TOO_LARGE_ERROR:			; The exponent is too large.
	MOVL		#SS$_FLTOVF, R0		; Return floating point
	RET					; overflow error status.


	.END

CONV_COND_HAND Condition Handler

      INTEGER*4  FUNCTION  CONV_COND_HAND (SIGARGS, MCHARGS)

C*******************************************************************************
C
C	CONV_COND_HAND
C
C	This routine is a VAX/VMS condition handler designed for use with
C	number conversion functions (i.e., converting ASCII text to binary
C	number representations). The error condition causing the condition
C	handler to be invoked is tested to determine if it is one of several
C	arithmetic errors. If so, the condition handler unwinds the stack to
C	the caller of the routine which established the condition handler and
C	returns control to that routine; the error condition is the function
C	value returned. If the error condition is not an arithmetic-type error,
C	the condition handler resignals the condition to higher-level condition
C	handlers. For example, let routine A call number conversion function B:
C
C		A :	status = B (in_number, out_number)
C
C		!
C		!
C		V
C
C		B <-----> condition_handler
C
C	B establishes this condition handler to handle errors. If the number is
C	successfully converted, B returns a function value of SS$_NORMAL. If an
C	error occurs, the condition handler is invoked, the stack is unwound to
C	A, and the error code is returned as the function value. Routine A is
C	not aware of the processing taking place in B or in the condition handler;
C	all A sees is the function value returned, either normal or error.
C
C
C	INPUTS  -
C
C		SIGARGS:	INTEGER*4 array which defines the condition
C				and the environment in which it occurred.
C
C		MCHARGS:	INTEGER*4 array which contains the values
C				of R0 and R1 at the time of the condition.
C
C	OUTPUTS  -
C
C		MCHARGS:	Saved value of R0 replaced by error code if
C				arithmetic condition caused handler to be
C				invoked. VAX/VMS will return this new R0 as
C				a function value to the caller of the routine
C				that established the condition handler.
C
C		CONV_COND_HAND:	SS$_CONTINUE if arithmetic error occurred and
C				program is to continue.
C				SS$_RESIGNAL if other error occurred and is to
C				be processed by higher-level condition handlers.
C
C*******************************************************************************

C...      Function arguments.
      INTEGER*4  SIGARGS(*), MCHARGS (*)
C...      Parameters and external routines.
      INCLUDE 'SYS$LIBRARY:SIGDEF.FOR'
      INTEGER*4  LIB$MATCH_COND
C...      Local variables.
      INTEGER*4  CONDITION, INDEX




C...      Determine if the condition handler was invoked during a stack
C      unwind, or for an arithmetic error, or for some other condition.

      CONDITION = SIGARGS(2)			! Condition that invoked handler.

      INDEX = LIB$MATCH_COND (CONDITION,
     +                        SS$_UNWIND,	! Unwinding stack.
     +                        SS$_DECOVF,	! Decimal overflow.
     +                        SS$_FLTDIV,	! Floating divide.
     +                        SS$_FLTDIV_F,	! Floating divide fault.
     +                        SS$_FLTOVF,	! Floating overflow.
     +                        SS$_FLTOVF_F,	! Floating overflow fault.
     +                        SS$_FLTUND,	! Floating underflow.
     +                        SS$_FLTUND_F,	! Floating underflow fault.
     +                        SS$_INTDIV,	! Integer divide.
     +                        SS$_INTOVF,	! Integer overflow.
     +                        SS$_ROPRAND)	! Reserved operand.

C...      If the condition handler was invoked during a stack unwind,
C      then let the system continue unwinding.

      IF (INDEX .EQ. 1) THEN
          CONV_COND_HAND = SS$_CONTINUE

C...      For an arithmetic error, unwind the stack and return the
C      status code as the function value to the calling routine.

      ELSEIF (INDEX .GE. 2) THEN
          MCHARGS(4) = CONDITION
          CALL SYS$UNWIND ( , )
          CONV_COND_HAND = SS$_CONTINUE

C...      For all other errors, resignal the condition, allowing it to
C      be handled by other condition handlers (i.e., the traceback
C      handler).

      ELSE
          CONV_COND_HAND = SS$_RESIGNAL
      ENDIF

      RETURN

      END