


nbrIsPrime()
and nbrNextPrime()
.
"We don't do this song very much, but I feel like singing so, I guess, that's what we're gonna do."
 Duane Allman, "Dimples", Ludlow Garage, 1970 (Wikipedia)
I feel like writing so, I guess, that's what I'm going to do!
WARNING: All over the internet, you'll find example code that determines if a number is prime. Nearly all the examples I've seen loop through divisors while "i×i ≤ n" (i.e., while the square of the next divisor to be tested is less than or equal to the number). Squaring the divisor is fine if you're using arbitraryprecision arithmetic, but it can fail at the high end if you're using fixedsize integers. Take unsigned 32bit numbers for example:
~65535.9^2 = 4,294,967,295 (maximum unsigned 32bit number) 65535.0^2 = 4,294,836,225
There are 5,853 prime numbers between 65535.0^{2} and ~65535.9^{2}. When checking the next divisor after 65,535 for any of these primes, the i×i computation will overflow, leading to meaningless, possibly infinite loops. This is discussed in more detail in the "Limit Checking" section below.
NOTE: This prime number package is GROSSLY
OVERENGINEERED. I began with an optimization that skipped
testing potential divisors divisible by 2 or by 3 in a small, tight
loop. I then got to wondering about skipping potential divisors
divisible by an even wider range of prime divisors; e.g., skip
divisors divisible by 2, 3, 5, 7, or 11. I generalized the code
to handle an arbitrary range, [2..
NOTE: I was REINVENTING THE WHEEL. I basically had and have a gradeschool knowledge of prime numbers. After writing the code, I looked up on the internet fast ways of determining primes and I learned that "my" optimization was widely known (see the mathematics section below). So perhaps this was all a waste of time, but the openended exploration of the subject on my own was very intellectually satisfying for me.
NOTE: If a SMALL MEMORY FOOTPRINT is important, this package can be compiled to produce compact object code that uses the original hardcoded loop and omits the sequenceofdifferences functions and data related to the generalized version of the code. To accomplish this, add
DNBR_SKIP_2_3_HARDCODED=1
to the compilation command for "nbr_util.c". If an application only
calls nbrIsPrime()
and/or nbrNextPrime()
,
its code does not need to be compiled with the CPP definition.
If, however, the application calls any of the sequenceofdifferences
functions, the application should be compiled with the CPP definition.
Macros in "nbr_util.h" substitute meaningful return values for the
omitted functions.
NOTE: To jog your memory, if necessary, a NUMBER IS PRIME if it's positive and if its only divisors are 1 and itself; zero and one by definition are not prime. To determine if a number n is prime, you basically test if the number is evenly divisible (remainder 0) by any divisor in the range 2..√n. If the number is evenly divisible by some divisor, then the number is not prime. If none of the divisors evenly divide the number, then the number is prime.
I wrote my HASH_UTIL
hash table package in the late 1980s, as I recall.
It has long been a nuisance that a program using the package has had
to be explicitly linked to the C math library ("lm"). So, about 30
years later, I decided to fix that! Thinking that the math library
was required because of the use of sqrt(3)
in (i)
determining the next prime number larger than the expected table size and
(ii) computing the standard deviation of the bucket lengths,
I dove into the code. My memory hadn't served me well, because I
discovered that the prime number function was not using sqrt()
,
but was instead iterating until:
divisor > (number / divisor)
I don't know if I was considering the possibility of overflow at the time I wrote the hash table package, but it was a fortuitous choice for the loop test.
So far, so good, but I was redfaced to see that for 3 decades my code had been checking each and every possible divisor from 2 on up: 2, 3, 4, 5, 6, 7, ... Skipping all the even numbers after 2 was a painfully obvious optimization. And then I began thinking about also skipping odd numbers divisible by 3: check divisors 5, 7, ... skip 9 ..., 11, 13, ... skip 15 ..., 17, 19, etc. I noticed the differences between divisors to be tested was 2, 4, 2, 4, etc. In other words, a repeating cycle of two divisors. I implemented a hardcoded loop ("divisible" means evenly divisible with a remainder of 0):
if (number equals 0 or 1) return (notprime) ; if (number equals 2 or 3) return (isprime) ; if (number divisible by 2 or by 3) return (notprime) ; for (divisor = 5 to "square root" of number) { if (number divisible by divisor) return (notprime) ; divisor += 2 ; if (number divisible by divisor) return (notprime) ; divisor += 4 ; } return (isprime) ;
which checks the following divisors: [2, 3,] 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, 37, ... Skipping odd divisors divisible by 3 is almost twice as fast as checking all of the odd numbers.
I was pleased by my independent "discovery" of this heretofore unthoughtof optimization (/snark) and my thoughts naturally turned to skipping even more divisors.
First, however, some terminology I use in this package:
SkipThrough Prime (STP)  is the maximum prime number for
which divisors divisible by one or more of primes 2 through STP
are skipped. In the pseudocode above, the skipthrough prime
is 3. As another example, if the skipthrough prime is 11,
nbrIsPrime()
will skip divisors divisible by any
of the primes 2 through 11: 2, 3, 5, 7, 11.
Start Divisor  is the divisor at which to begin the first cycle of testing. This is always the next prime after the skipthrough prime. Using the examples above, STP 3's start divisor is 5 and STP 11's start divisor is 13.
Sequence of Differences  is an array of the differences between successive divisors to be tested. The differences skip over divisors that do not need to be tested; i.e., divisors evenly divisible by primes 2 through the skipthrough prime.
Beginning with the start divisor, nbrIsPrime()
tests if
the target number is evenly divisible by the start divisor. If so,
the number is not prime and nbrIsPrime()
returns. If
not, the first difference in the sequence is added to the start
divisor to get the next divisor to test. The function continues
testing successive divisors; when the end of the sequence is reached,
nbrIsPrime()
jumps back to the beginning of the sequence
to continue with the next pass through the sequence.
For example, in the pseudocode above, the sequence of differences for STP 3 is { 2, 4 }. Beginning with start divisor 5, the loop iterations unfold as follows:
(1) Test 5, Add 2, Test 7, Add 4 (2) Test 11, Add 2, Test 13, Add 4 (3) Test 17, Add 2, Test 19, Add 4 (4) Test 23, Add 2, Test 25, Add 4 (5) Test 29, ...
Limit Check  refers to checking if a divisor about to be tested is less than or equal to the square root of the target number. When testing all odd divisors (skipthrough prime 2), there is only one test in the loop and the limit check is performed as part of the loop control. For skipthrough primes 3 and above, there are multiple tests per loop iteration, corresponding to each difference in the prime's sequence. STP 5 has a sequence of 8 differences, for example, but adding 7 limit checks inside the loop incurs a performance hit. (The pseudocode above doesn't limit check the second test's divisor in the loop.) I discuss this issue in more detail below, "Limit Checking".
The pseudocode for the generalized implementation inserts an inner loop that steps through the sequence of differences (again, "divisible" means evenly divisible with a remainder of 0):
if (number equals 0 or 1) return (notprime) ; if (number equals any prime in 2..STP) return (isprime) ; if (number divisible by any prime in 2..STP) return (notprime) ; divisor = start divisor ; while (divisor ≤ "square root" of number) { for (each difference in sequence) { if (number divisible by divisor) return (notprime) ; divisor += difference ; } } return (isprime) ;
Figuring out the sequence of differences for skipthrough prime 3 was easy enough to do in my head. I wrote a throwaway program to generate differences for skipthrough primes 5, 7, and 11; examined the listings to determine when the sequences began repeating (8, 48, and 480, respectively); and noticed the first of a few relations I would find:
seqlength (nextprime) = seqlength (currentprime) × (nextprime  1)
For example:
Skipthrough prime 2's sequence of differences, { 2 }, is of length 1. STP 3 has a sequence of length 2 = 1 × (3  1). STP 5 has a sequence of length 8 = 2 × (5  1). STP 7 has a sequence of length 48 = 8 × (7  1). STP 11 has a sequence of length 480 = 48 × (11  1). And so on.
Sequence length, for the skipthrough primes under consideration (2 through 53 for 64bit unsigned long integers), increases by one or two orders of magnitude for each new prime. STP 59's sequence length exceeds the maximum representable, unsigned 64bit value.
range (nextprime) = range (currentprime) × nextprime
The range of a sequence of differences is the range of divisors covered by each pass over the sequence, found by summing all the differences in the sequence. STP 3's sequence is { 2, 4 }, whose range is 2 + 4 = 6; as an example above shows, the first divisor tested in each loop iteration is 5, 11, 17, 23, 29, ... STP 5's 8element sequence has a range of 30, so the divisors at the beginning of each loop iteration are 7, 37, 67, 97, ... I wasn't clever enough to realize that the formula above is more simply expressed as the product of the preceding primes and the skipthrough prime; e.g., the range of STP 7 is 2 × 3 × 5 × 7 = 210. Although STP 53's sequence length fits in an unsigned 64bit integer, its range doesn't.
density (prime) = seqlength (prime) / range (prime)
Density, in this case, is a measure of how many numbers in a range are actually tested as divisors. The lower the density the better. If even numbers are skipped and odd numbers are tested (STP 2), the density is 50%. Higher skipthrough primes lower the density even further at the expense of increasingly lengthy sequences of differences.
maxdifference (nextprime) = currentprime × 2
Sometimes! The maximum differences in the sequences for skipthrough primes
[2,] 3, 5, 7, 11, 13, 17, 19, and 29, 31 obey this relation. For example,
the maximum difference in STP 31's sequence is 29 × 2 = 58. However,
skipthrough prime 23's maximum difference is 40 (not the expected
19×2 = 38) and STP 37's is 66 (not the expected 31×2 = 62).
Using nbrPrimeSeqDiffsMax()
to generate the one trillion
(10^{12}) differences in STP 37's sequence and determine the
maximum difference originally consumed 99%100% of the CPU for 8.5 days.
I subsequently optimized the function by skipping both even divisors and
odd divisors divisible by 3, reducing the time to 6.4 days. I'm not
a glutton for punishment, so I did not give serious consideration to
optimizing nbrPrimeSeqDiffs()
and
nbrPrimeSeqDiffsMax()
further by using higher skipthrough
primes than 3. In any case, finding out STP 41's maximum among its 44
trillion differences would take threequarters of a year or more. Maybe
there's a mathematical relation that gives the correct maximum difference
for all skipthrough primes, but nothing jumped out at me.
Knowing a sequence's maximum differenceand using differences in the first placecan significantly reduce memory usage. For unsigned 64bit integers, the largest skipthrough prime, 47, has an estimated maximum difference of 43 × 2 = 86, which can be represented in an unsigned 8bit integer. STP 47's sequence of differences requires only oneeighth of the memory that the sequence of offsets in the math section below requires. (Actually, STP 53 is the largest skipthrough prime in the differences form of the algorithm. It can't be used in the math section's offset form of the algorithm. The last offset for any STP is always one less than the STP's range and STP 53's range exceeds the maximum representable, unsigned 64bit integer by 14×10^{18}.) Looking at unsigned 128bit integers, the largest skipthrough prime is 101 with an expected maximum difference of 202, which, if correct, can still be represented in an unsigned 8bit integer. (This is all theoretical for the most part; the largest skipthrough prime I've actually tested was 29 with its onebillionelement sequence.)
Time for some actual numbers to get a sense of what we are dealing with. The following numbers are for unsigned 64bit integers (L/R is the length divided by the range; i.e., density):
Skip Estimated Sequence Range Through Start Maximum (Sum of Prime Divisor Sequence Length Difference Differences) L/R       2 3 1 2 2 50% 3 5 2 4 6 33% 5 7 8 6 30 26% 7 11 48 10 210 22% 11 13 480 14 2310 20% 13 17 5760 22 30030 19% 17 19 92160 26 510510 18% 19 23 1658880 34 9699690 17% 23 29 36495360 38 223092870 16% 29 31 1021870080 46 6469693230 15% 31 37 30656102400 58 200560490130 15% 37 41 1103619686400 62 7420738134810 14% 41 43 44144787456000 74 304250263527210 14% 43 47 1854081073152000 82 13082761331670030 14% 47 53 85287729364992000 86 614889782588491410 13% 53 59 4434961926979584000 94 overflow 13% SIZE_MAX: 18446744073709551615 
(The primate program computes the range as both an integer and a floatingpoint number. If the integer overflows, the floatingpoint number is substituted in the density calculation. So the density for skipthrough prime 53 is correctly listed as 13%.)
For unsigned 128bit integers, the following figures pick up with skipthrough prime 53 (whose range overflowed 64 bits) and run through skipthrough prime 101. The density at this point has only dropped down to 11.9%. (The sequence length and range both overflow for STP 103.)
53 (SkipThrough Prime) 59 (Start Divisor) 94 (Est. Max. Diff.) Length: 4434961926979584000 (4.43496e+18) Range: 32589158477190044730 (3.25892e+19) ... through ... 101 (SkipThrough Prime) 103 (Start Divisor) 202 (Est. Max. Diff.) Length: 27739969042773783995307880611840000000 (2.774e+37) Range: 232862364358497360900063316880507363070 (2.32862e+38) SIZE_MAX: 340282366920938463463374607431768211455 (3.40282e+38) 103 (SkipThrough Prime) 107 (Start Divisor) Length: 
Knowing the exploding lengths and ranges for the higher skipthrough prime, whether for 64bit or 128bit numbers, was of intellectual interest to me, but these higher skipthrough primes are obviously not of any practical value. The Prime Page's Glossary entry for wheel factorization says, in different terminology than mine, that skipthrough prime 251 reduces the density to 10% and 75,037 reduces it to 5%. Skipthrough prime 134,253,593, with a sequence length of approximately 6.4×10^{99}, drives the density down to 3%!
"Limit checking" refers to not testing divisors greater than the square root of the number when determining if the number is prime.
First of all, heed my warning at the very beginning about the danger of overflow using the widely seen i×i ≤ n check with fixedprecision integers. The following explanation uses 64bit unsigned longs, but the warning applies to any bit width.
The integer square root of the maximum representable, 64bit value, ULONG_MAX, is 0x0FFFFFFFF — all of the lower 32 bits set. (The real square root of ULONG_MAX is 0x0FFFFFFFF plus about 0.9 decimal.) If you square the integer square root, the result is a number that is 8.6 billion less than ULONG_MAX. In other words, the 8.6 billion numbers in the range [root^{2}..ULONG_MAX] all have the same integer square root: 0x0FFFFFFFF. There are 193 million primes in that range. Consequently, for each of the 193 million primes tested with divisor i:
So 193 million primes are excluded from being tested because of the i×i multiplication. The steps above will occur when testing every possible divisor (i++) or only odd divisors (i += 2).
Additional optimizations in the algorithm make the situation even worse. If the algorithm is extended to also skip divisors divisible by 3 (my first optimization), the divisor is incremented alternately by 2 and by 4. Since 0x0FFFFFFFF is divisible by 3 and will be skipped, i will only reach the preceding odd number, 0x0FFFFFFFD, and then be incremented by 4 to 0x100000001, causing i×i to overflow. The square of the preceding odd number is 25.8 billion less than ULONG_MAX. Testing the primality of the 579 million primes in that range — those whose integer square root is 0x0FFFFFFFD, 0x0FFFFFFFE, or 0x0FFFFFFFF — is impossible.
To avoid overflow with fixedprecision integers, divide both sides of the relational operator by i:
if (i×i ≤ n) ... continue checking divisors ... (INCORRECT) if (i ≤ n/i) ... continue checking divisors ... (CORRECT)
Now that there is no danger of overflow, there are some other aspects of limit checking in relation to my skipthrough prime optimization that need to be examined. Remember that the algorithm is basically two nested loops, where the inner loop runs through the elements in the sequence of differences and the outer loop repeats the inner loop as often as necessary to reach and pass the square root of the number (if the number is prime):
div = STP's start divisor while (div ≤ n/div) { foreach difference in sequence { ... test if n is evenly divisible by div ... div = div + difference } }
In the Wikipedia pseudocode for the related c#k+i algorithm (see the mathematics section below), the "inner loop" corresponding to my inner loop tests the same number of divisors as mine does, but without limit checking the divisors. (I put "inner loop" in quotes because the pseudocode is hardcoded for the STP 3 optimization and the "loop" is a single IFstatement that checks both the 6(k1)+5 and "6k+1" divisors; again, the c#k+i version of my STP 3 optimization is explained in the mathematics section.) It is a good idea, however, to limit check each divisor inside the inner loop as well as the outer loop:
div = STP's start divisor while (div ≤ n/div) { foreach difference in sequence { ... test if n is evenly divisible by div ... div = div + difference if (div > n/div) break } }
(Whether the check is at the beginning or end of the inner loop, there will always be one check that duplicates the check in the outer loop on each pass through the sequence.)
Because the length of the sequence and range increase rapidly with higher skipthrough primes, there are two cases that are of concern when there is no limit check in the inner loop:
The primality test must run through the entire sequence of differences. If the divisor exceeds the square root early in a pass through the sequence, doing all the remaining tests in the sequence is a waste of time. Assume the function uses STP 11 with its 480difference sequence. If on some cycle the divisor surpasses the square root at the 5th difference, the inner loop will uselessly continue checking the other 475 divisors, all of which are greater than the square root.
If the range of a sequence is greater than a prime number, continued testing of divisors past the square root might reach a point where the prime number itself is tested as a divisor. "n%n == 0", so the number will be rejected as composite even though it's really prime!
The first case above is a matter of efficiency; the second case requires flagging a number as composite if "n%div == 0" and if "n != div".
Adding the limit check, "div > n/div", to the inner loop caused a big performance hit. My first attempt at ameliorating the slowdown was to add a global boolean variable that the calling program could set to enable or disable limit checking in the inner loop:
if (gCheckLimits && (div > n/div)) ...
Testing a boolean flag takes no time at all and, when limit checking is disabled, the algorithm is as fast as the original nolimitchecking algorithm. However, I was not satisfied with burdening the calling program with the responsibility of deciding whether to turn limit checking on or off.
My second attempt took advantage of a sequence's range to put off limit checking until the tested divisor got "close" to the square root of the input number. First, I computed a rough estimate of the square root by repeatedly dividing the number by 2 until "root ≤ n/root". Then, in the outer loop, before beginning the inner loop each time, figure out if the rough root estimate falls within the range of numbers covered by the coming pass through the sequence of differences. If not, clear a local boolean variable to disable limit checking in the inner loop. If the estimated root does fall in the range, set the boolean variable to enable checking.
... compute rough estimate of square root ... checkLimits = false div = STP's start divisor while (div ≤ n/div) { if (div ≤ root ≤ (div+range1)) checkLimits = true foreach difference in sequence { ... test if n is evenly divisible by div ... div = div + difference if (checkLimits && (div > n/div)) break } }
A big improvement without intervention by the calling program, but ...
The estimate of the square root could be off by as much as half of the actual square root, so limit checking might begin multiple range blocks before the range containing the actual square root. Obviously, it would be faster to only check limits in the last block.
The large range of a higher skipthrough prime's sequence may be greater than the estimated or actual square root and limit checking would then begin with the very first divisor of the very first pass through the sequence; i.e., the skipthrough prime's start divisor. Thus the algorithm reverts to the slow checkeverydivisor mode.
A more accurate estimate of the square root was needed, so I searched for "fast integer square root" on the web and, not unsurprisingly, up popped numerous references to a technical bulletin, "Fast Integer Square Root", by Ross M. Fosler. (I added an "s" to the title for my own description of the very fast algorithm and how I implemented it, "Fast Integer Square Roots".) My isprime function now knows exactly when the tested divisors pass the actual square root of the number and the function no longer needs to perform the timeconsuming "div > n/div" check in the inner loop:
root = nbrUnsignedSQRT (n) div = STP's start divisor while (div ≤ root) { foreach difference in sequence { ... test if n is evenly divisible by div ... div = div + difference if (div > root) break } }
Performance tests were conducted using my primate test program. The command lines are shown below. The "output" following the command lines are not the output of the program, but me tabulating the results of multiple runs. When primate is in "time" mode, most output is suppressed so that I/O is not a significant factor in the times. Also, the generation of sequences of differences is done before the timer starts counting.
My original prime number function was in my HASH_UTIL package and the largest hash table I ever needed was for about 80,000 telemetry points. (Most of those points were variable names tied to memory locations in the onboard computer and would only be downloaded, selectively, when trying to diagnose a problem.) So my first suite of tests listed all the prime numbers in the range 80,000200,000. Since listing those numbers is very quick, I had each run repeat the test 1000 times to get a large enough timing figure for comparison's sake.
Determine the prime numbers in the range 80,000200,000; repeat 1000 times:
% primate time repeat 1000 list 80000200000 alternate odds STP 2* 209.9 seconds * handcoded loop to check odd divisors % primate time repeat 1000 list 80000200000 qprime stp STP 2 221.3 seconds STP 3 127.5 seconds STP 5 87.0 seconds STP 7 71.0 seconds STP 11 65.5 seconds STP 13 62.2 seconds STP 17 60.3 seconds STP 19 59.1 seconds STP 23 58.5 seconds STP 29 58.0 secondsThe first test ("alternate odds") used a runofthemill, handcoded loop to check all odd divisors. The remaining tests ("qprime stp") used my sequence of differences, hence the slightly higher time for STP 2. As expected, the times decrease as the skipthrough prime increases.
The second suite of tests determines the 1,000 largest prime numbers representable in unsigned, 64bit integers. These numbers are verified by matching them against a precomputed list of the primes generated by a different program, Achim Flammenkamp's prime_sieve, found at his web page:
"The Sieve of Eratosthenes"
http://wwwhomes.unibielefeld.de/achim/prime_sieve.html
[Modifications to his program: (i) define LONG as "unsigned long", (ii) undefine _ANSI_EXTENSION, (iii) uncomment the commentedout code in the definition of "use(x)", and (iv) hardcode the initial sieve size.]
Determine and verify the largest 1,000 prime numbers representable in unsigned, 64bit integers against a precomputed (by a different program) list of those primes:
% primate time verify last64 qprime stp STP 3 90,842 seconds (25.2 hours) STP 5 55,797 seconds (15.5 hours) STP 7 43,093 seconds (12.0 hours) STP 11 38,338 seconds (10.6 hours) STP 13 35,310 seconds ( 9.8 hours) STP 17 33,234 seconds ( 9.2 hours) STP 19 31,667 seconds ( 8.8 hours) STP 23 30,238 seconds ( 8.4 hours) STP 29 29,430 seconds ( 8.2 hours)
The most timeconsuming aspect of my algorithm is generating the sequence
of differences for the higer skipthrough primes. My computer was able
to generate a physical, 1billionbyte sequence of differences for STP 29,
but malloc(3)
failed trying to allocate the 31billionbyte
sequence for STP 31. The nbrPrimeSeqDiffsMax()
function
generates the same sequences of differences, but without requiring any
memory for the sequence: each difference is generated, compared to the
running maximum value, and then discarded. So I was able to time sequence
generation for STPs 31 and 37; the last took 6.4 days and I didn't have the
patience to wait 280+ days for the generation of STP 41's 44trillionelement
sequence to complete.
Generate sequences of differences:
% primate time qmax stp STP 17 0.04 seconds 92 thousand elements STP 19 0.4 seconds 1.7 million elements STP 23 11 seconds 36 million elements STP 29 407 seconds (6.8 minutes) 1 billion elements STP 31 12,602 seconds (3.5 hours) 31 billion elements STP 37 550,670 seconds (6.4 days) 1 trillion elements
When the code was pretty much in its final form, bar tweaking of the limit checking, I researched ways to speed up generating sequences of differences (e.g., originally 8.5 days for skipthrough prime 37's one trillion differences, a 10^{12}length sequence). This research was unsuccessful, but I came upon many queries on programming forums about determining if a number is prime and many responses about an optimized method called "6k +/ 1". I then searched for information about 6k +/ 1 and landed on the Wikipedia page about primality testing using "trial division", i.e., looping through all possible divisors to find a divisor other than 1 for the number. From Wikipedia (Wikipedia: Primality test  Simple methods):
The algorithm can be improved further by observing that all primes greater than 6 are of the form 6k ± 1. This is because all integers can be expressed as (6k + i) for some integer k and for i = 1, 0, 1, 2, 3, or 4; 2 divides (6k + 0), (6k + 2), (6k + 4); and 3 divides (6k + 3). So, a more efficient method is to test if n is divisible by 2 or 3, then to check through all the numbers of the form 6k ± 1 ≤ √n. This is 3 times as fast as testing all m.
Generalising further, it can be seen that all primes greater than c# are of the form c#k + i for i < c# where c and k are integers, c# represents c primorial, and i represents the numbers that are coprime to c#. For example, let c = 6. Then c# = 2 ⋅ 3 ⋅ 5 = 30. All integers are of the form 30k + i for i = 0, 1, 2, …, 29 and k an integer. However, 2 divides 0, 2, 4, …, 28 and 3 divides 0, 3, 6, …, 27 and 5 divides 0, 5, 10, …, 25. So all prime numbers greater than 30 are of the form 30k + i for i = 1, 7, 11, 13, 17, 19, 23, 29 (i.e. for i < 30 such that gcd(i,30) = 1). Note that if i and 30 were not coprime, then 30k + i would be divisible by a prime divisor of 30, namely 2, 3 or 5, and would therefore not be prime. [Last sentence: ... for ex: 437 ... I learn something new every day!]
So all primes are of the form c#k+i for i meeting the conditions above. (Note that the reverse is not true; numbers of the form c#k+i are not necessarily prime.) Relating this to my code:
c is my skipthrough prime. (Strictly speaking, c can be any number in the range STP ... nextprime  1; in other words, STP is the largest prime less than or equal to c.)
c# is "c primorial", where primorial is simply the product of all the primes less than or equal to c; i.e., 2x3x5x7x... This value is identical to my "range" of a sequence of differences. (Wikipedia: Primorial)
k indexes the blocks of range c# of divisors to be checked. Incrementing k is like reaching the end of a sequence of differences in my code and looping back to start at the beginning of the sequence for the next higher block of divisors.
i is drawn from the set of numbers in the effective range 1..c#1 such that each i is coprime with c#. Two numbers are coprime with respect to each other if they share no divisors other than 1. Note that "coprime" is a relation between numbers, not an attribute (like "prime") of a single number. (Wikipedia: Coprime integers)
So the c#k+i optimization is the foundation for my code, unbeknownst to me when I wrote the code! The difference between the c#k+i method and my code is that the former method works with explicit i offsets from the base of a c#k block and my code works with the sequence of differences between successive i offsets.
Now let's consider the set of numbers coprime with c# from which i is drawn. The numbers are in the range 0..c#1. Zero (0) drops out because the first multiplier in c# is 2, so c#k+0 is always even (divisible by 2). The very first number in the set is always 1 since the only divisor it shares with c# is 1.
The second number in the set is always the next prime after c: the numbers between 1 and the next prime, exclusive, are either equal to or evenly divisible by one of the primes less than or equal to c (that's why the next prime is prime!), so these numbers drop through the "sieve". NOTE that the next prime, the second number in the set, is the start divisor in my algorithm. My sequence of differences begins with the difference between the second and third elements of the set and ends with the difference between the first and second elements.
The remaining numbers in the set are determined by checking the divisors of the numbers after the next prime. If a number is evenly divisible by any of the primes less than or equal to c, the number drops through the "sieve" and out of the set. Since the number of elements in a set rises rapidly with high c's, this is a timeconsuming process. This process can itself be optimized by using the c#k+i method. My comparable generation of a sequence of differences uses a hardcoded skipthrough prime of 3; as I mentioned above, it didn't seem worth the effort to generalize that bit of code for higher STPs.
Here are some example skipthrough primes in their c#k+i form. (In the first example, 1 is not prime, but the algorithm will work with the given parameters to check every possible divisor, odd and even.)
c = 1 c# = 1 i in { 1 } c = 2 c# = 2 i in { 1 } c = 3 c# = 6 i in { 1, 5 } c = 5 c# = 30 i in { 1, 7, 11, 13, 17, 19, 23, 29 } c = 7 c# = 210 48 elements in set i in { 1, 11, 13, 17, 19, 23, 29, 31, ..., 199, 209 } c = 11 c# = 2310 480 elements in set i in { 1, 13, 17, 19, 23, 29, 31, 37, ..., 2297, 2309 } ... c = 23 c# = 223,092,870 36,495,360 elements in set i in { 1, 29, 31, 37, 41, ..., 223092841, 223092869 } c = 29 c# = 6,469,693,230 1,021,870,080 elements in set i in { 1, 31, 37, 41, 43, ..., 6469693199, 6469693229 }
Note that the algorithm has to ignore the very first potential divisor: when k is 0 and i is 1, c#k+i = 1 and any number being divisible by 1 does not mean the number is composite. Effectively, the first divisor checked is the next prime after c, my start divisor. (Of course, this is after the initial check if the target number is equal to or divisible by the primes less than or equal to c).
I just noticed that the difference between the first two elements in a set equals the difference between the last two elements, at least up to c=29. I don't know if there's a mathematical reason for this or if it holds true for all possible sets.
You might be wondering how the "6k +/ 1" form widely seen on the internet relates to the c#k+i scheme of 6k+i for i in { 1, 5 }. Swap the arithmetic operations and "6k /+ 1" becomes shorthand for the +5 term of the previous k block and the +1 term of the current k block:
c = 3 c# = 2×3 = 6 i in { 1, 5 } Previous k block: 6(k1)+5 = 6k6+5 = 6k1 Current k block: 6k+1
A graphical representation, I guess you would call it, of the c#k+i optimization is called wheel factorization (Wikipedia: Wheel factorization). Like my algorithm, the algorithm presented on the Wikipedia page (i) explicitly begins checking divisors with the next prime after c and (ii) uses the differences between divisors rather than the i offsets.
Lastly, a remark upon my formula for the length of a sequence of differences:
seqlength (nextprime) = seqlength (currentprime) ⋅ (nextprime  1)
As mentioned before, the sequence length is the count (or totient) of the numbers in 1..c#1 that are coprime with c#. Euler's product formula (Wikipedia: Euler's product formula) can be used to compute the totient, phi(n), for a given number n:
φ(n) = n ⋅ ∏_{(pn)} (1  1/p)
which means the product of successive (1^{1}/_{p})s for all primes p that are divisors of n:
phi = 1 for each prime number p that is a divisor of n { phi = phi * (1  1/p) }
(Note that Euler's formula can be used for arbitrary positive integers, but what is of interest in the primality testing context is specifically when n is an instance of a c#.)
With p[k] being a prime number and p[k+1] the next larger prime number, my sequencelength formula can be derived from Euler's formula as follows:
φ(p[k+1]#) = φ(p[k]#) ⋅ p[k+1] ⋅ (1  1/p[k+1]) = φ(p[k]#) ⋅ p[k+1] ⋅ ((p[k+1]  1)/p[k+1]) = φ(p[k]#) ⋅ (p[k+1]  1) = seqlength (currentprime) ⋅ (nextprime  1)
I am not a mathematician and my heart sank after reading the first two paragraphs on totient's Wikipedia page. The rest of the page is not for the mathematically faint of heart. Fortunately, I really only needed to understand Euler's product formula; some ancillary Wikipedia pages on the notation helped me figure out how to apply the formula. I tried it on three skipthrough primes:
5 (5# = 30, φ(30) = 8); 7 (7# = 210, φ(210) = 48); and 11 (11# = 2310, φ(2310) = 480).
The pattern of the calculations became obvious and I could then work out the derivation above of my formula after the fact.