It is a known fact that all primes > 5 are of the form 6*k+1
and 6*k+5
(i.e. their remainder after dividing by 6 is 1 or 5). Justifying this fact is
trivial: given any other remainder, the number can be factored as follows:
6*k+2 = 2*(3*k+1)
6*k+3 = 3*(2*k+1)
6*k+4 = 2*(3*k+2)
so it cannot possibly be prime.
Sieve of Eratosthenes
is an ancient way to generate primes. Its naive
implementation uses one bit per integer to encode whether it is prime
or not. Thus, storage space required to find all primes up to n
equals floor(n/8)+1
bytes. Using the above observation,
the storage requirements can be cut down by a factor of 3 as shown
below.
The first prime after 5 is 7. Consider the following sequence of numbers, all of which are 1 or 5 modulo 6: 7, 11, 13, 17, 19, 23, 25, 29, 31, 35, 37, 41, 43, 47, 49, 53, 55,... Each of these numbers corresponds to one bit in the sieve. Note that the largest among first 8 numbers (encoded in the 1st byte of the sieve) is 29. The naive implementation can encode only numbers 1-8 in the 1st byte.
The sieve is organized as a bit-vector encoding only numbers which 1 or 5
modulo 6. The first task is to map a number n
to its bit index in the bit
vector. This is accomplished by this formula: b = floor(n/3)-1
. Thus, 7 is
mapped to 1, 11 to 2, etc. Once b
is found, it is trivial to access the b
'th
bit in the bit-vector.
To check whether an arbitrary n is prime, proceed as follows
int is_prime(int n)
{
int r;
if(n < 7)
return B[n];
r = n % 6;
if((r != 1) && (r != 5))
return 0;
return B[n/3-1];
}
This is pseudo-code where B
is a bit-vector indexed by bit. B[n]
is 1 if
n
is prime. The slowest operations in this process are division and modulus
operations. They can be suitably optimized (e.g. substituted with
multiplication). Optimizations will be the topic of a possible future article.
This article has shown the basic idea of compressing space needed for the sieve of Eartosthenes. Note that there are more advanced sieves (also more complicated to code), such as Sieve of Atkin.