Primes to One Trillion
The Gutenberg text "The First 100,000 Prime Numbers", EBook #65,
lists the primes up to 1,318,699. This somewhat more ambitious
version lists the primes up to one trillion (1,000,000,000,000 or
1E12).
Introduction
I became interested in prime numbers after hearing about
Goldbach's Conjecture,
"Every even integer greater than 2 can be expressed as the sum of
two primes".
Verifying this requires a source of primes. Short lists (or
programs to generate them) are widely available. Really long
lists are scarce, except for
primos.mat.br.
To make these lists more accessible, I have reformatted them to a
size easily manageable by ordinary text editors and
viewers--about 55MB. The file names correspond to the range of
primes the file contains:
00000000000_to_00100000000.txt Zero to 100 Million
00100000000_to_00200000000.txt 100 Million to 200 Million
etc.
The leading zero digits cause the file names to collate in order
of their content. Longer lists can be composed with the DOS copy
command. Move the required prime txt files to a temporary
directory and use:
copy *.txt longList.txt
Prime Text Files
This is a collection of 10,000 files, occupying about 486GB of
disk space in their unzipped native txt format. Since adjacent
primes have about 90% identical leading digits, the compressed
(zip) versions total 61GB. Each zip file contains 100 txt files.
Do not use Windows Explorer to copy or move large numbers of
files at a time. Use DOS copy or xcopy for large copies. I find
Beyond Compare (Scooter Software) handy for keeping track of
large numbers of files.
0000.zip 000G to 010G (0 to 1E10)
0010.zip 010G to 020G
0020.zip 020G to 030G
0030.zip 030G to 040G
0040.zip 040G to 050G
0050.zip 050G to 060G
0060.zip 060G to 070G
0070.zip 070G to 080G
0080.zip 080G to 090G
0090.zip 090G to 100G
0100.zip 100G to 110G
0110.zip 110G to 120G
0120.zip 120G to 130G
0130.zip 130G to 140G
0140.zip 140G to 150G
0150.zip 150G to 160G
0160.zip 160G to 170G
0170.zip 170G to 180G
0180.zip 180G to 190G
0190.zip 190G to 100G
0200.zip 200G to 210G
0210.zip 210G to 220G
0220.zip 220G to 230G
0230.zip 230G to 240G
0240.zip 240G to 250G
0250.zip 250G to 260G
0260.zip 260G to 270G
0270.zip 270G to 280G
0280.zip 280G to 290G
0290.zip 290G to 200G
0300.zip 300G to 310G
0310.zip 310G to 320G
0320.zip 320G to 330G
0330.zip 330G to 340G
0340.zip 340G to 350G
0350.zip 350G to 360G
0360.zip 360G to 370G
0370.zip 370G to 380G
0380.zip 380G to 390G
0390.zip 390G to 300G
0400.zip 400G to 410G
0410.zip 410G to 420G
0420.zip 420G to 430G
0430.zip 430G to 440G
0440.zip 440G to 450G
0450.zip 450G to 460G
0460.zip 460G to 470G
0470.zip 470G to 480G
0480.zip 480G to 490G
0490.zip 490G to 400G
0500.zip 500G to 510G
0510.zip 510G to 520G
0520.zip 520G to 530G
0530.zip 530G to 540G
0540.zip 540G to 550G
0550.zip 550G to 560G
0560.zip 560G to 570G
0570.zip 570G to 580G
0580.zip 580G to 590G
0590.zip 590G to 500G
0600.zip 600G to 610G
0610.zip 610G to 620G
0620.zip 620G to 630G
0630.zip 630G to 640G
0640.zip 640G to 650G
0650.zip 650G to 660G
0660.zip 660G to 670G
0670.zip 670G to 680G
0680.zip 680G to 690G
0690.zip 690G to 600G
0700.zip 700G to 710G
0710.zip 710G to 720G
0720.zip 720G to 730G
0730.zip 730G to 740G
0740.zip 740G to 750G
0750.zip 750G to 760G
0760.zip 760G to 770G
0770.zip 770G to 780G
0780.zip 780G to 790G
0790.zip 790G to 700G
0800.zip 800G to 810G
0810.zip 810G to 820G
0820.zip 820G to 830G
0830.zip 830G to 840G
0840.zip 840G to 850G
0850.zip 850G to 860G
0860.zip 860G to 870G
0870.zip 870G to 880G
0880.zip 880G to 890G
0890.zip 890G to 800G
0900.zip 900G to 910G
0910.zip 910G to 920G
0920.zip 920G to 930G
0930.zip 930G to 940G
0940.zip 940G to 950G
0950.zip 950G to 960G
0960.zip 960G to 970G
0970.zip 970G to 980G
0980.zip 980G to 990G
0990.zip 990G to 1000G
Additional prime files will be posted here.
PrimeC File Format and
Miscellaneous C++ Programs
While working with primes, I developed the primec format, a file
or array representation for primes that is roughly the same size
of the compressed (zipped) txt representation, and supports fast
access, both sequential and direct. The exact location of the
primality specification of any number in the file (or memory
array) is computed with a few instructions and no search.
If you wish to examine and experiment with the C++ programs used
to reformat these prime lists and test the Goldbach Conjecture,
download the "programs.zip" package. It contains Generating and
Analyzing Prime Numbers, a description of the content and use of
these files, including the primec file format.
Primec Format
The primec format exploits the fact that all primes greater than
5 end in the decimal digits 1, 3, 7, or 9. Thus, the primality of
20 successive numbers can be specified in one 8 bit byte. The
file begins with the complete binary representation of:
The beginning of the sequence
The end of the sequence
A check sum of all data bytes
(All three are 8 bytes for this implementation).
The first and last values are a multiple of twenty, thus are
never primes. There is no overlap of primes between successive
files that use the same number for the upper boundary of the
first file, and the lower boundary of the second file.
The primality of any number in the range of the file is
determined as follows:
If the number ends in 0, 2, 4, 5, 6, or 8, it is not prime.
Otherwise, the location of the specifying byte is at offset:
( value - start ) / 20
Within that byte, the primality of the value is specified by the
bit as shown in the following table.
The only tedious programming tasks were:
Special case code for values less than 20, which include 2 and 5,
and exclude 1 and 9. All larger values follow the same simple
pattern.
The increment and decrement operators for the corresponding
iterators must search forward (or backward) for the next true
bit, specifying the next prime number.
This table shows the layout and content for a file containing 20
to 60. The first 24 bytes (start value, end value, check sum)
are not shown.
Byte 0------------------------------| 1----------------------------
Bit 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Value 21 23 27 29 31 33 37 39 41 43 47 49 51 53 57 59
Prime F T F T T F T F T T T F F T F T
Hex 5A E5
As primes become larger, the density of primes becomes smaller as
1/ln(n). Thus the density of true bits also falls off. The number
of digits (binary or decimal) to represent the primes grows as
ln(n). Thus, a sequence of primes represented as primec is always
competitive in size with the corresponding sequence in ASCI text
or binary, besides providing fast direct access by value:
bool isPrime(value).
The results for the sequence of the largest 64 bit primes
(18446744073707000000 to 18446744073709551558) is:
Format Size
(KB)
64 Bit Binary 445
Txt 1150
Zip Txt 114
PrimeC 125
Programs
Among the programs in the "program.zip" package are:
BuildTxtPrime Create file of primes, txt format
TxtToPrimeC Convert txt to primec format.
Goldbach Verify Goldbach's conjecture for zero to 1E12
Among the more than 15 classes and utilities are:
PrimeGenerator Create prime numbers in a given range.
Directory A vector of strings containing the names of
files in a file directory.
Progress A class to manage the periodic reporting
of program activity.
PrimeCVector Abstract class providing the algorithms
to access primec data.
PrimeCFileWriter Create a primec file.
PrimeCFileReader Read a primec file.
I hope you find them useful.
If you have any questions, observations or bug reports concerning
the C++ programming or the content of the prime files, send an
email (after changing "at" to "@".
primes1e12 at earthlink.net
I embarked on this project as a programming challenge. I am not a
mathematician. I have no deep insight into prime number theory.
Please confine messages to programming issues. Here are some
references:
Prime Numbers: http://www.primos.mat.br/indexen.html
Wikipedia: List of Prime Numbers (with numerous references): https://en.wikipedia.org/wiki/List_of_prime_numbers
The Math Forum: http://mathforum.org/dr.math/faq/faq.prime.num.html
The Prime Pages https://primes.utm.edu
The program files are also posted on
http://home.earthlink.net/~primes1E12.
Corrections and additions will be posted there as they occur.
Don Kostuch
October, 2018.