**Statistical Primes Part 2**

Of course, the 2 helpful “prerequisites” for this article are:

Twin Primes and

In Statistical Primes (part 1) we discussed how the TwinPrimes program originally just counted Primes and Twin Primes. Then, for the “Statistical Primes” article it was enhanced to (closely?) estimate the the counts of Primes and Twins by extrapolating from statistical samples. The technique used was a simple variation of “Stratified Sampling.” The method used there was to specify

How many strata to use and

How many elements (samples) within each stratum to process.

In particular, the elements/samples in each stratum were simply the first N numbers of each stratum. It was a very simple and easily implemented method which yielded surprisingly accurate estimates of the number of Primes and Twins.

In the prior article I noted that investigating other techniques would be left to the reader. That was kind of true… after a few days I got the bug to see if the results would be any better (or worse) if the samples, from within each of the strata , were **chosen randomly.** And that is what this article covers.

A button was added to the TwinPrimes program giving the user a second way to estimate the count of Primes and Twins. With that button, the samples within the strata would be chosen at random using the C# Random Class.

https://msdn.microsoft.com/en-us/library/system.random(v=vs.110).aspx

It may interest you to know that the Random class only accepts int for parameters and only returns an int for the random number chosen. Initially this presented a small problem since the TwinPrimes program deals with really BIG numbers but the limitation was overcome by limiting the size of any single stratum to 2×10^{9} (2 Billion or less… good enough for me for now).

**Sample Results**

In general, I suspected that randomly sampling N numbers within each stratum would yield more accurate estimates then using the first N numbers within each stratum. This was indeed the case but the accuracy differences were minimal at the low end and became much more pronounced as we moved to larger numbers. For example:

**For the range of 1 → 1 Billion**

When the first N numbers in each stratum are used for the sample then

The estimated count of Primes differs from the actual count of Primes by 0.24 of 1% (by 0.00236 of the actual count).

The estimate count of Twins differs from the actual count by 1.3% (by 0.0131 of the actual count).

When the numbers in each stratum are **randomly chosen for the sample then**

The estimated count of Primes differs from the actual count of Primes by 0.12 of 1% (by 0.0012 of the actual count).

The estimated count of Twins differs from the actual count by 0.2% (by 0.00195 of the actual count).

**For the range of 1 → 1 Trillion**

When the first N numbers in each stratum are used for the sample then

The estimated count of Primes differs from the actual count of Primes by 0.07 of 1% (by 0.00068 of the actual count).

The estimated count of Twins differs from the actual count by 0.92 of 1% (by 0.0092 of the actual count).

When the numbers in each stratum are **randomly chosen for the sample** then

The estimated count of Primes differs from the actual count of Primes by 0.04 of 1% (by 0.00041 of the actual count).

The estimate count of Twins differs from the actual count by 0.07 of 1% (by 0.00073 of the actual count).

An interesting note here. In the TwinPrimes program I chose to have the Random class/object “randomly” seeded from the system clock. As such, the estimates it gives will vary with each execution. Of course, I could have seeded the class (object) with the same number every time and the results would always be the same. But that would present its own issues. Of course, I could make it an option to do it either way but…

When I started I didn’t know whether one method would be significantly superior to the other but it appears that both methods of sampling give (I think) very good estimates. And of course, there are many other ways of “statistically” estimating the count of primes and prime related “stuff.”

Below are 2 things you may be interested in:

- Screen print of latest TwinPrimes program. The “
*GO*” button simply uses brute force to calculate the number of Primes and Twins from “Start Number” thru “How many numbers to test…” The 2 “*GO WITH SAMPLING*” buttons estimate the counts using the techniques talked of above.

- There’s a signpost ahead… it’s a copy of a spreadsheet I used to track results.

The End