Working through Google's billboard brainteaser from 2004

Jonathan E. Landrum cde70eb020 Converted to KeTeX in README.md		7 years ago
README.md	cde70eb020 Converted to KeTeX in README.md	7 years ago
e.hs	9ff7a81693 Fixing errors in `e.hs` and `README.md`	7 years ago
google-billboard.c	e132fa6577 Moved program to its own project	10 years ago
google-billboard.png	12704df61d Add files via upload	7 years ago

Solving Google's Billboard Brainteaser

Since I don't live on either the east coast or the west coast, I never encountered Google's near-infamous "first 10-digit prime" advertisements when they were posted in 2004. I instead happened upon it just the other day while reading Mathematics.SE^[1]. The gist of the problem is to create a program that chunks the fractional component of the number $e$^[2] into 10-digit integers, and then finds the first number in that set that is prime. Considering I've done a lot of tinkering with primes in the past^[3], this sounded like a problem that was right down my alley.

Analysis

The first step in solving this problem is a cognitive one, not a programmatic one. Before we can set out to find this prime number, we have to know about where to expect it. Do we need the first 100 digits of $e$, or the first 1,000,000? Or more? The good news is Chebyshev, along with the help of a lot of other very smart mathematicians who worked on this problem before him, already proved that the distribution of prime numbers in the Natural numbers is bounded^{[4; 5]}. This bound is described in his proof, and is what we now call the Prime Number Theorem^[6]. The theorem states that if we define $\pi(x)$ to be a function that returns the number of primes less than or equal to $x$^[7] for all $x \in \mathbb{R}$, then $\pi(x) \sim \frac{x}{\ln(x)}$^{[4; 5; 8]}. In other words, as the value of $x$ increases without bound, the error between the approximation $\frac{x}{\ln(x)}$ and the true value $\pi(x)$ approaches 0. For example, if we let $p(x) = \frac{x}{\ln(x)}$, then $\pi(10) = 4$, and $p(10) \sim 4.34$, which is an error of about 8.57%. Not bad for an estimate. But that estimate is for the total number of primes less than or equal to $x$, not the number of digits we would have to examine in order to expect a prime of $x$ digits in length. So how do we go about figuring out how many digits we need?

First, consider this: if $x = 10^10 - 1$ (which is the largest possible ten-digit number: 9,999,999,999), how many 10-digit numbers, $n$, are there in $x$? It is simply $x$ minus the total number of nine-digit numbers: $n = x - (10^9 - 1) = 10^10 - 10^9 = 9,000,000,000$, which is the quantity of all of the numbers between 1,000,000,000 and 9,999,999,999. So how many primes does $p$ estimate are in this range? We can't say $p(n)$, because that would give us the number of primes less than or equal to $n$, which isn't what we are looking for. So instead we have to start with the number of primes at or below $x$, which is about $4.34 \times 10^8$. Then we have to estimate the total number of primes less than or equal to the largest nine-digit number, $10^9 - 1$, which is about $4.83 \times 10^7$, and subtract that from our previous result to give us that estimated number of ten-digit primes, which is about $3.86 \times 10^8$. That is the number of primes $p$ estimates to be ten digits long, which we've already shown to be quite a close estimate of reality. Furthermore, three conditions of the problem help our cause: the set of all numbers on the order of $10^10$ is quite small, we have a 10-digit constraint for length, and our numbers will be arriving in random order. Because of these three conditions, we can safely assume a somewhat even distribution of these primes among the ten-digit numbers in $e$, even though the Prime Number Theorem states that primes become more sparse as magnitude increases^[4], and we can assume it's a problem that can be easily tackled in a single sitting.

Where does this leave us? If we constrain the value of $x$ to a ten-digit number, namely the first ten digits of $e$, how many ten-digit primes can we expect to find in $x$? At most, one, because it is the only ten-digit number able to be constructed from $x$. In fact, because there are $10^10 - 10^9$ ten-digit numbers, and because there are about $3.86 \times 10^8$ primes in this set, then the probability that a random ten-digit number is prime is approximately $(3.86 \times 10^8) / (10^10 - 10^9)$, which is $4.28999...\%$^[8], which is also the probability that the first ten digits of $e$ are prime, which isn't much. In general, by the function $p(x)$ above, the chance that a randomly chosen integer is prime is approximately $1 / \ln(x)$, which is about $4.34\%$ in this case. But what about the chance of findinding a ten-digit prime in, say, the first 100 digits of $e$? We can rearrange $p(x)$ to show that, for a ten-digit prime, we can expect to test approximately $\ln(10^10)$, or approximately 23, ten-digit numbers before we find one that is prime. So how many distinct ten-digit numbers are there in a given set of digits of $e$? Let's consider the first 100 digits. We can visualize a unique ten-digit number to be like a set of digits of fixed length:

[n, n, n, n, n, n, n, n, n, n], n, n, n, ...

This set counts as one ten-digit number, and then we shift the set down one digit and count the next number:

n, [n, n, n, n, n, n, n, n, n, n], n, n, ...

Doing this, we can see that in a 100-digit number we have almost $100 - 10$ distinct 10-digit numbers. It will be off by one, because by the time we get to the end of the set, the remaining nine digits never were the beginning digit of their own 10-digit number:

... n, n, n, [n, n, n, n, n, n, n, n, n, n]

This means we have 91 numbers to choose from in the first 100 digits of $e$ (actually, we have at most 91; if a group of ten digits begins with a 0, then that group doesn't represent a ten-digit number, and has to be thrown out), which is about four times more than $p(x)$ estimated would be necessary. However, for numbers this small, the estimations of $\pi(x)$ are not that good of an approximation. Thus, I chose to calculate 1,000 digits of $e$ so that the first hundred digits would be correct.

Calculating digits of $e$ is a slow process, and most people (that I'm aware of) use the Taylor series^[9] to perform the calculation. The Taylor series is a way of approximating the value of a function, and is especially useful for computational mathematics, where the result of a transcendental function is to be represented with a subset of the Rational numbers, which is a constraint on current computers.

The Taylor series of a function $f(x)$ is an infinite summation of the $n$^th derivative of $f$ evaluated at a point $a$ divided by $n!$ and multiplied by $(x - a)^n$:

or using the more compact sigma notation:

where $n$ is iterated infinitely through all of the Natural numbers, $n!$ is the factorial of $n$, $f^(n)(a)$ is the $n$^th derivative of $f$ evaluated at the point $a$, and $(x - a)^n$ is a correction for the offset of $a$ from 0.

But because we are calculating the value $e^x$ where $x = 0$, we are actually calculating a much simpler version of the Taylor series known as the Maclaurin series. So we can leave off the term $(x - a)^n$ at the end of the summation since $a = x$ in our case, and because the term $f^(n)(a)$ always evaluates to 1 for $f = e^0$^[2], we can write it as simply 1. This leaves a much more simplified infinite series to calculate:

Now the only question that remains is for how many iterations do we have to calculate this summation before we arrive at a close enough answer? For that, we turn to the Lagrange error bound for a Taylor polynomial. For our function, each term in the Taylor series is denoted by $1/n!$, and because the error bound of the Taylor series is asymptotic in nature^[14], then for the $n$^th term in the series given by $P_n(x)$, the term $P_{n + 1}(x)$ must be more accurate than the previous term, thus the error bound for a given term $P_n(x)$ is at most the value of the $(n + 1)$^th term. The value for $1/100!$ is about $1.0715 \times 10^-158$, or a decimal point followed by 158 zeroes before starting the significant digits. That is well beyond our initial guess of needing 100 digits of precision, and will thus work perfectly for this exercise.

Program Design

I initially wrote a program in a variant of Lisp to do the calculation of the first 1,000 digits of $e$, but I discovered that it doesn't understand floating point numbers. So since I had to change languages anyway, I opted for Haskell.

The program relies on the numbers package^[10], and on a desktop computer, you can expect to wait only a few minutes for the result. The reason for the delay is because getting the necessary precision requires calculating the summation 100 times as per our previous calculation. Here is the listing of the program:

import Data.Number.CReal

factorial n = if n < 2 then 1 else n * factorial (n - 1)

taylor n = if n < 2 then 1 else (1 / factorial n) + taylor (n - 1)

main::IO()

main = putStrLn(showCReal 1000 (taylor 100))

However, since it is now more than a decade later, it's significantly easier to just search for the digits of $e$ rather than calculate them. In fact, it is trivial to find a listing of $e$ into the millions of decimal places^[11]. And considering the first result when searching for "1000 digits of e"^[12] is a sparse page that looks like it just got off the boat from 1998, is hosted by the School of Mathematics and Statistics at the University of St. Andrews in Scotland, and is appropriately titled "10000 digits of $e$"^[13], I picked that one. It was actually quite fortuitous that the source of that page is so sparse, because there's little to dig through with our program.

I originally wrote this program in C, but I may eventually go back and do the whole thing in Haskell as a single main function. The program gets the digits by first using wget to pull the source of the page into a file pointer with popen, and then parses through the stream until it finds a . in the output. It does this because, upon examination of the source, the decimal point in the number $e$ is the only full-stop symbol used on the entire page, so score one for ease.

There are three dependency headers and five helper functions for the main method. The required headers are not uncommon and are simply added with a respective include statement:

#include <inttypes.h>
#include <math.h>
#include <stdio.h>

The helper functions fairly straightforward, as well:

int isPrime(uint64_t);
int isNumeric(char);
int isFullLength(char[]);
uint64_t arrToNum(char[], int);
void modBuff(char*, int);

The first three int type functions are obviously boolean, with isPrime telling if its input is prime or not, isNumeric telling if its input is numeric or not, and isFullLength telling if the buffer holds a ten-digit number or not. The next function, arrToNum, takes an array and returns an integer with the values from the array, and modBuff modifies the buffer by sliding the 2^nd element to the 1^st slot, and continues in like fashion until the end of the array. This preserves the previous values and makes room for one more digit. The purpose of doing this is because a file pointer opened with popen can't rewind (which makes sense, given we are reading directly from the output of wget).

With the preliminaries out of the way, now all that's left is to run over the characters ten at a time and check to see if the resulting number is prime or not, and then iterate one character ahead to get the next ten-digit number. The program finds the number 7,427,466,391 almost instantaneously, which starts at the 98^th digit. So the initial guess of 100 digits would have been a bit too short. But hey, still not bad for a seemingly rough approximation.

The original "Google Billboard Problem" suggested to visit the url 7427466391.com (dead link), which revealed a second brain teaser. Solving that one supposedly allowed you to submit your resume to Google, but the site has been offline for quite a while. Nevertheless, it was a fun problem to tackle. Here's the full source of the code:

#include <inttypes.h>
#include <math.h>
#include <stdio.h>

/*
 * Function Prototypes
 */
int isPrime(uint64_t);
int isNumeric(char);
int isFullLength(char[]);
uint64_t arrToNum(char[], int);
void modBuff(char*, int);

/*
 * Main Method
 */
int main(int argc, char **argv) {
    /* Open the site as a file for reading the source */
    FILE *fptr = popen("wget --quiet -O - http://www-history.mcs.st-andrews.ac.uk/HistTopics/e_10000.html", "r");
    
    /* Variables --------------------------------------------------------- */
    int ctr = 1, itr; /* Two counter variables for the for loop and count  */
    char buf[10];     /* A character buffer to store the individual digits */
    char cptr;        /* A character to hold the next item in the stream   */
    uint64_t num;     /* An unsigned 64-bit integer to hold the number     */
    /* ------------------------------------------------------------------- */
    
    /* Move the pointer to the numbers past the decimal point */
    while ((cptr = fgetc(fptr)) != '.');
    
    /* Put the first 10-digit number in the buffer */
    for (itr = 0; itr < 10; itr++) {
        /* Grab the next character */
        cptr = fgetc(fptr);
        
        /* Check to see if the character is numeric */
        if (isNumeric(cptr)) {
            /* Store it in the buffer if it is */
            buf[itr] = cptr;
        } else {
            /* Peel off any non-numeric characters */
            while (!isNumeric(cptr)) cptr = fgetc(fptr);
            
            /* Assign the character to the buffer if it's numeric */
            if (isNumeric(cptr)) buf[itr] = cptr;
        }
    }
    
    /* Iterate across the data and test the numbers */
    do {
        /* Check that the 10-digit number doesn't actually start with a 0,
         * and skip this iteration if it does. */
        if (isFullLength(buf)) {
            /* Convert the buffer into an integer */
            num = arrToNum(buf, 10);
            
            /* Print the number and increment the counter */
            printf("%02d.) %" PRIu64 "", ctr++, num);
            
            /* Check for primality of the number to exit the loop */
            if (isPrime(num)) {
                printf (" (1st 10-digit prime)\n");
                break;
            }
            
            /* Separate the numbers */
            printf("\n");
        }
        
        /* Grab the next character */
        cptr = fgetc(fptr);
        
        /* Peel off any non-numeric characters due to pre-formatted text */
        while (!isNumeric(cptr) && cptr != '<') cptr = fgetc(fptr);
        
        /* Check to make sure we're not at the end of the list */
        if (cptr == '<') break;
        
        /* Slide the unused numbers in the buffer down one step */
        modBuff(buf, 10);
        
        /* Put the next character at the end of the buffer */
        buf[9] = cptr;
    } while (isNumeric(cptr));
    
    /* Close the website pointer */
    pclose(fptr);
    
    /* Exit */
    return 0;
}

/*
 * isPrime(uint64_t)
 * Checks for primality of a number
 */
int isPrime(uint64_t num) {
    if (num < 2) return 0;
    if (num == 2 || num == 3) return 1;
    if (num % 2 == 0) return 0;
    uint64_t sr = ceil(pow(num, 0.5));
    uint64_t ctr;
    for (ctr = 3; ctr <= sr; ctr += 2) if (num % ctr == 0) return 0;
    return 1;
}

/*
 * isNumeric(char)
 * Checks to make sure the character passed is a number
 */
int isNumeric(char num) {
    return (num - '0' >= 0 && num - '0' <= 9);
}

/*
 * isFullLength(char[])
 * Simply checks to see that an n-digit number is, in fact, n-digits long
 */
int isFullLength(char arr[]) {
    return (arr[0] - '0');
}

/*
 * arrToNum(char[], int)
 * Converts a character array to an integer
 */
uint64_t arrToNum(char arr[], int len) {
    uint64_t res = 0;
    int ctr;
    for (ctr = 0; ctr < len; ctr++) res = (res * 10) + (arr[ctr] - '0');
    return res;
}

/*
 * modBuff(*char, int)
 * Slides the elements in the buffer down one slot
 */
void modBuff(char *arr, int len) {
    int ctr;
    for (ctr = 0; ctr < (len - 1); ctr++) arr[ctr] = arr[ctr + 1];
    return;
}

References

First 10-digit prime in consecutive digits of e. Mathematics.SE user “$kptlronyttcn$”. Accessed 21 May 2015.
$e$ (mathematical constant). Wikipedia. Accessed 21 May 2015.
Prime Numbers tree on GitLab.com. $Jonathan E. Landrum$. GitLab.com. Accessed 12 June 2018.
Definition of the Prime Number Theorem. $Chris K. Caldwell$. UTM. Accessed 21 May 2015.
Lectures #5 and 6: The Prime Number Theorem. $Noah Snyder$. Columbia. Accessed 21 May 2015.
Prime number theorem. Wikipedia. Accessed 21 May 2015.
Prime-counting function. Wikipedia. Accessed 21 May 2015.
How Many Primes Are There? $Chris K. Caldwell$. UTM. Accessed 21 May 2015.
Taylor series. Wikipedia. Accessed 11 June 2018.
Haskell numbers package. Hackage. Accessed 11 June 2018.
First 2 million digits of e. $Robert Nemiroff$ and $Jerry Bonnell$. Nasa. Accessed 21 May 2015.
Search engine result page for query “1000 digits of e”. Google. Accessed 21 May 2015.
10000 digits of $e$. $MacTutor History of Mathematics archive$, School of Mathematics and Statistics, University of St. Andrews. Accessed 21 May 2015.
Taylor Series - Error Bounds. $July Thomas$. Brilliant.org. Accessed 12 July 2018.

README.md

Solving Google's Billboard Brainteaser

In This Article

Analysis

Program Design

References