The Hints to Speed Things Up

At a first glance an overview of the memory organization

  • Low memory: >2800++ but well below >3000 The main program
  • Low memory: >3000 - >3FFF The scroll routine with the screen buffer
  • 16 Bit memory: >8300 - >83FF The prime kernel, the display, the workspace, and some space for the interrupts
  • High memory: >A000 - >FFFF A table for the online calculated primes and in sequel the offset table

    To speed things up the time critical stuff is put in the 16 Bit area.

    The scroll routine
    All the VDP addresses are held in registers and the scroll code is completely unfold to yield no loop overhead. The scroll don't blank the last line because this line is overwritten with numbers of same or greater length.

    The number display
    The most used divider is held in a register. The number to display is pre divided in two parts, so standard display algorithms will work, yielding 655359999 as the maximal displayable number which is far away to be calculated. The number chars are redefined, so I need no add to justify the fraction of the number to display the right char. The row is divided in fixed columns to display four numbers in one row.

    The watchdog timer
    I don't use the VDP interrupt because it comes too often. Therefore I make use of the TMS 9901 timer and choose the maximal time delay which is ~0.35s. The interrupt routine was analyzed carefully to achieve the shortest possible break. The program will not respond on any key press or something else till one hour is over. The time limit is checked only on completed lines before a scroll and therefore no garbage is left in the last line caused by a unblanked line from the scroll before.

    Calculating the primes
    The algorithm is split in two parts. The first part is responsible for the primes up to 65536, the second part is responsible for the primes over 65536 up to ~4000000000. In the first part the primes are additionally stored in a table, so I can use them later for calculating the higher values. This technique is called dynamic programming . To see if any natural number is dividable, it is sufficient to check if the divider reach the square root of the number. I do it the other way. I check if the squared divider is greater than the number. The second part make use of number theorie to check for a fraction in a 32 bit word. I therefore need only two DIV's to extract the fraction of the 32 bit number and I can therefore also see if a number is dividable. The following algorithm accelerator comes from number theorie too. It is possible to predict which numbers have potential to be prime or not. One well known fact is that only odd numbers can be prime. The mathematical stuff behind this common knowledge follows and is only used straightforward...

        A given number mk+a with m in N, k in No and 0 <  a <  m is *no* prime,
        if m and a is dividable by d >  1, except for k = 0 and a is a prime.
        (Harald Scheid 1994, Zahlentheorie)
        If m is dividable by d and a is dividable by d, so is mk+a. That should be
        clear. So a good idea is to build m and a with no common divisors, because
        otherwise the number mk+a is definitely no prime with the stated exception
        For instance we choose:
           m = 2, a = {1}                   This is   2k + 1, the increment by two.
           m = 2 x 3, a = {1,5}             This is   6k + a.
           m = 2 x 3 x 5, a = {1,7,11,13,17,19,23,29} ...
        I do it with m = 2 x 3 x 5 x 7 x 11 x 13. Thus occupying the available
        memory which is left. The achievable speed ups are:
           m = 2           -> 2/(2-1)                       = 2
           m = 2 x 3       -> 2/(2-1) x 3/(3-1)             = 3
           m = 2 x 3 x 5   -> 2/(2-1) x 3/(3-1) x 5/(5-1)   = 3.75
        BTW there is a trade off. This kind of technique is NP-Space complete. 
        The memory it uses is   mem = (2-1) x (3-1) x (5-1) x (7-1) x ...
    With these actions a lot of primes are generated. Possibly enough to win...

    © Dr. Henry Koplien