1Note that the UNROLL option makes the 'inner' des loop unroll all 16 rounds
2instead of the default 4.
3RISC1 and RISC2 are 2 alternatives for the inner loop and
4PTR means to use pointers arithmetic instead of arrays.
5
6IRIX 6.2 - R10000 195mhz - cc (-O3 -n32) - UNROLL RISC2 PTR	496,000 3968k/s
7solaris 2.5.1 usparc 167mhz?? - SC4.0 - UNROLL RISC1 PTR [1]	475,400 3804k/s
8solaris 2.5.1 usparc 167mhz?? - gcc 2.7.2 - UNROLL RISC1 PTR	306,000 2448k/s
9linux - pentium 100mhz - gcc 2.7.0 - assember			281,000 2250k/s
10NT 4.0 - pentium 100mhz - VC 4.2 - assember			281,000 2250k/s
11IRIX 5.3 - R4400 200mhz - gcc 2.6.3 - UNROLL RISC2 PTR		235,300 1882k/s
12IRIX 5.3 - R4400 200mhz - cc - UNROLL RISC2 PTR			233,700 1869k/s
13NT 4.0 - pentium 100mhz - VC 4.2 - UNROLL RISC1 PTR		191,000 1528k/s
14DEC Alpha 165mhz??  - cc - RISC2 PTR [2]			181,000 1448k/s
15linux - pentium 100mhz - gcc 2.7.0 - UNROLL RISC1 PTR		158,500 1268k/s
16HPUX 10 - 9000/887 - cc - UNROLL [3]	 			148,000	1190k/s
17solaris 2.5.1 - sparc 10 50mhz - gcc 2.7.2 - UNROLL		123,600  989k/s
18IRIX 5.3 - R4000 100mhz - cc - UNROLL RISC2 PTR			101,000  808k/s
19DGUX - 88100 50mhz(?) - gcc 2.6.3 - UNROLL			 81,000  648k/s
20solaris 2.4 486 50mhz - gcc 2.6.3 - assember			 65,000  522k/s
21HPUX 10 - 9000/887 - k&r cc (default compiler) - UNROLL PTR	 76,000	 608k/s
22solaris 2.4 486 50mhz - gcc 2.6.3 - UNROLL RISC2		 43,500  344k/s
23AIX - old slow one :-) - cc -					 39,000  312k/s
24
25Notes.
26[1] For the ultra sparc, SunC 4.0 cc -fast -Xa -xO5, running 'des_opts'
27    gives a speed of 475,000 des/s while 'speed' gives 417,000 des/s.
28    I believe the difference is tied up in optimisation that the compiler
29    is able to perform when the code is 'inlined'.  For 'speed', the DES
30    routines are being linked from a library.  I'll record the higher
31    speed since if performance is everything, you can always inline
32    'des_enc.c'.
33[2] Similar to the ultra sparc ([1]), 181,000 for 'des_opts' vs 175,000.
34[3] I was unable to get access to this machine when it was not heavily loaded.
35    As such, my timing program was never able to get more that %30 of the CPU.
36    This would cause the program to give much lower speed numbers because
37    it would be 'fighting' to stay in the cache with the other CPU burning
38    processes.
39