README
1Copyright 2002, 2005 Free Software Foundation, Inc.
2
3This file is part of the GNU MP Library.
4
5The GNU MP Library is free software; you can redistribute it and/or modify
6it under the terms of either:
7
8 * the GNU Lesser General Public License as published by the Free
9 Software Foundation; either version 3 of the License, or (at your
10 option) any later version.
11
12or
13
14 * the GNU General Public License as published by the Free Software
15 Foundation; either version 2 of the License, or (at your option) any
16 later version.
17
18or both in parallel, as here.
19
20The GNU MP Library is distributed in the hope that it will be useful, but
21WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
22or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
23for more details.
24
25You should have received copies of the GNU General Public License and the
26GNU Lesser General Public License along with the GNU MP Library. If not,
27see https://www.gnu.org/licenses/.
28
29
30
31
32
33This directory contains assembly code for nails-enabled 21264. The code is not
34very well optimized.
35
36For addmul_N, as N grows larger, we could make multiple loads together, then do
37about 3.3 i/c. 10 cycles after the last load, we can increase to 4 i/c. This
38would surely allow addmul_4 to run at 2 c/l, but the same should be possible
39also for addmul_3 and perhaps even addmul_2.
40
41
42 current fair best
43Routine c/l unroll c/l unroll c/l i/c
44mul_1 3.25 2.75 2.75 3.273
45addmul_1 4.0 4 3.5 4 14 3.25 3.385
46addmul_2 4.0 1 2.5 2 10 2.25 3.333
47addmul_3 3.0 1 2.33 2 14 2 3.333
48addmul_4 2.5 1 2.125 2 17 2 3.135
49
50addmul_5 2 1 10
51addmul_6 2 1 12
52addmul_7 2 1 14
53
54(The "best" column doesn't account for bookkeeping instructions and
55thereby assumes infinite unrolling.)
56
57Basecase usages:
58
591 addmul_1
602 addmul_2
613 addmul_3
624 addmul_4
635 addmul_3 + addmul_2 2.3998
646 addmul_4 + addmul_2
657 addmul_4 + addmul_3
66