1 | Copyright 2000, 2001, 2002 Free Software Foundation, Inc. |
---|
2 | |
---|
3 | This file is part of the GNU MP Library. |
---|
4 | |
---|
5 | The GNU MP Library is free software; you can redistribute it and/or modify |
---|
6 | it under the terms of the GNU Lesser General Public License as published by |
---|
7 | the Free Software Foundation; either version 2.1 of the License, or (at your |
---|
8 | option) any later version. |
---|
9 | |
---|
10 | The GNU MP Library is distributed in the hope that it will be useful, but |
---|
11 | WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY |
---|
12 | or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public |
---|
13 | License for more details. |
---|
14 | |
---|
15 | You should have received a copy of the GNU Lesser General Public License |
---|
16 | along with the GNU MP Library; see the file COPYING.LIB. If not, write to |
---|
17 | the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA |
---|
18 | 02111-1307, USA. |
---|
19 | |
---|
20 | |
---|
21 | |
---|
22 | |
---|
23 | |
---|
24 | GMP SPEED MEASURING AND PARAMETER TUNING |
---|
25 | |
---|
26 | |
---|
27 | The programs in this directory are for knowledgeable users who want to |
---|
28 | measure GMP routines on their machine, and perhaps tweak some settings or |
---|
29 | identify things that can be improved. |
---|
30 | |
---|
31 | The programs here are tools, not ready to run solutions. Nothing is built |
---|
32 | in a normal "make all", but various Makefile targets described below exist. |
---|
33 | |
---|
34 | Relatively few systems and CPUs have been tested, so be sure to verify that |
---|
35 | results are sensible before relying on them. |
---|
36 | |
---|
37 | |
---|
38 | |
---|
39 | |
---|
40 | MISCELLANEOUS NOTES |
---|
41 | |
---|
42 | --enable-assert |
---|
43 | |
---|
44 | Don't configure with --enable-assert, since the extra code added by |
---|
45 | assertion checking may influence measurements. |
---|
46 | |
---|
47 | Direct mapped caches |
---|
48 | |
---|
49 | Some effort has been made to accommodate CPUs with direct mapped caches, |
---|
50 | by putting data blocks more or less contiguously on the stack. But this |
---|
51 | will depend on TMP_ALLOC using alloca, and even then it may or may not |
---|
52 | be enough. |
---|
53 | |
---|
54 | FreeBSD 4.2 i486 getrusage |
---|
55 | |
---|
56 | This getrusage seems to be a bit doubtful, it looks like it's |
---|
57 | microsecond accurate, but sometimes ru_utime remains unchanged after a |
---|
58 | time of many microseconds has elapsed. It'd be good to detect this in |
---|
59 | the time.c initializations, but for now the suggestion is to pretend it |
---|
60 | doesn't exist. |
---|
61 | |
---|
62 | ./configure ac_cv_func_getrusage=no |
---|
63 | |
---|
64 | NetBSD 1.4.1 m68k macintosh time base |
---|
65 | |
---|
66 | On this system it's been found getrusage often goes backwards, making it |
---|
67 | unusable (configure is setup to ignore it). gettimeofday sometimes |
---|
68 | doesn't update atomically when it crosses a 1 second boundary. Not sure |
---|
69 | what to do about this. Expect intermittent failures. |
---|
70 | |
---|
71 | SCO OpenUNIX 8 /etc/hw |
---|
72 | |
---|
73 | /etc/hw takes about a second to return the cpu frequency, which suggests |
---|
74 | perhaps it's measuring each time it runs. If this is annoying when |
---|
75 | running the speed program repeatedly then set a GMP_CPU_FREQUENCY |
---|
76 | environment variable (see TIME BASE section below). |
---|
77 | |
---|
78 | Low resolution timebase |
---|
79 | |
---|
80 | Parameter tuning can be very time consuming if the only timebase |
---|
81 | available is a 10 millisecond clock tick, to the point of being |
---|
82 | unusable. This is currently the case on VAX and ARM systems. |
---|
83 | |
---|
84 | |
---|
85 | |
---|
86 | |
---|
87 | PARAMETER TUNING |
---|
88 | |
---|
89 | The "tuneup" program runs some tests designed to find the best settings for |
---|
90 | various thresholds, like MUL_KARATSUBA_THRESHOLD. Its output can be put |
---|
91 | into gmp-mparam.h. The program is built and run with |
---|
92 | |
---|
93 | make tune |
---|
94 | |
---|
95 | If the thresholds indicated are grossly different from the values in the |
---|
96 | selected gmp-mparam.h then there may be a performance boost in applicable |
---|
97 | size ranges by changing gmp-mparam.h accordingly. |
---|
98 | |
---|
99 | Be sure to do a full reconfigure and rebuild to get any newly set thresholds |
---|
100 | to take effect. A partial rebuild is enough sometimes, but a fresh |
---|
101 | configure and make is certain to be correct. |
---|
102 | |
---|
103 | If a CPU has specific tuned parameters coming from a gmp-mparam.h in one of |
---|
104 | the mpn subdirectories then the values from "make tune" should be similar. |
---|
105 | But check that the configured CPU is right and there are no machine specific |
---|
106 | effects causing a difference. |
---|
107 | |
---|
108 | It's hoped the compiler and options used won't have too much effect on |
---|
109 | thresholds, since for most CPUs they ultimately come down to comparisons |
---|
110 | between assembler subroutines. Missing out on the longlong.h macros by not |
---|
111 | using gcc will probably have an effect. |
---|
112 | |
---|
113 | Some thresholds produced by the tune program are merely single values chosen |
---|
114 | from what's a range of sizes where two algorithms are pretty much the same |
---|
115 | speed. When this happens the program is likely to give somewhat different |
---|
116 | values on successive runs. This is noticeable on the toom3 thresholds for |
---|
117 | instance. |
---|
118 | |
---|
119 | |
---|
120 | |
---|
121 | |
---|
122 | SPEED PROGRAM |
---|
123 | |
---|
124 | The "speed" program can be used for measuring and comparing various |
---|
125 | routines, and producing tables of data or gnuplot graphs. Compile it with |
---|
126 | |
---|
127 | make speed |
---|
128 | |
---|
129 | (Or on DOS systems "make speed.exe".) |
---|
130 | |
---|
131 | Here are some examples of how to use it. Check the code for all the |
---|
132 | options. |
---|
133 | |
---|
134 | Draw a graph of mpn_mul_n, stepping through sizes by 10 or a factor of 1.05 |
---|
135 | (whichever is greater). |
---|
136 | |
---|
137 | ./speed -s 10-5000 -t 10 -f 1.05 -P foo mpn_mul_n |
---|
138 | gnuplot foo.gnuplot |
---|
139 | |
---|
140 | Compare mpn_add_n and an mpn_lshift by 1, showing times in cycles and |
---|
141 | showing under mpn_lshift the difference between it and mpn_add_n. |
---|
142 | |
---|
143 | ./speed -s 1-40 -c -d mpn_add_n mpn_lshift.1 |
---|
144 | |
---|
145 | Using option -c for times in cycles is interesting but normally only |
---|
146 | necessary when looking carefully at assembler subroutines. You might think |
---|
147 | it would always give an integer value, but this doesn't happen in practice, |
---|
148 | probably due to overheads in the time measurements. |
---|
149 | |
---|
150 | In the free-form output the "#" symbol against a measurement means the |
---|
151 | corresponding routine is fastest at that size. This is a convenient visual |
---|
152 | cue when comparing different routines. The graph data files <name>.data |
---|
153 | don't get this since it would upset gnuplot or other data viewers. |
---|
154 | |
---|
155 | |
---|
156 | |
---|
157 | |
---|
158 | TIME BASE |
---|
159 | |
---|
160 | The time measuring method is determined in time.c, based on what the |
---|
161 | configured host has available. A cycle counter is preferred, possibly |
---|
162 | supplemented by another method if the counter has a limited range. A |
---|
163 | microsecond accurate getrusage() or gettimeofday() will work quite well too. |
---|
164 | |
---|
165 | The cycle counters (except possibly on alpha) and gettimeofday() will depend |
---|
166 | on the machine being otherwise idle, or rather on other jobs not stealing |
---|
167 | CPU time from the measuring program. Short routines (those that complete |
---|
168 | within a timeslice) should work even on a busy machine. |
---|
169 | |
---|
170 | Some trouble is taken by speed_measure() in common.c to avoid ill effects |
---|
171 | from sporadic interrupts, or other intermittent things (like cron waking up |
---|
172 | every minute). But generally an idle machine will be necessary to be |
---|
173 | certain of consistent results. |
---|
174 | |
---|
175 | The CPU frequency is needed to convert between cycles and seconds, or for |
---|
176 | when a cycle counter is supplemented by getrusage() etc. The speed program |
---|
177 | will convert as necessary according to the output format requested. The |
---|
178 | tune program will work with either cycles or seconds. |
---|
179 | |
---|
180 | freq.c knows how to get the frequency on some systems, or can measure a |
---|
181 | cycle counter against gettimeofday() or getrusage(), but when that fails, or |
---|
182 | needs to be overridden, an environment variable GMP_CPU_FREQUENCY can be |
---|
183 | used (in Hertz). For example in "bash" on a 650 MHz machine, |
---|
184 | |
---|
185 | export GMP_CPU_FREQUENCY=650e6 |
---|
186 | |
---|
187 | A high precision time base makes it possible to get accurate measurements in |
---|
188 | a shorter time. |
---|
189 | |
---|
190 | |
---|
191 | |
---|
192 | |
---|
193 | EXAMPLE COMPARISONS - VARIOUS |
---|
194 | |
---|
195 | Here are some ideas for things that can be done with the speed program. |
---|
196 | |
---|
197 | There's always going to be a certain amount of overhead in the time |
---|
198 | measurements, due to reading the time base, and in the loop that runs a |
---|
199 | routine enough times to get a reading of the desired precision. Noop |
---|
200 | functions taking various arguments are available to measure this. The |
---|
201 | "overhead" printed by the speed program each time in its intro is the "noop" |
---|
202 | routine, but note that this is just for information, it isn't deducted from |
---|
203 | the times printed or anything. |
---|
204 | |
---|
205 | ./speed -s 1 noop noop_wxs noop_wxys |
---|
206 | |
---|
207 | To see how many cycles per limb a routine is taking, look at the time |
---|
208 | increase when the size increments, using option -D. This avoids fixed |
---|
209 | overheads in the measuring. Also, remember many of the assembler routines |
---|
210 | have unrolled loops, so it might be necessary to compare times at, say, 16, |
---|
211 | 32, 48, 64 etc to see what the unrolled part is taking, as opposed to any |
---|
212 | finishing off. |
---|
213 | |
---|
214 | ./speed -s 16-64 -t 16 -C -D mpn_add_n |
---|
215 | |
---|
216 | The -C option on its own gives cycles per limb, but is really only useful at |
---|
217 | big sizes where fixed overheads are small compared to the code doing the |
---|
218 | real work. Remember of course memory caching and/or page swapping will |
---|
219 | affect results at large sizes. |
---|
220 | |
---|
221 | ./speed -s 500000 -C mpn_add_n |
---|
222 | |
---|
223 | Once a calculation stops fitting in the CPU data cache, it's going to start |
---|
224 | taking longer. Exactly where this happens depends on the cache priming in |
---|
225 | the measuring routines, and on what sort of "least recently used" the |
---|
226 | hardware does. Here's an example for a CPU with a 16kbyte L1 data cache and |
---|
227 | 32-bit limb, showing a suddenly steeper curve for mpn_add_n at about 2000 |
---|
228 | limbs. |
---|
229 | |
---|
230 | ./speed -s 1-4000 -t 5 -f 1.02 -P foo mpn_add_n |
---|
231 | gnuplot foo.gnuplot |
---|
232 | |
---|
233 | When a routine has an unrolled loop for, say, multiples of 8 limbs and then |
---|
234 | an ordinary loop for the remainder, it can happen that it's actually faster |
---|
235 | to do an operation on, say, 8 limbs than it is on 7 limbs. The following |
---|
236 | draws a graph of mpn_sub_n, to see whether times smoothly increase with |
---|
237 | size. |
---|
238 | |
---|
239 | ./speed -s 1-100 -c -P foo mpn_sub_n |
---|
240 | gnuplot foo.gnuplot |
---|
241 | |
---|
242 | If mpn_lshift and mpn_rshift have special case code for shifts by 1, it |
---|
243 | ought to be faster (or at least not slower) than shifting by, say, 2 bits. |
---|
244 | |
---|
245 | ./speed -s 1-200 -c mpn_rshift.1 mpn_rshift.2 |
---|
246 | |
---|
247 | An mpn_lshift by 1 can be done by mpn_add_n adding a number to itself, and |
---|
248 | if the lshift isn't faster there's an obvious improvement that's possible. |
---|
249 | |
---|
250 | ./speed -s 1-200 -c mpn_lshift.1 mpn_add_n_self |
---|
251 | |
---|
252 | On some CPUs (AMD K6 for example) an "in-place" mpn_add_n where the |
---|
253 | destination is one of the sources is faster than a separate destination. |
---|
254 | Here's an example to see this. ".1" selects dst==src1 for mpn_add_n (and |
---|
255 | mpn_sub_n), for other values see speed.h SPEED_ROUTINE_MPN_BINARY_N_CALL. |
---|
256 | |
---|
257 | ./speed -s 1-200 -c mpn_add_n mpn_add_n.1 |
---|
258 | |
---|
259 | The gmp manual points out that divisions by powers of two should be done |
---|
260 | using a right shift because it'll be significantly faster than an actual |
---|
261 | division. The following shows by what factor mpn_rshift is faster than |
---|
262 | mpn_divrem_1, using division by 32 as an example. |
---|
263 | |
---|
264 | ./speed -s 10-20 -r mpn_rshift.5 mpn_divrem_1.32 |
---|
265 | |
---|
266 | |
---|
267 | |
---|
268 | |
---|
269 | EXAMPLE COMPARISONS - MULTIPLICATION |
---|
270 | |
---|
271 | mul_basecase takes a ".<r>" parameter which is the first (larger) size |
---|
272 | parameter. For example to show speeds for 20x1 up to 20x15 in cycles, |
---|
273 | |
---|
274 | ./speed -s 1-15 -c mpn_mul_basecase.20 |
---|
275 | |
---|
276 | mul_basecase with no parameter does an NxN multiply, so for example to show |
---|
277 | speeds in cycles for 1x1, 2x2, 3x3, etc, up to 20x20, in cycles, |
---|
278 | |
---|
279 | ./speed -s 1-20 -c mpn_mul_basecase |
---|
280 | |
---|
281 | sqr_basecase is implemented by a "triangular" method on most CPUs, making it |
---|
282 | up to twice as fast as mul_basecase. In practice loop overheads and the |
---|
283 | products on the diagonal mean it falls short of this. Here's an example |
---|
284 | running the two and showing by what factor an NxN mul_basecase is slower |
---|
285 | than an NxN sqr_basecase. (Some versions of sqr_basecase only allow sizes |
---|
286 | below SQR_KARATSUBA_THRESHOLD, so if it crashes at that point don't worry.) |
---|
287 | |
---|
288 | ./speed -s 1-20 -r mpn_sqr_basecase mpn_mul_basecase |
---|
289 | |
---|
290 | The technique described above with -CD for showing the time difference in |
---|
291 | cycles per limb between two size operations can be done on an NxN |
---|
292 | mul_basecase using -E to change the basis for the size increment to N*N. |
---|
293 | For instance a 20x20 operation is taken to be doing 400 limbs, and a 16x16 |
---|
294 | doing 256 limbs. The following therefore shows the per crossproduct speed |
---|
295 | of mul_basecase and sqr_basecase at around 20x20 limbs. |
---|
296 | |
---|
297 | ./speed -s 16-20 -t 4 -CDE mpn_mul_basecase mpn_sqr_basecase |
---|
298 | |
---|
299 | Of course sqr_basecase isn't really doing NxN crossproducts, but it can be |
---|
300 | interesting to compare it to mul_basecase as if it was. For sqr_basecase |
---|
301 | the -F option can be used to base the deltas on N*(N+1)/2 operations, which |
---|
302 | is the triangular products sqr_basecase does. For example, |
---|
303 | |
---|
304 | ./speed -s 16-20 -t 4 -CDF mpn_sqr_basecase |
---|
305 | |
---|
306 | Both -E and -F are preliminary and might change. A consistent approach to |
---|
307 | using them when claiming certain per crossproduct or per triangularproduct |
---|
308 | speeds hasn't really been established, but the increment between speeds in |
---|
309 | the range karatsuba will call seems sensible, that being k to k/2. For |
---|
310 | instance, if the karatsuba threshold was 20 for the multiply and 30 for the |
---|
311 | square, |
---|
312 | |
---|
313 | ./speed -s 10-20 -t 10 -CDE mpn_mul_basecase |
---|
314 | ./speed -s 15-30 -t 15 -CDF mpn_sqr_basecase |
---|
315 | |
---|
316 | Two versions of toom3 interpolation and evaluation are available in |
---|
317 | mpn/generic/mul_n.c, using either a one-pass open-coded style or simple mpn |
---|
318 | subroutine calls. The former is used on RISCs with lots of registers, the |
---|
319 | latter on other CPUs. The two can be compared directly to check which is |
---|
320 | best. Naturally it's sizes where toom3 is faster than karatsuba that are of |
---|
321 | interest. |
---|
322 | |
---|
323 | ./speed -s 80-120 -c mpn_toom3_mul_n_mpn mpn_toom3_mul_n_open |
---|
324 | ./speed -s 80-120 -c mpn_toom3_sqr_n_mpn mpn_toom3_sqr_n_open |
---|
325 | |
---|
326 | |
---|
327 | |
---|
328 | |
---|
329 | EXAMPLE COMPARISONS - MALLOC |
---|
330 | |
---|
331 | The gmp manual recommends application programs avoid excessive initializing |
---|
332 | and clearing of mpz_t variables (and mpq_t and mpf_t too). Every new |
---|
333 | variable will at a minimum go through an init, a realloc for its first |
---|
334 | store, and finally a clear. Quite how long that takes depends on the C |
---|
335 | library. The following compares an mpz_init/realloc/clear to a 10 limb |
---|
336 | mpz_add. Don't be surprised if the mallocing is quite slow. |
---|
337 | |
---|
338 | ./speed -s 10 -c mpz_init_realloc_clear mpz_add |
---|
339 | |
---|
340 | On some systems malloc and free are much slower when dynamic linked. The |
---|
341 | speed-dynamic program can be used to see this. For example the following |
---|
342 | measures malloc/free, first static then dynamic. |
---|
343 | |
---|
344 | ./speed -s 10 -c malloc_free |
---|
345 | ./speed-dynamic -s 10 -c malloc_free |
---|
346 | |
---|
347 | Of course a real world program has big problems if it's doing so many |
---|
348 | mallocs and frees that it gets slowed down by a dynamic linked malloc. |
---|
349 | |
---|
350 | |
---|
351 | |
---|
352 | |
---|
353 | |
---|
354 | EXAMPLE COMPARISONS - STRING CONVERSIONS |
---|
355 | |
---|
356 | mpn_get_str does a binary to string conversion. The base is specified with |
---|
357 | a ".<r>" parameter, or decimal by default. Power of 2 bases are much faster |
---|
358 | than general bases. The following compares decimal and hex for instance. |
---|
359 | |
---|
360 | ./speed -s 1-20 -c mpn_get_str mpn_get_str.16 |
---|
361 | |
---|
362 | Smaller bases need more divisions to split a given size number, and so are |
---|
363 | slower. The following compares base 3 and base 9. On small operands 9 will |
---|
364 | be nearly twice as fast, though at bigger sizes this reduces since in the |
---|
365 | current implementation both divide repeatedly by 3^20 (or 3^40 for 64 bit |
---|
366 | limbs) and those divisions come to dominate. |
---|
367 | |
---|
368 | ./speed -s 1-20 -cr mpn_get_str.3 mpn_get_str.9 |
---|
369 | |
---|
370 | mpn_set_str does a string to binary conversion. The base is specified with |
---|
371 | a ".<r>" parameter, or decimal by default. Power of 2 bases are faster than |
---|
372 | general bases on large conversions. |
---|
373 | |
---|
374 | ./speed -s 1-512 -f 2 -c mpn_set_str.8 mpn_set_str.10 |
---|
375 | |
---|
376 | mpn_set_str also has some special case code for decimal which is a bit |
---|
377 | faster than the general case, basically by giving the compiler a chance to |
---|
378 | optimize some multiplications by 10. |
---|
379 | |
---|
380 | ./speed -s 20-40 -c mpn_set_str.9 mpn_set_str.10 mpn_set_str.11 |
---|
381 | |
---|
382 | |
---|
383 | |
---|
384 | |
---|
385 | EXAMPLE COMPARISONS - GCDs |
---|
386 | |
---|
387 | mpn_gcd_1 has a threshold for when to reduce using an initial x%y when both |
---|
388 | x and y are single limbs. This isn't tuned currently, but a value can be |
---|
389 | established by a measurement like |
---|
390 | |
---|
391 | ./speed -s 10-32 mpn_gcd_1.10 |
---|
392 | |
---|
393 | This runs src[0] from 10 to 32 bits, and y fixed at 10 bits. If the div |
---|
394 | threshold is high, say 31 so it's effectively disabled then a 32x10 bit gcd |
---|
395 | is done by nibbling away at the 32-bit operands bit-by-bit. When the |
---|
396 | threshold is small, say 1 bit, then an initial x%y is done to reduce it to a |
---|
397 | 10x10 bit operation. |
---|
398 | |
---|
399 | The threshold in mpn/generic/gcd_1.c or the various assembler |
---|
400 | implementations can be tweaked up or down until there's no more speedups on |
---|
401 | interesting combinations of sizes. Note that this affects only a 1x1 limb |
---|
402 | operation and so isn't very important. (An Nx1 limb operation always does |
---|
403 | an initial modular reduction, using mpn_mod_1 or mpn_modexact_1_odd.) |
---|
404 | |
---|
405 | |
---|
406 | |
---|
407 | |
---|
408 | SPEED PROGRAM EXTENSIONS |
---|
409 | |
---|
410 | Potentially lots of things could be made available in the program, but it's |
---|
411 | been left at only the things that have actually been wanted and are likely |
---|
412 | to be reasonably useful in the future. |
---|
413 | |
---|
414 | Extensions should be fairly easy to make though. speed-ext.c is an example, |
---|
415 | in a style that should suit one-off tests, or new code fragments under |
---|
416 | development. |
---|
417 | |
---|
418 | many.pl is a script for generating a new speed program supplemented with |
---|
419 | alternate versions of the standard routines. It can be used for measuring |
---|
420 | experimental code, or for comparing different implementations that exist |
---|
421 | within a CPU family. |
---|
422 | |
---|
423 | |
---|
424 | |
---|
425 | |
---|
426 | THRESHOLD EXAMINING |
---|
427 | |
---|
428 | The speed program can be used to examine the speeds of different algorithms |
---|
429 | to check the tune program has done the right thing. For example to examine |
---|
430 | the karatsuba multiply threshold, |
---|
431 | |
---|
432 | ./speed -s 5-40 mpn_mul_basecase mpn_kara_mul_n |
---|
433 | |
---|
434 | When examining the toom3 threshold, remember it depends on the karatsuba |
---|
435 | threshold, so the right karatsuba threshold needs to be compiled into the |
---|
436 | library first. The tune program uses specially recompiled versions of |
---|
437 | mpn/mul_n.c etc for this reason, but the speed program simply uses the |
---|
438 | normal libgmp.la. |
---|
439 | |
---|
440 | Note further that the various routines may recurse into themselves on sizes |
---|
441 | far enough above applicable thresholds. For example, mpn_kara_mul_n will |
---|
442 | recurse into itself on sizes greater than twice the compiled-in |
---|
443 | MUL_KARATSUBA_THRESHOLD. |
---|
444 | |
---|
445 | When doing the above comparison between mul_basecase and kara_mul_n what's |
---|
446 | probably of interest is mul_basecase versus a kara_mul_n that does one level |
---|
447 | of Karatsuba then calls to mul_basecase, but this only happens on sizes less |
---|
448 | than twice the compiled MUL_KARATSUBA_THRESHOLD. A larger value for that |
---|
449 | setting can be compiled-in to avoid the problem if necessary. The same |
---|
450 | applies to toom3 and DC, though in a trickier fashion. |
---|
451 | |
---|
452 | There are some upper limits on some of the thresholds, arising from arrays |
---|
453 | dimensioned according to a threshold (mpn_mul_n), or asm code with certain |
---|
454 | sized displacements (some x86 versions of sqr_basecase). So putting huge |
---|
455 | values for the thresholds, even just for testing, may fail. |
---|
456 | |
---|
457 | |
---|
458 | |
---|
459 | |
---|
460 | FUTURE |
---|
461 | |
---|
462 | Make a program to check the time base is working properly, for small and |
---|
463 | large measurements. Make it able to test each available method, including |
---|
464 | perhaps the apparent resolution of each. |
---|
465 | |
---|
466 | Make a general mechanism for specifying operand overlap, and a syntax like |
---|
467 | maybe "mpn_add_n.dst=src2" to select it. Some measuring routines do this |
---|
468 | sort of thing with the "r" parameter currently. |
---|
469 | |
---|
470 | |
---|
471 | |
---|
472 | ---------------- |
---|
473 | Local variables: |
---|
474 | mode: text |
---|
475 | fill-column: 76 |
---|
476 | End: |
---|