source: trunk/third/gcc/PROJECTS @ 8834

Revision 8834, 18.1 KB checked in by ghudson, 28 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r8833, which included commits to RCS files with non-trunk default branches.
Line 
10. Improved efficiency.
2
3* Parse and output array initializers an element at a time, freeing
4storage after each, instead of parsing the whole initializer first and
5then outputting.  This would reduce memory usage for large
6initializers.
7
8* See if the techniques describe in Oct 1991 SIGPLAN Notices
9(Frazer and Hanson) are applicable to GCC.
10
111. Better optimization.
12
13* Constants in unused inline functions
14
15It would be nice to delay output of string constants so that string
16constants mentioned in unused inline functions are never generated.
17Perhaps this would also take care of string constants in dead code.
18
19The difficulty is in finding a clean way for the RTL which refers
20to the constant (currently, only by an assembler symbol name)
21to point to the constant and cause it to be output.
22
23* More cse
24
25The techniques for doing full global cse are described in the red
26dragon book, or (a different version) in Frederick Chow's thesis from
27Stanford.  It is likely to be slow and use a lot of memory, but it
28might be worth offering as an additional option.
29
30It is probably possible to extend cse to a few very frequent cases
31without so much expense.
32
33For example, it is not very hard to handle cse through if-then
34statements with no else clauses.  Here's how to do it.  On reaching a
35label, notice that the label's use-count is 1 and that the last
36preceding jump jumps conditionally to this label.  Now you know it
37is a simple if-then statement.  Remove from the hash table
38all the expressions that were entered since that jump insn
39and you can continue with cse.
40
41It is probably not hard to handle cse from the end of a loop
42around to the beginning, and a few loops would be greatly sped
43up by this.
44
45* Optimize a sequence of if statements whose conditions are exclusive.
46
47It is possible to optimize
48
49    if (x == 1) ...;
50    if (x == 2) ...;
51    if (x == 3) ...;
52
53into
54
55    if (x == 1) ...;
56    else if (x == 2) ...;
57    else if (x == 3) ...;
58
59provided that x is not altered by the contents of the if statements.
60
61It's not certain whether this is worth doing.  Perhaps programmers
62nearly always write the else's themselves, leaving few opportunities
63to improve anything.
64
65* Un-cse.
66
67Perhaps we should have an un-cse step right after cse, which tries to
68replace a reg with its value if the value can be substituted for the
69reg everywhere, if that looks like an improvement.  Which is if the
70reg is used only a few times.  Use rtx_cost to determine if the
71change is really an improvement.
72
73* Clean up how cse works.
74
75The scheme is that each value has just one hash entry.  The
76first_same_value and next_same_value chains are no longer needed.
77
78For arithmetic, each hash table elt has the following slots:
79
80* Operation.  This is an rtx code.
81* Mode.
82* Operands 0, 1 and 2.  These point to other hash table elements.
83
84So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
85first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
86points to.  Then we put these elts into operands 0 and 1 of a new elt.
87We put PLUS and SI into the new elt.
88
89Registers and mem refs would never be entered into the table as such.
90However, the values they contain would be entered.  There would be a
91table indexed by regno which points at the hash entry for the value in
92that reg.
93
94The hash entry index now plays the role of a qty number.
95We still need qty_first_reg, reg_next_eqv, etc. to record which regs
96share a particular qty.
97
98When a reg is used whose contents are unknown, we need to create a
99hash table entry whose contents say "unknown", as a place holder for
100whatever the reg contains.  If that reg is added to something, then
101the hash entry for the sum will refer to the "unknown" entry.  Use
102UNKNOWN for the rtx code in this entry.  This replaces make_new_qty.
103
104For a constant, a unique hash entry would be made based on the
105value of the constant.
106
107What about MEM?  Each time a memory address is referenced, we need a
108qty (a hash table elt) to represent what is in it.  (Just as for a
109register.)  If this isn't known, create one, just as for a reg whose
110contents are unknown.
111
112We need a way to find all mem refs that still contain a certain value.
113Do this with a chain of hash elts (for memory addresses) that point to
114locations that hold the value.  The hash elt for the value itself should
115point to the start of the chain.  It would be good for the hash elt
116for an address to point to the hash elt for the contents of that address
117(but this ptr can be null if the contents have never been entered).
118
119With this data structure, nothing need ever be invalidated except
120the lists of which regs or mems hold a particular value.  It is easy
121to see if there is a reg or mem that is equiv to a particular value.
122If the value is constant, it is always explicitly constant.
123
124* Support more general tail-recursion among different functions.
125
126This might be possible under certain circumstances, such as when
127the argument lists of the functions have the same lengths.
128Perhaps it could be done with a special declaration.
129
130You would need to verify in the calling function that it does not
131use the addresses of any local variables and does not use setjmp.
132
133* Put short statics vars at low addresses and use short addressing mode?
134
135Useful on the 68000/68020 and perhaps on the 32000 series,
136provided one has a linker that works with the feature.
137This is said to make a 15% speedup on the 68000.
138
139* Keep global variables in registers.
140
141Here is a scheme for doing this.  A global variable, or a local variable
142whose address is taken, can be kept in a register for an entire function
143if it does not use non-constant memory addresses and (for globals only)
144does not call other functions.  If the entire function does not meet
145this criterion, a loop may.
146
147The VAR_DECL for such a variable would have to have two RTL expressions:
148the true home in memory, and the pseudo-register used temporarily.
149It is necessary to emit insns to copy the memory location into the
150pseudo-register at the beginning of the function or loop, and perhaps
151back out at the end.  These insns should have REG_EQUIV notes so that,
152if the pseudo-register does not get a hard register, it is spilled into
153the memory location which exists in any case.
154
155The easiest way to set up these insns is to modify the routine
156put_var_into_stack so that it does not apply to the entire function
157(sparing any loops which contain nothing dangerous) and to call it at
158the end of the function regardless of where in the function the
159address of a local variable is taken.  It would be called
160unconditionally at the end of the function for all relevant global
161variables.
162
163For debugger output, the thing to do is to invent a new binding level
164around the appropriate loop and define the variable name as a register
165variable with that scope.
166
167* Live-range splitting.
168
169Currently a variable is allocated a hard register either for the full
170extent of its use or not at all.  Sometimes it would be good to
171allocate a variable a hard register for just part of a function; for
172example, through a particular loop where the variable is mostly used,
173or outside of a particular loop where the variable is not used.  (The
174latter is nice because it might let the variable be in a register most
175of the time even though the loop needs all the registers.)
176
177It might not be very hard to do this in global.c when a variable
178fails to get a hard register for its entire life span.
179
180The first step is to find a loop in which the variable is live, but
181which is not the whole life span or nearly so.  It's probably best to
182use a loop in which the variable is heavily used.
183
184Then create a new pseudo-register to represent the variable in that loop.
185Substitute this for the old pseudo-register there, and insert move insns
186to copy between the two at the loop entry and all exits.  (When several
187such moves are inserted at the same place, some new feature should be
188added to say that none of those registers conflict merely because of
189overlap between the new moves.  And the reload pass should reorder them
190so that a store precedes a load, for any given hard register.)
191
192After doing this for all the reasonable candidates, run global-alloc
193over again.  With luck, one of the two pseudo-registers will be fit
194somewhere.  It may even have a much higher priority due to its reduced
195life span.
196
197There will be no room in general for the new pseudo-registers in
198basic_block_live_at_start, so there will need to be a second such
199matrix exclusively for the new ones.  Various other vectors indexed by
200register number will have to be made bigger, or there will have to be
201secondary extender vectors just for global-alloc.
202
203A simple new feature could arrange that both pseudo-registers get the
204same stack slot if they both fail to get hard registers.
205
206Other compilers split live ranges when they are not connected, or
207try to split off pieces `at the edge'.  I think splitting around loops
208will provide more speedup.
209
210Creating a fake binding block and a new like-named variable with
211shorter life span and different address might succeed in describing
212this technique for the debugger.
213
214* Detect dead stores into memory?
215
216A store into memory is dead if it is followed by another store into
217the same location; and, in between, there is no reference to anything
218that might be that location (including no reference to a variable
219address).
220
221* Loop optimization.
222
223Strength reduction and iteration variable elimination could be
224smarter.  They should know how to decide which iteration variables are
225not worth making explicit because they can be computed as part of an
226address calculation.  Based on this information, they should decide
227when it is desirable to eliminate one iteration variable and create
228another in its place.
229
230It should be possible to compute what the value of an iteration
231variable will be at the end of the loop, and eliminate the variable
232within the loop by computing that value at the loop end.
233
234When a loop has a simple increment that adds 1,
235instead of jumping in after the increment,
236decrement the loop count and jump to the increment.
237This allows aob insns to be used.
238
239* Using constraints on values.
240
241Many operations could be simplified based on knowledge of the
242minimum and maximum possible values of a register at any particular time.
243These limits could come from the data types in the tree, via rtl generation,
244or they can be deduced from operations that are performed.  For example,
245the result of an `and' operation one of whose operands is 7 must be in
246the range 0 to 7.  Compare instructions also tell something about the
247possible values of the operand, in the code beyond the test.
248
249Value constraints can be used to determine the results of a further
250comparison.  They can also indicate that certain `and' operations are
251redundant.  Constraints might permit a decrement and branch
252instruction that checks zeroness to be used when the user has
253specified to exit if negative.
254
255* Smarter reload pass.
256
257The reload pass as currently written can reload values only into registers
258that are reserved for reloading.  This means that in order to use a
259register for reloading it must spill everything out of that register.
260
261It would be straightforward, though complicated, for reload1.c to keep
262track, during its scan, of which hard registers were available at each
263point in the function, and use for reloading even registers that were
264free only at the point they were needed.  This would avoid much spilling
265and make better code.
266
267* Change the type of a variable.
268
269Sometimes a variable is declared as `int', it is assigned only once
270from a value of type `char', and then it is used only by comparison
271against constants.  On many machines, better code would result if
272the variable had type `char'.  If the compiler could detect this
273case, it could change the declaration of the variable and change
274all the places that use it.
275
276* Better handling for very sparse switches.
277
278There may be cases where it would be better to compile a switch
279statement to use a fixed hash table rather than the current
280combination of jump tables and binary search.
281
282* Order of subexpressions.
283
284It might be possible to make better code by paying attention
285to the order in which to generate code for subexpressions of an expression.
286
287* More code motion.
288
289Consider hoisting common code up past conditional branches or
290tablejumps.
291
292* Trace scheduling.
293
294This technique is said to be able to figure out which way a jump
295will usually go, and rearrange the code to make that path the
296faster one.
297
298* Distributive law.
299
300The C expression *(X + 4 * (Y + C)) compiles better on certain
301machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
302modes.  It may be tricky to determine when, and for which machines, to
303use each alternative.
304
305Some work has been done on this, in combine.c.
306
307* Can optimize by changing if (x) y; else z; into z; if (x) y;
308if z and x do not interfere and z has no effects not undone by y.
309This is desirable if z is faster than jumping.
310
311* For a two-insn loop on the 68020, such as
312  foo:  movb    a2@+,a3@+
313        jne     foo
314it is better to insert dbeq d0,foo before the jne.
315d0 can be a junk register.  The challenge is to fit this into
316a portable framework: when can you detect this situation and
317still be able to allocate a junk register?
318
3192. Simpler porting.
320
321Right now, describing the target machine's instructions is done
322cleanly, but describing its addressing mode is done with several
323ad-hoc macro definitions.  Porting would be much easier if there were
324an RTL description for addressing modes like that for instructions.
325Tools analogous to genflags and genrecog would generate macros from
326this description.
327
328There would be one pattern in the address-description file for each
329kind of addressing, and this pattern would have:
330
331  * the RTL expression for the address
332  * C code to verify its validity (since that may depend on
333    the exact data).
334  * C code to print the address in assembler language.
335  * C code to convert the address into a valid one, if it is not valid.
336    (This would replace LEGITIMIZE_ADDRESS).
337  * Register constraints for all indeterminates that appear
338    in the RTL expression.
339
3403. Other languages.
341
342Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
343desirable.
344
345Pascal, Modula-2 and Ada require the implementation of functions
346within functions.  Some of the mechanisms for this already exist.
347
3484. More extensions.
349
350* Generated unique labels.  Have some way of generating distinct labels
351for use in extended asm statements.  I don't know what a good syntax would
352be.
353
354* A way of defining a structure containing a union, in which the choice of
355union alternative is controlled by a previous structure component.
356
357Here is a possible syntax for this.
358
359struct foo {
360  enum { INT, DOUBLE } code;
361  auto union { case INT: int i; case DOUBLE: double d;} value : code;
362};
363
364* Allow constructor expressions as lvalues, like this:
365
366        (struct foo) {a, b, c} = foo();
367
368This would call foo, which returns a structure, and then store the
369several components of the structure into the variables a, b, and c.
370
3715. Generalize the machine model.
372
373* Some new compiler features may be needed to do a good job on machines
374where static data needs to be addressed using base registers.
375
376* Some machines have two stacks in different areas of memory, one used
377for scalars and another for large objects.  The compiler does not
378now have a way to understand this.
379
3806. Useful warnings.
381
382* Warn about statements that are undefined because the order of
383evaluation of increment operators makes a big difference.  Here is an
384example:
385
386    *foo++ = hack (*foo);
387
3887. Better documentation of how GCC works and how to port it.
389
390Here is an outline proposed by Allan Adler.
391
392I.    Overview of this document
393II.   The machines on which GCC is implemented
394    A. Prose description of those characteristics of target machines and
395       their operating systems which are pertinent to the implementation
396       of GCC.
397        i. target machine characteristics
398        ii. comparison of this system of machine characteristics with
399            other systems of machine specification currently in use
400    B. Tables of the characteristics of the target machines on which
401       GCC is implemented.
402    C. A priori restrictions on the values of characteristics of target
403       machines, with special reference to those parts of the source code
404       which entail those restrictions
405        i. restrictions on individual characteristics
406        ii. restrictions involving relations between various characteristics
407    D. The use of GCC as a cross-compiler
408        i. cross-compilation to existing machines
409        ii. cross-compilation to non-existent machines
410    E. Assumptions which are made regarding the target machine
411        i.  assumptions regarding the architecture of the target machine
412        ii. assumptions regarding the operating system of the target machine
413        iii. assumptions regarding software resident on the target machine
414        iv. where in the source code these assumptions are in effect made
415III.   A systematic approach to writing the files tm.h and xm.h
416    A. Macros which require special care or skill
417    B. Examples, with special reference to the underlying reasoning
418IV.    A systematic approach to writing the machine description file md
419    A. Minimal viable sets of insn descriptions
420    B. Examples, with special reference to the underlying reasoning
421V.     Uses of the file aux-output.c
422VI.    Specification of what constitutes correct performance of an
423       implementation of GCC
424    A. The components of GCC
425    B. The itinerary of a C program through GCC
426    C. A system of benchmark programs
427    D. What your RTL and assembler should look like with these benchmarks
428    E. Fine tuning for speed and size of compiled code
429VII.   A systematic procedure for debugging an implementation of GCC
430    A. Use of GDB
431        i. the macros in the file .gdbinit for GCC
432        ii. obstacles to the use of GDB
433            a. functions implemented as macros can't be called in GDB
434    B. Debugging without GDB
435        i. How to turn off the normal operation of GCC and access specific
436           parts of GCC
437    C. Debugging tools
438    D. Debugging the parser
439        i. how machine macros and insn definitions affect the parser
440    E. Debugging the recognizer
441        i. how machine macros and insn definitions affect the recognizer
442
443ditto for other components
444
445VIII. Data types used by GCC, with special reference to restrictions not
446      specified in the formal definition of the data type
447IX.   References to the literature for the algorithms used in GCC
448
Note: See TracBrowser for help on using the repository browser.