Context Navigation

PROJECTS @ 8834

Visit:

Revision 8834, 18.1 KB checked in by ghudson, 28 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r8833, which included commits to RCS files with non-trunk default branches.

Rev	Line
[8833]	1	0. Improved efficiency.
	2
	3	* Parse and output array initializers an element at a time, freeing
	4	storage after each, instead of parsing the whole initializer first and
	5	then outputting. This would reduce memory usage for large
	6	initializers.
	7
	8	* See if the techniques describe in Oct 1991 SIGPLAN Notices
	9	(Frazer and Hanson) are applicable to GCC.
	10
	11	1. Better optimization.
	12
	13	* Constants in unused inline functions
	14
	15	It would be nice to delay output of string constants so that string
	16	constants mentioned in unused inline functions are never generated.
	17	Perhaps this would also take care of string constants in dead code.
	18
	19	The difficulty is in finding a clean way for the RTL which refers
	20	to the constant (currently, only by an assembler symbol name)
	21	to point to the constant and cause it to be output.
	22
	23	* More cse
	24
	25	The techniques for doing full global cse are described in the red
	26	dragon book, or (a different version) in Frederick Chow's thesis from
	27	Stanford. It is likely to be slow and use a lot of memory, but it
	28	might be worth offering as an additional option.
	29
	30	It is probably possible to extend cse to a few very frequent cases
	31	without so much expense.
	32
	33	For example, it is not very hard to handle cse through if-then
	34	statements with no else clauses. Here's how to do it. On reaching a
	35	label, notice that the label's use-count is 1 and that the last
	36	preceding jump jumps conditionally to this label. Now you know it
	37	is a simple if-then statement. Remove from the hash table
	38	all the expressions that were entered since that jump insn
	39	and you can continue with cse.
	40
	41	It is probably not hard to handle cse from the end of a loop
	42	around to the beginning, and a few loops would be greatly sped
	43	up by this.
	44
	45	* Optimize a sequence of if statements whose conditions are exclusive.
	46
	47	It is possible to optimize
	48
	49	if (x == 1) ...;
	50	if (x == 2) ...;
	51	if (x == 3) ...;
	52
	53	into
	54
	55	if (x == 1) ...;
	56	else if (x == 2) ...;
	57	else if (x == 3) ...;
	58
	59	provided that x is not altered by the contents of the if statements.
	60
	61	It's not certain whether this is worth doing. Perhaps programmers
	62	nearly always write the else's themselves, leaving few opportunities
	63	to improve anything.
	64
	65	* Un-cse.
	66
	67	Perhaps we should have an un-cse step right after cse, which tries to
	68	replace a reg with its value if the value can be substituted for the
	69	reg everywhere, if that looks like an improvement. Which is if the
	70	reg is used only a few times. Use rtx_cost to determine if the
	71	change is really an improvement.
	72
	73	* Clean up how cse works.
	74
	75	The scheme is that each value has just one hash entry. The
	76	first_same_value and next_same_value chains are no longer needed.
	77
	78	For arithmetic, each hash table elt has the following slots:
	79
	80	* Operation. This is an rtx code.
	81	* Mode.
	82	* Operands 0, 1 and 2. These point to other hash table elements.
	83
	84	So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
	85	first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
	86	points to. Then we put these elts into operands 0 and 1 of a new elt.
	87	We put PLUS and SI into the new elt.
	88
	89	Registers and mem refs would never be entered into the table as such.
	90	However, the values they contain would be entered. There would be a
	91	table indexed by regno which points at the hash entry for the value in
	92	that reg.
	93
	94	The hash entry index now plays the role of a qty number.
	95	We still need qty_first_reg, reg_next_eqv, etc. to record which regs
	96	share a particular qty.
	97
	98	When a reg is used whose contents are unknown, we need to create a
	99	hash table entry whose contents say "unknown", as a place holder for
	100	whatever the reg contains. If that reg is added to something, then
	101	the hash entry for the sum will refer to the "unknown" entry. Use
	102	UNKNOWN for the rtx code in this entry. This replaces make_new_qty.
	103
	104	For a constant, a unique hash entry would be made based on the
	105	value of the constant.
	106
	107	What about MEM? Each time a memory address is referenced, we need a
	108	qty (a hash table elt) to represent what is in it. (Just as for a
	109	register.) If this isn't known, create one, just as for a reg whose
	110	contents are unknown.
	111
	112	We need a way to find all mem refs that still contain a certain value.
	113	Do this with a chain of hash elts (for memory addresses) that point to
	114	locations that hold the value. The hash elt for the value itself should
	115	point to the start of the chain. It would be good for the hash elt
	116	for an address to point to the hash elt for the contents of that address
	117	(but this ptr can be null if the contents have never been entered).
	118
	119	With this data structure, nothing need ever be invalidated except
	120	the lists of which regs or mems hold a particular value. It is easy
	121	to see if there is a reg or mem that is equiv to a particular value.
	122	If the value is constant, it is always explicitly constant.
	123
	124	* Support more general tail-recursion among different functions.
	125
	126	This might be possible under certain circumstances, such as when
	127	the argument lists of the functions have the same lengths.
	128	Perhaps it could be done with a special declaration.
	129
	130	You would need to verify in the calling function that it does not
	131	use the addresses of any local variables and does not use setjmp.
	132
	133	* Put short statics vars at low addresses and use short addressing mode?
	134
	135	Useful on the 68000/68020 and perhaps on the 32000 series,
	136	provided one has a linker that works with the feature.
	137	This is said to make a 15% speedup on the 68000.
	138
	139	* Keep global variables in registers.
	140
	141	Here is a scheme for doing this. A global variable, or a local variable
	142	whose address is taken, can be kept in a register for an entire function
	143	if it does not use non-constant memory addresses and (for globals only)
	144	does not call other functions. If the entire function does not meet
	145	this criterion, a loop may.
	146
	147	The VAR_DECL for such a variable would have to have two RTL expressions:
	148	the true home in memory, and the pseudo-register used temporarily.
	149	It is necessary to emit insns to copy the memory location into the
	150	pseudo-register at the beginning of the function or loop, and perhaps
	151	back out at the end. These insns should have REG_EQUIV notes so that,
	152	if the pseudo-register does not get a hard register, it is spilled into
	153	the memory location which exists in any case.
	154
	155	The easiest way to set up these insns is to modify the routine
	156	put_var_into_stack so that it does not apply to the entire function
	157	(sparing any loops which contain nothing dangerous) and to call it at
	158	the end of the function regardless of where in the function the
	159	address of a local variable is taken. It would be called
	160	unconditionally at the end of the function for all relevant global
	161	variables.
	162
	163	For debugger output, the thing to do is to invent a new binding level
	164	around the appropriate loop and define the variable name as a register
	165	variable with that scope.
	166
	167	* Live-range splitting.
	168
	169	Currently a variable is allocated a hard register either for the full
	170	extent of its use or not at all. Sometimes it would be good to
	171	allocate a variable a hard register for just part of a function; for
	172	example, through a particular loop where the variable is mostly used,
	173	or outside of a particular loop where the variable is not used. (The
	174	latter is nice because it might let the variable be in a register most
	175	of the time even though the loop needs all the registers.)
	176
	177	It might not be very hard to do this in global.c when a variable
	178	fails to get a hard register for its entire life span.
	179
	180	The first step is to find a loop in which the variable is live, but
	181	which is not the whole life span or nearly so. It's probably best to
	182	use a loop in which the variable is heavily used.
	183
	184	Then create a new pseudo-register to represent the variable in that loop.
	185	Substitute this for the old pseudo-register there, and insert move insns
	186	to copy between the two at the loop entry and all exits. (When several
	187	such moves are inserted at the same place, some new feature should be
	188	added to say that none of those registers conflict merely because of
	189	overlap between the new moves. And the reload pass should reorder them
	190	so that a store precedes a load, for any given hard register.)
	191
	192	After doing this for all the reasonable candidates, run global-alloc
	193	over again. With luck, one of the two pseudo-registers will be fit
	194	somewhere. It may even have a much higher priority due to its reduced
	195	life span.
	196
	197	There will be no room in general for the new pseudo-registers in
	198	basic_block_live_at_start, so there will need to be a second such
	199	matrix exclusively for the new ones. Various other vectors indexed by
	200	register number will have to be made bigger, or there will have to be
	201	secondary extender vectors just for global-alloc.
	202
	203	A simple new feature could arrange that both pseudo-registers get the
	204	same stack slot if they both fail to get hard registers.
	205
	206	Other compilers split live ranges when they are not connected, or
	207	try to split off pieces `at the edge'. I think splitting around loops
	208	will provide more speedup.
	209
	210	Creating a fake binding block and a new like-named variable with
	211	shorter life span and different address might succeed in describing
	212	this technique for the debugger.
	213
	214	* Detect dead stores into memory?
	215
	216	A store into memory is dead if it is followed by another store into
	217	the same location; and, in between, there is no reference to anything
	218	that might be that location (including no reference to a variable
	219	address).
	220
	221	* Loop optimization.
	222
	223	Strength reduction and iteration variable elimination could be
	224	smarter. They should know how to decide which iteration variables are
	225	not worth making explicit because they can be computed as part of an
	226	address calculation. Based on this information, they should decide
	227	when it is desirable to eliminate one iteration variable and create
	228	another in its place.
	229
	230	It should be possible to compute what the value of an iteration
	231	variable will be at the end of the loop, and eliminate the variable
	232	within the loop by computing that value at the loop end.
	233
	234	When a loop has a simple increment that adds 1,
	235	instead of jumping in after the increment,
	236	decrement the loop count and jump to the increment.
	237	This allows aob insns to be used.
	238
	239	* Using constraints on values.
	240
	241	Many operations could be simplified based on knowledge of the
	242	minimum and maximum possible values of a register at any particular time.
	243	These limits could come from the data types in the tree, via rtl generation,
	244	or they can be deduced from operations that are performed. For example,
	245	the result of an `and' operation one of whose operands is 7 must be in
	246	the range 0 to 7. Compare instructions also tell something about the
	247	possible values of the operand, in the code beyond the test.
	248
	249	Value constraints can be used to determine the results of a further
	250	comparison. They can also indicate that certain `and' operations are
	251	redundant. Constraints might permit a decrement and branch
	252	instruction that checks zeroness to be used when the user has
	253	specified to exit if negative.
	254
	255	* Smarter reload pass.
	256
	257	The reload pass as currently written can reload values only into registers
	258	that are reserved for reloading. This means that in order to use a
	259	register for reloading it must spill everything out of that register.
	260
	261	It would be straightforward, though complicated, for reload1.c to keep
	262	track, during its scan, of which hard registers were available at each
	263	point in the function, and use for reloading even registers that were
	264	free only at the point they were needed. This would avoid much spilling
	265	and make better code.
	266
	267	* Change the type of a variable.
	268
	269	Sometimes a variable is declared as `int', it is assigned only once
	270	from a value of type `char', and then it is used only by comparison
	271	against constants. On many machines, better code would result if
	272	the variable had type `char'. If the compiler could detect this
	273	case, it could change the declaration of the variable and change
	274	all the places that use it.
	275
	276	* Better handling for very sparse switches.
	277
	278	There may be cases where it would be better to compile a switch
	279	statement to use a fixed hash table rather than the current
	280	combination of jump tables and binary search.
	281
	282	* Order of subexpressions.
	283
	284	It might be possible to make better code by paying attention
	285	to the order in which to generate code for subexpressions of an expression.
	286
	287	* More code motion.
	288
	289	Consider hoisting common code up past conditional branches or
	290	tablejumps.
	291
	292	* Trace scheduling.
	293
	294	This technique is said to be able to figure out which way a jump
	295	will usually go, and rearrange the code to make that path the
	296	faster one.
	297
	298	* Distributive law.
	299
	300	The C expression (X + 4 (Y + C)) compiles better on certain
	301	machines if rewritten as (X + 4C + 4*Y) because of known addressing
	302	modes. It may be tricky to determine when, and for which machines, to
	303	use each alternative.
	304
	305	Some work has been done on this, in combine.c.
	306
	307	* Can optimize by changing if (x) y; else z; into z; if (x) y;
	308	if z and x do not interfere and z has no effects not undone by y.
	309	This is desirable if z is faster than jumping.
	310
	311	* For a two-insn loop on the 68020, such as
	312	foo: movb a2@+,a3@+
	313	jne foo
	314	it is better to insert dbeq d0,foo before the jne.
	315	d0 can be a junk register. The challenge is to fit this into
	316	a portable framework: when can you detect this situation and
	317	still be able to allocate a junk register?
	318
	319	2. Simpler porting.
	320
	321	Right now, describing the target machine's instructions is done
	322	cleanly, but describing its addressing mode is done with several
	323	ad-hoc macro definitions. Porting would be much easier if there were
	324	an RTL description for addressing modes like that for instructions.
	325	Tools analogous to genflags and genrecog would generate macros from
	326	this description.
	327
	328	There would be one pattern in the address-description file for each
	329	kind of addressing, and this pattern would have:
	330
	331	* the RTL expression for the address
	332	* C code to verify its validity (since that may depend on
	333	the exact data).
	334	* C code to print the address in assembler language.
	335	* C code to convert the address into a valid one, if it is not valid.
	336	(This would replace LEGITIMIZE_ADDRESS).
	337	* Register constraints for all indeterminates that appear
	338	in the RTL expression.
	339
	340	3. Other languages.
	341
	342	Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
	343	desirable.
	344
	345	Pascal, Modula-2 and Ada require the implementation of functions
	346	within functions. Some of the mechanisms for this already exist.
	347
	348	4. More extensions.
	349
	350	* Generated unique labels. Have some way of generating distinct labels
	351	for use in extended asm statements. I don't know what a good syntax would
	352	be.
	353
	354	* A way of defining a structure containing a union, in which the choice of
	355	union alternative is controlled by a previous structure component.
	356
	357	Here is a possible syntax for this.
	358
	359	struct foo {
	360	enum { INT, DOUBLE } code;
	361	auto union { case INT: int i; case DOUBLE: double d;} value : code;
	362	};
	363
	364	* Allow constructor expressions as lvalues, like this:
	365
	366	(struct foo) {a, b, c} = foo();
	367
	368	This would call foo, which returns a structure, and then store the
	369	several components of the structure into the variables a, b, and c.
	370
	371	5. Generalize the machine model.
	372
	373	* Some new compiler features may be needed to do a good job on machines
	374	where static data needs to be addressed using base registers.
	375
	376	* Some machines have two stacks in different areas of memory, one used
	377	for scalars and another for large objects. The compiler does not
	378	now have a way to understand this.
	379
	380	6. Useful warnings.
	381
	382	* Warn about statements that are undefined because the order of
	383	evaluation of increment operators makes a big difference. Here is an
	384	example:
	385
	386	foo++ = hack (foo);
	387
	388	7. Better documentation of how GCC works and how to port it.
	389
	390	Here is an outline proposed by Allan Adler.
	391
	392	I. Overview of this document
	393	II. The machines on which GCC is implemented
	394	A. Prose description of those characteristics of target machines and
	395	their operating systems which are pertinent to the implementation
	396	of GCC.
	397	i. target machine characteristics
	398	ii. comparison of this system of machine characteristics with
	399	other systems of machine specification currently in use
	400	B. Tables of the characteristics of the target machines on which
	401	GCC is implemented.
	402	C. A priori restrictions on the values of characteristics of target
	403	machines, with special reference to those parts of the source code
	404	which entail those restrictions
	405	i. restrictions on individual characteristics
	406	ii. restrictions involving relations between various characteristics
	407	D. The use of GCC as a cross-compiler
	408	i. cross-compilation to existing machines
	409	ii. cross-compilation to non-existent machines
	410	E. Assumptions which are made regarding the target machine
	411	i. assumptions regarding the architecture of the target machine
	412	ii. assumptions regarding the operating system of the target machine
	413	iii. assumptions regarding software resident on the target machine
	414	iv. where in the source code these assumptions are in effect made
	415	III. A systematic approach to writing the files tm.h and xm.h
	416	A. Macros which require special care or skill
	417	B. Examples, with special reference to the underlying reasoning
	418	IV. A systematic approach to writing the machine description file md
	419	A. Minimal viable sets of insn descriptions
	420	B. Examples, with special reference to the underlying reasoning
	421	V. Uses of the file aux-output.c
	422	VI. Specification of what constitutes correct performance of an
	423	implementation of GCC
	424	A. The components of GCC
	425	B. The itinerary of a C program through GCC
	426	C. A system of benchmark programs
	427	D. What your RTL and assembler should look like with these benchmarks
	428	E. Fine tuning for speed and size of compiled code
	429	VII. A systematic procedure for debugging an implementation of GCC
	430	A. Use of GDB
	431	i. the macros in the file .gdbinit for GCC
	432	ii. obstacles to the use of GDB
	433	a. functions implemented as macros can't be called in GDB
	434	B. Debugging without GDB
	435	i. How to turn off the normal operation of GCC and access specific
	436	parts of GCC
	437	C. Debugging tools
	438	D. Debugging the parser
	439	i. how machine macros and insn definitions affect the parser
	440	E. Debugging the recognizer
	441	i. how machine macros and insn definitions affect the recognizer
	442
	443	ditto for other components
	444
	445	VIII. Data types used by GCC, with special reference to restrictions not
	446	specified in the formal definition of the data type
	447	IX. References to the literature for the algorithms used in GCC
	448

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format