1 | This is Info file gcc.info, produced by Makeinfo-1.55 from the input |
---|
2 | file gcc.texi. |
---|
3 | |
---|
4 | This file documents the use and the internals of the GNU compiler. |
---|
5 | |
---|
6 | Published by the Free Software Foundation 59 Temple Place - Suite 330 |
---|
7 | Boston, MA 02111-1307 USA |
---|
8 | |
---|
9 | Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995 Free Software |
---|
10 | Foundation, Inc. |
---|
11 | |
---|
12 | Permission is granted to make and distribute verbatim copies of this |
---|
13 | manual provided the copyright notice and this permission notice are |
---|
14 | preserved on all copies. |
---|
15 | |
---|
16 | Permission is granted to copy and distribute modified versions of |
---|
17 | this manual under the conditions for verbatim copying, provided also |
---|
18 | that the sections entitled "GNU General Public License," "Funding for |
---|
19 | Free Software," and "Protect Your Freedom--Fight `Look And Feel'" are |
---|
20 | included exactly as in the original, and provided that the entire |
---|
21 | resulting derived work is distributed under the terms of a permission |
---|
22 | notice identical to this one. |
---|
23 | |
---|
24 | Permission is granted to copy and distribute translations of this |
---|
25 | manual into another language, under the above conditions for modified |
---|
26 | versions, except that the sections entitled "GNU General Public |
---|
27 | License," "Funding for Free Software," and "Protect Your Freedom--Fight |
---|
28 | `Look And Feel'", and this permission notice, may be included in |
---|
29 | translations approved by the Free Software Foundation instead of in the |
---|
30 | original English. |
---|
31 | |
---|
32 | |
---|
33 | File: gcc.info, Node: Standard Names, Next: Pattern Ordering, Prev: Constraints, Up: Machine Desc |
---|
34 | |
---|
35 | Standard Pattern Names For Generation |
---|
36 | ===================================== |
---|
37 | |
---|
38 | Here is a table of the instruction names that are meaningful in the |
---|
39 | RTL generation pass of the compiler. Giving one of these names to an |
---|
40 | instruction pattern tells the RTL generation pass that it can use the |
---|
41 | pattern in to accomplish a certain task. |
---|
42 | |
---|
43 | `movM' |
---|
44 | Here M stands for a two-letter machine mode name, in lower case. |
---|
45 | This instruction pattern moves data with that machine mode from |
---|
46 | operand 1 to operand 0. For example, `movsi' moves full-word data. |
---|
47 | |
---|
48 | If operand 0 is a `subreg' with mode M of a register whose own |
---|
49 | mode is wider than M, the effect of this instruction is to store |
---|
50 | the specified value in the part of the register that corresponds |
---|
51 | to mode M. The effect on the rest of the register is undefined. |
---|
52 | |
---|
53 | This class of patterns is special in several ways. First of all, |
---|
54 | each of these names *must* be defined, because there is no other |
---|
55 | way to copy a datum from one place to another. |
---|
56 | |
---|
57 | Second, these patterns are not used solely in the RTL generation |
---|
58 | pass. Even the reload pass can generate move insns to copy values |
---|
59 | from stack slots into temporary registers. When it does so, one |
---|
60 | of the operands is a hard register and the other is an operand |
---|
61 | that can need to be reloaded into a register. |
---|
62 | |
---|
63 | Therefore, when given such a pair of operands, the pattern must |
---|
64 | generate RTL which needs no reloading and needs no temporary |
---|
65 | registers--no registers other than the operands. For example, if |
---|
66 | you support the pattern with a `define_expand', then in such a |
---|
67 | case the `define_expand' mustn't call `force_reg' or any other such |
---|
68 | function which might generate new pseudo registers. |
---|
69 | |
---|
70 | This requirement exists even for subword modes on a RISC machine |
---|
71 | where fetching those modes from memory normally requires several |
---|
72 | insns and some temporary registers. Look in `spur.md' to see how |
---|
73 | the requirement can be satisfied. |
---|
74 | |
---|
75 | During reload a memory reference with an invalid address may be |
---|
76 | passed as an operand. Such an address will be replaced with a |
---|
77 | valid address later in the reload pass. In this case, nothing may |
---|
78 | be done with the address except to use it as it stands. If it is |
---|
79 | copied, it will not be replaced with a valid address. No attempt |
---|
80 | should be made to make such an address into a valid address and no |
---|
81 | routine (such as `change_address') that will do so may be called. |
---|
82 | Note that `general_operand' will fail when applied to such an |
---|
83 | address. |
---|
84 | |
---|
85 | The global variable `reload_in_progress' (which must be explicitly |
---|
86 | declared if required) can be used to determine whether such special |
---|
87 | handling is required. |
---|
88 | |
---|
89 | The variety of operands that have reloads depends on the rest of |
---|
90 | the machine description, but typically on a RISC machine these can |
---|
91 | only be pseudo registers that did not get hard registers, while on |
---|
92 | other machines explicit memory references will get optional |
---|
93 | reloads. |
---|
94 | |
---|
95 | If a scratch register is required to move an object to or from |
---|
96 | memory, it can be allocated using `gen_reg_rtx' prior to reload. |
---|
97 | But this is impossible during and after reload. If there are |
---|
98 | cases needing scratch registers after reload, you must define |
---|
99 | `SECONDARY_INPUT_RELOAD_CLASS' and perhaps also |
---|
100 | `SECONDARY_OUTPUT_RELOAD_CLASS' to detect them, and provide |
---|
101 | patterns `reload_inM' or `reload_outM' to handle them. *Note |
---|
102 | Register Classes::. |
---|
103 | |
---|
104 | The constraints on a `moveM' must permit moving any hard register |
---|
105 | to any other hard register provided that `HARD_REGNO_MODE_OK' |
---|
106 | permits mode M in both registers and `REGISTER_MOVE_COST' applied |
---|
107 | to their classes returns a value of 2. |
---|
108 | |
---|
109 | It is obligatory to support floating point `moveM' instructions |
---|
110 | into and out of any registers that can hold fixed point values, |
---|
111 | because unions and structures (which have modes `SImode' or |
---|
112 | `DImode') can be in those registers and they may have floating |
---|
113 | point members. |
---|
114 | |
---|
115 | There may also be a need to support fixed point `moveM' |
---|
116 | instructions in and out of floating point registers. |
---|
117 | Unfortunately, I have forgotten why this was so, and I don't know |
---|
118 | whether it is still true. If `HARD_REGNO_MODE_OK' rejects fixed |
---|
119 | point values in floating point registers, then the constraints of |
---|
120 | the fixed point `moveM' instructions must be designed to avoid |
---|
121 | ever trying to reload into a floating point register. |
---|
122 | |
---|
123 | `reload_inM' |
---|
124 | `reload_outM' |
---|
125 | Like `movM', but used when a scratch register is required to move |
---|
126 | between operand 0 and operand 1. Operand 2 describes the scratch |
---|
127 | register. See the discussion of the `SECONDARY_RELOAD_CLASS' |
---|
128 | macro in *note Register Classes::.. |
---|
129 | |
---|
130 | `movstrictM' |
---|
131 | Like `movM' except that if operand 0 is a `subreg' with mode M of |
---|
132 | a register whose natural mode is wider, the `movstrictM' |
---|
133 | instruction is guaranteed not to alter any of the register except |
---|
134 | the part which belongs to mode M. |
---|
135 | |
---|
136 | `load_multiple' |
---|
137 | Load several consecutive memory locations into consecutive |
---|
138 | registers. Operand 0 is the first of the consecutive registers, |
---|
139 | operand 1 is the first memory location, and operand 2 is a |
---|
140 | constant: the number of consecutive registers. |
---|
141 | |
---|
142 | Define this only if the target machine really has such an |
---|
143 | instruction; do not define this if the most efficient way of |
---|
144 | loading consecutive registers from memory is to do them one at a |
---|
145 | time. |
---|
146 | |
---|
147 | On some machines, there are restrictions as to which consecutive |
---|
148 | registers can be stored into memory, such as particular starting or |
---|
149 | ending register numbers or only a range of valid counts. For those |
---|
150 | machines, use a `define_expand' (*note Expander Definitions::.) |
---|
151 | and make the pattern fail if the restrictions are not met. |
---|
152 | |
---|
153 | Write the generated insn as a `parallel' with elements being a |
---|
154 | `set' of one register from the appropriate memory location (you may |
---|
155 | also need `use' or `clobber' elements). Use a `match_parallel' |
---|
156 | (*note RTL Template::.) to recognize the insn. See `a29k.md' and |
---|
157 | `rs6000.md' for examples of the use of this insn pattern. |
---|
158 | |
---|
159 | `store_multiple' |
---|
160 | Similar to `load_multiple', but store several consecutive registers |
---|
161 | into consecutive memory locations. Operand 0 is the first of the |
---|
162 | consecutive memory locations, operand 1 is the first register, and |
---|
163 | operand 2 is a constant: the number of consecutive registers. |
---|
164 | |
---|
165 | `addM3' |
---|
166 | Add operand 2 and operand 1, storing the result in operand 0. All |
---|
167 | operands must have mode M. This can be used even on two-address |
---|
168 | machines, by means of constraints requiring operands 1 and 0 to be |
---|
169 | the same location. |
---|
170 | |
---|
171 | `subM3', `mulM3' |
---|
172 | `divM3', `udivM3', `modM3', `umodM3' |
---|
173 | `sminM3', `smaxM3', `uminM3', `umaxM3' |
---|
174 | `andM3', `iorM3', `xorM3' |
---|
175 | Similar, for other arithmetic operations. |
---|
176 | |
---|
177 | `mulhisi3' |
---|
178 | Multiply operands 1 and 2, which have mode `HImode', and store a |
---|
179 | `SImode' product in operand 0. |
---|
180 | |
---|
181 | `mulqihi3', `mulsidi3' |
---|
182 | Similar widening-multiplication instructions of other widths. |
---|
183 | |
---|
184 | `umulqihi3', `umulhisi3', `umulsidi3' |
---|
185 | Similar widening-multiplication instructions that do unsigned |
---|
186 | multiplication. |
---|
187 | |
---|
188 | `mulM3_highpart' |
---|
189 | Perform a signed multiplication of operands 1 and 2, which have |
---|
190 | mode M, and store the most significant half of the product in |
---|
191 | operand 0. The least significant half of the product is discarded. |
---|
192 | |
---|
193 | `umulM3_highpart' |
---|
194 | Similar, but the multiplication is unsigned. |
---|
195 | |
---|
196 | `divmodM4' |
---|
197 | Signed division that produces both a quotient and a remainder. |
---|
198 | Operand 1 is divided by operand 2 to produce a quotient stored in |
---|
199 | operand 0 and a remainder stored in operand 3. |
---|
200 | |
---|
201 | For machines with an instruction that produces both a quotient and |
---|
202 | a remainder, provide a pattern for `divmodM4' but do not provide |
---|
203 | patterns for `divM3' and `modM3'. This allows optimization in the |
---|
204 | relatively common case when both the quotient and remainder are |
---|
205 | computed. |
---|
206 | |
---|
207 | If an instruction that just produces a quotient or just a remainder |
---|
208 | exists and is more efficient than the instruction that produces |
---|
209 | both, write the output routine of `divmodM4' to call |
---|
210 | `find_reg_note' and look for a `REG_UNUSED' note on the quotient |
---|
211 | or remainder and generate the appropriate instruction. |
---|
212 | |
---|
213 | `udivmodM4' |
---|
214 | Similar, but does unsigned division. |
---|
215 | |
---|
216 | `ashlM3' |
---|
217 | Arithmetic-shift operand 1 left by a number of bits specified by |
---|
218 | operand 2, and store the result in operand 0. Here M is the mode |
---|
219 | of operand 0 and operand 1; operand 2's mode is specified by the |
---|
220 | instruction pattern, and the compiler will convert the operand to |
---|
221 | that mode before generating the instruction. |
---|
222 | |
---|
223 | `ashrM3', `lshrM3', `rotlM3', `rotrM3' |
---|
224 | Other shift and rotate instructions, analogous to the `ashlM3' |
---|
225 | instructions. |
---|
226 | |
---|
227 | `negM2' |
---|
228 | Negate operand 1 and store the result in operand 0. |
---|
229 | |
---|
230 | `absM2' |
---|
231 | Store the absolute value of operand 1 into operand 0. |
---|
232 | |
---|
233 | `sqrtM2' |
---|
234 | Store the square root of operand 1 into operand 0. |
---|
235 | |
---|
236 | The `sqrt' built-in function of C always uses the mode which |
---|
237 | corresponds to the C data type `double'. |
---|
238 | |
---|
239 | `ffsM2' |
---|
240 | Store into operand 0 one plus the index of the least significant |
---|
241 | 1-bit of operand 1. If operand 1 is zero, store zero. M is the |
---|
242 | mode of operand 0; operand 1's mode is specified by the instruction |
---|
243 | pattern, and the compiler will convert the operand to that mode |
---|
244 | before generating the instruction. |
---|
245 | |
---|
246 | The `ffs' built-in function of C always uses the mode which |
---|
247 | corresponds to the C data type `int'. |
---|
248 | |
---|
249 | `one_cmplM2' |
---|
250 | Store the bitwise-complement of operand 1 into operand 0. |
---|
251 | |
---|
252 | `cmpM' |
---|
253 | Compare operand 0 and operand 1, and set the condition codes. The |
---|
254 | RTL pattern should look like this: |
---|
255 | |
---|
256 | (set (cc0) (compare (match_operand:M 0 ...) |
---|
257 | (match_operand:M 1 ...))) |
---|
258 | |
---|
259 | `tstM' |
---|
260 | Compare operand 0 against zero, and set the condition codes. The |
---|
261 | RTL pattern should look like this: |
---|
262 | |
---|
263 | (set (cc0) (match_operand:M 0 ...)) |
---|
264 | |
---|
265 | `tstM' patterns should not be defined for machines that do not use |
---|
266 | `(cc0)'. Doing so would confuse the optimizer since it would no |
---|
267 | longer be clear which `set' operations were comparisons. The |
---|
268 | `cmpM' patterns should be used instead. |
---|
269 | |
---|
270 | `movstrM' |
---|
271 | Block move instruction. The addresses of the destination and |
---|
272 | source strings are the first two operands, and both are in mode |
---|
273 | `Pmode'. The number of bytes to move is the third operand, in |
---|
274 | mode M. |
---|
275 | |
---|
276 | The fourth operand is the known shared alignment of the source and |
---|
277 | destination, in the form of a `const_int' rtx. Thus, if the |
---|
278 | compiler knows that both source and destination are word-aligned, |
---|
279 | it may provide the value 4 for this operand. |
---|
280 | |
---|
281 | These patterns need not give special consideration to the |
---|
282 | possibility that the source and destination strings might overlap. |
---|
283 | |
---|
284 | `cmpstrM' |
---|
285 | Block compare instruction, with five operands. Operand 0 is the |
---|
286 | output; it has mode M. The remaining four operands are like the |
---|
287 | operands of `movstrM'. The two memory blocks specified are |
---|
288 | compared byte by byte in lexicographic order. The effect of the |
---|
289 | instruction is to store a value in operand 0 whose sign indicates |
---|
290 | the result of the comparison. |
---|
291 | |
---|
292 | Compute the length of a string, with three operands. Operand 0 is |
---|
293 | the result (of mode M), operand 1 is a `mem' referring to the |
---|
294 | first character of the string, operand 2 is the character to |
---|
295 | search for (normally zero), and operand 3 is a constant describing |
---|
296 | the known alignment of the beginning of the string. |
---|
297 | |
---|
298 | `floatMN2' |
---|
299 | Convert signed integer operand 1 (valid for fixed point mode M) to |
---|
300 | floating point mode N and store in operand 0 (which has mode N). |
---|
301 | |
---|
302 | `floatunsMN2' |
---|
303 | Convert unsigned integer operand 1 (valid for fixed point mode M) |
---|
304 | to floating point mode N and store in operand 0 (which has mode N). |
---|
305 | |
---|
306 | `fixMN2' |
---|
307 | Convert operand 1 (valid for floating point mode M) to fixed point |
---|
308 | mode N as a signed number and store in operand 0 (which has mode |
---|
309 | N). This instruction's result is defined only when the value of |
---|
310 | operand 1 is an integer. |
---|
311 | |
---|
312 | `fixunsMN2' |
---|
313 | Convert operand 1 (valid for floating point mode M) to fixed point |
---|
314 | mode N as an unsigned number and store in operand 0 (which has |
---|
315 | mode N). This instruction's result is defined only when the value |
---|
316 | of operand 1 is an integer. |
---|
317 | |
---|
318 | `ftruncM2' |
---|
319 | Convert operand 1 (valid for floating point mode M) to an integer |
---|
320 | value, still represented in floating point mode M, and store it in |
---|
321 | operand 0 (valid for floating point mode M). |
---|
322 | |
---|
323 | `fix_truncMN2' |
---|
324 | Like `fixMN2' but works for any floating point value of mode M by |
---|
325 | converting the value to an integer. |
---|
326 | |
---|
327 | `fixuns_truncMN2' |
---|
328 | Like `fixunsMN2' but works for any floating point value of mode M |
---|
329 | by converting the value to an integer. |
---|
330 | |
---|
331 | `truncMN' |
---|
332 | Truncate operand 1 (valid for mode M) to mode N and store in |
---|
333 | operand 0 (which has mode N). Both modes must be fixed point or |
---|
334 | both floating point. |
---|
335 | |
---|
336 | `extendMN' |
---|
337 | Sign-extend operand 1 (valid for mode M) to mode N and store in |
---|
338 | operand 0 (which has mode N). Both modes must be fixed point or |
---|
339 | both floating point. |
---|
340 | |
---|
341 | `zero_extendMN' |
---|
342 | Zero-extend operand 1 (valid for mode M) to mode N and store in |
---|
343 | operand 0 (which has mode N). Both modes must be fixed point. |
---|
344 | |
---|
345 | `extv' |
---|
346 | Extract a bit field from operand 1 (a register or memory operand), |
---|
347 | where operand 2 specifies the width in bits and operand 3 the |
---|
348 | starting bit, and store it in operand 0. Operand 0 must have mode |
---|
349 | `word_mode'. Operand 1 may have mode `byte_mode' or `word_mode'; |
---|
350 | often `word_mode' is allowed only for registers. Operands 2 and 3 |
---|
351 | must be valid for `word_mode'. |
---|
352 | |
---|
353 | The RTL generation pass generates this instruction only with |
---|
354 | constants for operands 2 and 3. |
---|
355 | |
---|
356 | The bit-field value is sign-extended to a full word integer before |
---|
357 | it is stored in operand 0. |
---|
358 | |
---|
359 | `extzv' |
---|
360 | Like `extv' except that the bit-field value is zero-extended. |
---|
361 | |
---|
362 | `insv' |
---|
363 | Store operand 3 (which must be valid for `word_mode') into a bit |
---|
364 | field in operand 0, where operand 1 specifies the width in bits and |
---|
365 | operand 2 the starting bit. Operand 0 may have mode `byte_mode' or |
---|
366 | `word_mode'; often `word_mode' is allowed only for registers. |
---|
367 | Operands 1 and 2 must be valid for `word_mode'. |
---|
368 | |
---|
369 | The RTL generation pass generates this instruction only with |
---|
370 | constants for operands 1 and 2. |
---|
371 | |
---|
372 | `movMODEcc' |
---|
373 | Conditionally move operand 2 or operand 3 into operand 0 according |
---|
374 | to the comparison in operand 1. If the comparison is true, |
---|
375 | operand 2 is moved into operand 0, otherwise operand 3 is moved. |
---|
376 | |
---|
377 | The mode of the operands being compared need not be the same as |
---|
378 | the operands being moved. Some machines, sparc64 for example, |
---|
379 | have instructions that conditionally move an integer value based |
---|
380 | on the floating point condition codes and vice versa. |
---|
381 | |
---|
382 | If the machine does not have conditional move instructions, do not |
---|
383 | define these patterns. |
---|
384 | |
---|
385 | `sCOND' |
---|
386 | Store zero or nonzero in the operand according to the condition |
---|
387 | codes. Value stored is nonzero iff the condition COND is true. |
---|
388 | cOND is the name of a comparison operation expression code, such |
---|
389 | as `eq', `lt' or `leu'. |
---|
390 | |
---|
391 | You specify the mode that the operand must have when you write the |
---|
392 | `match_operand' expression. The compiler automatically sees which |
---|
393 | mode you have used and supplies an operand of that mode. |
---|
394 | |
---|
395 | The value stored for a true condition must have 1 as its low bit, |
---|
396 | or else must be negative. Otherwise the instruction is not |
---|
397 | suitable and you should omit it from the machine description. You |
---|
398 | describe to the compiler exactly which value is stored by defining |
---|
399 | the macro `STORE_FLAG_VALUE' (*note Misc::.). If a description |
---|
400 | cannot be found that can be used for all the `sCOND' patterns, you |
---|
401 | should omit those operations from the machine description. |
---|
402 | |
---|
403 | These operations may fail, but should do so only in relatively |
---|
404 | uncommon cases; if they would fail for common cases involving |
---|
405 | integer comparisons, it is best to omit these patterns. |
---|
406 | |
---|
407 | If these operations are omitted, the compiler will usually |
---|
408 | generate code that copies the constant one to the target and |
---|
409 | branches around an assignment of zero to the target. If this code |
---|
410 | is more efficient than the potential instructions used for the |
---|
411 | `sCOND' pattern followed by those required to convert the result |
---|
412 | into a 1 or a zero in `SImode', you should omit the `sCOND' |
---|
413 | operations from the machine description. |
---|
414 | |
---|
415 | `bCOND' |
---|
416 | Conditional branch instruction. Operand 0 is a `label_ref' that |
---|
417 | refers to the label to jump to. Jump if the condition codes meet |
---|
418 | condition COND. |
---|
419 | |
---|
420 | Some machines do not follow the model assumed here where a |
---|
421 | comparison instruction is followed by a conditional branch |
---|
422 | instruction. In that case, the `cmpM' (and `tstM') patterns should |
---|
423 | simply store the operands away and generate all the required insns |
---|
424 | in a `define_expand' (*note Expander Definitions::.) for the |
---|
425 | conditional branch operations. All calls to expand `bCOND' |
---|
426 | patterns are immediately preceded by calls to expand either a |
---|
427 | `cmpM' pattern or a `tstM' pattern. |
---|
428 | |
---|
429 | Machines that use a pseudo register for the condition code value, |
---|
430 | or where the mode used for the comparison depends on the condition |
---|
431 | being tested, should also use the above mechanism. *Note Jump |
---|
432 | Patterns:: |
---|
433 | |
---|
434 | The above discussion also applies to the `movMODEcc' and `sCOND' |
---|
435 | patterns. |
---|
436 | |
---|
437 | `call' |
---|
438 | Subroutine call instruction returning no value. Operand 0 is the |
---|
439 | function to call; operand 1 is the number of bytes of arguments |
---|
440 | pushed (in mode `SImode', except it is normally a `const_int'); |
---|
441 | operand 2 is the number of registers used as operands. |
---|
442 | |
---|
443 | On most machines, operand 2 is not actually stored into the RTL |
---|
444 | pattern. It is supplied for the sake of some RISC machines which |
---|
445 | need to put this information into the assembler code; they can put |
---|
446 | it in the RTL instead of operand 1. |
---|
447 | |
---|
448 | Operand 0 should be a `mem' RTX whose address is the address of the |
---|
449 | function. Note, however, that this address can be a `symbol_ref' |
---|
450 | expression even if it would not be a legitimate memory address on |
---|
451 | the target machine. If it is also not a valid argument for a call |
---|
452 | instruction, the pattern for this operation should be a |
---|
453 | `define_expand' (*note Expander Definitions::.) that places the |
---|
454 | address into a register and uses that register in the call |
---|
455 | instruction. |
---|
456 | |
---|
457 | `call_value' |
---|
458 | Subroutine call instruction returning a value. Operand 0 is the |
---|
459 | hard register in which the value is returned. There are three more |
---|
460 | operands, the same as the three operands of the `call' instruction |
---|
461 | (but with numbers increased by one). |
---|
462 | |
---|
463 | Subroutines that return `BLKmode' objects use the `call' insn. |
---|
464 | |
---|
465 | `call_pop', `call_value_pop' |
---|
466 | Similar to `call' and `call_value', except used if defined and if |
---|
467 | `RETURN_POPS_ARGS' is non-zero. They should emit a `parallel' |
---|
468 | that contains both the function call and a `set' to indicate the |
---|
469 | adjustment made to the frame pointer. |
---|
470 | |
---|
471 | For machines where `RETURN_POPS_ARGS' can be non-zero, the use of |
---|
472 | these patterns increases the number of functions for which the |
---|
473 | frame pointer can be eliminated, if desired. |
---|
474 | |
---|
475 | `untyped_call' |
---|
476 | Subroutine call instruction returning a value of any type. |
---|
477 | Operand 0 is the function to call; operand 1 is a memory location |
---|
478 | where the result of calling the function is to be stored; operand |
---|
479 | 2 is a `parallel' expression where each element is a `set' |
---|
480 | expression that indicates the saving of a function return value |
---|
481 | into the result block. |
---|
482 | |
---|
483 | This instruction pattern should be defined to support |
---|
484 | `__builtin_apply' on machines where special instructions are needed |
---|
485 | to call a subroutine with arbitrary arguments or to save the value |
---|
486 | returned. This instruction pattern is required on machines that |
---|
487 | have multiple registers that can hold a return value (i.e. |
---|
488 | `FUNCTION_VALUE_REGNO_P' is true for more than one register). |
---|
489 | |
---|
490 | `return' |
---|
491 | Subroutine return instruction. This instruction pattern name |
---|
492 | should be defined only if a single instruction can do all the work |
---|
493 | of returning from a function. |
---|
494 | |
---|
495 | Like the `movM' patterns, this pattern is also used after the RTL |
---|
496 | generation phase. In this case it is to support machines where |
---|
497 | multiple instructions are usually needed to return from a |
---|
498 | function, but some class of functions only requires one |
---|
499 | instruction to implement a return. Normally, the applicable |
---|
500 | functions are those which do not need to save any registers or |
---|
501 | allocate stack space. |
---|
502 | |
---|
503 | For such machines, the condition specified in this pattern should |
---|
504 | only be true when `reload_completed' is non-zero and the function's |
---|
505 | epilogue would only be a single instruction. For machines with |
---|
506 | register windows, the routine `leaf_function_p' may be used to |
---|
507 | determine if a register window push is required. |
---|
508 | |
---|
509 | Machines that have conditional return instructions should define |
---|
510 | patterns such as |
---|
511 | |
---|
512 | (define_insn "" |
---|
513 | [(set (pc) |
---|
514 | (if_then_else (match_operator |
---|
515 | 0 "comparison_operator" |
---|
516 | [(cc0) (const_int 0)]) |
---|
517 | (return) |
---|
518 | (pc)))] |
---|
519 | "CONDITION" |
---|
520 | "...") |
---|
521 | |
---|
522 | where CONDITION would normally be the same condition specified on |
---|
523 | the named `return' pattern. |
---|
524 | |
---|
525 | `untyped_return' |
---|
526 | Untyped subroutine return instruction. This instruction pattern |
---|
527 | should be defined to support `__builtin_return' on machines where |
---|
528 | special instructions are needed to return a value of any type. |
---|
529 | |
---|
530 | Operand 0 is a memory location where the result of calling a |
---|
531 | function with `__builtin_apply' is stored; operand 1 is a |
---|
532 | `parallel' expression where each element is a `set' expression |
---|
533 | that indicates the restoring of a function return value from the |
---|
534 | result block. |
---|
535 | |
---|
536 | `nop' |
---|
537 | No-op instruction. This instruction pattern name should always be |
---|
538 | defined to output a no-op in assembler code. `(const_int 0)' will |
---|
539 | do as an RTL pattern. |
---|
540 | |
---|
541 | `indirect_jump' |
---|
542 | An instruction to jump to an address which is operand zero. This |
---|
543 | pattern name is mandatory on all machines. |
---|
544 | |
---|
545 | `casesi' |
---|
546 | Instruction to jump through a dispatch table, including bounds |
---|
547 | checking. This instruction takes five operands: |
---|
548 | |
---|
549 | 1. The index to dispatch on, which has mode `SImode'. |
---|
550 | |
---|
551 | 2. The lower bound for indices in the table, an integer constant. |
---|
552 | |
---|
553 | 3. The total range of indices in the table--the largest index |
---|
554 | minus the smallest one (both inclusive). |
---|
555 | |
---|
556 | 4. A label that precedes the table itself. |
---|
557 | |
---|
558 | 5. A label to jump to if the index has a value outside the |
---|
559 | bounds. (If the machine-description macro |
---|
560 | `CASE_DROPS_THROUGH' is defined, then an out-of-bounds index |
---|
561 | drops through to the code following the jump table instead of |
---|
562 | jumping to this label. In that case, this label is not |
---|
563 | actually used by the `casesi' instruction, but it is always |
---|
564 | provided as an operand.) |
---|
565 | |
---|
566 | The table is a `addr_vec' or `addr_diff_vec' inside of a |
---|
567 | `jump_insn'. The number of elements in the table is one plus the |
---|
568 | difference between the upper bound and the lower bound. |
---|
569 | |
---|
570 | `tablejump' |
---|
571 | Instruction to jump to a variable address. This is a low-level |
---|
572 | capability which can be used to implement a dispatch table when |
---|
573 | there is no `casesi' pattern. |
---|
574 | |
---|
575 | This pattern requires two operands: the address or offset, and a |
---|
576 | label which should immediately precede the jump table. If the |
---|
577 | macro `CASE_VECTOR_PC_RELATIVE' is defined then the first operand |
---|
578 | is an offset which counts from the address of the table; |
---|
579 | otherwise, it is an absolute address to jump to. In either case, |
---|
580 | the first operand has mode `Pmode'. |
---|
581 | |
---|
582 | The `tablejump' insn is always the last insn before the jump table |
---|
583 | it uses. Its assembler code normally has no need to use the |
---|
584 | second operand, but you should incorporate it in the RTL pattern so |
---|
585 | that the jump optimizer will not delete the table as unreachable |
---|
586 | code. |
---|
587 | |
---|
588 | `save_stack_block' |
---|
589 | `save_stack_function' |
---|
590 | `save_stack_nonlocal' |
---|
591 | `restore_stack_block' |
---|
592 | `restore_stack_function' |
---|
593 | `restore_stack_nonlocal' |
---|
594 | Most machines save and restore the stack pointer by copying it to |
---|
595 | or from an object of mode `Pmode'. Do not define these patterns on |
---|
596 | such machines. |
---|
597 | |
---|
598 | Some machines require special handling for stack pointer saves and |
---|
599 | restores. On those machines, define the patterns corresponding to |
---|
600 | the non-standard cases by using a `define_expand' (*note Expander |
---|
601 | Definitions::.) that produces the required insns. The three types |
---|
602 | of saves and restores are: |
---|
603 | |
---|
604 | 1. `save_stack_block' saves the stack pointer at the start of a |
---|
605 | block that allocates a variable-sized object, and |
---|
606 | `restore_stack_block' restores the stack pointer when the |
---|
607 | block is exited. |
---|
608 | |
---|
609 | 2. `save_stack_function' and `restore_stack_function' do a |
---|
610 | similar job for the outermost block of a function and are |
---|
611 | used when the function allocates variable-sized objects or |
---|
612 | calls `alloca'. Only the epilogue uses the restored stack |
---|
613 | pointer, allowing a simpler save or restore sequence on some |
---|
614 | machines. |
---|
615 | |
---|
616 | 3. `save_stack_nonlocal' is used in functions that contain labels |
---|
617 | branched to by nested functions. It saves the stack pointer |
---|
618 | in such a way that the inner function can use |
---|
619 | `restore_stack_nonlocal' to restore the stack pointer. The |
---|
620 | compiler generates code to restore the frame and argument |
---|
621 | pointer registers, but some machines require saving and |
---|
622 | restoring additional data such as register window information |
---|
623 | or stack backchains. Place insns in these patterns to save |
---|
624 | and restore any such required data. |
---|
625 | |
---|
626 | When saving the stack pointer, operand 0 is the save area and |
---|
627 | operand 1 is the stack pointer. The mode used to allocate the |
---|
628 | save area is the mode of operand 0. You must specify an integral |
---|
629 | mode, or `VOIDmode' if no save area is needed for a particular |
---|
630 | type of save (either because no save is needed or because a |
---|
631 | machine-specific save area can be used). Operand 0 is the stack |
---|
632 | pointer and operand 1 is the save area for restore operations. If |
---|
633 | `save_stack_block' is defined, operand 0 must not be `VOIDmode' |
---|
634 | since these saves can be arbitrarily nested. |
---|
635 | |
---|
636 | A save area is a `mem' that is at a constant offset from |
---|
637 | `virtual_stack_vars_rtx' when the stack pointer is saved for use by |
---|
638 | nonlocal gotos and a `reg' in the other two cases. |
---|
639 | |
---|
640 | `allocate_stack' |
---|
641 | Subtract (or add if `STACK_GROWS_DOWNWARD' is undefined) operand 0 |
---|
642 | from the stack pointer to create space for dynamically allocated |
---|
643 | data. |
---|
644 | |
---|
645 | Do not define this pattern if all that must be done is the |
---|
646 | subtraction. Some machines require other operations such as stack |
---|
647 | probes or maintaining the back chain. Define this pattern to emit |
---|
648 | those operations in addition to updating the stack pointer. |
---|
649 | |
---|
650 | |
---|
651 | File: gcc.info, Node: Pattern Ordering, Next: Dependent Patterns, Prev: Standard Names, Up: Machine Desc |
---|
652 | |
---|
653 | When the Order of Patterns Matters |
---|
654 | ================================== |
---|
655 | |
---|
656 | Sometimes an insn can match more than one instruction pattern. Then |
---|
657 | the pattern that appears first in the machine description is the one |
---|
658 | used. Therefore, more specific patterns (patterns that will match |
---|
659 | fewer things) and faster instructions (those that will produce better |
---|
660 | code when they do match) should usually go first in the description. |
---|
661 | |
---|
662 | In some cases the effect of ordering the patterns can be used to hide |
---|
663 | a pattern when it is not valid. For example, the 68000 has an |
---|
664 | instruction for converting a fullword to floating point and another for |
---|
665 | converting a byte to floating point. An instruction converting an |
---|
666 | integer to floating point could match either one. We put the pattern |
---|
667 | to convert the fullword first to make sure that one will be used rather |
---|
668 | than the other. (Otherwise a large integer might be generated as a |
---|
669 | single-byte immediate quantity, which would not work.) Instead of using |
---|
670 | this pattern ordering it would be possible to make the pattern for |
---|
671 | convert-a-byte smart enough to deal properly with any constant value. |
---|
672 | |
---|
673 | |
---|
674 | File: gcc.info, Node: Dependent Patterns, Next: Jump Patterns, Prev: Pattern Ordering, Up: Machine Desc |
---|
675 | |
---|
676 | Interdependence of Patterns |
---|
677 | =========================== |
---|
678 | |
---|
679 | Every machine description must have a named pattern for each of the |
---|
680 | conditional branch names `bCOND'. The recognition template must always |
---|
681 | have the form |
---|
682 | |
---|
683 | (set (pc) |
---|
684 | (if_then_else (COND (cc0) (const_int 0)) |
---|
685 | (label_ref (match_operand 0 "" "")) |
---|
686 | (pc))) |
---|
687 | |
---|
688 | In addition, every machine description must have an anonymous pattern |
---|
689 | for each of the possible reverse-conditional branches. Their templates |
---|
690 | look like |
---|
691 | |
---|
692 | (set (pc) |
---|
693 | (if_then_else (COND (cc0) (const_int 0)) |
---|
694 | (pc) |
---|
695 | (label_ref (match_operand 0 "" "")))) |
---|
696 | |
---|
697 | They are necessary because jump optimization can turn direct-conditional |
---|
698 | branches into reverse-conditional branches. |
---|
699 | |
---|
700 | It is often convenient to use the `match_operator' construct to |
---|
701 | reduce the number of patterns that must be specified for branches. For |
---|
702 | example, |
---|
703 | |
---|
704 | (define_insn "" |
---|
705 | [(set (pc) |
---|
706 | (if_then_else (match_operator 0 "comparison_operator" |
---|
707 | [(cc0) (const_int 0)]) |
---|
708 | (pc) |
---|
709 | (label_ref (match_operand 1 "" ""))))] |
---|
710 | "CONDITION" |
---|
711 | "...") |
---|
712 | |
---|
713 | In some cases machines support instructions identical except for the |
---|
714 | machine mode of one or more operands. For example, there may be |
---|
715 | "sign-extend halfword" and "sign-extend byte" instructions whose |
---|
716 | patterns are |
---|
717 | |
---|
718 | (set (match_operand:SI 0 ...) |
---|
719 | (extend:SI (match_operand:HI 1 ...))) |
---|
720 | |
---|
721 | (set (match_operand:SI 0 ...) |
---|
722 | (extend:SI (match_operand:QI 1 ...))) |
---|
723 | |
---|
724 | Constant integers do not specify a machine mode, so an instruction to |
---|
725 | extend a constant value could match either pattern. The pattern it |
---|
726 | actually will match is the one that appears first in the file. For |
---|
727 | correct results, this must be the one for the widest possible mode |
---|
728 | (`HImode', here). If the pattern matches the `QImode' instruction, the |
---|
729 | results will be incorrect if the constant value does not actually fit |
---|
730 | that mode. |
---|
731 | |
---|
732 | Such instructions to extend constants are rarely generated because |
---|
733 | they are optimized away, but they do occasionally happen in nonoptimized |
---|
734 | compilations. |
---|
735 | |
---|
736 | If a constraint in a pattern allows a constant, the reload pass may |
---|
737 | replace a register with a constant permitted by the constraint in some |
---|
738 | cases. Similarly for memory references. Because of this substitution, |
---|
739 | you should not provide separate patterns for increment and decrement |
---|
740 | instructions. Instead, they should be generated from the same pattern |
---|
741 | that supports register-register add insns by examining the operands and |
---|
742 | generating the appropriate machine instruction. |
---|
743 | |
---|
744 | |
---|
745 | File: gcc.info, Node: Jump Patterns, Next: Insn Canonicalizations, Prev: Dependent Patterns, Up: Machine Desc |
---|
746 | |
---|
747 | Defining Jump Instruction Patterns |
---|
748 | ================================== |
---|
749 | |
---|
750 | For most machines, GNU CC assumes that the machine has a condition |
---|
751 | code. A comparison insn sets the condition code, recording the results |
---|
752 | of both signed and unsigned comparison of the given operands. A |
---|
753 | separate branch insn tests the condition code and branches or not |
---|
754 | according its value. The branch insns come in distinct signed and |
---|
755 | unsigned flavors. Many common machines, such as the Vax, the 68000 and |
---|
756 | the 32000, work this way. |
---|
757 | |
---|
758 | Some machines have distinct signed and unsigned compare |
---|
759 | instructions, and only one set of conditional branch instructions. The |
---|
760 | easiest way to handle these machines is to treat them just like the |
---|
761 | others until the final stage where assembly code is written. At this |
---|
762 | time, when outputting code for the compare instruction, peek ahead at |
---|
763 | the following branch using `next_cc0_user (insn)'. (The variable |
---|
764 | `insn' refers to the insn being output, in the output-writing code in |
---|
765 | an instruction pattern.) If the RTL says that is an unsigned branch, |
---|
766 | output an unsigned compare; otherwise output a signed compare. When |
---|
767 | the branch itself is output, you can treat signed and unsigned branches |
---|
768 | identically. |
---|
769 | |
---|
770 | The reason you can do this is that GNU CC always generates a pair of |
---|
771 | consecutive RTL insns, possibly separated by `note' insns, one to set |
---|
772 | the condition code and one to test it, and keeps the pair inviolate |
---|
773 | until the end. |
---|
774 | |
---|
775 | To go with this technique, you must define the machine-description |
---|
776 | macro `NOTICE_UPDATE_CC' to do `CC_STATUS_INIT'; in other words, no |
---|
777 | compare instruction is superfluous. |
---|
778 | |
---|
779 | Some machines have compare-and-branch instructions and no condition |
---|
780 | code. A similar technique works for them. When it is time to "output" |
---|
781 | a compare instruction, record its operands in two static variables. |
---|
782 | When outputting the branch-on-condition-code instruction that follows, |
---|
783 | actually output a compare-and-branch instruction that uses the |
---|
784 | remembered operands. |
---|
785 | |
---|
786 | It also works to define patterns for compare-and-branch instructions. |
---|
787 | In optimizing compilation, the pair of compare and branch instructions |
---|
788 | will be combined according to these patterns. But this does not happen |
---|
789 | if optimization is not requested. So you must use one of the solutions |
---|
790 | above in addition to any special patterns you define. |
---|
791 | |
---|
792 | In many RISC machines, most instructions do not affect the condition |
---|
793 | code and there may not even be a separate condition code register. On |
---|
794 | these machines, the restriction that the definition and use of the |
---|
795 | condition code be adjacent insns is not necessary and can prevent |
---|
796 | important optimizations. For example, on the IBM RS/6000, there is a |
---|
797 | delay for taken branches unless the condition code register is set three |
---|
798 | instructions earlier than the conditional branch. The instruction |
---|
799 | scheduler cannot perform this optimization if it is not permitted to |
---|
800 | separate the definition and use of the condition code register. |
---|
801 | |
---|
802 | On these machines, do not use `(cc0)', but instead use a register to |
---|
803 | represent the condition code. If there is a specific condition code |
---|
804 | register in the machine, use a hard register. If the condition code or |
---|
805 | comparison result can be placed in any general register, or if there are |
---|
806 | multiple condition registers, use a pseudo register. |
---|
807 | |
---|
808 | On some machines, the type of branch instruction generated may |
---|
809 | depend on the way the condition code was produced; for example, on the |
---|
810 | 68k and Sparc, setting the condition code directly from an add or |
---|
811 | subtract instruction does not clear the overflow bit the way that a test |
---|
812 | instruction does, so a different branch instruction must be used for |
---|
813 | some conditional branches. For machines that use `(cc0)', the set and |
---|
814 | use of the condition code must be adjacent (separated only by `note' |
---|
815 | insns) allowing flags in `cc_status' to be used. (*Note Condition |
---|
816 | Code::.) Also, the comparison and branch insns can be located from |
---|
817 | each other by using the functions `prev_cc0_setter' and `next_cc0_user'. |
---|
818 | |
---|
819 | However, this is not true on machines that do not use `(cc0)'. On |
---|
820 | those machines, no assumptions can be made about the adjacency of the |
---|
821 | compare and branch insns and the above methods cannot be used. Instead, |
---|
822 | we use the machine mode of the condition code register to record |
---|
823 | different formats of the condition code register. |
---|
824 | |
---|
825 | Registers used to store the condition code value should have a mode |
---|
826 | that is in class `MODE_CC'. Normally, it will be `CCmode'. If |
---|
827 | additional modes are required (as for the add example mentioned above in |
---|
828 | the Sparc), define the macro `EXTRA_CC_MODES' to list the additional |
---|
829 | modes required (*note Condition Code::.). Also define `EXTRA_CC_NAMES' |
---|
830 | to list the names of those modes and `SELECT_CC_MODE' to choose a mode |
---|
831 | given an operand of a compare. |
---|
832 | |
---|
833 | If it is known during RTL generation that a different mode will be |
---|
834 | required (for example, if the machine has separate compare instructions |
---|
835 | for signed and unsigned quantities, like most IBM processors), they can |
---|
836 | be specified at that time. |
---|
837 | |
---|
838 | If the cases that require different modes would be made by |
---|
839 | instruction combination, the macro `SELECT_CC_MODE' determines which |
---|
840 | machine mode should be used for the comparison result. The patterns |
---|
841 | should be written using that mode. To support the case of the add on |
---|
842 | the Sparc discussed above, we have the pattern |
---|
843 | |
---|
844 | (define_insn "" |
---|
845 | [(set (reg:CC_NOOV 0) |
---|
846 | (compare:CC_NOOV |
---|
847 | (plus:SI (match_operand:SI 0 "register_operand" "%r") |
---|
848 | (match_operand:SI 1 "arith_operand" "rI")) |
---|
849 | (const_int 0)))] |
---|
850 | "" |
---|
851 | "...") |
---|
852 | |
---|
853 | The `SELECT_CC_MODE' macro on the Sparc returns `CC_NOOVmode' for |
---|
854 | comparisons whose argument is a `plus'. |
---|
855 | |
---|
856 | |
---|
857 | File: gcc.info, Node: Insn Canonicalizations, Next: Peephole Definitions, Prev: Jump Patterns, Up: Machine Desc |
---|
858 | |
---|
859 | Canonicalization of Instructions |
---|
860 | ================================ |
---|
861 | |
---|
862 | There are often cases where multiple RTL expressions could represent |
---|
863 | an operation performed by a single machine instruction. This situation |
---|
864 | is most commonly encountered with logical, branch, and |
---|
865 | multiply-accumulate instructions. In such cases, the compiler attempts |
---|
866 | to convert these multiple RTL expressions into a single canonical form |
---|
867 | to reduce the number of insn patterns required. |
---|
868 | |
---|
869 | In addition to algebraic simplifications, following canonicalizations |
---|
870 | are performed: |
---|
871 | |
---|
872 | * For commutative and comparison operators, a constant is always |
---|
873 | made the second operand. If a machine only supports a constant as |
---|
874 | the second operand, only patterns that match a constant in the |
---|
875 | second operand need be supplied. |
---|
876 | |
---|
877 | For these operators, if only one operand is a `neg', `not', |
---|
878 | `mult', `plus', or `minus' expression, it will be the first |
---|
879 | operand. |
---|
880 | |
---|
881 | * For the `compare' operator, a constant is always the second operand |
---|
882 | on machines where `cc0' is used (*note Jump Patterns::.). On other |
---|
883 | machines, there are rare cases where the compiler might want to |
---|
884 | construct a `compare' with a constant as the first operand. |
---|
885 | However, these cases are not common enough for it to be worthwhile |
---|
886 | to provide a pattern matching a constant as the first operand |
---|
887 | unless the machine actually has such an instruction. |
---|
888 | |
---|
889 | An operand of `neg', `not', `mult', `plus', or `minus' is made the |
---|
890 | first operand under the same conditions as above. |
---|
891 | |
---|
892 | * `(minus X (const_int N))' is converted to `(plus X (const_int |
---|
893 | -N))'. |
---|
894 | |
---|
895 | * Within address computations (i.e., inside `mem'), a left shift is |
---|
896 | converted into the appropriate multiplication by a power of two. |
---|
897 | |
---|
898 | De`Morgan's Law is used to move bitwise negation inside a bitwise |
---|
899 | logical-and or logical-or operation. If this results in only one |
---|
900 | operand being a `not' expression, it will be the first one. |
---|
901 | |
---|
902 | A machine that has an instruction that performs a bitwise |
---|
903 | logical-and of one operand with the bitwise negation of the other |
---|
904 | should specify the pattern for that instruction as |
---|
905 | |
---|
906 | (define_insn "" |
---|
907 | [(set (match_operand:M 0 ...) |
---|
908 | (and:M (not:M (match_operand:M 1 ...)) |
---|
909 | (match_operand:M 2 ...)))] |
---|
910 | "..." |
---|
911 | "...") |
---|
912 | |
---|
913 | Similarly, a pattern for a "NAND" instruction should be written |
---|
914 | |
---|
915 | (define_insn "" |
---|
916 | [(set (match_operand:M 0 ...) |
---|
917 | (ior:M (not:M (match_operand:M 1 ...)) |
---|
918 | (not:M (match_operand:M 2 ...))))] |
---|
919 | "..." |
---|
920 | "...") |
---|
921 | |
---|
922 | In both cases, it is not necessary to include patterns for the many |
---|
923 | logically equivalent RTL expressions. |
---|
924 | |
---|
925 | * The only possible RTL expressions involving both bitwise |
---|
926 | exclusive-or and bitwise negation are `(xor:M X Y)' and `(not:M |
---|
927 | (xor:M X Y))'. |
---|
928 | |
---|
929 | * The sum of three items, one of which is a constant, will only |
---|
930 | appear in the form |
---|
931 | |
---|
932 | (plus:M (plus:M X Y) CONSTANT) |
---|
933 | |
---|
934 | * On machines that do not use `cc0', `(compare X (const_int 0))' |
---|
935 | will be converted to X. |
---|
936 | |
---|
937 | * Equality comparisons of a group of bits (usually a single bit) |
---|
938 | with zero will be written using `zero_extract' rather than the |
---|
939 | equivalent `and' or `sign_extract' operations. |
---|
940 | |
---|
941 | |
---|
942 | File: gcc.info, Node: Peephole Definitions, Next: Expander Definitions, Prev: Insn Canonicalizations, Up: Machine Desc |
---|
943 | |
---|
944 | Machine-Specific Peephole Optimizers |
---|
945 | ==================================== |
---|
946 | |
---|
947 | In addition to instruction patterns the `md' file may contain |
---|
948 | definitions of machine-specific peephole optimizations. |
---|
949 | |
---|
950 | The combiner does not notice certain peephole optimizations when the |
---|
951 | data flow in the program does not suggest that it should try them. For |
---|
952 | example, sometimes two consecutive insns related in purpose can be |
---|
953 | combined even though the second one does not appear to use a register |
---|
954 | computed in the first one. A machine-specific peephole optimizer can |
---|
955 | detect such opportunities. |
---|
956 | |
---|
957 | A definition looks like this: |
---|
958 | |
---|
959 | (define_peephole |
---|
960 | [INSN-PATTERN-1 |
---|
961 | INSN-PATTERN-2 |
---|
962 | ...] |
---|
963 | "CONDITION" |
---|
964 | "TEMPLATE" |
---|
965 | "OPTIONAL INSN-ATTRIBUTES") |
---|
966 | |
---|
967 | The last string operand may be omitted if you are not using any |
---|
968 | machine-specific information in this machine description. If present, |
---|
969 | it must obey the same rules as in a `define_insn'. |
---|
970 | |
---|
971 | In this skeleton, INSN-PATTERN-1 and so on are patterns to match |
---|
972 | consecutive insns. The optimization applies to a sequence of insns when |
---|
973 | INSN-PATTERN-1 matches the first one, INSN-PATTERN-2 matches the next, |
---|
974 | and so on. |
---|
975 | |
---|
976 | Each of the insns matched by a peephole must also match a |
---|
977 | `define_insn'. Peepholes are checked only at the last stage just |
---|
978 | before code generation, and only optionally. Therefore, any insn which |
---|
979 | would match a peephole but no `define_insn' will cause a crash in code |
---|
980 | generation in an unoptimized compilation, or at various optimization |
---|
981 | stages. |
---|
982 | |
---|
983 | The operands of the insns are matched with `match_operands', |
---|
984 | `match_operator', and `match_dup', as usual. What is not usual is that |
---|
985 | the operand numbers apply to all the insn patterns in the definition. |
---|
986 | So, you can check for identical operands in two insns by using |
---|
987 | `match_operand' in one insn and `match_dup' in the other. |
---|
988 | |
---|
989 | The operand constraints used in `match_operand' patterns do not have |
---|
990 | any direct effect on the applicability of the peephole, but they will |
---|
991 | be validated afterward, so make sure your constraints are general enough |
---|
992 | to apply whenever the peephole matches. If the peephole matches but |
---|
993 | the constraints are not satisfied, the compiler will crash. |
---|
994 | |
---|
995 | It is safe to omit constraints in all the operands of the peephole; |
---|
996 | or you can write constraints which serve as a double-check on the |
---|
997 | criteria previously tested. |
---|
998 | |
---|
999 | Once a sequence of insns matches the patterns, the CONDITION is |
---|
1000 | checked. This is a C expression which makes the final decision whether |
---|
1001 | to perform the optimization (we do so if the expression is nonzero). If |
---|
1002 | CONDITION is omitted (in other words, the string is empty) then the |
---|
1003 | optimization is applied to every sequence of insns that matches the |
---|
1004 | patterns. |
---|
1005 | |
---|
1006 | The defined peephole optimizations are applied after register |
---|
1007 | allocation is complete. Therefore, the peephole definition can check |
---|
1008 | which operands have ended up in which kinds of registers, just by |
---|
1009 | looking at the operands. |
---|
1010 | |
---|
1011 | The way to refer to the operands in CONDITION is to write |
---|
1012 | `operands[I]' for operand number I (as matched by `(match_operand I |
---|
1013 | ...)'). Use the variable `insn' to refer to the last of the insns |
---|
1014 | being matched; use `prev_active_insn' to find the preceding insns. |
---|
1015 | |
---|
1016 | When optimizing computations with intermediate results, you can use |
---|
1017 | CONDITION to match only when the intermediate results are not used |
---|
1018 | elsewhere. Use the C expression `dead_or_set_p (INSN, OP)', where INSN |
---|
1019 | is the insn in which you expect the value to be used for the last time |
---|
1020 | (from the value of `insn', together with use of `prev_nonnote_insn'), |
---|
1021 | and OP is the intermediate value (from `operands[I]'). |
---|
1022 | |
---|
1023 | Applying the optimization means replacing the sequence of insns with |
---|
1024 | one new insn. The TEMPLATE controls ultimate output of assembler code |
---|
1025 | for this combined insn. It works exactly like the template of a |
---|
1026 | `define_insn'. Operand numbers in this template are the same ones used |
---|
1027 | in matching the original sequence of insns. |
---|
1028 | |
---|
1029 | The result of a defined peephole optimizer does not need to match |
---|
1030 | any of the insn patterns in the machine description; it does not even |
---|
1031 | have an opportunity to match them. The peephole optimizer definition |
---|
1032 | itself serves as the insn pattern to control how the insn is output. |
---|
1033 | |
---|
1034 | Defined peephole optimizers are run as assembler code is being |
---|
1035 | output, so the insns they produce are never combined or rearranged in |
---|
1036 | any way. |
---|
1037 | |
---|
1038 | Here is an example, taken from the 68000 machine description: |
---|
1039 | |
---|
1040 | (define_peephole |
---|
1041 | [(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4))) |
---|
1042 | (set (match_operand:DF 0 "register_operand" "=f") |
---|
1043 | (match_operand:DF 1 "register_operand" "ad"))] |
---|
1044 | "FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])" |
---|
1045 | "* |
---|
1046 | { |
---|
1047 | rtx xoperands[2]; |
---|
1048 | xoperands[1] = gen_rtx (REG, SImode, REGNO (operands[1]) + 1); |
---|
1049 | #ifdef MOTOROLA |
---|
1050 | output_asm_insn (\"move.l %1,(sp)\", xoperands); |
---|
1051 | output_asm_insn (\"move.l %1,-(sp)\", operands); |
---|
1052 | return \"fmove.d (sp)+,%0\"; |
---|
1053 | #else |
---|
1054 | output_asm_insn (\"movel %1,sp@\", xoperands); |
---|
1055 | output_asm_insn (\"movel %1,sp@-\", operands); |
---|
1056 | return \"fmoved sp@+,%0\"; |
---|
1057 | #endif |
---|
1058 | } |
---|
1059 | ") |
---|
1060 | |
---|
1061 | The effect of this optimization is to change |
---|
1062 | |
---|
1063 | jbsr _foobar |
---|
1064 | addql #4,sp |
---|
1065 | movel d1,sp@- |
---|
1066 | movel d0,sp@- |
---|
1067 | fmoved sp@+,fp0 |
---|
1068 | |
---|
1069 | into |
---|
1070 | |
---|
1071 | jbsr _foobar |
---|
1072 | movel d1,sp@ |
---|
1073 | movel d0,sp@- |
---|
1074 | fmoved sp@+,fp0 |
---|
1075 | |
---|
1076 | INSN-PATTERN-1 and so on look *almost* like the second operand of |
---|
1077 | `define_insn'. There is one important difference: the second operand |
---|
1078 | of `define_insn' consists of one or more RTX's enclosed in square |
---|
1079 | brackets. Usually, there is only one: then the same action can be |
---|
1080 | written as an element of a `define_peephole'. But when there are |
---|
1081 | multiple actions in a `define_insn', they are implicitly enclosed in a |
---|
1082 | `parallel'. Then you must explicitly write the `parallel', and the |
---|
1083 | square brackets within it, in the `define_peephole'. Thus, if an insn |
---|
1084 | pattern looks like this, |
---|
1085 | |
---|
1086 | (define_insn "divmodsi4" |
---|
1087 | [(set (match_operand:SI 0 "general_operand" "=d") |
---|
1088 | (div:SI (match_operand:SI 1 "general_operand" "0") |
---|
1089 | (match_operand:SI 2 "general_operand" "dmsK"))) |
---|
1090 | (set (match_operand:SI 3 "general_operand" "=d") |
---|
1091 | (mod:SI (match_dup 1) (match_dup 2)))] |
---|
1092 | "TARGET_68020" |
---|
1093 | "divsl%.l %2,%3:%0") |
---|
1094 | |
---|
1095 | then the way to mention this insn in a peephole is as follows: |
---|
1096 | |
---|
1097 | (define_peephole |
---|
1098 | [... |
---|
1099 | (parallel |
---|
1100 | [(set (match_operand:SI 0 "general_operand" "=d") |
---|
1101 | (div:SI (match_operand:SI 1 "general_operand" "0") |
---|
1102 | (match_operand:SI 2 "general_operand" "dmsK"))) |
---|
1103 | (set (match_operand:SI 3 "general_operand" "=d") |
---|
1104 | (mod:SI (match_dup 1) (match_dup 2)))]) |
---|
1105 | ...] |
---|
1106 | ...) |
---|
1107 | |
---|