1 | =head1 NAME |
---|
2 | |
---|
3 | perlguts - Introduction to the Perl API |
---|
4 | |
---|
5 | =head1 DESCRIPTION |
---|
6 | |
---|
7 | This document attempts to describe how to use the Perl API, as well as |
---|
8 | containing some info on the basic workings of the Perl core. It is far |
---|
9 | from complete and probably contains many errors. Please refer any |
---|
10 | questions or comments to the author below. |
---|
11 | |
---|
12 | =head1 Variables |
---|
13 | |
---|
14 | =head2 Datatypes |
---|
15 | |
---|
16 | Perl has three typedefs that handle Perl's three main data types: |
---|
17 | |
---|
18 | SV Scalar Value |
---|
19 | AV Array Value |
---|
20 | HV Hash Value |
---|
21 | |
---|
22 | Each typedef has specific routines that manipulate the various data types. |
---|
23 | |
---|
24 | =head2 What is an "IV"? |
---|
25 | |
---|
26 | Perl uses a special typedef IV which is a simple signed integer type that is |
---|
27 | guaranteed to be large enough to hold a pointer (as well as an integer). |
---|
28 | Additionally, there is the UV, which is simply an unsigned IV. |
---|
29 | |
---|
30 | Perl also uses two special typedefs, I32 and I16, which will always be at |
---|
31 | least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16, |
---|
32 | as well.) |
---|
33 | |
---|
34 | =head2 Working with SVs |
---|
35 | |
---|
36 | An SV can be created and loaded with one command. There are four types of |
---|
37 | values that can be loaded: an integer value (IV), a double (NV), |
---|
38 | a string (PV), and another scalar (SV). |
---|
39 | |
---|
40 | The six routines are: |
---|
41 | |
---|
42 | SV* newSViv(IV); |
---|
43 | SV* newSVnv(double); |
---|
44 | SV* newSVpv(const char*, int); |
---|
45 | SV* newSVpvn(const char*, int); |
---|
46 | SV* newSVpvf(const char*, ...); |
---|
47 | SV* newSVsv(SV*); |
---|
48 | |
---|
49 | To change the value of an *already-existing* SV, there are seven routines: |
---|
50 | |
---|
51 | void sv_setiv(SV*, IV); |
---|
52 | void sv_setuv(SV*, UV); |
---|
53 | void sv_setnv(SV*, double); |
---|
54 | void sv_setpv(SV*, const char*); |
---|
55 | void sv_setpvn(SV*, const char*, int) |
---|
56 | void sv_setpvf(SV*, const char*, ...); |
---|
57 | void sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); |
---|
58 | void sv_setsv(SV*, SV*); |
---|
59 | |
---|
60 | Notice that you can choose to specify the length of the string to be |
---|
61 | assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may |
---|
62 | allow Perl to calculate the length by using C<sv_setpv> or by specifying |
---|
63 | 0 as the second argument to C<newSVpv>. Be warned, though, that Perl will |
---|
64 | determine the string's length by using C<strlen>, which depends on the |
---|
65 | string terminating with a NUL character. |
---|
66 | |
---|
67 | The arguments of C<sv_setpvf> are processed like C<sprintf>, and the |
---|
68 | formatted output becomes the value. |
---|
69 | |
---|
70 | C<sv_setpvfn> is an analogue of C<vsprintf>, but it allows you to specify |
---|
71 | either a pointer to a variable argument list or the address and length of |
---|
72 | an array of SVs. The last argument points to a boolean; on return, if that |
---|
73 | boolean is true, then locale-specific information has been used to format |
---|
74 | the string, and the string's contents are therefore untrustworthy (see |
---|
75 | L<perlsec>). This pointer may be NULL if that information is not |
---|
76 | important. Note that this function requires you to specify the length of |
---|
77 | the format. |
---|
78 | |
---|
79 | STRLEN is an integer type (Size_t, usually defined as size_t in |
---|
80 | config.h) guaranteed to be large enough to represent the size of |
---|
81 | any string that perl can handle. |
---|
82 | |
---|
83 | The C<sv_set*()> functions are not generic enough to operate on values |
---|
84 | that have "magic". See L<Magic Virtual Tables> later in this document. |
---|
85 | |
---|
86 | All SVs that contain strings should be terminated with a NUL character. |
---|
87 | If it is not NUL-terminated there is a risk of |
---|
88 | core dumps and corruptions from code which passes the string to C |
---|
89 | functions or system calls which expect a NUL-terminated string. |
---|
90 | Perl's own functions typically add a trailing NUL for this reason. |
---|
91 | Nevertheless, you should be very careful when you pass a string stored |
---|
92 | in an SV to a C function or system call. |
---|
93 | |
---|
94 | To access the actual value that an SV points to, you can use the macros: |
---|
95 | |
---|
96 | SvIV(SV*) |
---|
97 | SvUV(SV*) |
---|
98 | SvNV(SV*) |
---|
99 | SvPV(SV*, STRLEN len) |
---|
100 | SvPV_nolen(SV*) |
---|
101 | |
---|
102 | which will automatically coerce the actual scalar type into an IV, UV, double, |
---|
103 | or string. |
---|
104 | |
---|
105 | In the C<SvPV> macro, the length of the string returned is placed into the |
---|
106 | variable C<len> (this is a macro, so you do I<not> use C<&len>). If you do |
---|
107 | not care what the length of the data is, use the C<SvPV_nolen> macro. |
---|
108 | Historically the C<SvPV> macro with the global variable C<PL_na> has been |
---|
109 | used in this case. But that can be quite inefficient because C<PL_na> must |
---|
110 | be accessed in thread-local storage in threaded Perl. In any case, remember |
---|
111 | that Perl allows arbitrary strings of data that may both contain NULs and |
---|
112 | might not be terminated by a NUL. |
---|
113 | |
---|
114 | Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len), |
---|
115 | len);>. It might work with your compiler, but it won't work for everyone. |
---|
116 | Break this sort of statement up into separate assignments: |
---|
117 | |
---|
118 | SV *s; |
---|
119 | STRLEN len; |
---|
120 | char * ptr; |
---|
121 | ptr = SvPV(s, len); |
---|
122 | foo(ptr, len); |
---|
123 | |
---|
124 | If you want to know if the scalar value is TRUE, you can use: |
---|
125 | |
---|
126 | SvTRUE(SV*) |
---|
127 | |
---|
128 | Although Perl will automatically grow strings for you, if you need to force |
---|
129 | Perl to allocate more memory for your SV, you can use the macro |
---|
130 | |
---|
131 | SvGROW(SV*, STRLEN newlen) |
---|
132 | |
---|
133 | which will determine if more memory needs to be allocated. If so, it will |
---|
134 | call the function C<sv_grow>. Note that C<SvGROW> can only increase, not |
---|
135 | decrease, the allocated memory of an SV and that it does not automatically |
---|
136 | add a byte for the a trailing NUL (perl's own string functions typically do |
---|
137 | C<SvGROW(sv, len + 1)>). |
---|
138 | |
---|
139 | If you have an SV and want to know what kind of data Perl thinks is stored |
---|
140 | in it, you can use the following macros to check the type of SV you have. |
---|
141 | |
---|
142 | SvIOK(SV*) |
---|
143 | SvNOK(SV*) |
---|
144 | SvPOK(SV*) |
---|
145 | |
---|
146 | You can get and set the current length of the string stored in an SV with |
---|
147 | the following macros: |
---|
148 | |
---|
149 | SvCUR(SV*) |
---|
150 | SvCUR_set(SV*, I32 val) |
---|
151 | |
---|
152 | You can also get a pointer to the end of the string stored in the SV |
---|
153 | with the macro: |
---|
154 | |
---|
155 | SvEND(SV*) |
---|
156 | |
---|
157 | But note that these last three macros are valid only if C<SvPOK()> is true. |
---|
158 | |
---|
159 | If you want to append something to the end of string stored in an C<SV*>, |
---|
160 | you can use the following functions: |
---|
161 | |
---|
162 | void sv_catpv(SV*, const char*); |
---|
163 | void sv_catpvn(SV*, const char*, STRLEN); |
---|
164 | void sv_catpvf(SV*, const char*, ...); |
---|
165 | void sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool); |
---|
166 | void sv_catsv(SV*, SV*); |
---|
167 | |
---|
168 | The first function calculates the length of the string to be appended by |
---|
169 | using C<strlen>. In the second, you specify the length of the string |
---|
170 | yourself. The third function processes its arguments like C<sprintf> and |
---|
171 | appends the formatted output. The fourth function works like C<vsprintf>. |
---|
172 | You can specify the address and length of an array of SVs instead of the |
---|
173 | va_list argument. The fifth function extends the string stored in the first |
---|
174 | SV with the string stored in the second SV. It also forces the second SV |
---|
175 | to be interpreted as a string. |
---|
176 | |
---|
177 | The C<sv_cat*()> functions are not generic enough to operate on values that |
---|
178 | have "magic". See L<Magic Virtual Tables> later in this document. |
---|
179 | |
---|
180 | If you know the name of a scalar variable, you can get a pointer to its SV |
---|
181 | by using the following: |
---|
182 | |
---|
183 | SV* get_sv("package::varname", FALSE); |
---|
184 | |
---|
185 | This returns NULL if the variable does not exist. |
---|
186 | |
---|
187 | If you want to know if this variable (or any other SV) is actually C<defined>, |
---|
188 | you can call: |
---|
189 | |
---|
190 | SvOK(SV*) |
---|
191 | |
---|
192 | The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. Its |
---|
193 | address can be used whenever an C<SV*> is needed. |
---|
194 | |
---|
195 | There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean |
---|
196 | TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their addresses can |
---|
197 | be used whenever an C<SV*> is needed. |
---|
198 | |
---|
199 | Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. |
---|
200 | Take this code: |
---|
201 | |
---|
202 | SV* sv = (SV*) 0; |
---|
203 | if (I-am-to-return-a-real-value) { |
---|
204 | sv = sv_2mortal(newSViv(42)); |
---|
205 | } |
---|
206 | sv_setsv(ST(0), sv); |
---|
207 | |
---|
208 | This code tries to return a new SV (which contains the value 42) if it should |
---|
209 | return a real value, or undef otherwise. Instead it has returned a NULL |
---|
210 | pointer which, somewhere down the line, will cause a segmentation violation, |
---|
211 | bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the first |
---|
212 | line and all will be well. |
---|
213 | |
---|
214 | To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this |
---|
215 | call is not necessary (see L<Reference Counts and Mortality>). |
---|
216 | |
---|
217 | =head2 Offsets |
---|
218 | |
---|
219 | Perl provides the function C<sv_chop> to efficiently remove characters |
---|
220 | from the beginning of a string; you give it an SV and a pointer to |
---|
221 | somewhere inside the the PV, and it discards everything before the |
---|
222 | pointer. The efficiency comes by means of a little hack: instead of |
---|
223 | actually removing the characters, C<sv_chop> sets the flag C<OOK> |
---|
224 | (offset OK) to signal to other functions that the offset hack is in |
---|
225 | effect, and it puts the number of bytes chopped off into the IV field |
---|
226 | of the SV. It then moves the PV pointer (called C<SvPVX>) forward that |
---|
227 | many bytes, and adjusts C<SvCUR> and C<SvLEN>. |
---|
228 | |
---|
229 | Hence, at this point, the start of the buffer that we allocated lives |
---|
230 | at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing |
---|
231 | into the middle of this allocated storage. |
---|
232 | |
---|
233 | This is best demonstrated by example: |
---|
234 | |
---|
235 | % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)' |
---|
236 | SV = PVIV(0x8128450) at 0x81340f0 |
---|
237 | REFCNT = 1 |
---|
238 | FLAGS = (POK,OOK,pPOK) |
---|
239 | IV = 1 (OFFSET) |
---|
240 | PV = 0x8135781 ( "1" . ) "2345"\0 |
---|
241 | CUR = 4 |
---|
242 | LEN = 5 |
---|
243 | |
---|
244 | Here the number of bytes chopped off (1) is put into IV, and |
---|
245 | C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The |
---|
246 | portion of the string between the "real" and the "fake" beginnings is |
---|
247 | shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect |
---|
248 | the fake beginning, not the real one. |
---|
249 | |
---|
250 | Something similar to the offset hack is perfomed on AVs to enable |
---|
251 | efficient shifting and splicing off the beginning of the array; while |
---|
252 | C<AvARRAY> points to the first element in the array that is visible from |
---|
253 | Perl, C<AvALLOC> points to the real start of the C array. These are |
---|
254 | usually the same, but a C<shift> operation can be carried out by |
---|
255 | increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>. |
---|
256 | Again, the location of the real start of the C array only comes into |
---|
257 | play when freeing the array. See C<av_shift> in F<av.c>. |
---|
258 | |
---|
259 | =head2 What's Really Stored in an SV? |
---|
260 | |
---|
261 | Recall that the usual method of determining the type of scalar you have is |
---|
262 | to use C<Sv*OK> macros. Because a scalar can be both a number and a string, |
---|
263 | usually these macros will always return TRUE and calling the C<Sv*V> |
---|
264 | macros will do the appropriate conversion of string to integer/double or |
---|
265 | integer/double to string. |
---|
266 | |
---|
267 | If you I<really> need to know if you have an integer, double, or string |
---|
268 | pointer in an SV, you can use the following three macros instead: |
---|
269 | |
---|
270 | SvIOKp(SV*) |
---|
271 | SvNOKp(SV*) |
---|
272 | SvPOKp(SV*) |
---|
273 | |
---|
274 | These will tell you if you truly have an integer, double, or string pointer |
---|
275 | stored in your SV. The "p" stands for private. |
---|
276 | |
---|
277 | In general, though, it's best to use the C<Sv*V> macros. |
---|
278 | |
---|
279 | =head2 Working with AVs |
---|
280 | |
---|
281 | There are two ways to create and load an AV. The first method creates an |
---|
282 | empty AV: |
---|
283 | |
---|
284 | AV* newAV(); |
---|
285 | |
---|
286 | The second method both creates the AV and initially populates it with SVs: |
---|
287 | |
---|
288 | AV* av_make(I32 num, SV **ptr); |
---|
289 | |
---|
290 | The second argument points to an array containing C<num> C<SV*>'s. Once the |
---|
291 | AV has been created, the SVs can be destroyed, if so desired. |
---|
292 | |
---|
293 | Once the AV has been created, the following operations are possible on AVs: |
---|
294 | |
---|
295 | void av_push(AV*, SV*); |
---|
296 | SV* av_pop(AV*); |
---|
297 | SV* av_shift(AV*); |
---|
298 | void av_unshift(AV*, I32 num); |
---|
299 | |
---|
300 | These should be familiar operations, with the exception of C<av_unshift>. |
---|
301 | This routine adds C<num> elements at the front of the array with the C<undef> |
---|
302 | value. You must then use C<av_store> (described below) to assign values |
---|
303 | to these new elements. |
---|
304 | |
---|
305 | Here are some other functions: |
---|
306 | |
---|
307 | I32 av_len(AV*); |
---|
308 | SV** av_fetch(AV*, I32 key, I32 lval); |
---|
309 | SV** av_store(AV*, I32 key, SV* val); |
---|
310 | |
---|
311 | The C<av_len> function returns the highest index value in array (just |
---|
312 | like $#array in Perl). If the array is empty, -1 is returned. The |
---|
313 | C<av_fetch> function returns the value at index C<key>, but if C<lval> |
---|
314 | is non-zero, then C<av_fetch> will store an undef value at that index. |
---|
315 | The C<av_store> function stores the value C<val> at index C<key>, and does |
---|
316 | not increment the reference count of C<val>. Thus the caller is responsible |
---|
317 | for taking care of that, and if C<av_store> returns NULL, the caller will |
---|
318 | have to decrement the reference count to avoid a memory leak. Note that |
---|
319 | C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their |
---|
320 | return value. |
---|
321 | |
---|
322 | void av_clear(AV*); |
---|
323 | void av_undef(AV*); |
---|
324 | void av_extend(AV*, I32 key); |
---|
325 | |
---|
326 | The C<av_clear> function deletes all the elements in the AV* array, but |
---|
327 | does not actually delete the array itself. The C<av_undef> function will |
---|
328 | delete all the elements in the array plus the array itself. The |
---|
329 | C<av_extend> function extends the array so that it contains at least C<key+1> |
---|
330 | elements. If C<key+1> is less than the currently allocated length of the array, |
---|
331 | then nothing is done. |
---|
332 | |
---|
333 | If you know the name of an array variable, you can get a pointer to its AV |
---|
334 | by using the following: |
---|
335 | |
---|
336 | AV* get_av("package::varname", FALSE); |
---|
337 | |
---|
338 | This returns NULL if the variable does not exist. |
---|
339 | |
---|
340 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
---|
341 | information on how to use the array access functions on tied arrays. |
---|
342 | |
---|
343 | =head2 Working with HVs |
---|
344 | |
---|
345 | To create an HV, you use the following routine: |
---|
346 | |
---|
347 | HV* newHV(); |
---|
348 | |
---|
349 | Once the HV has been created, the following operations are possible on HVs: |
---|
350 | |
---|
351 | SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); |
---|
352 | SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); |
---|
353 | |
---|
354 | The C<klen> parameter is the length of the key being passed in (Note that |
---|
355 | you cannot pass 0 in as a value of C<klen> to tell Perl to measure the |
---|
356 | length of the key). The C<val> argument contains the SV pointer to the |
---|
357 | scalar being stored, and C<hash> is the precomputed hash value (zero if |
---|
358 | you want C<hv_store> to calculate it for you). The C<lval> parameter |
---|
359 | indicates whether this fetch is actually a part of a store operation, in |
---|
360 | which case a new undefined value will be added to the HV with the supplied |
---|
361 | key and C<hv_fetch> will return as if the value had already existed. |
---|
362 | |
---|
363 | Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just |
---|
364 | C<SV*>. To access the scalar value, you must first dereference the return |
---|
365 | value. However, you should check to make sure that the return value is |
---|
366 | not NULL before dereferencing it. |
---|
367 | |
---|
368 | These two functions check if a hash table entry exists, and deletes it. |
---|
369 | |
---|
370 | bool hv_exists(HV*, const char* key, U32 klen); |
---|
371 | SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); |
---|
372 | |
---|
373 | If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will |
---|
374 | create and return a mortal copy of the deleted value. |
---|
375 | |
---|
376 | And more miscellaneous functions: |
---|
377 | |
---|
378 | void hv_clear(HV*); |
---|
379 | void hv_undef(HV*); |
---|
380 | |
---|
381 | Like their AV counterparts, C<hv_clear> deletes all the entries in the hash |
---|
382 | table but does not actually delete the hash table. The C<hv_undef> deletes |
---|
383 | both the entries and the hash table itself. |
---|
384 | |
---|
385 | Perl keeps the actual data in linked list of structures with a typedef of HE. |
---|
386 | These contain the actual key and value pointers (plus extra administrative |
---|
387 | overhead). The key is a string pointer; the value is an C<SV*>. However, |
---|
388 | once you have an C<HE*>, to get the actual key and value, use the routines |
---|
389 | specified below. |
---|
390 | |
---|
391 | I32 hv_iterinit(HV*); |
---|
392 | /* Prepares starting point to traverse hash table */ |
---|
393 | HE* hv_iternext(HV*); |
---|
394 | /* Get the next entry, and return a pointer to a |
---|
395 | structure that has both the key and value */ |
---|
396 | char* hv_iterkey(HE* entry, I32* retlen); |
---|
397 | /* Get the key from an HE structure and also return |
---|
398 | the length of the key string */ |
---|
399 | SV* hv_iterval(HV*, HE* entry); |
---|
400 | /* Return a SV pointer to the value of the HE |
---|
401 | structure */ |
---|
402 | SV* hv_iternextsv(HV*, char** key, I32* retlen); |
---|
403 | /* This convenience routine combines hv_iternext, |
---|
404 | hv_iterkey, and hv_iterval. The key and retlen |
---|
405 | arguments are return values for the key and its |
---|
406 | length. The value is returned in the SV* argument */ |
---|
407 | |
---|
408 | If you know the name of a hash variable, you can get a pointer to its HV |
---|
409 | by using the following: |
---|
410 | |
---|
411 | HV* get_hv("package::varname", FALSE); |
---|
412 | |
---|
413 | This returns NULL if the variable does not exist. |
---|
414 | |
---|
415 | The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro: |
---|
416 | |
---|
417 | hash = 0; |
---|
418 | while (klen--) |
---|
419 | hash = (hash * 33) + *key++; |
---|
420 | hash = hash + (hash >> 5); /* after 5.6 */ |
---|
421 | |
---|
422 | The last step was added in version 5.6 to improve distribution of |
---|
423 | lower bits in the resulting hash value. |
---|
424 | |
---|
425 | See L<Understanding the Magic of Tied Hashes and Arrays> for more |
---|
426 | information on how to use the hash access functions on tied hashes. |
---|
427 | |
---|
428 | =head2 Hash API Extensions |
---|
429 | |
---|
430 | Beginning with version 5.004, the following functions are also supported: |
---|
431 | |
---|
432 | HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); |
---|
433 | HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); |
---|
434 | |
---|
435 | bool hv_exists_ent (HV* tb, SV* key, U32 hash); |
---|
436 | SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); |
---|
437 | |
---|
438 | SV* hv_iterkeysv (HE* entry); |
---|
439 | |
---|
440 | Note that these functions take C<SV*> keys, which simplifies writing |
---|
441 | of extension code that deals with hash structures. These functions |
---|
442 | also allow passing of C<SV*> keys to C<tie> functions without forcing |
---|
443 | you to stringify the keys (unlike the previous set of functions). |
---|
444 | |
---|
445 | They also return and accept whole hash entries (C<HE*>), making their |
---|
446 | use more efficient (since the hash number for a particular string |
---|
447 | doesn't have to be recomputed every time). See L<perlapi> for detailed |
---|
448 | descriptions. |
---|
449 | |
---|
450 | The following macros must always be used to access the contents of hash |
---|
451 | entries. Note that the arguments to these macros must be simple |
---|
452 | variables, since they may get evaluated more than once. See |
---|
453 | L<perlapi> for detailed descriptions of these macros. |
---|
454 | |
---|
455 | HePV(HE* he, STRLEN len) |
---|
456 | HeVAL(HE* he) |
---|
457 | HeHASH(HE* he) |
---|
458 | HeSVKEY(HE* he) |
---|
459 | HeSVKEY_force(HE* he) |
---|
460 | HeSVKEY_set(HE* he, SV* sv) |
---|
461 | |
---|
462 | These two lower level macros are defined, but must only be used when |
---|
463 | dealing with keys that are not C<SV*>s: |
---|
464 | |
---|
465 | HeKEY(HE* he) |
---|
466 | HeKLEN(HE* he) |
---|
467 | |
---|
468 | Note that both C<hv_store> and C<hv_store_ent> do not increment the |
---|
469 | reference count of the stored C<val>, which is the caller's responsibility. |
---|
470 | If these functions return a NULL value, the caller will usually have to |
---|
471 | decrement the reference count of C<val> to avoid a memory leak. |
---|
472 | |
---|
473 | =head2 References |
---|
474 | |
---|
475 | References are a special type of scalar that point to other data types |
---|
476 | (including references). |
---|
477 | |
---|
478 | To create a reference, use either of the following functions: |
---|
479 | |
---|
480 | SV* newRV_inc((SV*) thing); |
---|
481 | SV* newRV_noinc((SV*) thing); |
---|
482 | |
---|
483 | The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The |
---|
484 | functions are identical except that C<newRV_inc> increments the reference |
---|
485 | count of the C<thing>, while C<newRV_noinc> does not. For historical |
---|
486 | reasons, C<newRV> is a synonym for C<newRV_inc>. |
---|
487 | |
---|
488 | Once you have a reference, you can use the following macro to dereference |
---|
489 | the reference: |
---|
490 | |
---|
491 | SvRV(SV*) |
---|
492 | |
---|
493 | then call the appropriate routines, casting the returned C<SV*> to either an |
---|
494 | C<AV*> or C<HV*>, if required. |
---|
495 | |
---|
496 | To determine if an SV is a reference, you can use the following macro: |
---|
497 | |
---|
498 | SvROK(SV*) |
---|
499 | |
---|
500 | To discover what type of value the reference refers to, use the following |
---|
501 | macro and then check the return value. |
---|
502 | |
---|
503 | SvTYPE(SvRV(SV*)) |
---|
504 | |
---|
505 | The most useful types that will be returned are: |
---|
506 | |
---|
507 | SVt_IV Scalar |
---|
508 | SVt_NV Scalar |
---|
509 | SVt_PV Scalar |
---|
510 | SVt_RV Scalar |
---|
511 | SVt_PVAV Array |
---|
512 | SVt_PVHV Hash |
---|
513 | SVt_PVCV Code |
---|
514 | SVt_PVGV Glob (possible a file handle) |
---|
515 | SVt_PVMG Blessed or Magical Scalar |
---|
516 | |
---|
517 | See the sv.h header file for more details. |
---|
518 | |
---|
519 | =head2 Blessed References and Class Objects |
---|
520 | |
---|
521 | References are also used to support object-oriented programming. In the |
---|
522 | OO lexicon, an object is simply a reference that has been blessed into a |
---|
523 | package (or class). Once blessed, the programmer may now use the reference |
---|
524 | to access the various methods in the class. |
---|
525 | |
---|
526 | A reference can be blessed into a package with the following function: |
---|
527 | |
---|
528 | SV* sv_bless(SV* sv, HV* stash); |
---|
529 | |
---|
530 | The C<sv> argument must be a reference. The C<stash> argument specifies |
---|
531 | which class the reference will belong to. See |
---|
532 | L<Stashes and Globs> for information on converting class names into stashes. |
---|
533 | |
---|
534 | /* Still under construction */ |
---|
535 | |
---|
536 | Upgrades rv to reference if not already one. Creates new SV for rv to |
---|
537 | point to. If C<classname> is non-null, the SV is blessed into the specified |
---|
538 | class. SV is returned. |
---|
539 | |
---|
540 | SV* newSVrv(SV* rv, const char* classname); |
---|
541 | |
---|
542 | Copies integer or double into an SV whose reference is C<rv>. SV is blessed |
---|
543 | if C<classname> is non-null. |
---|
544 | |
---|
545 | SV* sv_setref_iv(SV* rv, const char* classname, IV iv); |
---|
546 | SV* sv_setref_nv(SV* rv, const char* classname, NV iv); |
---|
547 | |
---|
548 | Copies the pointer value (I<the address, not the string!>) into an SV whose |
---|
549 | reference is rv. SV is blessed if C<classname> is non-null. |
---|
550 | |
---|
551 | SV* sv_setref_pv(SV* rv, const char* classname, PV iv); |
---|
552 | |
---|
553 | Copies string into an SV whose reference is C<rv>. Set length to 0 to let |
---|
554 | Perl calculate the string length. SV is blessed if C<classname> is non-null. |
---|
555 | |
---|
556 | SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length); |
---|
557 | |
---|
558 | Tests whether the SV is blessed into the specified class. It does not |
---|
559 | check inheritance relationships. |
---|
560 | |
---|
561 | int sv_isa(SV* sv, const char* name); |
---|
562 | |
---|
563 | Tests whether the SV is a reference to a blessed object. |
---|
564 | |
---|
565 | int sv_isobject(SV* sv); |
---|
566 | |
---|
567 | Tests whether the SV is derived from the specified class. SV can be either |
---|
568 | a reference to a blessed object or a string containing a class name. This |
---|
569 | is the function implementing the C<UNIVERSAL::isa> functionality. |
---|
570 | |
---|
571 | bool sv_derived_from(SV* sv, const char* name); |
---|
572 | |
---|
573 | To check if you've got an object derived from a specific class you have |
---|
574 | to write: |
---|
575 | |
---|
576 | if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } |
---|
577 | |
---|
578 | =head2 Creating New Variables |
---|
579 | |
---|
580 | To create a new Perl variable with an undef value which can be accessed from |
---|
581 | your Perl script, use the following routines, depending on the variable type. |
---|
582 | |
---|
583 | SV* get_sv("package::varname", TRUE); |
---|
584 | AV* get_av("package::varname", TRUE); |
---|
585 | HV* get_hv("package::varname", TRUE); |
---|
586 | |
---|
587 | Notice the use of TRUE as the second parameter. The new variable can now |
---|
588 | be set, using the routines appropriate to the data type. |
---|
589 | |
---|
590 | There are additional macros whose values may be bitwise OR'ed with the |
---|
591 | C<TRUE> argument to enable certain extra features. Those bits are: |
---|
592 | |
---|
593 | GV_ADDMULTI Marks the variable as multiply defined, thus preventing the |
---|
594 | "Name <varname> used only once: possible typo" warning. |
---|
595 | GV_ADDWARN Issues the warning "Had to create <varname> unexpectedly" if |
---|
596 | the variable did not exist before the function was called. |
---|
597 | |
---|
598 | If you do not specify a package name, the variable is created in the current |
---|
599 | package. |
---|
600 | |
---|
601 | =head2 Reference Counts and Mortality |
---|
602 | |
---|
603 | Perl uses an reference count-driven garbage collection mechanism. SVs, |
---|
604 | AVs, or HVs (xV for short in the following) start their life with a |
---|
605 | reference count of 1. If the reference count of an xV ever drops to 0, |
---|
606 | then it will be destroyed and its memory made available for reuse. |
---|
607 | |
---|
608 | This normally doesn't happen at the Perl level unless a variable is |
---|
609 | undef'ed or the last variable holding a reference to it is changed or |
---|
610 | overwritten. At the internal level, however, reference counts can be |
---|
611 | manipulated with the following macros: |
---|
612 | |
---|
613 | int SvREFCNT(SV* sv); |
---|
614 | SV* SvREFCNT_inc(SV* sv); |
---|
615 | void SvREFCNT_dec(SV* sv); |
---|
616 | |
---|
617 | However, there is one other function which manipulates the reference |
---|
618 | count of its argument. The C<newRV_inc> function, you will recall, |
---|
619 | creates a reference to the specified argument. As a side effect, |
---|
620 | it increments the argument's reference count. If this is not what |
---|
621 | you want, use C<newRV_noinc> instead. |
---|
622 | |
---|
623 | For example, imagine you want to return a reference from an XSUB function. |
---|
624 | Inside the XSUB routine, you create an SV which initially has a reference |
---|
625 | count of one. Then you call C<newRV_inc>, passing it the just-created SV. |
---|
626 | This returns the reference as a new SV, but the reference count of the |
---|
627 | SV you passed to C<newRV_inc> has been incremented to two. Now you |
---|
628 | return the reference from the XSUB routine and forget about the SV. |
---|
629 | But Perl hasn't! Whenever the returned reference is destroyed, the |
---|
630 | reference count of the original SV is decreased to one and nothing happens. |
---|
631 | The SV will hang around without any way to access it until Perl itself |
---|
632 | terminates. This is a memory leak. |
---|
633 | |
---|
634 | The correct procedure, then, is to use C<newRV_noinc> instead of |
---|
635 | C<newRV_inc>. Then, if and when the last reference is destroyed, |
---|
636 | the reference count of the SV will go to zero and it will be destroyed, |
---|
637 | stopping any memory leak. |
---|
638 | |
---|
639 | There are some convenience functions available that can help with the |
---|
640 | destruction of xVs. These functions introduce the concept of "mortality". |
---|
641 | An xV that is mortal has had its reference count marked to be decremented, |
---|
642 | but not actually decremented, until "a short time later". Generally the |
---|
643 | term "short time later" means a single Perl statement, such as a call to |
---|
644 | an XSUB function. The actual determinant for when mortal xVs have their |
---|
645 | reference count decremented depends on two macros, SAVETMPS and FREETMPS. |
---|
646 | See L<perlcall> and L<perlxs> for more details on these macros. |
---|
647 | |
---|
648 | "Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>. |
---|
649 | However, if you mortalize a variable twice, the reference count will |
---|
650 | later be decremented twice. |
---|
651 | |
---|
652 | You should be careful about creating mortal variables. Strange things |
---|
653 | can happen if you make the same value mortal within multiple contexts, |
---|
654 | or if you make a variable mortal multiple times. |
---|
655 | |
---|
656 | To create a mortal variable, use the functions: |
---|
657 | |
---|
658 | SV* sv_newmortal() |
---|
659 | SV* sv_2mortal(SV*) |
---|
660 | SV* sv_mortalcopy(SV*) |
---|
661 | |
---|
662 | The first call creates a mortal SV, the second converts an existing |
---|
663 | SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the |
---|
664 | third creates a mortal copy of an existing SV. |
---|
665 | |
---|
666 | The mortal routines are not just for SVs -- AVs and HVs can be |
---|
667 | made mortal by passing their address (type-casted to C<SV*>) to the |
---|
668 | C<sv_2mortal> or C<sv_mortalcopy> routines. |
---|
669 | |
---|
670 | =head2 Stashes and Globs |
---|
671 | |
---|
672 | A "stash" is a hash that contains all of the different objects that |
---|
673 | are contained within a package. Each key of the stash is a symbol |
---|
674 | name (shared by all the different types of objects that have the same |
---|
675 | name), and each value in the hash table is a GV (Glob Value). This GV |
---|
676 | in turn contains references to the various objects of that name, |
---|
677 | including (but not limited to) the following: |
---|
678 | |
---|
679 | Scalar Value |
---|
680 | Array Value |
---|
681 | Hash Value |
---|
682 | I/O Handle |
---|
683 | Format |
---|
684 | Subroutine |
---|
685 | |
---|
686 | There is a single stash called "PL_defstash" that holds the items that exist |
---|
687 | in the "main" package. To get at the items in other packages, append the |
---|
688 | string "::" to the package name. The items in the "Foo" package are in |
---|
689 | the stash "Foo::" in PL_defstash. The items in the "Bar::Baz" package are |
---|
690 | in the stash "Baz::" in "Bar::"'s stash. |
---|
691 | |
---|
692 | To get the stash pointer for a particular package, use the function: |
---|
693 | |
---|
694 | HV* gv_stashpv(const char* name, I32 create) |
---|
695 | HV* gv_stashsv(SV*, I32 create) |
---|
696 | |
---|
697 | The first function takes a literal string, the second uses the string stored |
---|
698 | in the SV. Remember that a stash is just a hash table, so you get back an |
---|
699 | C<HV*>. The C<create> flag will create a new package if it is set. |
---|
700 | |
---|
701 | The name that C<gv_stash*v> wants is the name of the package whose symbol table |
---|
702 | you want. The default package is called C<main>. If you have multiply nested |
---|
703 | packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl |
---|
704 | language itself. |
---|
705 | |
---|
706 | Alternately, if you have an SV that is a blessed reference, you can find |
---|
707 | out the stash pointer by using: |
---|
708 | |
---|
709 | HV* SvSTASH(SvRV(SV*)); |
---|
710 | |
---|
711 | then use the following to get the package name itself: |
---|
712 | |
---|
713 | char* HvNAME(HV* stash); |
---|
714 | |
---|
715 | If you need to bless or re-bless an object you can use the following |
---|
716 | function: |
---|
717 | |
---|
718 | SV* sv_bless(SV*, HV* stash) |
---|
719 | |
---|
720 | where the first argument, an C<SV*>, must be a reference, and the second |
---|
721 | argument is a stash. The returned C<SV*> can now be used in the same way |
---|
722 | as any other SV. |
---|
723 | |
---|
724 | For more information on references and blessings, consult L<perlref>. |
---|
725 | |
---|
726 | =head2 Double-Typed SVs |
---|
727 | |
---|
728 | Scalar variables normally contain only one type of value, an integer, |
---|
729 | double, pointer, or reference. Perl will automatically convert the |
---|
730 | actual scalar data from the stored type into the requested type. |
---|
731 | |
---|
732 | Some scalar variables contain more than one type of scalar data. For |
---|
733 | example, the variable C<$!> contains either the numeric value of C<errno> |
---|
734 | or its string equivalent from either C<strerror> or C<sys_errlist[]>. |
---|
735 | |
---|
736 | To force multiple data values into an SV, you must do two things: use the |
---|
737 | C<sv_set*v> routines to add the additional scalar type, then set a flag |
---|
738 | so that Perl will believe it contains more than one type of data. The |
---|
739 | four macros to set the flags are: |
---|
740 | |
---|
741 | SvIOK_on |
---|
742 | SvNOK_on |
---|
743 | SvPOK_on |
---|
744 | SvROK_on |
---|
745 | |
---|
746 | The particular macro you must use depends on which C<sv_set*v> routine |
---|
747 | you called first. This is because every C<sv_set*v> routine turns on |
---|
748 | only the bit for the particular type of data being set, and turns off |
---|
749 | all the rest. |
---|
750 | |
---|
751 | For example, to create a new Perl variable called "dberror" that contains |
---|
752 | both the numeric and descriptive string error values, you could use the |
---|
753 | following code: |
---|
754 | |
---|
755 | extern int dberror; |
---|
756 | extern char *dberror_list; |
---|
757 | |
---|
758 | SV* sv = get_sv("dberror", TRUE); |
---|
759 | sv_setiv(sv, (IV) dberror); |
---|
760 | sv_setpv(sv, dberror_list[dberror]); |
---|
761 | SvIOK_on(sv); |
---|
762 | |
---|
763 | If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the |
---|
764 | macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. |
---|
765 | |
---|
766 | =head2 Magic Variables |
---|
767 | |
---|
768 | [This section still under construction. Ignore everything here. Post no |
---|
769 | bills. Everything not permitted is forbidden.] |
---|
770 | |
---|
771 | Any SV may be magical, that is, it has special features that a normal |
---|
772 | SV does not have. These features are stored in the SV structure in a |
---|
773 | linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. |
---|
774 | |
---|
775 | struct magic { |
---|
776 | MAGIC* mg_moremagic; |
---|
777 | MGVTBL* mg_virtual; |
---|
778 | U16 mg_private; |
---|
779 | char mg_type; |
---|
780 | U8 mg_flags; |
---|
781 | SV* mg_obj; |
---|
782 | char* mg_ptr; |
---|
783 | I32 mg_len; |
---|
784 | }; |
---|
785 | |
---|
786 | Note this is current as of patchlevel 0, and could change at any time. |
---|
787 | |
---|
788 | =head2 Assigning Magic |
---|
789 | |
---|
790 | Perl adds magic to an SV using the sv_magic function: |
---|
791 | |
---|
792 | void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); |
---|
793 | |
---|
794 | The C<sv> argument is a pointer to the SV that is to acquire a new magical |
---|
795 | feature. |
---|
796 | |
---|
797 | If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to |
---|
798 | set the C<SVt_PVMG> flag for the C<sv>. Perl then continues by adding |
---|
799 | it to the beginning of the linked list of magical features. Any prior |
---|
800 | entry of the same type of magic is deleted. Note that this can be |
---|
801 | overridden, and multiple instances of the same type of magic can be |
---|
802 | associated with an SV. |
---|
803 | |
---|
804 | The C<name> and C<namlen> arguments are used to associate a string with |
---|
805 | the magic, typically the name of a variable. C<namlen> is stored in the |
---|
806 | C<mg_len> field and if C<name> is non-null and C<namlen> >= 0 a malloc'd |
---|
807 | copy of the name is stored in C<mg_ptr> field. |
---|
808 | |
---|
809 | The sv_magic function uses C<how> to determine which, if any, predefined |
---|
810 | "Magic Virtual Table" should be assigned to the C<mg_virtual> field. |
---|
811 | See the "Magic Virtual Table" section below. The C<how> argument is also |
---|
812 | stored in the C<mg_type> field. |
---|
813 | |
---|
814 | The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> |
---|
815 | structure. If it is not the same as the C<sv> argument, the reference |
---|
816 | count of the C<obj> object is incremented. If it is the same, or if |
---|
817 | the C<how> argument is "#", or if it is a NULL pointer, then C<obj> is |
---|
818 | merely stored, without the reference count being incremented. |
---|
819 | |
---|
820 | There is also a function to add magic to an C<HV>: |
---|
821 | |
---|
822 | void hv_magic(HV *hv, GV *gv, int how); |
---|
823 | |
---|
824 | This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. |
---|
825 | |
---|
826 | To remove the magic from an SV, call the function sv_unmagic: |
---|
827 | |
---|
828 | void sv_unmagic(SV *sv, int type); |
---|
829 | |
---|
830 | The C<type> argument should be equal to the C<how> value when the C<SV> |
---|
831 | was initially made magical. |
---|
832 | |
---|
833 | =head2 Magic Virtual Tables |
---|
834 | |
---|
835 | The C<mg_virtual> field in the C<MAGIC> structure is a pointer to a |
---|
836 | C<MGVTBL>, which is a structure of function pointers and stands for |
---|
837 | "Magic Virtual Table" to handle the various operations that might be |
---|
838 | applied to that variable. |
---|
839 | |
---|
840 | The C<MGVTBL> has five pointers to the following routine types: |
---|
841 | |
---|
842 | int (*svt_get)(SV* sv, MAGIC* mg); |
---|
843 | int (*svt_set)(SV* sv, MAGIC* mg); |
---|
844 | U32 (*svt_len)(SV* sv, MAGIC* mg); |
---|
845 | int (*svt_clear)(SV* sv, MAGIC* mg); |
---|
846 | int (*svt_free)(SV* sv, MAGIC* mg); |
---|
847 | |
---|
848 | This MGVTBL structure is set at compile-time in C<perl.h> and there are |
---|
849 | currently 19 types (or 21 with overloading turned on). These different |
---|
850 | structures contain pointers to various routines that perform additional |
---|
851 | actions depending on which function is being called. |
---|
852 | |
---|
853 | Function pointer Action taken |
---|
854 | ---------------- ------------ |
---|
855 | svt_get Do something after the value of the SV is retrieved. |
---|
856 | svt_set Do something after the SV is assigned a value. |
---|
857 | svt_len Report on the SV's length. |
---|
858 | svt_clear Clear something the SV represents. |
---|
859 | svt_free Free any extra storage associated with the SV. |
---|
860 | |
---|
861 | For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds |
---|
862 | to an C<mg_type> of '\0') contains: |
---|
863 | |
---|
864 | { magic_get, magic_set, magic_len, 0, 0 } |
---|
865 | |
---|
866 | Thus, when an SV is determined to be magical and of type '\0', if a get |
---|
867 | operation is being performed, the routine C<magic_get> is called. All |
---|
868 | the various routines for the various magical types begin with C<magic_>. |
---|
869 | NOTE: the magic routines are not considered part of the Perl API, and may |
---|
870 | not be exported by the Perl library. |
---|
871 | |
---|
872 | The current kinds of Magic Virtual Tables are: |
---|
873 | |
---|
874 | mg_type MGVTBL Type of magic |
---|
875 | ------- ------ ---------------------------- |
---|
876 | \0 vtbl_sv Special scalar variable |
---|
877 | A vtbl_amagic %OVERLOAD hash |
---|
878 | a vtbl_amagicelem %OVERLOAD hash element |
---|
879 | c (none) Holds overload table (AMT) on stash |
---|
880 | B vtbl_bm Boyer-Moore (fast string search) |
---|
881 | D vtbl_regdata Regex match position data (@+ and @- vars) |
---|
882 | d vtbl_regdatum Regex match position data element |
---|
883 | E vtbl_env %ENV hash |
---|
884 | e vtbl_envelem %ENV hash element |
---|
885 | f vtbl_fm Formline ('compiled' format) |
---|
886 | g vtbl_mglob m//g target / study()ed string |
---|
887 | I vtbl_isa @ISA array |
---|
888 | i vtbl_isaelem @ISA array element |
---|
889 | k vtbl_nkeys scalar(keys()) lvalue |
---|
890 | L (none) Debugger %_<filename |
---|
891 | l vtbl_dbline Debugger %_<filename element |
---|
892 | o vtbl_collxfrm Locale transformation |
---|
893 | P vtbl_pack Tied array or hash |
---|
894 | p vtbl_packelem Tied array or hash element |
---|
895 | q vtbl_packelem Tied scalar or handle |
---|
896 | S vtbl_sig %SIG hash |
---|
897 | s vtbl_sigelem %SIG hash element |
---|
898 | t vtbl_taint Taintedness |
---|
899 | U vtbl_uvar Available for use by extensions |
---|
900 | v vtbl_vec vec() lvalue |
---|
901 | x vtbl_substr substr() lvalue |
---|
902 | y vtbl_defelem Shadow "foreach" iterator variable / |
---|
903 | smart parameter vivification |
---|
904 | * vtbl_glob GV (typeglob) |
---|
905 | # vtbl_arylen Array length ($#ary) |
---|
906 | . vtbl_pos pos() lvalue |
---|
907 | ~ (none) Available for use by extensions |
---|
908 | |
---|
909 | When an uppercase and lowercase letter both exist in the table, then the |
---|
910 | uppercase letter is used to represent some kind of composite type (a list |
---|
911 | or a hash), and the lowercase letter is used to represent an element of |
---|
912 | that composite type. |
---|
913 | |
---|
914 | The '~' and 'U' magic types are defined specifically for use by |
---|
915 | extensions and will not be used by perl itself. Extensions can use |
---|
916 | '~' magic to 'attach' private information to variables (typically |
---|
917 | objects). This is especially useful because there is no way for |
---|
918 | normal perl code to corrupt this private information (unlike using |
---|
919 | extra elements of a hash object). |
---|
920 | |
---|
921 | Similarly, 'U' magic can be used much like tie() to call a C function |
---|
922 | any time a scalar's value is used or changed. The C<MAGIC>'s |
---|
923 | C<mg_ptr> field points to a C<ufuncs> structure: |
---|
924 | |
---|
925 | struct ufuncs { |
---|
926 | I32 (*uf_val)(IV, SV*); |
---|
927 | I32 (*uf_set)(IV, SV*); |
---|
928 | IV uf_index; |
---|
929 | }; |
---|
930 | |
---|
931 | When the SV is read from or written to, the C<uf_val> or C<uf_set> |
---|
932 | function will be called with C<uf_index> as the first arg and a |
---|
933 | pointer to the SV as the second. A simple example of how to add 'U' |
---|
934 | magic is shown below. Note that the ufuncs structure is copied by |
---|
935 | sv_magic, so you can safely allocate it on the stack. |
---|
936 | |
---|
937 | void |
---|
938 | Umagic(sv) |
---|
939 | SV *sv; |
---|
940 | PREINIT: |
---|
941 | struct ufuncs uf; |
---|
942 | CODE: |
---|
943 | uf.uf_val = &my_get_fn; |
---|
944 | uf.uf_set = &my_set_fn; |
---|
945 | uf.uf_index = 0; |
---|
946 | sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf)); |
---|
947 | |
---|
948 | Note that because multiple extensions may be using '~' or 'U' magic, |
---|
949 | it is important for extensions to take extra care to avoid conflict. |
---|
950 | Typically only using the magic on objects blessed into the same class |
---|
951 | as the extension is sufficient. For '~' magic, it may also be |
---|
952 | appropriate to add an I32 'signature' at the top of the private data |
---|
953 | area and check that. |
---|
954 | |
---|
955 | Also note that the C<sv_set*()> and C<sv_cat*()> functions described |
---|
956 | earlier do B<not> invoke 'set' magic on their targets. This must |
---|
957 | be done by the user either by calling the C<SvSETMAGIC()> macro after |
---|
958 | calling these functions, or by using one of the C<sv_set*_mg()> or |
---|
959 | C<sv_cat*_mg()> functions. Similarly, generic C code must call the |
---|
960 | C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV |
---|
961 | obtained from external sources in functions that don't handle magic. |
---|
962 | See L<perlapi> for a description of these functions. |
---|
963 | For example, calls to the C<sv_cat*()> functions typically need to be |
---|
964 | followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> |
---|
965 | since their implementation handles 'get' magic. |
---|
966 | |
---|
967 | =head2 Finding Magic |
---|
968 | |
---|
969 | MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */ |
---|
970 | |
---|
971 | This routine returns a pointer to the C<MAGIC> structure stored in the SV. |
---|
972 | If the SV does not have that magical feature, C<NULL> is returned. Also, |
---|
973 | if the SV is not of type SVt_PVMG, Perl may core dump. |
---|
974 | |
---|
975 | int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); |
---|
976 | |
---|
977 | This routine checks to see what types of magic C<sv> has. If the mg_type |
---|
978 | field is an uppercase letter, then the mg_obj is copied to C<nsv>, but |
---|
979 | the mg_type field is changed to be the lowercase letter. |
---|
980 | |
---|
981 | =head2 Understanding the Magic of Tied Hashes and Arrays |
---|
982 | |
---|
983 | Tied hashes and arrays are magical beasts of the 'P' magic type. |
---|
984 | |
---|
985 | WARNING: As of the 5.004 release, proper usage of the array and hash |
---|
986 | access functions requires understanding a few caveats. Some |
---|
987 | of these caveats are actually considered bugs in the API, to be fixed |
---|
988 | in later releases, and are bracketed with [MAYCHANGE] below. If |
---|
989 | you find yourself actually applying such information in this section, be |
---|
990 | aware that the behavior may change in the future, umm, without warning. |
---|
991 | |
---|
992 | The perl tie function associates a variable with an object that implements |
---|
993 | the various GET, SET etc methods. To perform the equivalent of the perl |
---|
994 | tie function from an XSUB, you must mimic this behaviour. The code below |
---|
995 | carries out the necessary steps - firstly it creates a new hash, and then |
---|
996 | creates a second hash which it blesses into the class which will implement |
---|
997 | the tie methods. Lastly it ties the two hashes together, and returns a |
---|
998 | reference to the new tied hash. Note that the code below does NOT call the |
---|
999 | TIEHASH method in the MyTie class - |
---|
1000 | see L<Calling Perl Routines from within C Programs> for details on how |
---|
1001 | to do this. |
---|
1002 | |
---|
1003 | SV* |
---|
1004 | mytie() |
---|
1005 | PREINIT: |
---|
1006 | HV *hash; |
---|
1007 | HV *stash; |
---|
1008 | SV *tie; |
---|
1009 | CODE: |
---|
1010 | hash = newHV(); |
---|
1011 | tie = newRV_noinc((SV*)newHV()); |
---|
1012 | stash = gv_stashpv("MyTie", TRUE); |
---|
1013 | sv_bless(tie, stash); |
---|
1014 | hv_magic(hash, tie, 'P'); |
---|
1015 | RETVAL = newRV_noinc(hash); |
---|
1016 | OUTPUT: |
---|
1017 | RETVAL |
---|
1018 | |
---|
1019 | The C<av_store> function, when given a tied array argument, merely |
---|
1020 | copies the magic of the array onto the value to be "stored", using |
---|
1021 | C<mg_copy>. It may also return NULL, indicating that the value did not |
---|
1022 | actually need to be stored in the array. [MAYCHANGE] After a call to |
---|
1023 | C<av_store> on a tied array, the caller will usually need to call |
---|
1024 | C<mg_set(val)> to actually invoke the perl level "STORE" method on the |
---|
1025 | TIEARRAY object. If C<av_store> did return NULL, a call to |
---|
1026 | C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory |
---|
1027 | leak. [/MAYCHANGE] |
---|
1028 | |
---|
1029 | The previous paragraph is applicable verbatim to tied hash access using the |
---|
1030 | C<hv_store> and C<hv_store_ent> functions as well. |
---|
1031 | |
---|
1032 | C<av_fetch> and the corresponding hash functions C<hv_fetch> and |
---|
1033 | C<hv_fetch_ent> actually return an undefined mortal value whose magic |
---|
1034 | has been initialized using C<mg_copy>. Note the value so returned does not |
---|
1035 | need to be deallocated, as it is already mortal. [MAYCHANGE] But you will |
---|
1036 | need to call C<mg_get()> on the returned value in order to actually invoke |
---|
1037 | the perl level "FETCH" method on the underlying TIE object. Similarly, |
---|
1038 | you may also call C<mg_set()> on the return value after possibly assigning |
---|
1039 | a suitable value to it using C<sv_setsv>, which will invoke the "STORE" |
---|
1040 | method on the TIE object. [/MAYCHANGE] |
---|
1041 | |
---|
1042 | [MAYCHANGE] |
---|
1043 | In other words, the array or hash fetch/store functions don't really |
---|
1044 | fetch and store actual values in the case of tied arrays and hashes. They |
---|
1045 | merely call C<mg_copy> to attach magic to the values that were meant to be |
---|
1046 | "stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually |
---|
1047 | do the job of invoking the TIE methods on the underlying objects. Thus |
---|
1048 | the magic mechanism currently implements a kind of lazy access to arrays |
---|
1049 | and hashes. |
---|
1050 | |
---|
1051 | Currently (as of perl version 5.004), use of the hash and array access |
---|
1052 | functions requires the user to be aware of whether they are operating on |
---|
1053 | "normal" hashes and arrays, or on their tied variants. The API may be |
---|
1054 | changed to provide more transparent access to both tied and normal data |
---|
1055 | types in future versions. |
---|
1056 | [/MAYCHANGE] |
---|
1057 | |
---|
1058 | You would do well to understand that the TIEARRAY and TIEHASH interfaces |
---|
1059 | are mere sugar to invoke some perl method calls while using the uniform hash |
---|
1060 | and array syntax. The use of this sugar imposes some overhead (typically |
---|
1061 | about two to four extra opcodes per FETCH/STORE operation, in addition to |
---|
1062 | the creation of all the mortal variables required to invoke the methods). |
---|
1063 | This overhead will be comparatively small if the TIE methods are themselves |
---|
1064 | substantial, but if they are only a few statements long, the overhead |
---|
1065 | will not be insignificant. |
---|
1066 | |
---|
1067 | =head2 Localizing changes |
---|
1068 | |
---|
1069 | Perl has a very handy construction |
---|
1070 | |
---|
1071 | { |
---|
1072 | local $var = 2; |
---|
1073 | ... |
---|
1074 | } |
---|
1075 | |
---|
1076 | This construction is I<approximately> equivalent to |
---|
1077 | |
---|
1078 | { |
---|
1079 | my $oldvar = $var; |
---|
1080 | $var = 2; |
---|
1081 | ... |
---|
1082 | $var = $oldvar; |
---|
1083 | } |
---|
1084 | |
---|
1085 | The biggest difference is that the first construction would |
---|
1086 | reinstate the initial value of $var, irrespective of how control exits |
---|
1087 | the block: C<goto>, C<return>, C<die>/C<eval> etc. It is a little bit |
---|
1088 | more efficient as well. |
---|
1089 | |
---|
1090 | There is a way to achieve a similar task from C via Perl API: create a |
---|
1091 | I<pseudo-block>, and arrange for some changes to be automatically |
---|
1092 | undone at the end of it, either explicit, or via a non-local exit (via |
---|
1093 | die()). A I<block>-like construct is created by a pair of |
---|
1094 | C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). |
---|
1095 | Such a construct may be created specially for some important localized |
---|
1096 | task, or an existing one (like boundaries of enclosing Perl |
---|
1097 | subroutine/block, or an existing pair for freeing TMPs) may be |
---|
1098 | used. (In the second case the overhead of additional localization must |
---|
1099 | be almost negligible.) Note that any XSUB is automatically enclosed in |
---|
1100 | an C<ENTER>/C<LEAVE> pair. |
---|
1101 | |
---|
1102 | Inside such a I<pseudo-block> the following service is available: |
---|
1103 | |
---|
1104 | =over 4 |
---|
1105 | |
---|
1106 | =item C<SAVEINT(int i)> |
---|
1107 | |
---|
1108 | =item C<SAVEIV(IV i)> |
---|
1109 | |
---|
1110 | =item C<SAVEI32(I32 i)> |
---|
1111 | |
---|
1112 | =item C<SAVELONG(long i)> |
---|
1113 | |
---|
1114 | These macros arrange things to restore the value of integer variable |
---|
1115 | C<i> at the end of enclosing I<pseudo-block>. |
---|
1116 | |
---|
1117 | =item C<SAVESPTR(s)> |
---|
1118 | |
---|
1119 | =item C<SAVEPPTR(p)> |
---|
1120 | |
---|
1121 | These macros arrange things to restore the value of pointers C<s> and |
---|
1122 | C<p>. C<s> must be a pointer of a type which survives conversion to |
---|
1123 | C<SV*> and back, C<p> should be able to survive conversion to C<char*> |
---|
1124 | and back. |
---|
1125 | |
---|
1126 | =item C<SAVEFREESV(SV *sv)> |
---|
1127 | |
---|
1128 | The refcount of C<sv> would be decremented at the end of |
---|
1129 | I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a |
---|
1130 | mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> |
---|
1131 | extends the lifetime of C<sv> until the beginning of the next statement, |
---|
1132 | C<SAVEFREESV> extends it until the end of the enclosing scope. These |
---|
1133 | lifetimes can be wildly different. |
---|
1134 | |
---|
1135 | Also compare C<SAVEMORTALIZESV>. |
---|
1136 | |
---|
1137 | =item C<SAVEMORTALIZESV(SV *sv)> |
---|
1138 | |
---|
1139 | Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current |
---|
1140 | scope instead of decrementing its reference count. This usually has the |
---|
1141 | effect of keeping C<sv> alive until the statement that called the currently |
---|
1142 | live scope has finished executing. |
---|
1143 | |
---|
1144 | =item C<SAVEFREEOP(OP *op)> |
---|
1145 | |
---|
1146 | The C<OP *> is op_free()ed at the end of I<pseudo-block>. |
---|
1147 | |
---|
1148 | =item C<SAVEFREEPV(p)> |
---|
1149 | |
---|
1150 | The chunk of memory which is pointed to by C<p> is Safefree()ed at the |
---|
1151 | end of I<pseudo-block>. |
---|
1152 | |
---|
1153 | =item C<SAVECLEARSV(SV *sv)> |
---|
1154 | |
---|
1155 | Clears a slot in the current scratchpad which corresponds to C<sv> at |
---|
1156 | the end of I<pseudo-block>. |
---|
1157 | |
---|
1158 | =item C<SAVEDELETE(HV *hv, char *key, I32 length)> |
---|
1159 | |
---|
1160 | The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The |
---|
1161 | string pointed to by C<key> is Safefree()ed. If one has a I<key> in |
---|
1162 | short-lived storage, the corresponding string may be reallocated like |
---|
1163 | this: |
---|
1164 | |
---|
1165 | SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); |
---|
1166 | |
---|
1167 | =item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> |
---|
1168 | |
---|
1169 | At the end of I<pseudo-block> the function C<f> is called with the |
---|
1170 | only argument C<p>. |
---|
1171 | |
---|
1172 | =item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> |
---|
1173 | |
---|
1174 | At the end of I<pseudo-block> the function C<f> is called with the |
---|
1175 | implicit context argument (if any), and C<p>. |
---|
1176 | |
---|
1177 | =item C<SAVESTACK_POS()> |
---|
1178 | |
---|
1179 | The current offset on the Perl internal stack (cf. C<SP>) is restored |
---|
1180 | at the end of I<pseudo-block>. |
---|
1181 | |
---|
1182 | =back |
---|
1183 | |
---|
1184 | The following API list contains functions, thus one needs to |
---|
1185 | provide pointers to the modifiable data explicitly (either C pointers, |
---|
1186 | or Perlish C<GV *>s). Where the above macros take C<int>, a similar |
---|
1187 | function takes C<int *>. |
---|
1188 | |
---|
1189 | =over 4 |
---|
1190 | |
---|
1191 | =item C<SV* save_scalar(GV *gv)> |
---|
1192 | |
---|
1193 | Equivalent to Perl code C<local $gv>. |
---|
1194 | |
---|
1195 | =item C<AV* save_ary(GV *gv)> |
---|
1196 | |
---|
1197 | =item C<HV* save_hash(GV *gv)> |
---|
1198 | |
---|
1199 | Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. |
---|
1200 | |
---|
1201 | =item C<void save_item(SV *item)> |
---|
1202 | |
---|
1203 | Duplicates the current value of C<SV>, on the exit from the current |
---|
1204 | C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV> |
---|
1205 | using the stored value. |
---|
1206 | |
---|
1207 | =item C<void save_list(SV **sarg, I32 maxsarg)> |
---|
1208 | |
---|
1209 | A variant of C<save_item> which takes multiple arguments via an array |
---|
1210 | C<sarg> of C<SV*> of length C<maxsarg>. |
---|
1211 | |
---|
1212 | =item C<SV* save_svref(SV **sptr)> |
---|
1213 | |
---|
1214 | Similar to C<save_scalar>, but will reinstate a C<SV *>. |
---|
1215 | |
---|
1216 | =item C<void save_aptr(AV **aptr)> |
---|
1217 | |
---|
1218 | =item C<void save_hptr(HV **hptr)> |
---|
1219 | |
---|
1220 | Similar to C<save_svref>, but localize C<AV *> and C<HV *>. |
---|
1221 | |
---|
1222 | =back |
---|
1223 | |
---|
1224 | The C<Alias> module implements localization of the basic types within the |
---|
1225 | I<caller's scope>. People who are interested in how to localize things in |
---|
1226 | the containing scope should take a look there too. |
---|
1227 | |
---|
1228 | =head1 Subroutines |
---|
1229 | |
---|
1230 | =head2 XSUBs and the Argument Stack |
---|
1231 | |
---|
1232 | The XSUB mechanism is a simple way for Perl programs to access C subroutines. |
---|
1233 | An XSUB routine will have a stack that contains the arguments from the Perl |
---|
1234 | program, and a way to map from the Perl data structures to a C equivalent. |
---|
1235 | |
---|
1236 | The stack arguments are accessible through the C<ST(n)> macro, which returns |
---|
1237 | the C<n>'th stack argument. Argument 0 is the first argument passed in the |
---|
1238 | Perl subroutine call. These arguments are C<SV*>, and can be used anywhere |
---|
1239 | an C<SV*> is used. |
---|
1240 | |
---|
1241 | Most of the time, output from the C routine can be handled through use of |
---|
1242 | the RETVAL and OUTPUT directives. However, there are some cases where the |
---|
1243 | argument stack is not already long enough to handle all the return values. |
---|
1244 | An example is the POSIX tzname() call, which takes no arguments, but returns |
---|
1245 | two, the local time zone's standard and summer time abbreviations. |
---|
1246 | |
---|
1247 | To handle this situation, the PPCODE directive is used and the stack is |
---|
1248 | extended using the macro: |
---|
1249 | |
---|
1250 | EXTEND(SP, num); |
---|
1251 | |
---|
1252 | where C<SP> is the macro that represents the local copy of the stack pointer, |
---|
1253 | and C<num> is the number of elements the stack should be extended by. |
---|
1254 | |
---|
1255 | Now that there is room on the stack, values can be pushed on it using the |
---|
1256 | macros to push IVs, doubles, strings, and SV pointers respectively: |
---|
1257 | |
---|
1258 | PUSHi(IV) |
---|
1259 | PUSHn(double) |
---|
1260 | PUSHp(char*, I32) |
---|
1261 | PUSHs(SV*) |
---|
1262 | |
---|
1263 | And now the Perl program calling C<tzname>, the two values will be assigned |
---|
1264 | as in: |
---|
1265 | |
---|
1266 | ($standard_abbrev, $summer_abbrev) = POSIX::tzname; |
---|
1267 | |
---|
1268 | An alternate (and possibly simpler) method to pushing values on the stack is |
---|
1269 | to use the macros: |
---|
1270 | |
---|
1271 | XPUSHi(IV) |
---|
1272 | XPUSHn(double) |
---|
1273 | XPUSHp(char*, I32) |
---|
1274 | XPUSHs(SV*) |
---|
1275 | |
---|
1276 | These macros automatically adjust the stack for you, if needed. Thus, you |
---|
1277 | do not need to call C<EXTEND> to extend the stack. |
---|
1278 | However, see L</Putting a C value on Perl stack> |
---|
1279 | |
---|
1280 | For more information, consult L<perlxs> and L<perlxstut>. |
---|
1281 | |
---|
1282 | =head2 Calling Perl Routines from within C Programs |
---|
1283 | |
---|
1284 | There are four routines that can be used to call a Perl subroutine from |
---|
1285 | within a C program. These four are: |
---|
1286 | |
---|
1287 | I32 call_sv(SV*, I32); |
---|
1288 | I32 call_pv(const char*, I32); |
---|
1289 | I32 call_method(const char*, I32); |
---|
1290 | I32 call_argv(const char*, I32, register char**); |
---|
1291 | |
---|
1292 | The routine most often used is C<call_sv>. The C<SV*> argument |
---|
1293 | contains either the name of the Perl subroutine to be called, or a |
---|
1294 | reference to the subroutine. The second argument consists of flags |
---|
1295 | that control the context in which the subroutine is called, whether |
---|
1296 | or not the subroutine is being passed arguments, how errors should be |
---|
1297 | trapped, and how to treat return values. |
---|
1298 | |
---|
1299 | All four routines return the number of arguments that the subroutine returned |
---|
1300 | on the Perl stack. |
---|
1301 | |
---|
1302 | These routines used to be called C<perl_call_sv> etc., before Perl v5.6.0, |
---|
1303 | but those names are now deprecated; macros of the same name are provided for |
---|
1304 | compatibility. |
---|
1305 | |
---|
1306 | When using any of these routines (except C<call_argv>), the programmer |
---|
1307 | must manipulate the Perl stack. These include the following macros and |
---|
1308 | functions: |
---|
1309 | |
---|
1310 | dSP |
---|
1311 | SP |
---|
1312 | PUSHMARK() |
---|
1313 | PUTBACK |
---|
1314 | SPAGAIN |
---|
1315 | ENTER |
---|
1316 | SAVETMPS |
---|
1317 | FREETMPS |
---|
1318 | LEAVE |
---|
1319 | XPUSH*() |
---|
1320 | POP*() |
---|
1321 | |
---|
1322 | For a detailed description of calling conventions from C to Perl, |
---|
1323 | consult L<perlcall>. |
---|
1324 | |
---|
1325 | =head2 Memory Allocation |
---|
1326 | |
---|
1327 | All memory meant to be used with the Perl API functions should be manipulated |
---|
1328 | using the macros described in this section. The macros provide the necessary |
---|
1329 | transparency between differences in the actual malloc implementation that is |
---|
1330 | used within perl. |
---|
1331 | |
---|
1332 | It is suggested that you enable the version of malloc that is distributed |
---|
1333 | with Perl. It keeps pools of various sizes of unallocated memory in |
---|
1334 | order to satisfy allocation requests more quickly. However, on some |
---|
1335 | platforms, it may cause spurious malloc or free errors. |
---|
1336 | |
---|
1337 | New(x, pointer, number, type); |
---|
1338 | Newc(x, pointer, number, type, cast); |
---|
1339 | Newz(x, pointer, number, type); |
---|
1340 | |
---|
1341 | These three macros are used to initially allocate memory. |
---|
1342 | |
---|
1343 | The first argument C<x> was a "magic cookie" that was used to keep track |
---|
1344 | of who called the macro, to help when debugging memory problems. However, |
---|
1345 | the current code makes no use of this feature (most Perl developers now |
---|
1346 | use run-time memory checkers), so this argument can be any number. |
---|
1347 | |
---|
1348 | The second argument C<pointer> should be the name of a variable that will |
---|
1349 | point to the newly allocated memory. |
---|
1350 | |
---|
1351 | The third and fourth arguments C<number> and C<type> specify how many of |
---|
1352 | the specified type of data structure should be allocated. The argument |
---|
1353 | C<type> is passed to C<sizeof>. The final argument to C<Newc>, C<cast>, |
---|
1354 | should be used if the C<pointer> argument is different from the C<type> |
---|
1355 | argument. |
---|
1356 | |
---|
1357 | Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero> |
---|
1358 | to zero out all the newly allocated memory. |
---|
1359 | |
---|
1360 | Renew(pointer, number, type); |
---|
1361 | Renewc(pointer, number, type, cast); |
---|
1362 | Safefree(pointer) |
---|
1363 | |
---|
1364 | These three macros are used to change a memory buffer size or to free a |
---|
1365 | piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> |
---|
1366 | match those of C<New> and C<Newc> with the exception of not needing the |
---|
1367 | "magic cookie" argument. |
---|
1368 | |
---|
1369 | Move(source, dest, number, type); |
---|
1370 | Copy(source, dest, number, type); |
---|
1371 | Zero(dest, number, type); |
---|
1372 | |
---|
1373 | These three macros are used to move, copy, or zero out previously allocated |
---|
1374 | memory. The C<source> and C<dest> arguments point to the source and |
---|
1375 | destination starting points. Perl will move, copy, or zero out C<number> |
---|
1376 | instances of the size of the C<type> data structure (using the C<sizeof> |
---|
1377 | function). |
---|
1378 | |
---|
1379 | =head2 PerlIO |
---|
1380 | |
---|
1381 | The most recent development releases of Perl has been experimenting with |
---|
1382 | removing Perl's dependency on the "normal" standard I/O suite and allowing |
---|
1383 | other stdio implementations to be used. This involves creating a new |
---|
1384 | abstraction layer that then calls whichever implementation of stdio Perl |
---|
1385 | was compiled with. All XSUBs should now use the functions in the PerlIO |
---|
1386 | abstraction layer and not make any assumptions about what kind of stdio |
---|
1387 | is being used. |
---|
1388 | |
---|
1389 | For a complete description of the PerlIO abstraction, consult L<perlapio>. |
---|
1390 | |
---|
1391 | =head2 Putting a C value on Perl stack |
---|
1392 | |
---|
1393 | A lot of opcodes (this is an elementary operation in the internal perl |
---|
1394 | stack machine) put an SV* on the stack. However, as an optimization |
---|
1395 | the corresponding SV is (usually) not recreated each time. The opcodes |
---|
1396 | reuse specially assigned SVs (I<target>s) which are (as a corollary) |
---|
1397 | not constantly freed/created. |
---|
1398 | |
---|
1399 | Each of the targets is created only once (but see |
---|
1400 | L<Scratchpads and recursion> below), and when an opcode needs to put |
---|
1401 | an integer, a double, or a string on stack, it just sets the |
---|
1402 | corresponding parts of its I<target> and puts the I<target> on stack. |
---|
1403 | |
---|
1404 | The macro to put this target on stack is C<PUSHTARG>, and it is |
---|
1405 | directly used in some opcodes, as well as indirectly in zillions of |
---|
1406 | others, which use it via C<(X)PUSH[pni]>. |
---|
1407 | |
---|
1408 | Because the target is reused, you must be careful when pushing multiple |
---|
1409 | values on the stack. The following code will not do what you think: |
---|
1410 | |
---|
1411 | XPUSHi(10); |
---|
1412 | XPUSHi(20); |
---|
1413 | |
---|
1414 | This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto |
---|
1415 | the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". |
---|
1416 | At the end of the operation, the stack does not contain the values 10 |
---|
1417 | and 20, but actually contains two pointers to C<TARG>, which we have set |
---|
1418 | to 20. If you need to push multiple different values, use C<XPUSHs>, |
---|
1419 | which bypasses C<TARG>. |
---|
1420 | |
---|
1421 | On a related note, if you do use C<(X)PUSH[npi]>, then you're going to |
---|
1422 | need a C<dTARG> in your variable declarations so that the C<*PUSH*> |
---|
1423 | macros can make use of the local variable C<TARG>. |
---|
1424 | |
---|
1425 | =head2 Scratchpads |
---|
1426 | |
---|
1427 | The question remains on when the SVs which are I<target>s for opcodes |
---|
1428 | are created. The answer is that they are created when the current unit -- |
---|
1429 | a subroutine or a file (for opcodes for statements outside of |
---|
1430 | subroutines) -- is compiled. During this time a special anonymous Perl |
---|
1431 | array is created, which is called a scratchpad for the current |
---|
1432 | unit. |
---|
1433 | |
---|
1434 | A scratchpad keeps SVs which are lexicals for the current unit and are |
---|
1435 | targets for opcodes. One can deduce that an SV lives on a scratchpad |
---|
1436 | by looking on its flags: lexicals have C<SVs_PADMY> set, and |
---|
1437 | I<target>s have C<SVs_PADTMP> set. |
---|
1438 | |
---|
1439 | The correspondence between OPs and I<target>s is not 1-to-1. Different |
---|
1440 | OPs in the compile tree of the unit can use the same target, if this |
---|
1441 | would not conflict with the expected life of the temporary. |
---|
1442 | |
---|
1443 | =head2 Scratchpads and recursion |
---|
1444 | |
---|
1445 | In fact it is not 100% true that a compiled unit contains a pointer to |
---|
1446 | the scratchpad AV. In fact it contains a pointer to an AV of |
---|
1447 | (initially) one element, and this element is the scratchpad AV. Why do |
---|
1448 | we need an extra level of indirection? |
---|
1449 | |
---|
1450 | The answer is B<recursion>, and maybe (sometime soon) B<threads>. Both |
---|
1451 | these can create several execution pointers going into the same |
---|
1452 | subroutine. For the subroutine-child not write over the temporaries |
---|
1453 | for the subroutine-parent (lifespan of which covers the call to the |
---|
1454 | child), the parent and the child should have different |
---|
1455 | scratchpads. (I<And> the lexicals should be separate anyway!) |
---|
1456 | |
---|
1457 | So each subroutine is born with an array of scratchpads (of length 1). |
---|
1458 | On each entry to the subroutine it is checked that the current |
---|
1459 | depth of the recursion is not more than the length of this array, and |
---|
1460 | if it is, new scratchpad is created and pushed into the array. |
---|
1461 | |
---|
1462 | The I<target>s on this scratchpad are C<undef>s, but they are already |
---|
1463 | marked with correct flags. |
---|
1464 | |
---|
1465 | =head1 Compiled code |
---|
1466 | |
---|
1467 | =head2 Code tree |
---|
1468 | |
---|
1469 | Here we describe the internal form your code is converted to by |
---|
1470 | Perl. Start with a simple example: |
---|
1471 | |
---|
1472 | $a = $b + $c; |
---|
1473 | |
---|
1474 | This is converted to a tree similar to this one: |
---|
1475 | |
---|
1476 | assign-to |
---|
1477 | / \ |
---|
1478 | + $a |
---|
1479 | / \ |
---|
1480 | $b $c |
---|
1481 | |
---|
1482 | (but slightly more complicated). This tree reflects the way Perl |
---|
1483 | parsed your code, but has nothing to do with the execution order. |
---|
1484 | There is an additional "thread" going through the nodes of the tree |
---|
1485 | which shows the order of execution of the nodes. In our simplified |
---|
1486 | example above it looks like: |
---|
1487 | |
---|
1488 | $b ---> $c ---> + ---> $a ---> assign-to |
---|
1489 | |
---|
1490 | But with the actual compile tree for C<$a = $b + $c> it is different: |
---|
1491 | some nodes I<optimized away>. As a corollary, though the actual tree |
---|
1492 | contains more nodes than our simplified example, the execution order |
---|
1493 | is the same as in our example. |
---|
1494 | |
---|
1495 | =head2 Examining the tree |
---|
1496 | |
---|
1497 | If you have your perl compiled for debugging (usually done with C<-D |
---|
1498 | optimize=-g> on C<Configure> command line), you may examine the |
---|
1499 | compiled tree by specifying C<-Dx> on the Perl command line. The |
---|
1500 | output takes several lines per node, and for C<$b+$c> it looks like |
---|
1501 | this: |
---|
1502 | |
---|
1503 | 5 TYPE = add ===> 6 |
---|
1504 | TARG = 1 |
---|
1505 | FLAGS = (SCALAR,KIDS) |
---|
1506 | { |
---|
1507 | TYPE = null ===> (4) |
---|
1508 | (was rv2sv) |
---|
1509 | FLAGS = (SCALAR,KIDS) |
---|
1510 | { |
---|
1511 | 3 TYPE = gvsv ===> 4 |
---|
1512 | FLAGS = (SCALAR) |
---|
1513 | GV = main::b |
---|
1514 | } |
---|
1515 | } |
---|
1516 | { |
---|
1517 | TYPE = null ===> (5) |
---|
1518 | (was rv2sv) |
---|
1519 | FLAGS = (SCALAR,KIDS) |
---|
1520 | { |
---|
1521 | 4 TYPE = gvsv ===> 5 |
---|
1522 | FLAGS = (SCALAR) |
---|
1523 | GV = main::c |
---|
1524 | } |
---|
1525 | } |
---|
1526 | |
---|
1527 | This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are |
---|
1528 | not optimized away (one per number in the left column). The immediate |
---|
1529 | children of the given node correspond to C<{}> pairs on the same level |
---|
1530 | of indentation, thus this listing corresponds to the tree: |
---|
1531 | |
---|
1532 | add |
---|
1533 | / \ |
---|
1534 | null null |
---|
1535 | | | |
---|
1536 | gvsv gvsv |
---|
1537 | |
---|
1538 | The execution order is indicated by C<===E<gt>> marks, thus it is C<3 |
---|
1539 | 4 5 6> (node C<6> is not included into above listing), i.e., |
---|
1540 | C<gvsv gvsv add whatever>. |
---|
1541 | |
---|
1542 | Each of these nodes represents an op, a fundamental operation inside the |
---|
1543 | Perl core. The code which implements each operation can be found in the |
---|
1544 | F<pp*.c> files; the function which implements the op with type C<gvsv> |
---|
1545 | is C<pp_gvsv>, and so on. As the tree above shows, different ops have |
---|
1546 | different numbers of children: C<add> is a binary operator, as one would |
---|
1547 | expect, and so has two children. To accommodate the various different |
---|
1548 | numbers of children, there are various types of op data structure, and |
---|
1549 | they link together in different ways. |
---|
1550 | |
---|
1551 | The simplest type of op structure is C<OP>: this has no children. Unary |
---|
1552 | operators, C<UNOP>s, have one child, and this is pointed to by the |
---|
1553 | C<op_first> field. Binary operators (C<BINOP>s) have not only an |
---|
1554 | C<op_first> field but also an C<op_last> field. The most complex type of |
---|
1555 | op is a C<LISTOP>, which has any number of children. In this case, the |
---|
1556 | first child is pointed to by C<op_first> and the last child by |
---|
1557 | C<op_last>. The children in between can be found by iteratively |
---|
1558 | following the C<op_sibling> pointer from the first child to the last. |
---|
1559 | |
---|
1560 | There are also two other op types: a C<PMOP> holds a regular expression, |
---|
1561 | and has no children, and a C<LOOP> may or may not have children. If the |
---|
1562 | C<op_children> field is non-zero, it behaves like a C<LISTOP>. To |
---|
1563 | complicate matters, if a C<UNOP> is actually a C<null> op after |
---|
1564 | optimization (see L</Compile pass 2: context propagation>) it will still |
---|
1565 | have children in accordance with its former type. |
---|
1566 | |
---|
1567 | =head2 Compile pass 1: check routines |
---|
1568 | |
---|
1569 | The tree is created by the compiler while I<yacc> code feeds it |
---|
1570 | the constructions it recognizes. Since I<yacc> works bottom-up, so does |
---|
1571 | the first pass of perl compilation. |
---|
1572 | |
---|
1573 | What makes this pass interesting for perl developers is that some |
---|
1574 | optimization may be performed on this pass. This is optimization by |
---|
1575 | so-called "check routines". The correspondence between node names |
---|
1576 | and corresponding check routines is described in F<opcode.pl> (do not |
---|
1577 | forget to run C<make regen_headers> if you modify this file). |
---|
1578 | |
---|
1579 | A check routine is called when the node is fully constructed except |
---|
1580 | for the execution-order thread. Since at this time there are no |
---|
1581 | back-links to the currently constructed node, one can do most any |
---|
1582 | operation to the top-level node, including freeing it and/or creating |
---|
1583 | new nodes above/below it. |
---|
1584 | |
---|
1585 | The check routine returns the node which should be inserted into the |
---|
1586 | tree (if the top-level node was not modified, check routine returns |
---|
1587 | its argument). |
---|
1588 | |
---|
1589 | By convention, check routines have names C<ck_*>. They are usually |
---|
1590 | called from C<new*OP> subroutines (or C<convert>) (which in turn are |
---|
1591 | called from F<perly.y>). |
---|
1592 | |
---|
1593 | =head2 Compile pass 1a: constant folding |
---|
1594 | |
---|
1595 | Immediately after the check routine is called the returned node is |
---|
1596 | checked for being compile-time executable. If it is (the value is |
---|
1597 | judged to be constant) it is immediately executed, and a I<constant> |
---|
1598 | node with the "return value" of the corresponding subtree is |
---|
1599 | substituted instead. The subtree is deleted. |
---|
1600 | |
---|
1601 | If constant folding was not performed, the execution-order thread is |
---|
1602 | created. |
---|
1603 | |
---|
1604 | =head2 Compile pass 2: context propagation |
---|
1605 | |
---|
1606 | When a context for a part of compile tree is known, it is propagated |
---|
1607 | down through the tree. At this time the context can have 5 values |
---|
1608 | (instead of 2 for runtime context): void, boolean, scalar, list, and |
---|
1609 | lvalue. In contrast with the pass 1 this pass is processed from top |
---|
1610 | to bottom: a node's context determines the context for its children. |
---|
1611 | |
---|
1612 | Additional context-dependent optimizations are performed at this time. |
---|
1613 | Since at this moment the compile tree contains back-references (via |
---|
1614 | "thread" pointers), nodes cannot be free()d now. To allow |
---|
1615 | optimized-away nodes at this stage, such nodes are null()ified instead |
---|
1616 | of free()ing (i.e. their type is changed to OP_NULL). |
---|
1617 | |
---|
1618 | =head2 Compile pass 3: peephole optimization |
---|
1619 | |
---|
1620 | After the compile tree for a subroutine (or for an C<eval> or a file) |
---|
1621 | is created, an additional pass over the code is performed. This pass |
---|
1622 | is neither top-down or bottom-up, but in the execution order (with |
---|
1623 | additional complications for conditionals). These optimizations are |
---|
1624 | done in the subroutine peep(). Optimizations performed at this stage |
---|
1625 | are subject to the same restrictions as in the pass 2. |
---|
1626 | |
---|
1627 | =head1 Examining internal data structures with the C<dump> functions |
---|
1628 | |
---|
1629 | To aid debugging, the source file F<dump.c> contains a number of |
---|
1630 | functions which produce formatted output of internal data structures. |
---|
1631 | |
---|
1632 | The most commonly used of these functions is C<Perl_sv_dump>; it's used |
---|
1633 | for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls |
---|
1634 | C<sv_dump> to produce debugging output from Perl-space, so users of that |
---|
1635 | module should already be familiar with its format. |
---|
1636 | |
---|
1637 | C<Perl_op_dump> can be used to dump an C<OP> structure or any of its |
---|
1638 | derivatives, and produces output similiar to C<perl -Dx>; in fact, |
---|
1639 | C<Perl_dump_eval> will dump the main root of the code being evaluated, |
---|
1640 | exactly like C<-Dx>. |
---|
1641 | |
---|
1642 | Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an |
---|
1643 | op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the |
---|
1644 | subroutines in a package like so: (Thankfully, these are all xsubs, so |
---|
1645 | there is no op tree) |
---|
1646 | |
---|
1647 | (gdb) print Perl_dump_packsubs(PL_defstash) |
---|
1648 | |
---|
1649 | SUB attributes::bootstrap = (xsub 0x811fedc 0) |
---|
1650 | |
---|
1651 | SUB UNIVERSAL::can = (xsub 0x811f50c 0) |
---|
1652 | |
---|
1653 | SUB UNIVERSAL::isa = (xsub 0x811f304 0) |
---|
1654 | |
---|
1655 | SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) |
---|
1656 | |
---|
1657 | SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) |
---|
1658 | |
---|
1659 | and C<Perl_dump_all>, which dumps all the subroutines in the stash and |
---|
1660 | the op tree of the main root. |
---|
1661 | |
---|
1662 | =head1 How multiple interpreters and concurrency are supported |
---|
1663 | |
---|
1664 | =head2 Background and PERL_IMPLICIT_CONTEXT |
---|
1665 | |
---|
1666 | The Perl interpreter can be regarded as a closed box: it has an API |
---|
1667 | for feeding it code or otherwise making it do things, but it also has |
---|
1668 | functions for its own use. This smells a lot like an object, and |
---|
1669 | there are ways for you to build Perl so that you can have multiple |
---|
1670 | interpreters, with one interpreter represented either as a C++ object, |
---|
1671 | a C structure, or inside a thread. The thread, the C structure, or |
---|
1672 | the C++ object will contain all the context, the state of that |
---|
1673 | interpreter. |
---|
1674 | |
---|
1675 | Three macros control the major Perl build flavors: MULTIPLICITY, |
---|
1676 | USE_THREADS and PERL_OBJECT. The MULTIPLICITY build has a C structure |
---|
1677 | that packages all the interpreter state, there is a similar thread-specific |
---|
1678 | data structure under USE_THREADS, and the (now deprecated) PERL_OBJECT |
---|
1679 | build has a C++ class to maintain interpreter state. In all three cases, |
---|
1680 | PERL_IMPLICIT_CONTEXT is also normally defined, and enables the |
---|
1681 | support for passing in a "hidden" first argument that represents all three |
---|
1682 | data structures. |
---|
1683 | |
---|
1684 | All this obviously requires a way for the Perl internal functions to be |
---|
1685 | C++ methods, subroutines taking some kind of structure as the first |
---|
1686 | argument, or subroutines taking nothing as the first argument. To |
---|
1687 | enable these three very different ways of building the interpreter, |
---|
1688 | the Perl source (as it does in so many other situations) makes heavy |
---|
1689 | use of macros and subroutine naming conventions. |
---|
1690 | |
---|
1691 | First problem: deciding which functions will be public API functions and |
---|
1692 | which will be private. All functions whose names begin C<S_> are private |
---|
1693 | (think "S" for "secret" or "static"). All other functions begin with |
---|
1694 | "Perl_", but just because a function begins with "Perl_" does not mean it is |
---|
1695 | part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a |
---|
1696 | function is part of the API is to find its entry in L<perlapi>. |
---|
1697 | If it exists in L<perlapi>, it's part of the API. If it doesn't, and you |
---|
1698 | think it should be (i.e., you need it for your extension), send mail via |
---|
1699 | L<perlbug> explaining why you think it should be. |
---|
1700 | |
---|
1701 | Second problem: there must be a syntax so that the same subroutine |
---|
1702 | declarations and calls can pass a structure as their first argument, |
---|
1703 | or pass nothing. To solve this, the subroutines are named and |
---|
1704 | declared in a particular way. Here's a typical start of a static |
---|
1705 | function used within the Perl guts: |
---|
1706 | |
---|
1707 | STATIC void |
---|
1708 | S_incline(pTHX_ char *s) |
---|
1709 | |
---|
1710 | STATIC becomes "static" in C, and is #define'd to nothing in C++. |
---|
1711 | |
---|
1712 | A public function (i.e. part of the internal API, but not necessarily |
---|
1713 | sanctioned for use in extensions) begins like this: |
---|
1714 | |
---|
1715 | void |
---|
1716 | Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv) |
---|
1717 | |
---|
1718 | C<pTHX_> is one of a number of macros (in perl.h) that hide the |
---|
1719 | details of the interpreter's context. THX stands for "thread", "this", |
---|
1720 | or "thingy", as the case may be. (And no, George Lucas is not involved. :-) |
---|
1721 | The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, |
---|
1722 | or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and |
---|
1723 | their variants. |
---|
1724 | |
---|
1725 | When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no |
---|
1726 | first argument containing the interpreter's context. The trailing underscore |
---|
1727 | in the pTHX_ macro indicates that the macro expansion needs a comma |
---|
1728 | after the context argument because other arguments follow it. If |
---|
1729 | PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the |
---|
1730 | subroutine is not prototyped to take the extra argument. The form of the |
---|
1731 | macro without the trailing underscore is used when there are no additional |
---|
1732 | explicit arguments. |
---|
1733 | |
---|
1734 | When a core function calls another, it must pass the context. This |
---|
1735 | is normally hidden via macros. Consider C<sv_setsv>. It expands into |
---|
1736 | something like this: |
---|
1737 | |
---|
1738 | ifdef PERL_IMPLICIT_CONTEXT |
---|
1739 | define sv_setsv(a,b) Perl_sv_setsv(aTHX_ a, b) |
---|
1740 | /* can't do this for vararg functions, see below */ |
---|
1741 | else |
---|
1742 | define sv_setsv Perl_sv_setsv |
---|
1743 | endif |
---|
1744 | |
---|
1745 | This works well, and means that XS authors can gleefully write: |
---|
1746 | |
---|
1747 | sv_setsv(foo, bar); |
---|
1748 | |
---|
1749 | and still have it work under all the modes Perl could have been |
---|
1750 | compiled with. |
---|
1751 | |
---|
1752 | Under PERL_OBJECT in the core, that will translate to either: |
---|
1753 | |
---|
1754 | CPerlObj::Perl_sv_setsv(foo,bar); # in CPerlObj functions, |
---|
1755 | # C++ takes care of 'this' |
---|
1756 | or |
---|
1757 | |
---|
1758 | pPerl->Perl_sv_setsv(foo,bar); # in truly static functions, |
---|
1759 | # see objXSUB.h |
---|
1760 | |
---|
1761 | Under PERL_OBJECT in extensions (aka PERL_CAPI), or under |
---|
1762 | MULTIPLICITY/USE_THREADS with PERL_IMPLICIT_CONTEXT in both core |
---|
1763 | and extensions, it will become: |
---|
1764 | |
---|
1765 | Perl_sv_setsv(aTHX_ foo, bar); # the canonical Perl "API" |
---|
1766 | # for all build flavors |
---|
1767 | |
---|
1768 | This doesn't work so cleanly for varargs functions, though, as macros |
---|
1769 | imply that the number of arguments is known in advance. Instead we |
---|
1770 | either need to spell them out fully, passing C<aTHX_> as the first |
---|
1771 | argument (the Perl core tends to do this with functions like |
---|
1772 | Perl_warner), or use a context-free version. |
---|
1773 | |
---|
1774 | The context-free version of Perl_warner is called |
---|
1775 | Perl_warner_nocontext, and does not take the extra argument. Instead |
---|
1776 | it does dTHX; to get the context from thread-local storage. We |
---|
1777 | C<#define warner Perl_warner_nocontext> so that extensions get source |
---|
1778 | compatibility at the expense of performance. (Passing an arg is |
---|
1779 | cheaper than grabbing it from thread-local storage.) |
---|
1780 | |
---|
1781 | You can ignore [pad]THX[xo] when browsing the Perl headers/sources. |
---|
1782 | Those are strictly for use within the core. Extensions and embedders |
---|
1783 | need only be aware of [pad]THX. |
---|
1784 | |
---|
1785 | =head2 So what happened to dTHR? |
---|
1786 | |
---|
1787 | C<dTHR> was introduced in perl 5.005 to support the older thread model. |
---|
1788 | The older thread model now uses the C<THX> mechanism to pass context |
---|
1789 | pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and |
---|
1790 | later still have it for backward source compatibility, but it is defined |
---|
1791 | to be a no-op. |
---|
1792 | |
---|
1793 | =head2 How do I use all this in extensions? |
---|
1794 | |
---|
1795 | When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call |
---|
1796 | any functions in the Perl API will need to pass the initial context |
---|
1797 | argument somehow. The kicker is that you will need to write it in |
---|
1798 | such a way that the extension still compiles when Perl hasn't been |
---|
1799 | built with PERL_IMPLICIT_CONTEXT enabled. |
---|
1800 | |
---|
1801 | There are three ways to do this. First, the easy but inefficient way, |
---|
1802 | which is also the default, in order to maintain source compatibility |
---|
1803 | with extensions: whenever XSUB.h is #included, it redefines the aTHX |
---|
1804 | and aTHX_ macros to call a function that will return the context. |
---|
1805 | Thus, something like: |
---|
1806 | |
---|
1807 | sv_setsv(asv, bsv); |
---|
1808 | |
---|
1809 | in your extension will translate to this when PERL_IMPLICIT_CONTEXT is |
---|
1810 | in effect: |
---|
1811 | |
---|
1812 | Perl_sv_setsv(Perl_get_context(), asv, bsv); |
---|
1813 | |
---|
1814 | or to this otherwise: |
---|
1815 | |
---|
1816 | Perl_sv_setsv(asv, bsv); |
---|
1817 | |
---|
1818 | You have to do nothing new in your extension to get this; since |
---|
1819 | the Perl library provides Perl_get_context(), it will all just |
---|
1820 | work. |
---|
1821 | |
---|
1822 | The second, more efficient way is to use the following template for |
---|
1823 | your Foo.xs: |
---|
1824 | |
---|
1825 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
---|
1826 | #include "EXTERN.h" |
---|
1827 | #include "perl.h" |
---|
1828 | #include "XSUB.h" |
---|
1829 | |
---|
1830 | static my_private_function(int arg1, int arg2); |
---|
1831 | |
---|
1832 | static SV * |
---|
1833 | my_private_function(int arg1, int arg2) |
---|
1834 | { |
---|
1835 | dTHX; /* fetch context */ |
---|
1836 | ... call many Perl API functions ... |
---|
1837 | } |
---|
1838 | |
---|
1839 | [... etc ...] |
---|
1840 | |
---|
1841 | MODULE = Foo PACKAGE = Foo |
---|
1842 | |
---|
1843 | /* typical XSUB */ |
---|
1844 | |
---|
1845 | void |
---|
1846 | my_xsub(arg) |
---|
1847 | int arg |
---|
1848 | CODE: |
---|
1849 | my_private_function(arg, 10); |
---|
1850 | |
---|
1851 | Note that the only two changes from the normal way of writing an |
---|
1852 | extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before |
---|
1853 | including the Perl headers, followed by a C<dTHX;> declaration at |
---|
1854 | the start of every function that will call the Perl API. (You'll |
---|
1855 | know which functions need this, because the C compiler will complain |
---|
1856 | that there's an undeclared identifier in those functions.) No changes |
---|
1857 | are needed for the XSUBs themselves, because the XS() macro is |
---|
1858 | correctly defined to pass in the implicit context if needed. |
---|
1859 | |
---|
1860 | The third, even more efficient way is to ape how it is done within |
---|
1861 | the Perl guts: |
---|
1862 | |
---|
1863 | |
---|
1864 | #define PERL_NO_GET_CONTEXT /* we want efficiency */ |
---|
1865 | #include "EXTERN.h" |
---|
1866 | #include "perl.h" |
---|
1867 | #include "XSUB.h" |
---|
1868 | |
---|
1869 | /* pTHX_ only needed for functions that call Perl API */ |
---|
1870 | static my_private_function(pTHX_ int arg1, int arg2); |
---|
1871 | |
---|
1872 | static SV * |
---|
1873 | my_private_function(pTHX_ int arg1, int arg2) |
---|
1874 | { |
---|
1875 | /* dTHX; not needed here, because THX is an argument */ |
---|
1876 | ... call Perl API functions ... |
---|
1877 | } |
---|
1878 | |
---|
1879 | [... etc ...] |
---|
1880 | |
---|
1881 | MODULE = Foo PACKAGE = Foo |
---|
1882 | |
---|
1883 | /* typical XSUB */ |
---|
1884 | |
---|
1885 | void |
---|
1886 | my_xsub(arg) |
---|
1887 | int arg |
---|
1888 | CODE: |
---|
1889 | my_private_function(aTHX_ arg, 10); |
---|
1890 | |
---|
1891 | This implementation never has to fetch the context using a function |
---|
1892 | call, since it is always passed as an extra argument. Depending on |
---|
1893 | your needs for simplicity or efficiency, you may mix the previous |
---|
1894 | two approaches freely. |
---|
1895 | |
---|
1896 | Never add a comma after C<pTHX> yourself--always use the form of the |
---|
1897 | macro with the underscore for functions that take explicit arguments, |
---|
1898 | or the form without the argument for functions with no explicit arguments. |
---|
1899 | |
---|
1900 | =head2 Should I do anything special if I call perl from multiple threads? |
---|
1901 | |
---|
1902 | If you create interpreters in one thread and then proceed to call them in |
---|
1903 | another, you need to make sure perl's own Thread Local Storage (TLS) slot is |
---|
1904 | initialized correctly in each of those threads. |
---|
1905 | |
---|
1906 | The C<perl_alloc> and C<perl_clone> API functions will automatically set |
---|
1907 | the TLS slot to the interpreter they created, so that there is no need to do |
---|
1908 | anything special if the interpreter is always accessed in the same thread that |
---|
1909 | created it, and that thread did not create or call any other interpreters |
---|
1910 | afterwards. If that is not the case, you have to set the TLS slot of the |
---|
1911 | thread before calling any functions in the Perl API on that particular |
---|
1912 | interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that |
---|
1913 | thread as the first thing you do: |
---|
1914 | |
---|
1915 | /* do this before doing anything else with some_perl */ |
---|
1916 | PERL_SET_CONTEXT(some_perl); |
---|
1917 | |
---|
1918 | ... other Perl API calls on some_perl go here ... |
---|
1919 | |
---|
1920 | =head2 Future Plans and PERL_IMPLICIT_SYS |
---|
1921 | |
---|
1922 | Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything |
---|
1923 | that the interpreter knows about itself and pass it around, so too are |
---|
1924 | there plans to allow the interpreter to bundle up everything it knows |
---|
1925 | about the environment it's running on. This is enabled with the |
---|
1926 | PERL_IMPLICIT_SYS macro. Currently it only works with PERL_OBJECT |
---|
1927 | and USE_THREADS on Windows (see inside iperlsys.h). |
---|
1928 | |
---|
1929 | This allows the ability to provide an extra pointer (called the "host" |
---|
1930 | environment) for all the system calls. This makes it possible for |
---|
1931 | all the system stuff to maintain their own state, broken down into |
---|
1932 | seven C structures. These are thin wrappers around the usual system |
---|
1933 | calls (see win32/perllib.c) for the default perl executable, but for a |
---|
1934 | more ambitious host (like the one that would do fork() emulation) all |
---|
1935 | the extra work needed to pretend that different interpreters are |
---|
1936 | actually different "processes", would be done here. |
---|
1937 | |
---|
1938 | The Perl engine/interpreter and the host are orthogonal entities. |
---|
1939 | There could be one or more interpreters in a process, and one or |
---|
1940 | more "hosts", with free association between them. |
---|
1941 | |
---|
1942 | =head1 Internal Functions |
---|
1943 | |
---|
1944 | All of Perl's internal functions which will be exposed to the outside |
---|
1945 | world are be prefixed by C<Perl_> so that they will not conflict with XS |
---|
1946 | functions or functions used in a program in which Perl is embedded. |
---|
1947 | Similarly, all global variables begin with C<PL_>. (By convention, |
---|
1948 | static functions start with C<S_>) |
---|
1949 | |
---|
1950 | Inside the Perl core, you can get at the functions either with or |
---|
1951 | without the C<Perl_> prefix, thanks to a bunch of defines that live in |
---|
1952 | F<embed.h>. This header file is generated automatically from |
---|
1953 | F<embed.pl>. F<embed.pl> also creates the prototyping header files for |
---|
1954 | the internal functions, generates the documentation and a lot of other |
---|
1955 | bits and pieces. It's important that when you add a new function to the |
---|
1956 | core or change an existing one, you change the data in the table at the |
---|
1957 | end of F<embed.pl> as well. Here's a sample entry from that table: |
---|
1958 | |
---|
1959 | Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval |
---|
1960 | |
---|
1961 | The second column is the return type, the third column the name. Columns |
---|
1962 | after that are the arguments. The first column is a set of flags: |
---|
1963 | |
---|
1964 | =over 3 |
---|
1965 | |
---|
1966 | =item A |
---|
1967 | |
---|
1968 | This function is a part of the public API. |
---|
1969 | |
---|
1970 | =item p |
---|
1971 | |
---|
1972 | This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch> |
---|
1973 | |
---|
1974 | =item d |
---|
1975 | |
---|
1976 | This function has documentation using the C<apidoc> feature which we'll |
---|
1977 | look at in a second. |
---|
1978 | |
---|
1979 | =back |
---|
1980 | |
---|
1981 | Other available flags are: |
---|
1982 | |
---|
1983 | =over 3 |
---|
1984 | |
---|
1985 | =item s |
---|
1986 | |
---|
1987 | This is a static function and is defined as C<S_whatever>, and usually |
---|
1988 | called within the sources as C<whatever(...)>. |
---|
1989 | |
---|
1990 | =item n |
---|
1991 | |
---|
1992 | This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See |
---|
1993 | L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.) |
---|
1994 | |
---|
1995 | =item r |
---|
1996 | |
---|
1997 | This function never returns; C<croak>, C<exit> and friends. |
---|
1998 | |
---|
1999 | =item f |
---|
2000 | |
---|
2001 | This function takes a variable number of arguments, C<printf> style. |
---|
2002 | The argument list should end with C<...>, like this: |
---|
2003 | |
---|
2004 | Afprd |void |croak |const char* pat|... |
---|
2005 | |
---|
2006 | =item M |
---|
2007 | |
---|
2008 | This function is part of the experimental development API, and may change |
---|
2009 | or disappear without notice. |
---|
2010 | |
---|
2011 | =item o |
---|
2012 | |
---|
2013 | This function should not have a compatibility macro to define, say, |
---|
2014 | C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>. |
---|
2015 | |
---|
2016 | =item j |
---|
2017 | |
---|
2018 | This function is not a member of C<CPerlObj>. If you don't know |
---|
2019 | what this means, don't use it. |
---|
2020 | |
---|
2021 | =item x |
---|
2022 | |
---|
2023 | This function isn't exported out of the Perl core. |
---|
2024 | |
---|
2025 | =back |
---|
2026 | |
---|
2027 | If you edit F<embed.pl>, you will need to run C<make regen_headers> to |
---|
2028 | force a rebuild of F<embed.h> and other auto-generated files. |
---|
2029 | |
---|
2030 | =head2 Formatted Printing of IVs, UVs, and NVs |
---|
2031 | |
---|
2032 | If you are printing IVs, UVs, or NVS instead of the stdio(3) style |
---|
2033 | formatting codes like C<%d>, C<%ld>, C<%f>, you should use the |
---|
2034 | following macros for portability |
---|
2035 | |
---|
2036 | IVdf IV in decimal |
---|
2037 | UVuf UV in decimal |
---|
2038 | UVof UV in octal |
---|
2039 | UVxf UV in hexadecimal |
---|
2040 | NVef NV %e-like |
---|
2041 | NVff NV %f-like |
---|
2042 | NVgf NV %g-like |
---|
2043 | |
---|
2044 | These will take care of 64-bit integers and long doubles. |
---|
2045 | For example: |
---|
2046 | |
---|
2047 | printf("IV is %"IVdf"\n", iv); |
---|
2048 | |
---|
2049 | The IVdf will expand to whatever is the correct format for the IVs. |
---|
2050 | |
---|
2051 | If you are printing addresses of pointers, use UVxf combined |
---|
2052 | with PTR2UV(), do not use %lx or %p. |
---|
2053 | |
---|
2054 | =head2 Pointer-To-Integer and Integer-To-Pointer |
---|
2055 | |
---|
2056 | Because pointer size does not necessarily equal integer size, |
---|
2057 | use the follow macros to do it right. |
---|
2058 | |
---|
2059 | PTR2UV(pointer) |
---|
2060 | PTR2IV(pointer) |
---|
2061 | PTR2NV(pointer) |
---|
2062 | INT2PTR(pointertotype, integer) |
---|
2063 | |
---|
2064 | For example: |
---|
2065 | |
---|
2066 | IV iv = ...; |
---|
2067 | SV *sv = INT2PTR(SV*, iv); |
---|
2068 | |
---|
2069 | and |
---|
2070 | |
---|
2071 | AV *av = ...; |
---|
2072 | UV uv = PTR2UV(av); |
---|
2073 | |
---|
2074 | =head2 Source Documentation |
---|
2075 | |
---|
2076 | There's an effort going on to document the internal functions and |
---|
2077 | automatically produce reference manuals from them - L<perlapi> is one |
---|
2078 | such manual which details all the functions which are available to XS |
---|
2079 | writers. L<perlintern> is the autogenerated manual for the functions |
---|
2080 | which are not part of the API and are supposedly for internal use only. |
---|
2081 | |
---|
2082 | Source documentation is created by putting POD comments into the C |
---|
2083 | source, like this: |
---|
2084 | |
---|
2085 | /* |
---|
2086 | =for apidoc sv_setiv |
---|
2087 | |
---|
2088 | Copies an integer into the given SV. Does not handle 'set' magic. See |
---|
2089 | C<sv_setiv_mg>. |
---|
2090 | |
---|
2091 | =cut |
---|
2092 | */ |
---|
2093 | |
---|
2094 | Please try and supply some documentation if you add functions to the |
---|
2095 | Perl core. |
---|
2096 | |
---|
2097 | =head1 Unicode Support |
---|
2098 | |
---|
2099 | Perl 5.6.0 introduced Unicode support. It's important for porters and XS |
---|
2100 | writers to understand this support and make sure that the code they |
---|
2101 | write does not corrupt Unicode data. |
---|
2102 | |
---|
2103 | =head2 What B<is> Unicode, anyway? |
---|
2104 | |
---|
2105 | In the olden, less enlightened times, we all used to use ASCII. Most of |
---|
2106 | us did, anyway. The big problem with ASCII is that it's American. Well, |
---|
2107 | no, that's not actually the problem; the problem is that it's not |
---|
2108 | particularly useful for people who don't use the Roman alphabet. What |
---|
2109 | used to happen was that particular languages would stick their own |
---|
2110 | alphabet in the upper range of the sequence, between 128 and 255. Of |
---|
2111 | course, we then ended up with plenty of variants that weren't quite |
---|
2112 | ASCII, and the whole point of it being a standard was lost. |
---|
2113 | |
---|
2114 | Worse still, if you've got a language like Chinese or |
---|
2115 | Japanese that has hundreds or thousands of characters, then you really |
---|
2116 | can't fit them into a mere 256, so they had to forget about ASCII |
---|
2117 | altogether, and build their own systems using pairs of numbers to refer |
---|
2118 | to one character. |
---|
2119 | |
---|
2120 | To fix this, some people formed Unicode, Inc. and |
---|
2121 | produced a new character set containing all the characters you can |
---|
2122 | possibly think of and more. There are several ways of representing these |
---|
2123 | characters, and the one Perl uses is called UTF8. UTF8 uses |
---|
2124 | a variable number of bytes to represent a character, instead of just |
---|
2125 | one. You can learn more about Unicode at http://www.unicode.org/ |
---|
2126 | |
---|
2127 | =head2 How can I recognise a UTF8 string? |
---|
2128 | |
---|
2129 | You can't. This is because UTF8 data is stored in bytes just like |
---|
2130 | non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types) |
---|
2131 | capital E with a grave accent, is represented by the two bytes |
---|
2132 | C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> |
---|
2133 | has that byte sequence as well. So you can't tell just by looking - this |
---|
2134 | is what makes Unicode input an interesting problem. |
---|
2135 | |
---|
2136 | The API function C<is_utf8_string> can help; it'll tell you if a string |
---|
2137 | contains only valid UTF8 characters. However, it can't do the work for |
---|
2138 | you. On a character-by-character basis, C<is_utf8_char> will tell you |
---|
2139 | whether the current character in a string is valid UTF8. |
---|
2140 | |
---|
2141 | =head2 How does UTF8 represent Unicode characters? |
---|
2142 | |
---|
2143 | As mentioned above, UTF8 uses a variable number of bytes to store a |
---|
2144 | character. Characters with values 1...128 are stored in one byte, just |
---|
2145 | like good ol' ASCII. Character 129 is stored as C<v194.129>; this |
---|
2146 | continues up to character 191, which is C<v194.191>. Now we've run out of |
---|
2147 | bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And |
---|
2148 | so it goes on, moving to three bytes at character 2048. |
---|
2149 | |
---|
2150 | Assuming you know you're dealing with a UTF8 string, you can find out |
---|
2151 | how long the first character in it is with the C<UTF8SKIP> macro: |
---|
2152 | |
---|
2153 | char *utf = "\305\233\340\240\201"; |
---|
2154 | I32 len; |
---|
2155 | |
---|
2156 | len = UTF8SKIP(utf); /* len is 2 here */ |
---|
2157 | utf += len; |
---|
2158 | len = UTF8SKIP(utf); /* len is 3 here */ |
---|
2159 | |
---|
2160 | Another way to skip over characters in a UTF8 string is to use |
---|
2161 | C<utf8_hop>, which takes a string and a number of characters to skip |
---|
2162 | over. You're on your own about bounds checking, though, so don't use it |
---|
2163 | lightly. |
---|
2164 | |
---|
2165 | All bytes in a multi-byte UTF8 character will have the high bit set, so |
---|
2166 | you can test if you need to do something special with this character |
---|
2167 | like this: |
---|
2168 | |
---|
2169 | UV uv; |
---|
2170 | |
---|
2171 | if (utf & 0x80) |
---|
2172 | /* Must treat this as UTF8 */ |
---|
2173 | uv = utf8_to_uv(utf); |
---|
2174 | else |
---|
2175 | /* OK to treat this character as a byte */ |
---|
2176 | uv = *utf; |
---|
2177 | |
---|
2178 | You can also see in that example that we use C<utf8_to_uv> to get the |
---|
2179 | value of the character; the inverse function C<uv_to_utf8> is available |
---|
2180 | for putting a UV into UTF8: |
---|
2181 | |
---|
2182 | if (uv > 0x80) |
---|
2183 | /* Must treat this as UTF8 */ |
---|
2184 | utf8 = uv_to_utf8(utf8, uv); |
---|
2185 | else |
---|
2186 | /* OK to treat this character as a byte */ |
---|
2187 | *utf8++ = uv; |
---|
2188 | |
---|
2189 | You B<must> convert characters to UVs using the above functions if |
---|
2190 | you're ever in a situation where you have to match UTF8 and non-UTF8 |
---|
2191 | characters. You may not skip over UTF8 characters in this case. If you |
---|
2192 | do this, you'll lose the ability to match hi-bit non-UTF8 characters; |
---|
2193 | for instance, if your UTF8 string contains C<v196.172>, and you skip |
---|
2194 | that character, you can never match a C<chr(200)> in a non-UTF8 string. |
---|
2195 | So don't do that! |
---|
2196 | |
---|
2197 | =head2 How does Perl store UTF8 strings? |
---|
2198 | |
---|
2199 | Currently, Perl deals with Unicode strings and non-Unicode strings |
---|
2200 | slightly differently. If a string has been identified as being UTF-8 |
---|
2201 | encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and |
---|
2202 | manipulate this flag with the following macros: |
---|
2203 | |
---|
2204 | SvUTF8(sv) |
---|
2205 | SvUTF8_on(sv) |
---|
2206 | SvUTF8_off(sv) |
---|
2207 | |
---|
2208 | This flag has an important effect on Perl's treatment of the string: if |
---|
2209 | Unicode data is not properly distinguished, regular expressions, |
---|
2210 | C<length>, C<substr> and other string handling operations will have |
---|
2211 | undesirable results. |
---|
2212 | |
---|
2213 | The problem comes when you have, for instance, a string that isn't |
---|
2214 | flagged is UTF8, and contains a byte sequence that could be UTF8 - |
---|
2215 | especially when combining non-UTF8 and UTF8 strings. |
---|
2216 | |
---|
2217 | Never forget that the C<SVf_UTF8> flag is separate to the PV value; you |
---|
2218 | need be sure you don't accidentally knock it off while you're |
---|
2219 | manipulating SVs. More specifically, you cannot expect to do this: |
---|
2220 | |
---|
2221 | SV *sv; |
---|
2222 | SV *nsv; |
---|
2223 | STRLEN len; |
---|
2224 | char *p; |
---|
2225 | |
---|
2226 | p = SvPV(sv, len); |
---|
2227 | frobnicate(p); |
---|
2228 | nsv = newSVpvn(p, len); |
---|
2229 | |
---|
2230 | The C<char*> string does not tell you the whole story, and you can't |
---|
2231 | copy or reconstruct an SV just by copying the string value. Check if the |
---|
2232 | old SV has the UTF8 flag set, and act accordingly: |
---|
2233 | |
---|
2234 | p = SvPV(sv, len); |
---|
2235 | frobnicate(p); |
---|
2236 | nsv = newSVpvn(p, len); |
---|
2237 | if (SvUTF8(sv)) |
---|
2238 | SvUTF8_on(nsv); |
---|
2239 | |
---|
2240 | In fact, your C<frobnicate> function should be made aware of whether or |
---|
2241 | not it's dealing with UTF8 data, so that it can handle the string |
---|
2242 | appropriately. |
---|
2243 | |
---|
2244 | =head2 How do I convert a string to UTF8? |
---|
2245 | |
---|
2246 | If you're mixing UTF8 and non-UTF8 strings, you might find it necessary |
---|
2247 | to upgrade one of the strings to UTF8. If you've got an SV, the easiest |
---|
2248 | way to do this is: |
---|
2249 | |
---|
2250 | sv_utf8_upgrade(sv); |
---|
2251 | |
---|
2252 | However, you must not do this, for example: |
---|
2253 | |
---|
2254 | if (!SvUTF8(left)) |
---|
2255 | sv_utf8_upgrade(left); |
---|
2256 | |
---|
2257 | If you do this in a binary operator, you will actually change one of the |
---|
2258 | strings that came into the operator, and, while it shouldn't be noticeable |
---|
2259 | by the end user, it can cause problems. |
---|
2260 | |
---|
2261 | Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its |
---|
2262 | string argument. This is useful for having the data available for |
---|
2263 | comparisons and so on, without harming the original SV. There's also |
---|
2264 | C<utf8_to_bytes> to go the other way, but naturally, this will fail if |
---|
2265 | the string contains any characters above 255 that can't be represented |
---|
2266 | in a single byte. |
---|
2267 | |
---|
2268 | =head2 Is there anything else I need to know? |
---|
2269 | |
---|
2270 | Not really. Just remember these things: |
---|
2271 | |
---|
2272 | =over 3 |
---|
2273 | |
---|
2274 | =item * |
---|
2275 | |
---|
2276 | There's no way to tell if a string is UTF8 or not. You can tell if an SV |
---|
2277 | is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if |
---|
2278 | something should be UTF8. Treat the flag as part of the PV, even though |
---|
2279 | it's not - if you pass on the PV to somewhere, pass on the flag too. |
---|
2280 | |
---|
2281 | =item * |
---|
2282 | |
---|
2283 | If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value, |
---|
2284 | unless C<!(*s & 0x80)> in which case you can use C<*s>. |
---|
2285 | |
---|
2286 | =item * |
---|
2287 | |
---|
2288 | When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless |
---|
2289 | C<uv < 0x80> in which case you can use C<*s = uv>. |
---|
2290 | |
---|
2291 | =item * |
---|
2292 | |
---|
2293 | Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get |
---|
2294 | a new string which is UTF8 encoded. There are tricks you can use to |
---|
2295 | delay deciding whether you need to use a UTF8 string until you get to a |
---|
2296 | high character - C<HALF_UPGRADE> is one of those. |
---|
2297 | |
---|
2298 | =back |
---|
2299 | |
---|
2300 | =head1 AUTHORS |
---|
2301 | |
---|
2302 | Until May 1997, this document was maintained by Jeff Okamoto |
---|
2303 | <okamoto@corp.hp.com>. It is now maintained as part of Perl itself |
---|
2304 | by the Perl 5 Porters <perl5-porters@perl.org>. |
---|
2305 | |
---|
2306 | With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, |
---|
2307 | Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil |
---|
2308 | Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, |
---|
2309 | Stephen McCamant, and Gurusamy Sarathy. |
---|
2310 | |
---|
2311 | API Listing originally by Dean Roehrich <roehrich@cray.com>. |
---|
2312 | |
---|
2313 | Modifications to autogenerate the API listing (L<perlapi>) by Benjamin |
---|
2314 | Stuhl. |
---|
2315 | |
---|
2316 | =head1 SEE ALSO |
---|
2317 | |
---|
2318 | perlapi(1), perlintern(1), perlxs(1), perlembed(1) |
---|