source: trunk/third/perl/pod/perlguts.pod @ 17035

Revision 17035, 87.4 KB checked in by zacheiss, 23 years ago (diff)
This commit was generated by cvs2svn to compensate for changes in r17034, which included commits to RCS files with non-trunk default branches.
Line 
1=head1 NAME
2
3perlguts - Introduction to the Perl API
4
5=head1 DESCRIPTION
6
7This document attempts to describe how to use the Perl API, as well as
8containing some info on the basic workings of the Perl core. It is far
9from complete and probably contains many errors. Please refer any
10questions or comments to the author below.
11
12=head1 Variables
13
14=head2 Datatypes
15
16Perl has three typedefs that handle Perl's three main data types:
17
18    SV  Scalar Value
19    AV  Array Value
20    HV  Hash Value
21
22Each typedef has specific routines that manipulate the various data types.
23
24=head2 What is an "IV"?
25
26Perl uses a special typedef IV which is a simple signed integer type that is
27guaranteed to be large enough to hold a pointer (as well as an integer).
28Additionally, there is the UV, which is simply an unsigned IV.
29
30Perl also uses two special typedefs, I32 and I16, which will always be at
31least 32-bits and 16-bits long, respectively. (Again, there are U32 and U16,
32as well.)
33
34=head2 Working with SVs
35
36An SV can be created and loaded with one command.  There are four types of
37values that can be loaded: an integer value (IV), a double (NV),
38a string (PV), and another scalar (SV).
39
40The six routines are:
41
42    SV*  newSViv(IV);
43    SV*  newSVnv(double);
44    SV*  newSVpv(const char*, int);
45    SV*  newSVpvn(const char*, int);
46    SV*  newSVpvf(const char*, ...);
47    SV*  newSVsv(SV*);
48
49To change the value of an *already-existing* SV, there are seven routines:
50
51    void  sv_setiv(SV*, IV);
52    void  sv_setuv(SV*, UV);
53    void  sv_setnv(SV*, double);
54    void  sv_setpv(SV*, const char*);
55    void  sv_setpvn(SV*, const char*, int)
56    void  sv_setpvf(SV*, const char*, ...);
57    void  sv_setpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
58    void  sv_setsv(SV*, SV*);
59
60Notice that you can choose to specify the length of the string to be
61assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may
62allow Perl to calculate the length by using C<sv_setpv> or by specifying
630 as the second argument to C<newSVpv>.  Be warned, though, that Perl will
64determine the string's length by using C<strlen>, which depends on the
65string terminating with a NUL character.
66
67The arguments of C<sv_setpvf> are processed like C<sprintf>, and the
68formatted output becomes the value.
69
70C<sv_setpvfn> is an analogue of C<vsprintf>, but it allows you to specify
71either a pointer to a variable argument list or the address and length of
72an array of SVs.  The last argument points to a boolean; on return, if that
73boolean is true, then locale-specific information has been used to format
74the string, and the string's contents are therefore untrustworthy (see
75L<perlsec>).  This pointer may be NULL if that information is not
76important.  Note that this function requires you to specify the length of
77the format.
78
79STRLEN is an integer type (Size_t, usually defined as size_t in
80config.h) guaranteed to be large enough to represent the size of
81any string that perl can handle.
82
83The C<sv_set*()> functions are not generic enough to operate on values
84that have "magic".  See L<Magic Virtual Tables> later in this document.
85
86All SVs that contain strings should be terminated with a NUL character.
87If it is not NUL-terminated there is a risk of
88core dumps and corruptions from code which passes the string to C
89functions or system calls which expect a NUL-terminated string.
90Perl's own functions typically add a trailing NUL for this reason.
91Nevertheless, you should be very careful when you pass a string stored
92in an SV to a C function or system call.
93
94To access the actual value that an SV points to, you can use the macros:
95
96    SvIV(SV*)
97    SvUV(SV*)
98    SvNV(SV*)
99    SvPV(SV*, STRLEN len)
100    SvPV_nolen(SV*)
101
102which will automatically coerce the actual scalar type into an IV, UV, double,
103or string.
104
105In the C<SvPV> macro, the length of the string returned is placed into the
106variable C<len> (this is a macro, so you do I<not> use C<&len>).  If you do
107not care what the length of the data is, use the C<SvPV_nolen> macro.
108Historically the C<SvPV> macro with the global variable C<PL_na> has been
109used in this case.  But that can be quite inefficient because C<PL_na> must
110be accessed in thread-local storage in threaded Perl.  In any case, remember
111that Perl allows arbitrary strings of data that may both contain NULs and
112might not be terminated by a NUL.
113
114Also remember that C doesn't allow you to safely say C<foo(SvPV(s, len),
115len);>. It might work with your compiler, but it won't work for everyone.
116Break this sort of statement up into separate assignments:
117
118        SV *s;
119        STRLEN len;
120        char * ptr;
121        ptr = SvPV(s, len);
122        foo(ptr, len);
123
124If you want to know if the scalar value is TRUE, you can use:
125
126    SvTRUE(SV*)
127
128Although Perl will automatically grow strings for you, if you need to force
129Perl to allocate more memory for your SV, you can use the macro
130
131    SvGROW(SV*, STRLEN newlen)
132
133which will determine if more memory needs to be allocated.  If so, it will
134call the function C<sv_grow>.  Note that C<SvGROW> can only increase, not
135decrease, the allocated memory of an SV and that it does not automatically
136add a byte for the a trailing NUL (perl's own string functions typically do
137C<SvGROW(sv, len + 1)>).
138
139If you have an SV and want to know what kind of data Perl thinks is stored
140in it, you can use the following macros to check the type of SV you have.
141
142    SvIOK(SV*)
143    SvNOK(SV*)
144    SvPOK(SV*)
145
146You can get and set the current length of the string stored in an SV with
147the following macros:
148
149    SvCUR(SV*)
150    SvCUR_set(SV*, I32 val)
151
152You can also get a pointer to the end of the string stored in the SV
153with the macro:
154
155    SvEND(SV*)
156
157But note that these last three macros are valid only if C<SvPOK()> is true.
158
159If you want to append something to the end of string stored in an C<SV*>,
160you can use the following functions:
161
162    void  sv_catpv(SV*, const char*);
163    void  sv_catpvn(SV*, const char*, STRLEN);
164    void  sv_catpvf(SV*, const char*, ...);
165    void  sv_catpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
166    void  sv_catsv(SV*, SV*);
167
168The first function calculates the length of the string to be appended by
169using C<strlen>.  In the second, you specify the length of the string
170yourself.  The third function processes its arguments like C<sprintf> and
171appends the formatted output.  The fourth function works like C<vsprintf>.
172You can specify the address and length of an array of SVs instead of the
173va_list argument. The fifth function extends the string stored in the first
174SV with the string stored in the second SV.  It also forces the second SV
175to be interpreted as a string.
176
177The C<sv_cat*()> functions are not generic enough to operate on values that
178have "magic".  See L<Magic Virtual Tables> later in this document.
179
180If you know the name of a scalar variable, you can get a pointer to its SV
181by using the following:
182
183    SV*  get_sv("package::varname", FALSE);
184
185This returns NULL if the variable does not exist.
186
187If you want to know if this variable (or any other SV) is actually C<defined>,
188you can call:
189
190    SvOK(SV*)
191
192The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>.  Its
193address can be used whenever an C<SV*> is needed.
194
195There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain Boolean
196TRUE and FALSE values, respectively.  Like C<PL_sv_undef>, their addresses can
197be used whenever an C<SV*> is needed.
198
199Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>.
200Take this code:
201
202    SV* sv = (SV*) 0;
203    if (I-am-to-return-a-real-value) {
204            sv = sv_2mortal(newSViv(42));
205    }
206    sv_setsv(ST(0), sv);
207
208This code tries to return a new SV (which contains the value 42) if it should
209return a real value, or undef otherwise.  Instead it has returned a NULL
210pointer which, somewhere down the line, will cause a segmentation violation,
211bus error, or just weird results.  Change the zero to C<&PL_sv_undef> in the first
212line and all will be well.
213
214To free an SV that you've created, call C<SvREFCNT_dec(SV*)>.  Normally this
215call is not necessary (see L<Reference Counts and Mortality>).
216
217=head2 Offsets
218
219Perl provides the function C<sv_chop> to efficiently remove characters
220from the beginning of a string; you give it an SV and a pointer to
221somewhere inside the the PV, and it discards everything before the
222pointer. The efficiency comes by means of a little hack: instead of
223actually removing the characters, C<sv_chop> sets the flag C<OOK>
224(offset OK) to signal to other functions that the offset hack is in
225effect, and it puts the number of bytes chopped off into the IV field
226of the SV. It then moves the PV pointer (called C<SvPVX>) forward that
227many bytes, and adjusts C<SvCUR> and C<SvLEN>.
228
229Hence, at this point, the start of the buffer that we allocated lives
230at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing
231into the middle of this allocated storage.
232
233This is best demonstrated by example:
234
235  % ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
236  SV = PVIV(0x8128450) at 0x81340f0
237    REFCNT = 1
238    FLAGS = (POK,OOK,pPOK)
239    IV = 1  (OFFSET)
240    PV = 0x8135781 ( "1" . ) "2345"\0
241    CUR = 4
242    LEN = 5
243
244Here the number of bytes chopped off (1) is put into IV, and
245C<Devel::Peek::Dump> helpfully reminds us that this is an offset. The
246portion of the string between the "real" and the "fake" beginnings is
247shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect
248the fake beginning, not the real one.
249
250Something similar to the offset hack is perfomed on AVs to enable
251efficient shifting and splicing off the beginning of the array; while
252C<AvARRAY> points to the first element in the array that is visible from
253Perl, C<AvALLOC> points to the real start of the C array. These are
254usually the same, but a C<shift> operation can be carried out by
255increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvLEN>.
256Again, the location of the real start of the C array only comes into
257play when freeing the array. See C<av_shift> in F<av.c>.
258
259=head2 What's Really Stored in an SV?
260
261Recall that the usual method of determining the type of scalar you have is
262to use C<Sv*OK> macros.  Because a scalar can be both a number and a string,
263usually these macros will always return TRUE and calling the C<Sv*V>
264macros will do the appropriate conversion of string to integer/double or
265integer/double to string.
266
267If you I<really> need to know if you have an integer, double, or string
268pointer in an SV, you can use the following three macros instead:
269
270    SvIOKp(SV*)
271    SvNOKp(SV*)
272    SvPOKp(SV*)
273
274These will tell you if you truly have an integer, double, or string pointer
275stored in your SV.  The "p" stands for private.
276
277In general, though, it's best to use the C<Sv*V> macros.
278
279=head2 Working with AVs
280
281There are two ways to create and load an AV.  The first method creates an
282empty AV:
283
284    AV*  newAV();
285
286The second method both creates the AV and initially populates it with SVs:
287
288    AV*  av_make(I32 num, SV **ptr);
289
290The second argument points to an array containing C<num> C<SV*>'s.  Once the
291AV has been created, the SVs can be destroyed, if so desired.
292
293Once the AV has been created, the following operations are possible on AVs:
294
295    void  av_push(AV*, SV*);
296    SV*   av_pop(AV*);
297    SV*   av_shift(AV*);
298    void  av_unshift(AV*, I32 num);
299
300These should be familiar operations, with the exception of C<av_unshift>.
301This routine adds C<num> elements at the front of the array with the C<undef>
302value.  You must then use C<av_store> (described below) to assign values
303to these new elements.
304
305Here are some other functions:
306
307    I32   av_len(AV*);
308    SV**  av_fetch(AV*, I32 key, I32 lval);
309    SV**  av_store(AV*, I32 key, SV* val);
310
311The C<av_len> function returns the highest index value in array (just
312like $#array in Perl).  If the array is empty, -1 is returned.  The
313C<av_fetch> function returns the value at index C<key>, but if C<lval>
314is non-zero, then C<av_fetch> will store an undef value at that index.
315The C<av_store> function stores the value C<val> at index C<key>, and does
316not increment the reference count of C<val>.  Thus the caller is responsible
317for taking care of that, and if C<av_store> returns NULL, the caller will
318have to decrement the reference count to avoid a memory leak.  Note that
319C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their
320return value.
321
322    void  av_clear(AV*);
323    void  av_undef(AV*);
324    void  av_extend(AV*, I32 key);
325
326The C<av_clear> function deletes all the elements in the AV* array, but
327does not actually delete the array itself.  The C<av_undef> function will
328delete all the elements in the array plus the array itself.  The
329C<av_extend> function extends the array so that it contains at least C<key+1>
330elements.  If C<key+1> is less than the currently allocated length of the array,
331then nothing is done.
332
333If you know the name of an array variable, you can get a pointer to its AV
334by using the following:
335
336    AV*  get_av("package::varname", FALSE);
337
338This returns NULL if the variable does not exist.
339
340See L<Understanding the Magic of Tied Hashes and Arrays> for more
341information on how to use the array access functions on tied arrays.
342
343=head2 Working with HVs
344
345To create an HV, you use the following routine:
346
347    HV*  newHV();
348
349Once the HV has been created, the following operations are possible on HVs:
350
351    SV**  hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash);
352    SV**  hv_fetch(HV*, const char* key, U32 klen, I32 lval);
353
354The C<klen> parameter is the length of the key being passed in (Note that
355you cannot pass 0 in as a value of C<klen> to tell Perl to measure the
356length of the key).  The C<val> argument contains the SV pointer to the
357scalar being stored, and C<hash> is the precomputed hash value (zero if
358you want C<hv_store> to calculate it for you).  The C<lval> parameter
359indicates whether this fetch is actually a part of a store operation, in
360which case a new undefined value will be added to the HV with the supplied
361key and C<hv_fetch> will return as if the value had already existed.
362
363Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just
364C<SV*>.  To access the scalar value, you must first dereference the return
365value.  However, you should check to make sure that the return value is
366not NULL before dereferencing it.
367
368These two functions check if a hash table entry exists, and deletes it.
369
370    bool  hv_exists(HV*, const char* key, U32 klen);
371    SV*   hv_delete(HV*, const char* key, U32 klen, I32 flags);
372
373If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will
374create and return a mortal copy of the deleted value.
375
376And more miscellaneous functions:
377
378    void   hv_clear(HV*);
379    void   hv_undef(HV*);
380
381Like their AV counterparts, C<hv_clear> deletes all the entries in the hash
382table but does not actually delete the hash table.  The C<hv_undef> deletes
383both the entries and the hash table itself.
384
385Perl keeps the actual data in linked list of structures with a typedef of HE.
386These contain the actual key and value pointers (plus extra administrative
387overhead).  The key is a string pointer; the value is an C<SV*>.  However,
388once you have an C<HE*>, to get the actual key and value, use the routines
389specified below.
390
391    I32    hv_iterinit(HV*);
392            /* Prepares starting point to traverse hash table */
393    HE*    hv_iternext(HV*);
394            /* Get the next entry, and return a pointer to a
395               structure that has both the key and value */
396    char*  hv_iterkey(HE* entry, I32* retlen);
397            /* Get the key from an HE structure and also return
398               the length of the key string */
399    SV*    hv_iterval(HV*, HE* entry);
400            /* Return a SV pointer to the value of the HE
401               structure */
402    SV*    hv_iternextsv(HV*, char** key, I32* retlen);
403            /* This convenience routine combines hv_iternext,
404               hv_iterkey, and hv_iterval.  The key and retlen
405               arguments are return values for the key and its
406               length.  The value is returned in the SV* argument */
407
408If you know the name of a hash variable, you can get a pointer to its HV
409by using the following:
410
411    HV*  get_hv("package::varname", FALSE);
412
413This returns NULL if the variable does not exist.
414
415The hash algorithm is defined in the C<PERL_HASH(hash, key, klen)> macro:
416
417    hash = 0;
418    while (klen--)
419        hash = (hash * 33) + *key++;
420    hash = hash + (hash >> 5);                  /* after 5.6 */
421
422The last step was added in version 5.6 to improve distribution of
423lower bits in the resulting hash value.
424
425See L<Understanding the Magic of Tied Hashes and Arrays> for more
426information on how to use the hash access functions on tied hashes.
427
428=head2 Hash API Extensions
429
430Beginning with version 5.004, the following functions are also supported:
431
432    HE*     hv_fetch_ent  (HV* tb, SV* key, I32 lval, U32 hash);
433    HE*     hv_store_ent  (HV* tb, SV* key, SV* val, U32 hash);
434
435    bool    hv_exists_ent (HV* tb, SV* key, U32 hash);
436    SV*     hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash);
437
438    SV*     hv_iterkeysv  (HE* entry);
439
440Note that these functions take C<SV*> keys, which simplifies writing
441of extension code that deals with hash structures.  These functions
442also allow passing of C<SV*> keys to C<tie> functions without forcing
443you to stringify the keys (unlike the previous set of functions).
444
445They also return and accept whole hash entries (C<HE*>), making their
446use more efficient (since the hash number for a particular string
447doesn't have to be recomputed every time).  See L<perlapi> for detailed
448descriptions.
449
450The following macros must always be used to access the contents of hash
451entries.  Note that the arguments to these macros must be simple
452variables, since they may get evaluated more than once.  See
453L<perlapi> for detailed descriptions of these macros.
454
455    HePV(HE* he, STRLEN len)
456    HeVAL(HE* he)
457    HeHASH(HE* he)
458    HeSVKEY(HE* he)
459    HeSVKEY_force(HE* he)
460    HeSVKEY_set(HE* he, SV* sv)
461
462These two lower level macros are defined, but must only be used when
463dealing with keys that are not C<SV*>s:
464
465    HeKEY(HE* he)
466    HeKLEN(HE* he)
467
468Note that both C<hv_store> and C<hv_store_ent> do not increment the
469reference count of the stored C<val>, which is the caller's responsibility.
470If these functions return a NULL value, the caller will usually have to
471decrement the reference count of C<val> to avoid a memory leak.
472
473=head2 References
474
475References are a special type of scalar that point to other data types
476(including references).
477
478To create a reference, use either of the following functions:
479
480    SV* newRV_inc((SV*) thing);
481    SV* newRV_noinc((SV*) thing);
482
483The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>.  The
484functions are identical except that C<newRV_inc> increments the reference
485count of the C<thing>, while C<newRV_noinc> does not.  For historical
486reasons, C<newRV> is a synonym for C<newRV_inc>.
487
488Once you have a reference, you can use the following macro to dereference
489the reference:
490
491    SvRV(SV*)
492
493then call the appropriate routines, casting the returned C<SV*> to either an
494C<AV*> or C<HV*>, if required.
495
496To determine if an SV is a reference, you can use the following macro:
497
498    SvROK(SV*)
499
500To discover what type of value the reference refers to, use the following
501macro and then check the return value.
502
503    SvTYPE(SvRV(SV*))
504
505The most useful types that will be returned are:
506
507    SVt_IV    Scalar
508    SVt_NV    Scalar
509    SVt_PV    Scalar
510    SVt_RV    Scalar
511    SVt_PVAV  Array
512    SVt_PVHV  Hash
513    SVt_PVCV  Code
514    SVt_PVGV  Glob (possible a file handle)
515    SVt_PVMG  Blessed or Magical Scalar
516
517    See the sv.h header file for more details.
518
519=head2 Blessed References and Class Objects
520
521References are also used to support object-oriented programming.  In the
522OO lexicon, an object is simply a reference that has been blessed into a
523package (or class).  Once blessed, the programmer may now use the reference
524to access the various methods in the class.
525
526A reference can be blessed into a package with the following function:
527
528    SV* sv_bless(SV* sv, HV* stash);
529
530The C<sv> argument must be a reference.  The C<stash> argument specifies
531which class the reference will belong to.  See
532L<Stashes and Globs> for information on converting class names into stashes.
533
534/* Still under construction */
535
536Upgrades rv to reference if not already one.  Creates new SV for rv to
537point to.  If C<classname> is non-null, the SV is blessed into the specified
538class.  SV is returned.
539
540        SV* newSVrv(SV* rv, const char* classname);
541
542Copies integer or double into an SV whose reference is C<rv>.  SV is blessed
543if C<classname> is non-null.
544
545        SV* sv_setref_iv(SV* rv, const char* classname, IV iv);
546        SV* sv_setref_nv(SV* rv, const char* classname, NV iv);
547
548Copies the pointer value (I<the address, not the string!>) into an SV whose
549reference is rv.  SV is blessed if C<classname> is non-null.
550
551        SV* sv_setref_pv(SV* rv, const char* classname, PV iv);
552
553Copies string into an SV whose reference is C<rv>.  Set length to 0 to let
554Perl calculate the string length.  SV is blessed if C<classname> is non-null.
555
556        SV* sv_setref_pvn(SV* rv, const char* classname, PV iv, STRLEN length);
557
558Tests whether the SV is blessed into the specified class.  It does not
559check inheritance relationships.
560
561        int  sv_isa(SV* sv, const char* name);
562
563Tests whether the SV is a reference to a blessed object.
564
565        int  sv_isobject(SV* sv);
566
567Tests whether the SV is derived from the specified class. SV can be either
568a reference to a blessed object or a string containing a class name. This
569is the function implementing the C<UNIVERSAL::isa> functionality.
570
571        bool sv_derived_from(SV* sv, const char* name);
572
573To check if you've got an object derived from a specific class you have
574to write:
575
576        if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... }
577
578=head2 Creating New Variables
579
580To create a new Perl variable with an undef value which can be accessed from
581your Perl script, use the following routines, depending on the variable type.
582
583    SV*  get_sv("package::varname", TRUE);
584    AV*  get_av("package::varname", TRUE);
585    HV*  get_hv("package::varname", TRUE);
586
587Notice the use of TRUE as the second parameter.  The new variable can now
588be set, using the routines appropriate to the data type.
589
590There are additional macros whose values may be bitwise OR'ed with the
591C<TRUE> argument to enable certain extra features.  Those bits are:
592
593    GV_ADDMULTI Marks the variable as multiply defined, thus preventing the
594                "Name <varname> used only once: possible typo" warning.
595    GV_ADDWARN  Issues the warning "Had to create <varname> unexpectedly" if
596                the variable did not exist before the function was called.
597
598If you do not specify a package name, the variable is created in the current
599package.
600
601=head2 Reference Counts and Mortality
602
603Perl uses an reference count-driven garbage collection mechanism. SVs,
604AVs, or HVs (xV for short in the following) start their life with a
605reference count of 1.  If the reference count of an xV ever drops to 0,
606then it will be destroyed and its memory made available for reuse.
607
608This normally doesn't happen at the Perl level unless a variable is
609undef'ed or the last variable holding a reference to it is changed or
610overwritten.  At the internal level, however, reference counts can be
611manipulated with the following macros:
612
613    int SvREFCNT(SV* sv);
614    SV* SvREFCNT_inc(SV* sv);
615    void SvREFCNT_dec(SV* sv);
616
617However, there is one other function which manipulates the reference
618count of its argument.  The C<newRV_inc> function, you will recall,
619creates a reference to the specified argument.  As a side effect,
620it increments the argument's reference count.  If this is not what
621you want, use C<newRV_noinc> instead.
622
623For example, imagine you want to return a reference from an XSUB function.
624Inside the XSUB routine, you create an SV which initially has a reference
625count of one.  Then you call C<newRV_inc>, passing it the just-created SV.
626This returns the reference as a new SV, but the reference count of the
627SV you passed to C<newRV_inc> has been incremented to two.  Now you
628return the reference from the XSUB routine and forget about the SV.
629But Perl hasn't!  Whenever the returned reference is destroyed, the
630reference count of the original SV is decreased to one and nothing happens.
631The SV will hang around without any way to access it until Perl itself
632terminates.  This is a memory leak.
633
634The correct procedure, then, is to use C<newRV_noinc> instead of
635C<newRV_inc>.  Then, if and when the last reference is destroyed,
636the reference count of the SV will go to zero and it will be destroyed,
637stopping any memory leak.
638
639There are some convenience functions available that can help with the
640destruction of xVs.  These functions introduce the concept of "mortality".
641An xV that is mortal has had its reference count marked to be decremented,
642but not actually decremented, until "a short time later".  Generally the
643term "short time later" means a single Perl statement, such as a call to
644an XSUB function.  The actual determinant for when mortal xVs have their
645reference count decremented depends on two macros, SAVETMPS and FREETMPS.
646See L<perlcall> and L<perlxs> for more details on these macros.
647
648"Mortalization" then is at its simplest a deferred C<SvREFCNT_dec>.
649However, if you mortalize a variable twice, the reference count will
650later be decremented twice.
651
652You should be careful about creating mortal variables.  Strange things
653can happen if you make the same value mortal within multiple contexts,
654or if you make a variable mortal multiple times.
655
656To create a mortal variable, use the functions:
657
658    SV*  sv_newmortal()
659    SV*  sv_2mortal(SV*)
660    SV*  sv_mortalcopy(SV*)
661
662The first call creates a mortal SV, the second converts an existing
663SV to a mortal SV (and thus defers a call to C<SvREFCNT_dec>), and the
664third creates a mortal copy of an existing SV.
665
666The mortal routines are not just for SVs -- AVs and HVs can be
667made mortal by passing their address (type-casted to C<SV*>) to the
668C<sv_2mortal> or C<sv_mortalcopy> routines.
669
670=head2 Stashes and Globs
671
672A "stash" is a hash that contains all of the different objects that
673are contained within a package.  Each key of the stash is a symbol
674name (shared by all the different types of objects that have the same
675name), and each value in the hash table is a GV (Glob Value).  This GV
676in turn contains references to the various objects of that name,
677including (but not limited to) the following:
678
679    Scalar Value
680    Array Value
681    Hash Value
682    I/O Handle
683    Format
684    Subroutine
685
686There is a single stash called "PL_defstash" that holds the items that exist
687in the "main" package.  To get at the items in other packages, append the
688string "::" to the package name.  The items in the "Foo" package are in
689the stash "Foo::" in PL_defstash.  The items in the "Bar::Baz" package are
690in the stash "Baz::" in "Bar::"'s stash.
691
692To get the stash pointer for a particular package, use the function:
693
694    HV*  gv_stashpv(const char* name, I32 create)
695    HV*  gv_stashsv(SV*, I32 create)
696
697The first function takes a literal string, the second uses the string stored
698in the SV.  Remember that a stash is just a hash table, so you get back an
699C<HV*>.  The C<create> flag will create a new package if it is set.
700
701The name that C<gv_stash*v> wants is the name of the package whose symbol table
702you want.  The default package is called C<main>.  If you have multiply nested
703packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl
704language itself.
705
706Alternately, if you have an SV that is a blessed reference, you can find
707out the stash pointer by using:
708
709    HV*  SvSTASH(SvRV(SV*));
710
711then use the following to get the package name itself:
712
713    char*  HvNAME(HV* stash);
714
715If you need to bless or re-bless an object you can use the following
716function:
717
718    SV*  sv_bless(SV*, HV* stash)
719
720where the first argument, an C<SV*>, must be a reference, and the second
721argument is a stash.  The returned C<SV*> can now be used in the same way
722as any other SV.
723
724For more information on references and blessings, consult L<perlref>.
725
726=head2 Double-Typed SVs
727
728Scalar variables normally contain only one type of value, an integer,
729double, pointer, or reference.  Perl will automatically convert the
730actual scalar data from the stored type into the requested type.
731
732Some scalar variables contain more than one type of scalar data.  For
733example, the variable C<$!> contains either the numeric value of C<errno>
734or its string equivalent from either C<strerror> or C<sys_errlist[]>.
735
736To force multiple data values into an SV, you must do two things: use the
737C<sv_set*v> routines to add the additional scalar type, then set a flag
738so that Perl will believe it contains more than one type of data.  The
739four macros to set the flags are:
740
741        SvIOK_on
742        SvNOK_on
743        SvPOK_on
744        SvROK_on
745
746The particular macro you must use depends on which C<sv_set*v> routine
747you called first.  This is because every C<sv_set*v> routine turns on
748only the bit for the particular type of data being set, and turns off
749all the rest.
750
751For example, to create a new Perl variable called "dberror" that contains
752both the numeric and descriptive string error values, you could use the
753following code:
754
755    extern int  dberror;
756    extern char *dberror_list;
757
758    SV* sv = get_sv("dberror", TRUE);
759    sv_setiv(sv, (IV) dberror);
760    sv_setpv(sv, dberror_list[dberror]);
761    SvIOK_on(sv);
762
763If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the
764macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>.
765
766=head2 Magic Variables
767
768[This section still under construction.  Ignore everything here.  Post no
769bills.  Everything not permitted is forbidden.]
770
771Any SV may be magical, that is, it has special features that a normal
772SV does not have.  These features are stored in the SV structure in a
773linked list of C<struct magic>'s, typedef'ed to C<MAGIC>.
774
775    struct magic {
776        MAGIC*      mg_moremagic;
777        MGVTBL*     mg_virtual;
778        U16         mg_private;
779        char        mg_type;
780        U8          mg_flags;
781        SV*         mg_obj;
782        char*       mg_ptr;
783        I32         mg_len;
784    };
785
786Note this is current as of patchlevel 0, and could change at any time.
787
788=head2 Assigning Magic
789
790Perl adds magic to an SV using the sv_magic function:
791
792    void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen);
793
794The C<sv> argument is a pointer to the SV that is to acquire a new magical
795feature.
796
797If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to
798set the C<SVt_PVMG> flag for the C<sv>.  Perl then continues by adding
799it to the beginning of the linked list of magical features.  Any prior
800entry of the same type of magic is deleted.  Note that this can be
801overridden, and multiple instances of the same type of magic can be
802associated with an SV.
803
804The C<name> and C<namlen> arguments are used to associate a string with
805the magic, typically the name of a variable. C<namlen> is stored in the
806C<mg_len> field and if C<name> is non-null and C<namlen> >= 0 a malloc'd
807copy of the name is stored in C<mg_ptr> field.
808
809The sv_magic function uses C<how> to determine which, if any, predefined
810"Magic Virtual Table" should be assigned to the C<mg_virtual> field.
811See the "Magic Virtual Table" section below.  The C<how> argument is also
812stored in the C<mg_type> field.
813
814The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC>
815structure.  If it is not the same as the C<sv> argument, the reference
816count of the C<obj> object is incremented.  If it is the same, or if
817the C<how> argument is "#", or if it is a NULL pointer, then C<obj> is
818merely stored, without the reference count being incremented.
819
820There is also a function to add magic to an C<HV>:
821
822    void hv_magic(HV *hv, GV *gv, int how);
823
824This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>.
825
826To remove the magic from an SV, call the function sv_unmagic:
827
828    void sv_unmagic(SV *sv, int type);
829
830The C<type> argument should be equal to the C<how> value when the C<SV>
831was initially made magical.
832
833=head2 Magic Virtual Tables
834
835The C<mg_virtual> field in the C<MAGIC> structure is a pointer to a
836C<MGVTBL>, which is a structure of function pointers and stands for
837"Magic Virtual Table" to handle the various operations that might be
838applied to that variable.
839
840The C<MGVTBL> has five pointers to the following routine types:
841
842    int  (*svt_get)(SV* sv, MAGIC* mg);
843    int  (*svt_set)(SV* sv, MAGIC* mg);
844    U32  (*svt_len)(SV* sv, MAGIC* mg);
845    int  (*svt_clear)(SV* sv, MAGIC* mg);
846    int  (*svt_free)(SV* sv, MAGIC* mg);
847
848This MGVTBL structure is set at compile-time in C<perl.h> and there are
849currently 19 types (or 21 with overloading turned on).  These different
850structures contain pointers to various routines that perform additional
851actions depending on which function is being called.
852
853    Function pointer    Action taken
854    ----------------    ------------
855    svt_get             Do something after the value of the SV is retrieved.
856    svt_set             Do something after the SV is assigned a value.
857    svt_len             Report on the SV's length.
858    svt_clear           Clear something the SV represents.
859    svt_free            Free any extra storage associated with the SV.
860
861For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds
862to an C<mg_type> of '\0') contains:
863
864    { magic_get, magic_set, magic_len, 0, 0 }
865
866Thus, when an SV is determined to be magical and of type '\0', if a get
867operation is being performed, the routine C<magic_get> is called.  All
868the various routines for the various magical types begin with C<magic_>.
869NOTE: the magic routines are not considered part of the Perl API, and may
870not be exported by the Perl library.
871
872The current kinds of Magic Virtual Tables are:
873
874    mg_type  MGVTBL              Type of magic
875    -------  ------              ----------------------------
876    \0       vtbl_sv             Special scalar variable
877    A        vtbl_amagic         %OVERLOAD hash
878    a        vtbl_amagicelem     %OVERLOAD hash element
879    c        (none)              Holds overload table (AMT) on stash
880    B        vtbl_bm             Boyer-Moore (fast string search)
881    D        vtbl_regdata        Regex match position data (@+ and @- vars)
882    d        vtbl_regdatum       Regex match position data element
883    E        vtbl_env            %ENV hash
884    e        vtbl_envelem        %ENV hash element
885    f        vtbl_fm             Formline ('compiled' format)
886    g        vtbl_mglob          m//g target / study()ed string
887    I        vtbl_isa            @ISA array
888    i        vtbl_isaelem        @ISA array element
889    k        vtbl_nkeys          scalar(keys()) lvalue
890    L        (none)              Debugger %_<filename
891    l        vtbl_dbline         Debugger %_<filename element
892    o        vtbl_collxfrm       Locale transformation
893    P        vtbl_pack           Tied array or hash
894    p        vtbl_packelem       Tied array or hash element
895    q        vtbl_packelem       Tied scalar or handle
896    S        vtbl_sig            %SIG hash
897    s        vtbl_sigelem        %SIG hash element
898    t        vtbl_taint          Taintedness
899    U        vtbl_uvar           Available for use by extensions
900    v        vtbl_vec            vec() lvalue
901    x        vtbl_substr         substr() lvalue
902    y        vtbl_defelem        Shadow "foreach" iterator variable /
903                                  smart parameter vivification
904    *        vtbl_glob           GV (typeglob)
905    #        vtbl_arylen         Array length ($#ary)
906    .        vtbl_pos            pos() lvalue
907    ~        (none)              Available for use by extensions
908
909When an uppercase and lowercase letter both exist in the table, then the
910uppercase letter is used to represent some kind of composite type (a list
911or a hash), and the lowercase letter is used to represent an element of
912that composite type.
913
914The '~' and 'U' magic types are defined specifically for use by
915extensions and will not be used by perl itself.  Extensions can use
916'~' magic to 'attach' private information to variables (typically
917objects).  This is especially useful because there is no way for
918normal perl code to corrupt this private information (unlike using
919extra elements of a hash object).
920
921Similarly, 'U' magic can be used much like tie() to call a C function
922any time a scalar's value is used or changed.  The C<MAGIC>'s
923C<mg_ptr> field points to a C<ufuncs> structure:
924
925    struct ufuncs {
926        I32 (*uf_val)(IV, SV*);
927        I32 (*uf_set)(IV, SV*);
928        IV uf_index;
929    };
930
931When the SV is read from or written to, the C<uf_val> or C<uf_set>
932function will be called with C<uf_index> as the first arg and a
933pointer to the SV as the second.  A simple example of how to add 'U'
934magic is shown below.  Note that the ufuncs structure is copied by
935sv_magic, so you can safely allocate it on the stack.
936
937    void
938    Umagic(sv)
939        SV *sv;
940    PREINIT:
941        struct ufuncs uf;
942    CODE:
943        uf.uf_val   = &my_get_fn;
944        uf.uf_set   = &my_set_fn;
945        uf.uf_index = 0;
946        sv_magic(sv, 0, 'U', (char*)&uf, sizeof(uf));
947
948Note that because multiple extensions may be using '~' or 'U' magic,
949it is important for extensions to take extra care to avoid conflict.
950Typically only using the magic on objects blessed into the same class
951as the extension is sufficient.  For '~' magic, it may also be
952appropriate to add an I32 'signature' at the top of the private data
953area and check that.
954
955Also note that the C<sv_set*()> and C<sv_cat*()> functions described
956earlier do B<not> invoke 'set' magic on their targets.  This must
957be done by the user either by calling the C<SvSETMAGIC()> macro after
958calling these functions, or by using one of the C<sv_set*_mg()> or
959C<sv_cat*_mg()> functions.  Similarly, generic C code must call the
960C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV
961obtained from external sources in functions that don't handle magic.
962See L<perlapi> for a description of these functions.
963For example, calls to the C<sv_cat*()> functions typically need to be
964followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()>
965since their implementation handles 'get' magic.
966
967=head2 Finding Magic
968
969    MAGIC* mg_find(SV*, int type); /* Finds the magic pointer of that type */
970
971This routine returns a pointer to the C<MAGIC> structure stored in the SV.
972If the SV does not have that magical feature, C<NULL> is returned.  Also,
973if the SV is not of type SVt_PVMG, Perl may core dump.
974
975    int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen);
976
977This routine checks to see what types of magic C<sv> has.  If the mg_type
978field is an uppercase letter, then the mg_obj is copied to C<nsv>, but
979the mg_type field is changed to be the lowercase letter.
980
981=head2 Understanding the Magic of Tied Hashes and Arrays
982
983Tied hashes and arrays are magical beasts of the 'P' magic type.
984
985WARNING: As of the 5.004 release, proper usage of the array and hash
986access functions requires understanding a few caveats.  Some
987of these caveats are actually considered bugs in the API, to be fixed
988in later releases, and are bracketed with [MAYCHANGE] below. If
989you find yourself actually applying such information in this section, be
990aware that the behavior may change in the future, umm, without warning.
991
992The perl tie function associates a variable with an object that implements
993the various GET, SET etc methods.  To perform the equivalent of the perl
994tie function from an XSUB, you must mimic this behaviour.  The code below
995carries out the necessary steps - firstly it creates a new hash, and then
996creates a second hash which it blesses into the class which will implement
997the tie methods. Lastly it ties the two hashes together, and returns a
998reference to the new tied hash.  Note that the code below does NOT call the
999TIEHASH method in the MyTie class -
1000see L<Calling Perl Routines from within C Programs> for details on how
1001to do this.
1002
1003    SV*
1004    mytie()
1005    PREINIT:
1006        HV *hash;
1007        HV *stash;
1008        SV *tie;
1009    CODE:
1010        hash = newHV();
1011        tie = newRV_noinc((SV*)newHV());
1012        stash = gv_stashpv("MyTie", TRUE);
1013        sv_bless(tie, stash);
1014        hv_magic(hash, tie, 'P');
1015        RETVAL = newRV_noinc(hash);
1016    OUTPUT:
1017        RETVAL
1018
1019The C<av_store> function, when given a tied array argument, merely
1020copies the magic of the array onto the value to be "stored", using
1021C<mg_copy>.  It may also return NULL, indicating that the value did not
1022actually need to be stored in the array.  [MAYCHANGE] After a call to
1023C<av_store> on a tied array, the caller will usually need to call
1024C<mg_set(val)> to actually invoke the perl level "STORE" method on the
1025TIEARRAY object.  If C<av_store> did return NULL, a call to
1026C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory
1027leak. [/MAYCHANGE]
1028
1029The previous paragraph is applicable verbatim to tied hash access using the
1030C<hv_store> and C<hv_store_ent> functions as well.
1031
1032C<av_fetch> and the corresponding hash functions C<hv_fetch> and
1033C<hv_fetch_ent> actually return an undefined mortal value whose magic
1034has been initialized using C<mg_copy>.  Note the value so returned does not
1035need to be deallocated, as it is already mortal.  [MAYCHANGE] But you will
1036need to call C<mg_get()> on the returned value in order to actually invoke
1037the perl level "FETCH" method on the underlying TIE object.  Similarly,
1038you may also call C<mg_set()> on the return value after possibly assigning
1039a suitable value to it using C<sv_setsv>,  which will invoke the "STORE"
1040method on the TIE object. [/MAYCHANGE]
1041
1042[MAYCHANGE]
1043In other words, the array or hash fetch/store functions don't really
1044fetch and store actual values in the case of tied arrays and hashes.  They
1045merely call C<mg_copy> to attach magic to the values that were meant to be
1046"stored" or "fetched".  Later calls to C<mg_get> and C<mg_set> actually
1047do the job of invoking the TIE methods on the underlying objects.  Thus
1048the magic mechanism currently implements a kind of lazy access to arrays
1049and hashes.
1050
1051Currently (as of perl version 5.004), use of the hash and array access
1052functions requires the user to be aware of whether they are operating on
1053"normal" hashes and arrays, or on their tied variants.  The API may be
1054changed to provide more transparent access to both tied and normal data
1055types in future versions.
1056[/MAYCHANGE]
1057
1058You would do well to understand that the TIEARRAY and TIEHASH interfaces
1059are mere sugar to invoke some perl method calls while using the uniform hash
1060and array syntax.  The use of this sugar imposes some overhead (typically
1061about two to four extra opcodes per FETCH/STORE operation, in addition to
1062the creation of all the mortal variables required to invoke the methods).
1063This overhead will be comparatively small if the TIE methods are themselves
1064substantial, but if they are only a few statements long, the overhead
1065will not be insignificant.
1066
1067=head2 Localizing changes
1068
1069Perl has a very handy construction
1070
1071  {
1072    local $var = 2;
1073    ...
1074  }
1075
1076This construction is I<approximately> equivalent to
1077
1078  {
1079    my $oldvar = $var;
1080    $var = 2;
1081    ...
1082    $var = $oldvar;
1083  }
1084
1085The biggest difference is that the first construction would
1086reinstate the initial value of $var, irrespective of how control exits
1087the block: C<goto>, C<return>, C<die>/C<eval> etc. It is a little bit
1088more efficient as well.
1089
1090There is a way to achieve a similar task from C via Perl API: create a
1091I<pseudo-block>, and arrange for some changes to be automatically
1092undone at the end of it, either explicit, or via a non-local exit (via
1093die()). A I<block>-like construct is created by a pair of
1094C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">).
1095Such a construct may be created specially for some important localized
1096task, or an existing one (like boundaries of enclosing Perl
1097subroutine/block, or an existing pair for freeing TMPs) may be
1098used. (In the second case the overhead of additional localization must
1099be almost negligible.) Note that any XSUB is automatically enclosed in
1100an C<ENTER>/C<LEAVE> pair.
1101
1102Inside such a I<pseudo-block> the following service is available:
1103
1104=over 4
1105
1106=item C<SAVEINT(int i)>
1107
1108=item C<SAVEIV(IV i)>
1109
1110=item C<SAVEI32(I32 i)>
1111
1112=item C<SAVELONG(long i)>
1113
1114These macros arrange things to restore the value of integer variable
1115C<i> at the end of enclosing I<pseudo-block>.
1116
1117=item C<SAVESPTR(s)>
1118
1119=item C<SAVEPPTR(p)>
1120
1121These macros arrange things to restore the value of pointers C<s> and
1122C<p>. C<s> must be a pointer of a type which survives conversion to
1123C<SV*> and back, C<p> should be able to survive conversion to C<char*>
1124and back.
1125
1126=item C<SAVEFREESV(SV *sv)>
1127
1128The refcount of C<sv> would be decremented at the end of
1129I<pseudo-block>.  This is similar to C<sv_2mortal> in that it is also a
1130mechanism for doing a delayed C<SvREFCNT_dec>.  However, while C<sv_2mortal>
1131extends the lifetime of C<sv> until the beginning of the next statement,
1132C<SAVEFREESV> extends it until the end of the enclosing scope.  These
1133lifetimes can be wildly different.
1134
1135Also compare C<SAVEMORTALIZESV>.
1136
1137=item C<SAVEMORTALIZESV(SV *sv)>
1138
1139Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current
1140scope instead of decrementing its reference count.  This usually has the
1141effect of keeping C<sv> alive until the statement that called the currently
1142live scope has finished executing.
1143
1144=item C<SAVEFREEOP(OP *op)>
1145
1146The C<OP *> is op_free()ed at the end of I<pseudo-block>.
1147
1148=item C<SAVEFREEPV(p)>
1149
1150The chunk of memory which is pointed to by C<p> is Safefree()ed at the
1151end of I<pseudo-block>.
1152
1153=item C<SAVECLEARSV(SV *sv)>
1154
1155Clears a slot in the current scratchpad which corresponds to C<sv> at
1156the end of I<pseudo-block>.
1157
1158=item C<SAVEDELETE(HV *hv, char *key, I32 length)>
1159
1160The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The
1161string pointed to by C<key> is Safefree()ed.  If one has a I<key> in
1162short-lived storage, the corresponding string may be reallocated like
1163this:
1164
1165  SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf));
1166
1167=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)>
1168
1169At the end of I<pseudo-block> the function C<f> is called with the
1170only argument C<p>.
1171
1172=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)>
1173
1174At the end of I<pseudo-block> the function C<f> is called with the
1175implicit context argument (if any), and C<p>.
1176
1177=item C<SAVESTACK_POS()>
1178
1179The current offset on the Perl internal stack (cf. C<SP>) is restored
1180at the end of I<pseudo-block>.
1181
1182=back
1183
1184The following API list contains functions, thus one needs to
1185provide pointers to the modifiable data explicitly (either C pointers,
1186or Perlish C<GV *>s).  Where the above macros take C<int>, a similar
1187function takes C<int *>.
1188
1189=over 4
1190
1191=item C<SV* save_scalar(GV *gv)>
1192
1193Equivalent to Perl code C<local $gv>.
1194
1195=item C<AV* save_ary(GV *gv)>
1196
1197=item C<HV* save_hash(GV *gv)>
1198
1199Similar to C<save_scalar>, but localize C<@gv> and C<%gv>.
1200
1201=item C<void save_item(SV *item)>
1202
1203Duplicates the current value of C<SV>, on the exit from the current
1204C<ENTER>/C<LEAVE> I<pseudo-block> will restore the value of C<SV>
1205using the stored value.
1206
1207=item C<void save_list(SV **sarg, I32 maxsarg)>
1208
1209A variant of C<save_item> which takes multiple arguments via an array
1210C<sarg> of C<SV*> of length C<maxsarg>.
1211
1212=item C<SV* save_svref(SV **sptr)>
1213
1214Similar to C<save_scalar>, but will reinstate a C<SV *>.
1215
1216=item C<void save_aptr(AV **aptr)>
1217
1218=item C<void save_hptr(HV **hptr)>
1219
1220Similar to C<save_svref>, but localize C<AV *> and C<HV *>.
1221
1222=back
1223
1224The C<Alias> module implements localization of the basic types within the
1225I<caller's scope>.  People who are interested in how to localize things in
1226the containing scope should take a look there too.
1227
1228=head1 Subroutines
1229
1230=head2 XSUBs and the Argument Stack
1231
1232The XSUB mechanism is a simple way for Perl programs to access C subroutines.
1233An XSUB routine will have a stack that contains the arguments from the Perl
1234program, and a way to map from the Perl data structures to a C equivalent.
1235
1236The stack arguments are accessible through the C<ST(n)> macro, which returns
1237the C<n>'th stack argument.  Argument 0 is the first argument passed in the
1238Perl subroutine call.  These arguments are C<SV*>, and can be used anywhere
1239an C<SV*> is used.
1240
1241Most of the time, output from the C routine can be handled through use of
1242the RETVAL and OUTPUT directives.  However, there are some cases where the
1243argument stack is not already long enough to handle all the return values.
1244An example is the POSIX tzname() call, which takes no arguments, but returns
1245two, the local time zone's standard and summer time abbreviations.
1246
1247To handle this situation, the PPCODE directive is used and the stack is
1248extended using the macro:
1249
1250    EXTEND(SP, num);
1251
1252where C<SP> is the macro that represents the local copy of the stack pointer,
1253and C<num> is the number of elements the stack should be extended by.
1254
1255Now that there is room on the stack, values can be pushed on it using the
1256macros to push IVs, doubles, strings, and SV pointers respectively:
1257
1258    PUSHi(IV)
1259    PUSHn(double)
1260    PUSHp(char*, I32)
1261    PUSHs(SV*)
1262
1263And now the Perl program calling C<tzname>, the two values will be assigned
1264as in:
1265
1266    ($standard_abbrev, $summer_abbrev) = POSIX::tzname;
1267
1268An alternate (and possibly simpler) method to pushing values on the stack is
1269to use the macros:
1270
1271    XPUSHi(IV)
1272    XPUSHn(double)
1273    XPUSHp(char*, I32)
1274    XPUSHs(SV*)
1275
1276These macros automatically adjust the stack for you, if needed.  Thus, you
1277do not need to call C<EXTEND> to extend the stack.
1278However, see L</Putting a C value on Perl stack>
1279
1280For more information, consult L<perlxs> and L<perlxstut>.
1281
1282=head2 Calling Perl Routines from within C Programs
1283
1284There are four routines that can be used to call a Perl subroutine from
1285within a C program.  These four are:
1286
1287    I32  call_sv(SV*, I32);
1288    I32  call_pv(const char*, I32);
1289    I32  call_method(const char*, I32);
1290    I32  call_argv(const char*, I32, register char**);
1291
1292The routine most often used is C<call_sv>.  The C<SV*> argument
1293contains either the name of the Perl subroutine to be called, or a
1294reference to the subroutine.  The second argument consists of flags
1295that control the context in which the subroutine is called, whether
1296or not the subroutine is being passed arguments, how errors should be
1297trapped, and how to treat return values.
1298
1299All four routines return the number of arguments that the subroutine returned
1300on the Perl stack.
1301
1302These routines used to be called C<perl_call_sv> etc., before Perl v5.6.0,
1303but those names are now deprecated; macros of the same name are provided for
1304compatibility.
1305
1306When using any of these routines (except C<call_argv>), the programmer
1307must manipulate the Perl stack.  These include the following macros and
1308functions:
1309
1310    dSP
1311    SP
1312    PUSHMARK()
1313    PUTBACK
1314    SPAGAIN
1315    ENTER
1316    SAVETMPS
1317    FREETMPS
1318    LEAVE
1319    XPUSH*()
1320    POP*()
1321
1322For a detailed description of calling conventions from C to Perl,
1323consult L<perlcall>.
1324
1325=head2 Memory Allocation
1326
1327All memory meant to be used with the Perl API functions should be manipulated
1328using the macros described in this section.  The macros provide the necessary
1329transparency between differences in the actual malloc implementation that is
1330used within perl.
1331
1332It is suggested that you enable the version of malloc that is distributed
1333with Perl.  It keeps pools of various sizes of unallocated memory in
1334order to satisfy allocation requests more quickly.  However, on some
1335platforms, it may cause spurious malloc or free errors.
1336
1337    New(x, pointer, number, type);
1338    Newc(x, pointer, number, type, cast);
1339    Newz(x, pointer, number, type);
1340
1341These three macros are used to initially allocate memory.
1342
1343The first argument C<x> was a "magic cookie" that was used to keep track
1344of who called the macro, to help when debugging memory problems.  However,
1345the current code makes no use of this feature (most Perl developers now
1346use run-time memory checkers), so this argument can be any number.
1347
1348The second argument C<pointer> should be the name of a variable that will
1349point to the newly allocated memory.
1350
1351The third and fourth arguments C<number> and C<type> specify how many of
1352the specified type of data structure should be allocated.  The argument
1353C<type> is passed to C<sizeof>.  The final argument to C<Newc>, C<cast>,
1354should be used if the C<pointer> argument is different from the C<type>
1355argument.
1356
1357Unlike the C<New> and C<Newc> macros, the C<Newz> macro calls C<memzero>
1358to zero out all the newly allocated memory.
1359
1360    Renew(pointer, number, type);
1361    Renewc(pointer, number, type, cast);
1362    Safefree(pointer)
1363
1364These three macros are used to change a memory buffer size or to free a
1365piece of memory no longer needed.  The arguments to C<Renew> and C<Renewc>
1366match those of C<New> and C<Newc> with the exception of not needing the
1367"magic cookie" argument.
1368
1369    Move(source, dest, number, type);
1370    Copy(source, dest, number, type);
1371    Zero(dest, number, type);
1372
1373These three macros are used to move, copy, or zero out previously allocated
1374memory.  The C<source> and C<dest> arguments point to the source and
1375destination starting points.  Perl will move, copy, or zero out C<number>
1376instances of the size of the C<type> data structure (using the C<sizeof>
1377function).
1378
1379=head2 PerlIO
1380
1381The most recent development releases of Perl has been experimenting with
1382removing Perl's dependency on the "normal" standard I/O suite and allowing
1383other stdio implementations to be used.  This involves creating a new
1384abstraction layer that then calls whichever implementation of stdio Perl
1385was compiled with.  All XSUBs should now use the functions in the PerlIO
1386abstraction layer and not make any assumptions about what kind of stdio
1387is being used.
1388
1389For a complete description of the PerlIO abstraction, consult L<perlapio>.
1390
1391=head2 Putting a C value on Perl stack
1392
1393A lot of opcodes (this is an elementary operation in the internal perl
1394stack machine) put an SV* on the stack. However, as an optimization
1395the corresponding SV is (usually) not recreated each time. The opcodes
1396reuse specially assigned SVs (I<target>s) which are (as a corollary)
1397not constantly freed/created.
1398
1399Each of the targets is created only once (but see
1400L<Scratchpads and recursion> below), and when an opcode needs to put
1401an integer, a double, or a string on stack, it just sets the
1402corresponding parts of its I<target> and puts the I<target> on stack.
1403
1404The macro to put this target on stack is C<PUSHTARG>, and it is
1405directly used in some opcodes, as well as indirectly in zillions of
1406others, which use it via C<(X)PUSH[pni]>.
1407
1408Because the target is reused, you must be careful when pushing multiple
1409values on the stack. The following code will not do what you think:
1410
1411    XPUSHi(10);
1412    XPUSHi(20);
1413
1414This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto
1415the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack".
1416At the end of the operation, the stack does not contain the values 10
1417and 20, but actually contains two pointers to C<TARG>, which we have set
1418to 20. If you need to push multiple different values, use C<XPUSHs>,
1419which bypasses C<TARG>.
1420
1421On a related note, if you do use C<(X)PUSH[npi]>, then you're going to
1422need a C<dTARG> in your variable declarations so that the C<*PUSH*>
1423macros can make use of the local variable C<TARG>.
1424
1425=head2 Scratchpads
1426
1427The question remains on when the SVs which are I<target>s for opcodes
1428are created. The answer is that they are created when the current unit --
1429a subroutine or a file (for opcodes for statements outside of
1430subroutines) -- is compiled. During this time a special anonymous Perl
1431array is created, which is called a scratchpad for the current
1432unit.
1433
1434A scratchpad keeps SVs which are lexicals for the current unit and are
1435targets for opcodes. One can deduce that an SV lives on a scratchpad
1436by looking on its flags: lexicals have C<SVs_PADMY> set, and
1437I<target>s have C<SVs_PADTMP> set.
1438
1439The correspondence between OPs and I<target>s is not 1-to-1. Different
1440OPs in the compile tree of the unit can use the same target, if this
1441would not conflict with the expected life of the temporary.
1442
1443=head2 Scratchpads and recursion
1444
1445In fact it is not 100% true that a compiled unit contains a pointer to
1446the scratchpad AV. In fact it contains a pointer to an AV of
1447(initially) one element, and this element is the scratchpad AV. Why do
1448we need an extra level of indirection?
1449
1450The answer is B<recursion>, and maybe (sometime soon) B<threads>. Both
1451these can create several execution pointers going into the same
1452subroutine. For the subroutine-child not write over the temporaries
1453for the subroutine-parent (lifespan of which covers the call to the
1454child), the parent and the child should have different
1455scratchpads. (I<And> the lexicals should be separate anyway!)
1456
1457So each subroutine is born with an array of scratchpads (of length 1).
1458On each entry to the subroutine it is checked that the current
1459depth of the recursion is not more than the length of this array, and
1460if it is, new scratchpad is created and pushed into the array.
1461
1462The I<target>s on this scratchpad are C<undef>s, but they are already
1463marked with correct flags.
1464
1465=head1 Compiled code
1466
1467=head2 Code tree
1468
1469Here we describe the internal form your code is converted to by
1470Perl. Start with a simple example:
1471
1472  $a = $b + $c;
1473
1474This is converted to a tree similar to this one:
1475
1476             assign-to
1477           /           \
1478          +             $a
1479        /   \
1480      $b     $c
1481
1482(but slightly more complicated).  This tree reflects the way Perl
1483parsed your code, but has nothing to do with the execution order.
1484There is an additional "thread" going through the nodes of the tree
1485which shows the order of execution of the nodes.  In our simplified
1486example above it looks like:
1487
1488     $b ---> $c ---> + ---> $a ---> assign-to
1489
1490But with the actual compile tree for C<$a = $b + $c> it is different:
1491some nodes I<optimized away>.  As a corollary, though the actual tree
1492contains more nodes than our simplified example, the execution order
1493is the same as in our example.
1494
1495=head2 Examining the tree
1496
1497If you have your perl compiled for debugging (usually done with C<-D
1498optimize=-g> on C<Configure> command line), you may examine the
1499compiled tree by specifying C<-Dx> on the Perl command line.  The
1500output takes several lines per node, and for C<$b+$c> it looks like
1501this:
1502
1503    5           TYPE = add  ===> 6
1504                TARG = 1
1505                FLAGS = (SCALAR,KIDS)
1506                {
1507                    TYPE = null  ===> (4)
1508                      (was rv2sv)
1509                    FLAGS = (SCALAR,KIDS)
1510                    {
1511    3                   TYPE = gvsv  ===> 4
1512                        FLAGS = (SCALAR)
1513                        GV = main::b
1514                    }
1515                }
1516                {
1517                    TYPE = null  ===> (5)
1518                      (was rv2sv)
1519                    FLAGS = (SCALAR,KIDS)
1520                    {
1521    4                   TYPE = gvsv  ===> 5
1522                        FLAGS = (SCALAR)
1523                        GV = main::c
1524                    }
1525                }
1526
1527This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are
1528not optimized away (one per number in the left column).  The immediate
1529children of the given node correspond to C<{}> pairs on the same level
1530of indentation, thus this listing corresponds to the tree:
1531
1532                   add
1533                 /     \
1534               null    null
1535                |       |
1536               gvsv    gvsv
1537
1538The execution order is indicated by C<===E<gt>> marks, thus it is C<3
15394 5 6> (node C<6> is not included into above listing), i.e.,
1540C<gvsv gvsv add whatever>.
1541
1542Each of these nodes represents an op, a fundamental operation inside the
1543Perl core. The code which implements each operation can be found in the
1544F<pp*.c> files; the function which implements the op with type C<gvsv>
1545is C<pp_gvsv>, and so on. As the tree above shows, different ops have
1546different numbers of children: C<add> is a binary operator, as one would
1547expect, and so has two children. To accommodate the various different
1548numbers of children, there are various types of op data structure, and
1549they link together in different ways.
1550
1551The simplest type of op structure is C<OP>: this has no children. Unary
1552operators, C<UNOP>s, have one child, and this is pointed to by the
1553C<op_first> field. Binary operators (C<BINOP>s) have not only an
1554C<op_first> field but also an C<op_last> field. The most complex type of
1555op is a C<LISTOP>, which has any number of children. In this case, the
1556first child is pointed to by C<op_first> and the last child by
1557C<op_last>. The children in between can be found by iteratively
1558following the C<op_sibling> pointer from the first child to the last.
1559
1560There are also two other op types: a C<PMOP> holds a regular expression,
1561and has no children, and a C<LOOP> may or may not have children. If the
1562C<op_children> field is non-zero, it behaves like a C<LISTOP>. To
1563complicate matters, if a C<UNOP> is actually a C<null> op after
1564optimization (see L</Compile pass 2: context propagation>) it will still
1565have children in accordance with its former type.
1566
1567=head2 Compile pass 1: check routines
1568
1569The tree is created by the compiler while I<yacc> code feeds it
1570the constructions it recognizes. Since I<yacc> works bottom-up, so does
1571the first pass of perl compilation.
1572
1573What makes this pass interesting for perl developers is that some
1574optimization may be performed on this pass.  This is optimization by
1575so-called "check routines".  The correspondence between node names
1576and corresponding check routines is described in F<opcode.pl> (do not
1577forget to run C<make regen_headers> if you modify this file).
1578
1579A check routine is called when the node is fully constructed except
1580for the execution-order thread.  Since at this time there are no
1581back-links to the currently constructed node, one can do most any
1582operation to the top-level node, including freeing it and/or creating
1583new nodes above/below it.
1584
1585The check routine returns the node which should be inserted into the
1586tree (if the top-level node was not modified, check routine returns
1587its argument).
1588
1589By convention, check routines have names C<ck_*>. They are usually
1590called from C<new*OP> subroutines (or C<convert>) (which in turn are
1591called from F<perly.y>).
1592
1593=head2 Compile pass 1a: constant folding
1594
1595Immediately after the check routine is called the returned node is
1596checked for being compile-time executable.  If it is (the value is
1597judged to be constant) it is immediately executed, and a I<constant>
1598node with the "return value" of the corresponding subtree is
1599substituted instead.  The subtree is deleted.
1600
1601If constant folding was not performed, the execution-order thread is
1602created.
1603
1604=head2 Compile pass 2: context propagation
1605
1606When a context for a part of compile tree is known, it is propagated
1607down through the tree.  At this time the context can have 5 values
1608(instead of 2 for runtime context): void, boolean, scalar, list, and
1609lvalue.  In contrast with the pass 1 this pass is processed from top
1610to bottom: a node's context determines the context for its children.
1611
1612Additional context-dependent optimizations are performed at this time.
1613Since at this moment the compile tree contains back-references (via
1614"thread" pointers), nodes cannot be free()d now.  To allow
1615optimized-away nodes at this stage, such nodes are null()ified instead
1616of free()ing (i.e. their type is changed to OP_NULL).
1617
1618=head2 Compile pass 3: peephole optimization
1619
1620After the compile tree for a subroutine (or for an C<eval> or a file)
1621is created, an additional pass over the code is performed. This pass
1622is neither top-down or bottom-up, but in the execution order (with
1623additional complications for conditionals).  These optimizations are
1624done in the subroutine peep().  Optimizations performed at this stage
1625are subject to the same restrictions as in the pass 2.
1626
1627=head1 Examining internal data structures with the C<dump> functions
1628
1629To aid debugging, the source file F<dump.c> contains a number of
1630functions which produce formatted output of internal data structures.
1631
1632The most commonly used of these functions is C<Perl_sv_dump>; it's used
1633for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls
1634C<sv_dump> to produce debugging output from Perl-space, so users of that
1635module should already be familiar with its format.
1636
1637C<Perl_op_dump> can be used to dump an C<OP> structure or any of its
1638derivatives, and produces output similiar to C<perl -Dx>; in fact,
1639C<Perl_dump_eval> will dump the main root of the code being evaluated,
1640exactly like C<-Dx>.
1641
1642Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an
1643op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the
1644subroutines in a package like so: (Thankfully, these are all xsubs, so
1645there is no op tree)
1646
1647    (gdb) print Perl_dump_packsubs(PL_defstash)
1648
1649    SUB attributes::bootstrap = (xsub 0x811fedc 0)
1650
1651    SUB UNIVERSAL::can = (xsub 0x811f50c 0)
1652
1653    SUB UNIVERSAL::isa = (xsub 0x811f304 0)
1654
1655    SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0)
1656
1657    SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0)
1658
1659and C<Perl_dump_all>, which dumps all the subroutines in the stash and
1660the op tree of the main root.
1661
1662=head1 How multiple interpreters and concurrency are supported
1663
1664=head2 Background and PERL_IMPLICIT_CONTEXT
1665
1666The Perl interpreter can be regarded as a closed box: it has an API
1667for feeding it code or otherwise making it do things, but it also has
1668functions for its own use.  This smells a lot like an object, and
1669there are ways for you to build Perl so that you can have multiple
1670interpreters, with one interpreter represented either as a C++ object,
1671a C structure, or inside a thread.  The thread, the C structure, or
1672the C++ object will contain all the context, the state of that
1673interpreter.
1674
1675Three macros control the major Perl build flavors: MULTIPLICITY,
1676USE_THREADS and PERL_OBJECT.  The MULTIPLICITY build has a C structure
1677that packages all the interpreter state, there is a similar thread-specific
1678data structure under USE_THREADS, and the (now deprecated) PERL_OBJECT
1679build has a C++ class to maintain interpreter state.  In all three cases,
1680PERL_IMPLICIT_CONTEXT is also normally defined, and enables the
1681support for passing in a "hidden" first argument that represents all three
1682data structures.
1683
1684All this obviously requires a way for the Perl internal functions to be
1685C++ methods, subroutines taking some kind of structure as the first
1686argument, or subroutines taking nothing as the first argument.  To
1687enable these three very different ways of building the interpreter,
1688the Perl source (as it does in so many other situations) makes heavy
1689use of macros and subroutine naming conventions.
1690
1691First problem: deciding which functions will be public API functions and
1692which will be private.  All functions whose names begin C<S_> are private
1693(think "S" for "secret" or "static").  All other functions begin with
1694"Perl_", but just because a function begins with "Perl_" does not mean it is
1695part of the API. (See L</Internal Functions>.) The easiest way to be B<sure> a
1696function is part of the API is to find its entry in L<perlapi>. 
1697If it exists in L<perlapi>, it's part of the API.  If it doesn't, and you
1698think it should be (i.e., you need it for your extension), send mail via
1699L<perlbug> explaining why you think it should be.
1700
1701Second problem: there must be a syntax so that the same subroutine
1702declarations and calls can pass a structure as their first argument,
1703or pass nothing.  To solve this, the subroutines are named and
1704declared in a particular way.  Here's a typical start of a static
1705function used within the Perl guts:
1706
1707  STATIC void
1708  S_incline(pTHX_ char *s)
1709
1710STATIC becomes "static" in C, and is #define'd to nothing in C++.
1711
1712A public function (i.e. part of the internal API, but not necessarily
1713sanctioned for use in extensions) begins like this:
1714
1715  void
1716  Perl_sv_setsv(pTHX_ SV* dsv, SV* ssv)
1717
1718C<pTHX_> is one of a number of macros (in perl.h) that hide the
1719details of the interpreter's context.  THX stands for "thread", "this",
1720or "thingy", as the case may be.  (And no, George Lucas is not involved. :-)
1721The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument,
1722or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and
1723their variants.
1724
1725When Perl is built without options that set PERL_IMPLICIT_CONTEXT, there is no
1726first argument containing the interpreter's context.  The trailing underscore
1727in the pTHX_ macro indicates that the macro expansion needs a comma
1728after the context argument because other arguments follow it.  If
1729PERL_IMPLICIT_CONTEXT is not defined, pTHX_ will be ignored, and the
1730subroutine is not prototyped to take the extra argument.  The form of the
1731macro without the trailing underscore is used when there are no additional
1732explicit arguments.
1733
1734When a core function calls another, it must pass the context.  This
1735is normally hidden via macros.  Consider C<sv_setsv>.  It expands into
1736something like this:
1737
1738    ifdef PERL_IMPLICIT_CONTEXT
1739      define sv_setsv(a,b)      Perl_sv_setsv(aTHX_ a, b)
1740      /* can't do this for vararg functions, see below */
1741    else
1742      define sv_setsv           Perl_sv_setsv
1743    endif
1744
1745This works well, and means that XS authors can gleefully write:
1746
1747    sv_setsv(foo, bar);
1748
1749and still have it work under all the modes Perl could have been
1750compiled with.
1751
1752Under PERL_OBJECT in the core, that will translate to either:
1753
1754    CPerlObj::Perl_sv_setsv(foo,bar);  # in CPerlObj functions,
1755                                       # C++ takes care of 'this'
1756  or
1757
1758    pPerl->Perl_sv_setsv(foo,bar);     # in truly static functions,
1759                                       # see objXSUB.h
1760
1761Under PERL_OBJECT in extensions (aka PERL_CAPI), or under
1762MULTIPLICITY/USE_THREADS with PERL_IMPLICIT_CONTEXT in both core
1763and extensions, it will become:
1764
1765    Perl_sv_setsv(aTHX_ foo, bar);     # the canonical Perl "API"
1766                                       # for all build flavors
1767
1768This doesn't work so cleanly for varargs functions, though, as macros
1769imply that the number of arguments is known in advance.  Instead we
1770either need to spell them out fully, passing C<aTHX_> as the first
1771argument (the Perl core tends to do this with functions like
1772Perl_warner), or use a context-free version.
1773
1774The context-free version of Perl_warner is called
1775Perl_warner_nocontext, and does not take the extra argument.  Instead
1776it does dTHX; to get the context from thread-local storage.  We
1777C<#define warner Perl_warner_nocontext> so that extensions get source
1778compatibility at the expense of performance.  (Passing an arg is
1779cheaper than grabbing it from thread-local storage.)
1780
1781You can ignore [pad]THX[xo] when browsing the Perl headers/sources.
1782Those are strictly for use within the core.  Extensions and embedders
1783need only be aware of [pad]THX.
1784
1785=head2 So what happened to dTHR?
1786
1787C<dTHR> was introduced in perl 5.005 to support the older thread model.
1788The older thread model now uses the C<THX> mechanism to pass context
1789pointers around, so C<dTHR> is not useful any more.  Perl 5.6.0 and
1790later still have it for backward source compatibility, but it is defined
1791to be a no-op.
1792
1793=head2 How do I use all this in extensions?
1794
1795When Perl is built with PERL_IMPLICIT_CONTEXT, extensions that call
1796any functions in the Perl API will need to pass the initial context
1797argument somehow.  The kicker is that you will need to write it in
1798such a way that the extension still compiles when Perl hasn't been
1799built with PERL_IMPLICIT_CONTEXT enabled.
1800
1801There are three ways to do this.  First, the easy but inefficient way,
1802which is also the default, in order to maintain source compatibility
1803with extensions: whenever XSUB.h is #included, it redefines the aTHX
1804and aTHX_ macros to call a function that will return the context.
1805Thus, something like:
1806
1807        sv_setsv(asv, bsv);
1808
1809in your extension will translate to this when PERL_IMPLICIT_CONTEXT is
1810in effect:
1811
1812        Perl_sv_setsv(Perl_get_context(), asv, bsv);
1813
1814or to this otherwise:
1815
1816        Perl_sv_setsv(asv, bsv);
1817
1818You have to do nothing new in your extension to get this; since
1819the Perl library provides Perl_get_context(), it will all just
1820work.
1821
1822The second, more efficient way is to use the following template for
1823your Foo.xs:
1824
1825        #define PERL_NO_GET_CONTEXT     /* we want efficiency */
1826        #include "EXTERN.h"
1827        #include "perl.h"
1828        #include "XSUB.h"
1829
1830        static my_private_function(int arg1, int arg2);
1831
1832        static SV *
1833        my_private_function(int arg1, int arg2)
1834        {
1835            dTHX;       /* fetch context */
1836            ... call many Perl API functions ...
1837        }
1838
1839        [... etc ...]
1840
1841        MODULE = Foo            PACKAGE = Foo
1842
1843        /* typical XSUB */
1844
1845        void
1846        my_xsub(arg)
1847                int arg
1848            CODE:
1849                my_private_function(arg, 10);
1850
1851Note that the only two changes from the normal way of writing an
1852extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before
1853including the Perl headers, followed by a C<dTHX;> declaration at
1854the start of every function that will call the Perl API.  (You'll
1855know which functions need this, because the C compiler will complain
1856that there's an undeclared identifier in those functions.)  No changes
1857are needed for the XSUBs themselves, because the XS() macro is
1858correctly defined to pass in the implicit context if needed.
1859
1860The third, even more efficient way is to ape how it is done within
1861the Perl guts:
1862
1863
1864        #define PERL_NO_GET_CONTEXT     /* we want efficiency */
1865        #include "EXTERN.h"
1866        #include "perl.h"
1867        #include "XSUB.h"
1868
1869        /* pTHX_ only needed for functions that call Perl API */
1870        static my_private_function(pTHX_ int arg1, int arg2);
1871
1872        static SV *
1873        my_private_function(pTHX_ int arg1, int arg2)
1874        {
1875            /* dTHX; not needed here, because THX is an argument */
1876            ... call Perl API functions ...
1877        }
1878
1879        [... etc ...]
1880
1881        MODULE = Foo            PACKAGE = Foo
1882
1883        /* typical XSUB */
1884
1885        void
1886        my_xsub(arg)
1887                int arg
1888            CODE:
1889                my_private_function(aTHX_ arg, 10);
1890
1891This implementation never has to fetch the context using a function
1892call, since it is always passed as an extra argument.  Depending on
1893your needs for simplicity or efficiency, you may mix the previous
1894two approaches freely.
1895
1896Never add a comma after C<pTHX> yourself--always use the form of the
1897macro with the underscore for functions that take explicit arguments,
1898or the form without the argument for functions with no explicit arguments.
1899
1900=head2 Should I do anything special if I call perl from multiple threads?
1901
1902If you create interpreters in one thread and then proceed to call them in
1903another, you need to make sure perl's own Thread Local Storage (TLS) slot is
1904initialized correctly in each of those threads.
1905
1906The C<perl_alloc> and C<perl_clone> API functions will automatically set
1907the TLS slot to the interpreter they created, so that there is no need to do
1908anything special if the interpreter is always accessed in the same thread that
1909created it, and that thread did not create or call any other interpreters
1910afterwards.  If that is not the case, you have to set the TLS slot of the
1911thread before calling any functions in the Perl API on that particular
1912interpreter.  This is done by calling the C<PERL_SET_CONTEXT> macro in that
1913thread as the first thing you do:
1914
1915        /* do this before doing anything else with some_perl */
1916        PERL_SET_CONTEXT(some_perl);
1917
1918        ... other Perl API calls on some_perl go here ...
1919
1920=head2 Future Plans and PERL_IMPLICIT_SYS
1921
1922Just as PERL_IMPLICIT_CONTEXT provides a way to bundle up everything
1923that the interpreter knows about itself and pass it around, so too are
1924there plans to allow the interpreter to bundle up everything it knows
1925about the environment it's running on.  This is enabled with the
1926PERL_IMPLICIT_SYS macro.  Currently it only works with PERL_OBJECT
1927and USE_THREADS on Windows (see inside iperlsys.h).
1928
1929This allows the ability to provide an extra pointer (called the "host"
1930environment) for all the system calls.  This makes it possible for
1931all the system stuff to maintain their own state, broken down into
1932seven C structures.  These are thin wrappers around the usual system
1933calls (see win32/perllib.c) for the default perl executable, but for a
1934more ambitious host (like the one that would do fork() emulation) all
1935the extra work needed to pretend that different interpreters are
1936actually different "processes", would be done here.
1937
1938The Perl engine/interpreter and the host are orthogonal entities.
1939There could be one or more interpreters in a process, and one or
1940more "hosts", with free association between them.
1941
1942=head1 Internal Functions
1943
1944All of Perl's internal functions which will be exposed to the outside
1945world are be prefixed by C<Perl_> so that they will not conflict with XS
1946functions or functions used in a program in which Perl is embedded.
1947Similarly, all global variables begin with C<PL_>. (By convention,
1948static functions start with C<S_>)
1949
1950Inside the Perl core, you can get at the functions either with or
1951without the C<Perl_> prefix, thanks to a bunch of defines that live in
1952F<embed.h>. This header file is generated automatically from
1953F<embed.pl>. F<embed.pl> also creates the prototyping header files for
1954the internal functions, generates the documentation and a lot of other
1955bits and pieces. It's important that when you add a new function to the
1956core or change an existing one, you change the data in the table at the
1957end of F<embed.pl> as well. Here's a sample entry from that table:
1958
1959    Apd |SV**   |av_fetch   |AV* ar|I32 key|I32 lval
1960
1961The second column is the return type, the third column the name. Columns
1962after that are the arguments. The first column is a set of flags:
1963
1964=over 3
1965
1966=item A
1967
1968This function is a part of the public API.
1969
1970=item p
1971
1972This function has a C<Perl_> prefix; ie, it is defined as C<Perl_av_fetch>
1973
1974=item d
1975
1976This function has documentation using the C<apidoc> feature which we'll
1977look at in a second.
1978
1979=back
1980
1981Other available flags are:
1982
1983=over 3
1984
1985=item s
1986
1987This is a static function and is defined as C<S_whatever>, and usually
1988called within the sources as C<whatever(...)>.
1989
1990=item n
1991
1992This does not use C<aTHX_> and C<pTHX> to pass interpreter context. (See
1993L<perlguts/Background and PERL_IMPLICIT_CONTEXT>.)
1994
1995=item r
1996
1997This function never returns; C<croak>, C<exit> and friends.
1998
1999=item f
2000
2001This function takes a variable number of arguments, C<printf> style.
2002The argument list should end with C<...>, like this:
2003
2004    Afprd   |void   |croak          |const char* pat|...
2005
2006=item M
2007
2008This function is part of the experimental development API, and may change
2009or disappear without notice.
2010
2011=item o
2012
2013This function should not have a compatibility macro to define, say,
2014C<Perl_parse> to C<parse>. It must be called as C<Perl_parse>.
2015
2016=item j
2017
2018This function is not a member of C<CPerlObj>. If you don't know
2019what this means, don't use it.
2020
2021=item x
2022
2023This function isn't exported out of the Perl core.
2024
2025=back
2026
2027If you edit F<embed.pl>, you will need to run C<make regen_headers> to
2028force a rebuild of F<embed.h> and other auto-generated files.
2029
2030=head2 Formatted Printing of IVs, UVs, and NVs
2031
2032If you are printing IVs, UVs, or NVS instead of the stdio(3) style
2033formatting codes like C<%d>, C<%ld>, C<%f>, you should use the
2034following macros for portability
2035
2036        IVdf            IV in decimal
2037        UVuf            UV in decimal
2038        UVof            UV in octal
2039        UVxf            UV in hexadecimal
2040        NVef            NV %e-like
2041        NVff            NV %f-like
2042        NVgf            NV %g-like
2043
2044These will take care of 64-bit integers and long doubles.
2045For example:
2046
2047        printf("IV is %"IVdf"\n", iv);
2048
2049The IVdf will expand to whatever is the correct format for the IVs.
2050
2051If you are printing addresses of pointers, use UVxf combined
2052with PTR2UV(), do not use %lx or %p.
2053
2054=head2 Pointer-To-Integer and Integer-To-Pointer
2055
2056Because pointer size does not necessarily equal integer size,
2057use the follow macros to do it right.
2058
2059        PTR2UV(pointer)
2060        PTR2IV(pointer)
2061        PTR2NV(pointer)
2062        INT2PTR(pointertotype, integer)
2063
2064For example:
2065
2066        IV  iv = ...;
2067        SV *sv = INT2PTR(SV*, iv);
2068
2069and
2070
2071        AV *av = ...;
2072        UV  uv = PTR2UV(av);
2073
2074=head2 Source Documentation
2075
2076There's an effort going on to document the internal functions and
2077automatically produce reference manuals from them - L<perlapi> is one
2078such manual which details all the functions which are available to XS
2079writers. L<perlintern> is the autogenerated manual for the functions
2080which are not part of the API and are supposedly for internal use only.
2081
2082Source documentation is created by putting POD comments into the C
2083source, like this:
2084
2085 /*
2086 =for apidoc sv_setiv
2087
2088 Copies an integer into the given SV.  Does not handle 'set' magic.  See
2089 C<sv_setiv_mg>.
2090
2091 =cut
2092 */
2093
2094Please try and supply some documentation if you add functions to the
2095Perl core.
2096
2097=head1 Unicode Support
2098
2099Perl 5.6.0 introduced Unicode support. It's important for porters and XS
2100writers to understand this support and make sure that the code they
2101write does not corrupt Unicode data.
2102
2103=head2 What B<is> Unicode, anyway?
2104
2105In the olden, less enlightened times, we all used to use ASCII. Most of
2106us did, anyway. The big problem with ASCII is that it's American. Well,
2107no, that's not actually the problem; the problem is that it's not
2108particularly useful for people who don't use the Roman alphabet. What
2109used to happen was that particular languages would stick their own
2110alphabet in the upper range of the sequence, between 128 and 255. Of
2111course, we then ended up with plenty of variants that weren't quite
2112ASCII, and the whole point of it being a standard was lost.
2113
2114Worse still, if you've got a language like Chinese or
2115Japanese that has hundreds or thousands of characters, then you really
2116can't fit them into a mere 256, so they had to forget about ASCII
2117altogether, and build their own systems using pairs of numbers to refer
2118to one character.
2119
2120To fix this, some people formed Unicode, Inc. and
2121produced a new character set containing all the characters you can
2122possibly think of and more. There are several ways of representing these
2123characters, and the one Perl uses is called UTF8. UTF8 uses
2124a variable number of bytes to represent a character, instead of just
2125one. You can learn more about Unicode at http://www.unicode.org/
2126
2127=head2 How can I recognise a UTF8 string?
2128
2129You can't. This is because UTF8 data is stored in bytes just like
2130non-UTF8 data. The Unicode character 200, (C<0xC8> for you hex types)
2131capital E with a grave accent, is represented by the two bytes
2132C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)>
2133has that byte sequence as well. So you can't tell just by looking - this
2134is what makes Unicode input an interesting problem.
2135
2136The API function C<is_utf8_string> can help; it'll tell you if a string
2137contains only valid UTF8 characters. However, it can't do the work for
2138you. On a character-by-character basis, C<is_utf8_char> will tell you
2139whether the current character in a string is valid UTF8.
2140
2141=head2 How does UTF8 represent Unicode characters?
2142
2143As mentioned above, UTF8 uses a variable number of bytes to store a
2144character. Characters with values 1...128 are stored in one byte, just
2145like good ol' ASCII. Character 129 is stored as C<v194.129>; this
2146continues up to character 191, which is C<v194.191>. Now we've run out of
2147bits (191 is binary C<10111111>) so we move on; 192 is C<v195.128>. And
2148so it goes on, moving to three bytes at character 2048.
2149
2150Assuming you know you're dealing with a UTF8 string, you can find out
2151how long the first character in it is with the C<UTF8SKIP> macro:
2152
2153    char *utf = "\305\233\340\240\201";
2154    I32 len;
2155
2156    len = UTF8SKIP(utf); /* len is 2 here */
2157    utf += len;
2158    len = UTF8SKIP(utf); /* len is 3 here */
2159
2160Another way to skip over characters in a UTF8 string is to use
2161C<utf8_hop>, which takes a string and a number of characters to skip
2162over. You're on your own about bounds checking, though, so don't use it
2163lightly.
2164
2165All bytes in a multi-byte UTF8 character will have the high bit set, so
2166you can test if you need to do something special with this character
2167like this:
2168
2169    UV uv;
2170
2171    if (utf & 0x80)
2172        /* Must treat this as UTF8 */
2173        uv = utf8_to_uv(utf);
2174    else
2175        /* OK to treat this character as a byte */
2176        uv = *utf;
2177
2178You can also see in that example that we use C<utf8_to_uv> to get the
2179value of the character; the inverse function C<uv_to_utf8> is available
2180for putting a UV into UTF8:
2181
2182    if (uv > 0x80)
2183        /* Must treat this as UTF8 */
2184        utf8 = uv_to_utf8(utf8, uv);
2185    else
2186        /* OK to treat this character as a byte */
2187        *utf8++ = uv;
2188
2189You B<must> convert characters to UVs using the above functions if
2190you're ever in a situation where you have to match UTF8 and non-UTF8
2191characters. You may not skip over UTF8 characters in this case. If you
2192do this, you'll lose the ability to match hi-bit non-UTF8 characters;
2193for instance, if your UTF8 string contains C<v196.172>, and you skip
2194that character, you can never match a C<chr(200)> in a non-UTF8 string.
2195So don't do that!
2196
2197=head2 How does Perl store UTF8 strings?
2198
2199Currently, Perl deals with Unicode strings and non-Unicode strings
2200slightly differently. If a string has been identified as being UTF-8
2201encoded, Perl will set a flag in the SV, C<SVf_UTF8>. You can check and
2202manipulate this flag with the following macros:
2203
2204    SvUTF8(sv)
2205    SvUTF8_on(sv)
2206    SvUTF8_off(sv)
2207
2208This flag has an important effect on Perl's treatment of the string: if
2209Unicode data is not properly distinguished, regular expressions,
2210C<length>, C<substr> and other string handling operations will have
2211undesirable results.
2212
2213The problem comes when you have, for instance, a string that isn't
2214flagged is UTF8, and contains a byte sequence that could be UTF8 -
2215especially when combining non-UTF8 and UTF8 strings.
2216
2217Never forget that the C<SVf_UTF8> flag is separate to the PV value; you
2218need be sure you don't accidentally knock it off while you're
2219manipulating SVs. More specifically, you cannot expect to do this:
2220
2221    SV *sv;
2222    SV *nsv;
2223    STRLEN len;
2224    char *p;
2225
2226    p = SvPV(sv, len);
2227    frobnicate(p);
2228    nsv = newSVpvn(p, len);
2229
2230The C<char*> string does not tell you the whole story, and you can't
2231copy or reconstruct an SV just by copying the string value. Check if the
2232old SV has the UTF8 flag set, and act accordingly:
2233
2234    p = SvPV(sv, len);
2235    frobnicate(p);
2236    nsv = newSVpvn(p, len);
2237    if (SvUTF8(sv))
2238        SvUTF8_on(nsv);
2239
2240In fact, your C<frobnicate> function should be made aware of whether or
2241not it's dealing with UTF8 data, so that it can handle the string
2242appropriately.
2243
2244=head2 How do I convert a string to UTF8?
2245
2246If you're mixing UTF8 and non-UTF8 strings, you might find it necessary
2247to upgrade one of the strings to UTF8. If you've got an SV, the easiest
2248way to do this is:
2249
2250    sv_utf8_upgrade(sv);
2251
2252However, you must not do this, for example:
2253
2254    if (!SvUTF8(left))
2255        sv_utf8_upgrade(left);
2256
2257If you do this in a binary operator, you will actually change one of the
2258strings that came into the operator, and, while it shouldn't be noticeable
2259by the end user, it can cause problems.
2260
2261Instead, C<bytes_to_utf8> will give you a UTF8-encoded B<copy> of its
2262string argument. This is useful for having the data available for
2263comparisons and so on, without harming the original SV. There's also
2264C<utf8_to_bytes> to go the other way, but naturally, this will fail if
2265the string contains any characters above 255 that can't be represented
2266in a single byte.
2267
2268=head2 Is there anything else I need to know?
2269
2270Not really. Just remember these things:
2271
2272=over 3
2273
2274=item *
2275
2276There's no way to tell if a string is UTF8 or not. You can tell if an SV
2277is UTF8 by looking at is C<SvUTF8> flag. Don't forget to set the flag if
2278something should be UTF8. Treat the flag as part of the PV, even though
2279it's not - if you pass on the PV to somewhere, pass on the flag too.
2280
2281=item *
2282
2283If a string is UTF8, B<always> use C<utf8_to_uv> to get at the value,
2284unless C<!(*s & 0x80)> in which case you can use C<*s>.
2285
2286=item *
2287
2288When writing to a UTF8 string, B<always> use C<uv_to_utf8>, unless
2289C<uv < 0x80> in which case you can use C<*s = uv>.
2290
2291=item *
2292
2293Mixing UTF8 and non-UTF8 strings is tricky. Use C<bytes_to_utf8> to get
2294a new string which is UTF8 encoded. There are tricks you can use to
2295delay deciding whether you need to use a UTF8 string until you get to a
2296high character - C<HALF_UPGRADE> is one of those.
2297
2298=back
2299
2300=head1 AUTHORS
2301
2302Until May 1997, this document was maintained by Jeff Okamoto
2303<okamoto@corp.hp.com>.  It is now maintained as part of Perl itself
2304by the Perl 5 Porters <perl5-porters@perl.org>.
2305
2306With lots of help and suggestions from Dean Roehrich, Malcolm Beattie,
2307Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil
2308Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer,
2309Stephen McCamant, and Gurusamy Sarathy.
2310
2311API Listing originally by Dean Roehrich <roehrich@cray.com>.
2312
2313Modifications to autogenerate the API listing (L<perlapi>) by Benjamin
2314Stuhl.
2315
2316=head1 SEE ALSO
2317
2318perlapi(1), perlintern(1), perlxs(1), perlembed(1)
Note: See TracBrowser for help on using the repository browser.