1 | =head1 NAME |
---|
2 | |
---|
3 | perlhack - How to hack at the Perl internals |
---|
4 | |
---|
5 | =head1 DESCRIPTION |
---|
6 | |
---|
7 | This document attempts to explain how Perl development takes place, |
---|
8 | and ends with some suggestions for people wanting to become bona fide |
---|
9 | porters. |
---|
10 | |
---|
11 | The perl5-porters mailing list is where the Perl standard distribution |
---|
12 | is maintained and developed. The list can get anywhere from 10 to 150 |
---|
13 | messages a day, depending on the heatedness of the debate. Most days |
---|
14 | there are two or three patches, extensions, features, or bugs being |
---|
15 | discussed at a time. |
---|
16 | |
---|
17 | A searchable archive of the list is at: |
---|
18 | |
---|
19 | http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ |
---|
20 | |
---|
21 | The list is also archived under the usenet group name |
---|
22 | C<perl.porters-gw> at: |
---|
23 | |
---|
24 | http://www.deja.com/ |
---|
25 | |
---|
26 | List subscribers (the porters themselves) come in several flavours. |
---|
27 | Some are quiet curious lurkers, who rarely pitch in and instead watch |
---|
28 | the ongoing development to ensure they're forewarned of new changes or |
---|
29 | features in Perl. Some are representatives of vendors, who are there |
---|
30 | to make sure that Perl continues to compile and work on their |
---|
31 | platforms. Some patch any reported bug that they know how to fix, |
---|
32 | some are actively patching their pet area (threads, Win32, the regexp |
---|
33 | engine), while others seem to do nothing but complain. In other |
---|
34 | words, it's your usual mix of technical people. |
---|
35 | |
---|
36 | Over this group of porters presides Larry Wall. He has the final word |
---|
37 | in what does and does not change in the Perl language. Various |
---|
38 | releases of Perl are shepherded by a ``pumpking'', a porter |
---|
39 | responsible for gathering patches, deciding on a patch-by-patch |
---|
40 | feature-by-feature basis what will and will not go into the release. |
---|
41 | For instance, Gurusamy Sarathy is the pumpking for the 5.6 release of |
---|
42 | Perl. |
---|
43 | |
---|
44 | In addition, various people are pumpkings for different things. For |
---|
45 | instance, Andy Dougherty and Jarkko Hietaniemi share the I<Configure> |
---|
46 | pumpkin, and Tom Christiansen is the documentation pumpking. |
---|
47 | |
---|
48 | Larry sees Perl development along the lines of the US government: |
---|
49 | there's the Legislature (the porters), the Executive branch (the |
---|
50 | pumpkings), and the Supreme Court (Larry). The legislature can |
---|
51 | discuss and submit patches to the executive branch all they like, but |
---|
52 | the executive branch is free to veto them. Rarely, the Supreme Court |
---|
53 | will side with the executive branch over the legislature, or the |
---|
54 | legislature over the executive branch. Mostly, however, the |
---|
55 | legislature and the executive branch are supposed to get along and |
---|
56 | work out their differences without impeachment or court cases. |
---|
57 | |
---|
58 | You might sometimes see reference to Rule 1 and Rule 2. Larry's power |
---|
59 | as Supreme Court is expressed in The Rules: |
---|
60 | |
---|
61 | =over 4 |
---|
62 | |
---|
63 | =item 1 |
---|
64 | |
---|
65 | Larry is always by definition right about how Perl should behave. |
---|
66 | This means he has final veto power on the core functionality. |
---|
67 | |
---|
68 | =item 2 |
---|
69 | |
---|
70 | Larry is allowed to change his mind about any matter at a later date, |
---|
71 | regardless of whether he previously invoked Rule 1. |
---|
72 | |
---|
73 | =back |
---|
74 | |
---|
75 | Got that? Larry is always right, even when he was wrong. It's rare |
---|
76 | to see either Rule exercised, but they are often alluded to. |
---|
77 | |
---|
78 | New features and extensions to the language are contentious, because |
---|
79 | the criteria used by the pumpkings, Larry, and other porters to decide |
---|
80 | which features should be implemented and incorporated are not codified |
---|
81 | in a few small design goals as with some other languages. Instead, |
---|
82 | the heuristics are flexible and often difficult to fathom. Here is |
---|
83 | one person's list, roughly in decreasing order of importance, of |
---|
84 | heuristics that new features have to be weighed against: |
---|
85 | |
---|
86 | =over 4 |
---|
87 | |
---|
88 | =item Does concept match the general goals of Perl? |
---|
89 | |
---|
90 | These haven't been written anywhere in stone, but one approximation |
---|
91 | is: |
---|
92 | |
---|
93 | 1. Keep it fast, simple, and useful. |
---|
94 | 2. Keep features/concepts as orthogonal as possible. |
---|
95 | 3. No arbitrary limits (platforms, data sizes, cultures). |
---|
96 | 4. Keep it open and exciting to use/patch/advocate Perl everywhere. |
---|
97 | 5. Either assimilate new technologies, or build bridges to them. |
---|
98 | |
---|
99 | =item Where is the implementation? |
---|
100 | |
---|
101 | All the talk in the world is useless without an implementation. In |
---|
102 | almost every case, the person or people who argue for a new feature |
---|
103 | will be expected to be the ones who implement it. Porters capable |
---|
104 | of coding new features have their own agendas, and are not available |
---|
105 | to implement your (possibly good) idea. |
---|
106 | |
---|
107 | =item Backwards compatibility |
---|
108 | |
---|
109 | It's a cardinal sin to break existing Perl programs. New warnings are |
---|
110 | contentious--some say that a program that emits warnings is not |
---|
111 | broken, while others say it is. Adding keywords has the potential to |
---|
112 | break programs, changing the meaning of existing token sequences or |
---|
113 | functions might break programs. |
---|
114 | |
---|
115 | =item Could it be a module instead? |
---|
116 | |
---|
117 | Perl 5 has extension mechanisms, modules and XS, specifically to avoid |
---|
118 | the need to keep changing the Perl interpreter. You can write modules |
---|
119 | that export functions, you can give those functions prototypes so they |
---|
120 | can be called like built-in functions, you can even write XS code to |
---|
121 | mess with the runtime data structures of the Perl interpreter if you |
---|
122 | want to implement really complicated things. If it can be done in a |
---|
123 | module instead of in the core, it's highly unlikely to be added. |
---|
124 | |
---|
125 | =item Is the feature generic enough? |
---|
126 | |
---|
127 | Is this something that only the submitter wants added to the language, |
---|
128 | or would it be broadly useful? Sometimes, instead of adding a feature |
---|
129 | with a tight focus, the porters might decide to wait until someone |
---|
130 | implements the more generalized feature. For instance, instead of |
---|
131 | implementing a ``delayed evaluation'' feature, the porters are waiting |
---|
132 | for a macro system that would permit delayed evaluation and much more. |
---|
133 | |
---|
134 | =item Does it potentially introduce new bugs? |
---|
135 | |
---|
136 | Radical rewrites of large chunks of the Perl interpreter have the |
---|
137 | potential to introduce new bugs. The smaller and more localized the |
---|
138 | change, the better. |
---|
139 | |
---|
140 | =item Does it preclude other desirable features? |
---|
141 | |
---|
142 | A patch is likely to be rejected if it closes off future avenues of |
---|
143 | development. For instance, a patch that placed a true and final |
---|
144 | interpretation on prototypes is likely to be rejected because there |
---|
145 | are still options for the future of prototypes that haven't been |
---|
146 | addressed. |
---|
147 | |
---|
148 | =item Is the implementation robust? |
---|
149 | |
---|
150 | Good patches (tight code, complete, correct) stand more chance of |
---|
151 | going in. Sloppy or incorrect patches might be placed on the back |
---|
152 | burner until the pumpking has time to fix, or might be discarded |
---|
153 | altogether without further notice. |
---|
154 | |
---|
155 | =item Is the implementation generic enough to be portable? |
---|
156 | |
---|
157 | The worst patches make use of a system-specific features. It's highly |
---|
158 | unlikely that nonportable additions to the Perl language will be |
---|
159 | accepted. |
---|
160 | |
---|
161 | =item Is there enough documentation? |
---|
162 | |
---|
163 | Patches without documentation are probably ill-thought out or |
---|
164 | incomplete. Nothing can be added without documentation, so submitting |
---|
165 | a patch for the appropriate manpages as well as the source code is |
---|
166 | always a good idea. If appropriate, patches should add to the test |
---|
167 | suite as well. |
---|
168 | |
---|
169 | =item Is there another way to do it? |
---|
170 | |
---|
171 | Larry said ``Although the Perl Slogan is I<There's More Than One Way |
---|
172 | to Do It>, I hesitate to make 10 ways to do something''. This is a |
---|
173 | tricky heuristic to navigate, though--one man's essential addition is |
---|
174 | another man's pointless cruft. |
---|
175 | |
---|
176 | =item Does it create too much work? |
---|
177 | |
---|
178 | Work for the pumpking, work for Perl programmers, work for module |
---|
179 | authors, ... Perl is supposed to be easy. |
---|
180 | |
---|
181 | =item Patches speak louder than words |
---|
182 | |
---|
183 | Working code is always preferred to pie-in-the-sky ideas. A patch to |
---|
184 | add a feature stands a much higher chance of making it to the language |
---|
185 | than does a random feature request, no matter how fervently argued the |
---|
186 | request might be. This ties into ``Will it be useful?'', as the fact |
---|
187 | that someone took the time to make the patch demonstrates a strong |
---|
188 | desire for the feature. |
---|
189 | |
---|
190 | =back |
---|
191 | |
---|
192 | If you're on the list, you might hear the word ``core'' bandied |
---|
193 | around. It refers to the standard distribution. ``Hacking on the |
---|
194 | core'' means you're changing the C source code to the Perl |
---|
195 | interpreter. ``A core module'' is one that ships with Perl. |
---|
196 | |
---|
197 | =head2 Keeping in sync |
---|
198 | |
---|
199 | The source code to the Perl interpreter, in its different versions, is |
---|
200 | kept in a repository managed by a revision control system (which is |
---|
201 | currently the Perforce program, see http://perforce.com/). The |
---|
202 | pumpkings and a few others have access to the repository to check in |
---|
203 | changes. Periodically the pumpking for the development version of Perl |
---|
204 | will release a new version, so the rest of the porters can see what's |
---|
205 | changed. The current state of the main trunk of repository, and patches |
---|
206 | that describe the individual changes that have happened since the last |
---|
207 | public release are available at this location: |
---|
208 | |
---|
209 | ftp://ftp.linux.activestate.com/pub/staff/gsar/APC/ |
---|
210 | |
---|
211 | If you are a member of the perl5-porters mailing list, it is a good |
---|
212 | thing to keep in touch with the most recent changes. If not only to |
---|
213 | verify if what you would have posted as a bug report isn't already |
---|
214 | solved in the most recent available perl development branch, also |
---|
215 | known as perl-current, bleading edge perl, bleedperl or bleadperl. |
---|
216 | |
---|
217 | Needless to say, the source code in perl-current is usually in a perpetual |
---|
218 | state of evolution. You should expect it to be very buggy. Do B<not> use |
---|
219 | it for any purpose other than testing and development. |
---|
220 | |
---|
221 | Keeping in sync with the most recent branch can be done in several ways, |
---|
222 | but the most convenient and reliable way is using B<rsync>, available at |
---|
223 | ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent |
---|
224 | branch by FTP.) |
---|
225 | |
---|
226 | If you choose to keep in sync using rsync, there are two approaches |
---|
227 | to doing so: |
---|
228 | |
---|
229 | =over 4 |
---|
230 | |
---|
231 | =item rsync'ing the source tree |
---|
232 | |
---|
233 | Presuming you are in the directory where your perl source resides |
---|
234 | and you have rsync installed and available, you can `upgrade' to |
---|
235 | the bleadperl using: |
---|
236 | |
---|
237 | # rsync -avz rsync://ftp.linux.activestate.com/perl-current/ . |
---|
238 | |
---|
239 | This takes care of updating every single item in the source tree to |
---|
240 | the latest applied patch level, creating files that are new (to your |
---|
241 | distribution) and setting date/time stamps of existing files to |
---|
242 | reflect the bleadperl status. |
---|
243 | |
---|
244 | You can than check what patch was the latest that was applied by |
---|
245 | looking in the file B<.patch>, which will show the number of the |
---|
246 | latest patch. |
---|
247 | |
---|
248 | If you have more than one machine to keep in sync, and not all of |
---|
249 | them have access to the WAN (so you are not able to rsync all the |
---|
250 | source trees to the real source), there are some ways to get around |
---|
251 | this problem. |
---|
252 | |
---|
253 | =over 4 |
---|
254 | |
---|
255 | =item Using rsync over the LAN |
---|
256 | |
---|
257 | Set up a local rsync server which makes the rsynced source tree |
---|
258 | available to the LAN and sync the other machines against this |
---|
259 | directory. |
---|
260 | |
---|
261 | From http://rsync.samba.org/README.html: |
---|
262 | |
---|
263 | "Rsync uses rsh or ssh for communication. It does not need to be |
---|
264 | setuid and requires no special privileges for installation. It |
---|
265 | does not require a inetd entry or a deamon. You must, however, |
---|
266 | have a working rsh or ssh system. Using ssh is recommended for |
---|
267 | its security features." |
---|
268 | |
---|
269 | =item Using pushing over the NFS |
---|
270 | |
---|
271 | Having the other systems mounted over the NFS, you can take an |
---|
272 | active pushing approach by checking the just updated tree against |
---|
273 | the other not-yet synced trees. An example would be |
---|
274 | |
---|
275 | #!/usr/bin/perl -w |
---|
276 | |
---|
277 | use strict; |
---|
278 | use File::Copy; |
---|
279 | |
---|
280 | my %MF = map { |
---|
281 | m/(\S+)/; |
---|
282 | $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime |
---|
283 | } `cat MANIFEST`; |
---|
284 | |
---|
285 | my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2); |
---|
286 | |
---|
287 | foreach my $host (keys %remote) { |
---|
288 | unless (-d $remote{$host}) { |
---|
289 | print STDERR "Cannot Xsync for host $host\n"; |
---|
290 | next; |
---|
291 | } |
---|
292 | foreach my $file (keys %MF) { |
---|
293 | my $rfile = "$remote{$host}/$file"; |
---|
294 | my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9]; |
---|
295 | defined $size or ($mode, $size, $mtime) = (0, 0, 0); |
---|
296 | $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next; |
---|
297 | printf "%4s %-34s %8d %9d %8d %9d\n", |
---|
298 | $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime; |
---|
299 | unlink $rfile; |
---|
300 | copy ($file, $rfile); |
---|
301 | utime time, $MF{$file}[2], $rfile; |
---|
302 | chmod $MF{$file}[0], $rfile; |
---|
303 | } |
---|
304 | } |
---|
305 | |
---|
306 | though this is not perfect. It could be improved with checking |
---|
307 | file checksums before updating. Not all NFS systems support |
---|
308 | reliable utime support (when used over the NFS). |
---|
309 | |
---|
310 | =back |
---|
311 | |
---|
312 | =item rsync'ing the patches |
---|
313 | |
---|
314 | The source tree is maintained by the pumpking who applies patches to |
---|
315 | the files in the tree. These patches are either created by the |
---|
316 | pumpking himself using C<diff -c> after updating the file manually or |
---|
317 | by applying patches sent in by posters on the perl5-porters list. |
---|
318 | These patches are also saved and rsync'able, so you can apply them |
---|
319 | yourself to the source files. |
---|
320 | |
---|
321 | Presuming you are in a directory where your patches reside, you can |
---|
322 | get them in sync with |
---|
323 | |
---|
324 | # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . |
---|
325 | |
---|
326 | This makes sure the latest available patch is downloaded to your |
---|
327 | patch directory. |
---|
328 | |
---|
329 | It's then up to you to apply these patches, using something like |
---|
330 | |
---|
331 | # last=`ls -rt1 *.gz | tail -1` |
---|
332 | # rsync -avz rsync://ftp.linux.activestate.com/perl-current-diffs/ . |
---|
333 | # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch |
---|
334 | # cd ../perl-current |
---|
335 | # patch -p1 -N <../perl-current-diffs/blead.patch |
---|
336 | |
---|
337 | or, since this is only a hint towards how it works, use CPAN-patchaperl |
---|
338 | from Andreas König to have better control over the patching process. |
---|
339 | |
---|
340 | =back |
---|
341 | |
---|
342 | =head2 Why rsync the source tree |
---|
343 | |
---|
344 | =over 4 |
---|
345 | |
---|
346 | =item It's easier |
---|
347 | |
---|
348 | Since you don't have to apply the patches yourself, you are sure all |
---|
349 | files in the source tree are in the right state. |
---|
350 | |
---|
351 | =item It's more recent |
---|
352 | |
---|
353 | According to Gurusamy Sarathy: |
---|
354 | |
---|
355 | "... The rsync mirror is automatic and syncs with the repository |
---|
356 | every five minutes. |
---|
357 | |
---|
358 | "Updating the patch area still requires manual intervention |
---|
359 | (with all the goofiness that implies, which you've noted) and |
---|
360 | is typically on a daily cycle. Making this process automatic |
---|
361 | is on my tuit list, but don't ask me when." |
---|
362 | |
---|
363 | =item It's more reliable |
---|
364 | |
---|
365 | Well, since the patches are updated by hand, I don't have to say any |
---|
366 | more ... (see Sarathy's remark). |
---|
367 | |
---|
368 | =back |
---|
369 | |
---|
370 | =head2 Why rsync the patches |
---|
371 | |
---|
372 | =over 4 |
---|
373 | |
---|
374 | =item It's easier |
---|
375 | |
---|
376 | If you have more than one machine that you want to keep in track with |
---|
377 | bleadperl, it's easier to rsync the patches only once and then apply |
---|
378 | them to all the source trees on the different machines. |
---|
379 | |
---|
380 | In case you try to keep in pace on 5 different machines, for which |
---|
381 | only one of them has access to the WAN, rsync'ing all the source |
---|
382 | trees should than be done 5 times over the NFS. Having |
---|
383 | rsync'ed the patches only once, I can apply them to all the source |
---|
384 | trees automatically. Need you say more ;-) |
---|
385 | |
---|
386 | =item It's a good reference |
---|
387 | |
---|
388 | If you do not only like to have the most recent development branch, |
---|
389 | but also like to B<fix> bugs, or extend features, you want to dive |
---|
390 | into the sources. If you are a seasoned perl core diver, you don't |
---|
391 | need no manuals, tips, roadmaps, perlguts.pod or other aids to find |
---|
392 | your way around. But if you are a starter, the patches may help you |
---|
393 | in finding where you should start and how to change the bits that |
---|
394 | bug you. |
---|
395 | |
---|
396 | The file B<Changes> is updated on occasions the pumpking sees as his |
---|
397 | own little sync points. On those occasions, he releases a tar-ball of |
---|
398 | the current source tree (i.e. perl@7582.tar.gz), which will be an |
---|
399 | excellent point to start with when choosing to use the 'rsync the |
---|
400 | patches' scheme. Starting with perl@7582, which means a set of source |
---|
401 | files on which the latest applied patch is number 7582, you apply all |
---|
402 | succeeding patches available from then on (7583, 7584, ...). |
---|
403 | |
---|
404 | You can use the patches later as a kind of search archive. |
---|
405 | |
---|
406 | =over 4 |
---|
407 | |
---|
408 | =item Finding a start point |
---|
409 | |
---|
410 | If you want to fix/change the behaviour of function/feature Foo, just |
---|
411 | scan the patches for patches that mention Foo either in the subject, |
---|
412 | the comments, or the body of the fix. A good chance the patch shows |
---|
413 | you the files that are affected by that patch which are very likely |
---|
414 | to be the starting point of your journey into the guts of perl. |
---|
415 | |
---|
416 | =item Finding how to fix a bug |
---|
417 | |
---|
418 | If you've found I<where> the function/feature Foo misbehaves, but you |
---|
419 | don't know how to fix it (but you do know the change you want to |
---|
420 | make), you can, again, peruse the patches for similar changes and |
---|
421 | look how others apply the fix. |
---|
422 | |
---|
423 | =item Finding the source of misbehaviour |
---|
424 | |
---|
425 | When you keep in sync with bleadperl, the pumpking would love to |
---|
426 | I<see> that the community efforts realy work. So after each of his |
---|
427 | sync points, you are to 'make test' to check if everything is still |
---|
428 | in working order. If it is, you do 'make ok', which will send an OK |
---|
429 | report to perlbug@perl.org. (If you do not have access to a mailer |
---|
430 | from the system you just finished successfully 'make test', you can |
---|
431 | do 'make okfile', which creates the file C<perl.ok>, which you can |
---|
432 | than take to your favourite mailer and mail yourself). |
---|
433 | |
---|
434 | But of course, as always, things will not allways lead to a success |
---|
435 | path, and one or more test do not pass the 'make test'. Before |
---|
436 | sending in a bug report (using 'make nok' or 'make nokfile'), check |
---|
437 | the mailing list if someone else has reported the bug already and if |
---|
438 | so, confirm it by replying to that message. If not, you might want to |
---|
439 | trace the source of that misbehaviour B<before> sending in the bug, |
---|
440 | which will help all the other porters in finding the solution. |
---|
441 | |
---|
442 | Here the saved patches come in very handy. You can check the list of |
---|
443 | patches to see which patch changed what file and what change caused |
---|
444 | the misbehaviour. If you note that in the bug report, it saves the |
---|
445 | one trying to solve it, looking for that point. |
---|
446 | |
---|
447 | =back |
---|
448 | |
---|
449 | If searching the patches is too bothersome, you might consider using |
---|
450 | perl's bugtron to find more information about discussions and |
---|
451 | ramblings on posted bugs. |
---|
452 | |
---|
453 | =back |
---|
454 | |
---|
455 | If you want to get the best of both worlds, rsync both the source |
---|
456 | tree for convenience, reliability and ease and rsync the patches |
---|
457 | for reference. |
---|
458 | |
---|
459 | =head2 Submitting patches |
---|
460 | |
---|
461 | Always submit patches to I<perl5-porters@perl.org>. This lets other |
---|
462 | porters review your patch, which catches a surprising number of errors |
---|
463 | in patches. Either use the diff program (available in source code |
---|
464 | form from I<ftp://ftp.gnu.org/pub/gnu/>), or use Johan Vromans' |
---|
465 | I<makepatch> (available from I<CPAN/authors/id/JV/>). Unified diffs |
---|
466 | are preferred, but context diffs are accepted. Do not send RCS-style |
---|
467 | diffs or diffs without context lines. More information is given in |
---|
468 | the I<Porting/patching.pod> file in the Perl source distribution. |
---|
469 | Please patch against the latest B<development> version (e.g., if |
---|
470 | you're fixing a bug in the 5.005 track, patch against the latest |
---|
471 | 5.005_5x version). Only patches that survive the heat of the |
---|
472 | development branch get applied to maintenance versions. |
---|
473 | |
---|
474 | Your patch should update the documentation and test suite. |
---|
475 | |
---|
476 | To report a bug in Perl, use the program I<perlbug> which comes with |
---|
477 | Perl (if you can't get Perl to work, send mail to the address |
---|
478 | I<perlbug@perl.org> or I<perlbug@perl.com>). Reporting bugs through |
---|
479 | I<perlbug> feeds into the automated bug-tracking system, access to |
---|
480 | which is provided through the web at I<http://bugs.perl.org/>. It |
---|
481 | often pays to check the archives of the perl5-porters mailing list to |
---|
482 | see whether the bug you're reporting has been reported before, and if |
---|
483 | so whether it was considered a bug. See above for the location of |
---|
484 | the searchable archives. |
---|
485 | |
---|
486 | The CPAN testers (I<http://testers.cpan.org/>) are a group of |
---|
487 | volunteers who test CPAN modules on a variety of platforms. Perl Labs |
---|
488 | (I<http://labs.perl.org/>) automatically tests Perl source releases on |
---|
489 | platforms and gives feedback to the CPAN testers mailing list. Both |
---|
490 | efforts welcome volunteers. |
---|
491 | |
---|
492 | It's a good idea to read and lurk for a while before chipping in. |
---|
493 | That way you'll get to see the dynamic of the conversations, learn the |
---|
494 | personalities of the players, and hopefully be better prepared to make |
---|
495 | a useful contribution when do you speak up. |
---|
496 | |
---|
497 | If after all this you still think you want to join the perl5-porters |
---|
498 | mailing list, send mail to I<perl5-porters-subscribe@perl.org>. To |
---|
499 | unsubscribe, send mail to I<perl5-porters-unsubscribe@perl.org>. |
---|
500 | |
---|
501 | To hack on the Perl guts, you'll need to read the following things: |
---|
502 | |
---|
503 | =over 3 |
---|
504 | |
---|
505 | =item L<perlguts> |
---|
506 | |
---|
507 | This is of paramount importance, since it's the documentation of what |
---|
508 | goes where in the Perl source. Read it over a couple of times and it |
---|
509 | might start to make sense - don't worry if it doesn't yet, because the |
---|
510 | best way to study it is to read it in conjunction with poking at Perl |
---|
511 | source, and we'll do that later on. |
---|
512 | |
---|
513 | You might also want to look at Gisle Aas's illustrated perlguts - |
---|
514 | there's no guarantee that this will be absolutely up-to-date with the |
---|
515 | latest documentation in the Perl core, but the fundamentals will be |
---|
516 | right. (http://gisle.aas.no/perl/illguts/) |
---|
517 | |
---|
518 | =item L<perlxstut> and L<perlxs> |
---|
519 | |
---|
520 | A working knowledge of XSUB programming is incredibly useful for core |
---|
521 | hacking; XSUBs use techniques drawn from the PP code, the portion of the |
---|
522 | guts that actually executes a Perl program. It's a lot gentler to learn |
---|
523 | those techniques from simple examples and explanation than from the core |
---|
524 | itself. |
---|
525 | |
---|
526 | =item L<perlapi> |
---|
527 | |
---|
528 | The documentation for the Perl API explains what some of the internal |
---|
529 | functions do, as well as the many macros used in the source. |
---|
530 | |
---|
531 | =item F<Porting/pumpkin.pod> |
---|
532 | |
---|
533 | This is a collection of words of wisdom for a Perl porter; some of it is |
---|
534 | only useful to the pumpkin holder, but most of it applies to anyone |
---|
535 | wanting to go about Perl development. |
---|
536 | |
---|
537 | =item The perl5-porters FAQ |
---|
538 | |
---|
539 | This is posted to perl5-porters at the beginning on every month, and |
---|
540 | should be available from http://perlhacker.org/p5p-faq; alternatively, |
---|
541 | you can get the FAQ emailed to you by sending mail to |
---|
542 | C<perl5-porters-faq@perl.org>. It contains hints on reading |
---|
543 | perl5-porters, information on how perl5-porters works and how Perl |
---|
544 | development in general works. |
---|
545 | |
---|
546 | =back |
---|
547 | |
---|
548 | =head2 Finding Your Way Around |
---|
549 | |
---|
550 | Perl maintenance can be split into a number of areas, and certain people |
---|
551 | (pumpkins) will have responsibility for each area. These areas sometimes |
---|
552 | correspond to files or directories in the source kit. Among the areas are: |
---|
553 | |
---|
554 | =over 3 |
---|
555 | |
---|
556 | =item Core modules |
---|
557 | |
---|
558 | Modules shipped as part of the Perl core live in the F<lib/> and F<ext/> |
---|
559 | subdirectories: F<lib/> is for the pure-Perl modules, and F<ext/> |
---|
560 | contains the core XS modules. |
---|
561 | |
---|
562 | =item Documentation |
---|
563 | |
---|
564 | Documentation maintenance includes looking after everything in the |
---|
565 | F<pod/> directory, (as well as contributing new documentation) and |
---|
566 | the documentation to the modules in core. |
---|
567 | |
---|
568 | =item Configure |
---|
569 | |
---|
570 | The configure process is the way we make Perl portable across the |
---|
571 | myriad of operating systems it supports. Responsibility for the |
---|
572 | configure, build and installation process, as well as the overall |
---|
573 | portability of the core code rests with the configure pumpkin - others |
---|
574 | help out with individual operating systems. |
---|
575 | |
---|
576 | The files involved are the operating system directories, (F<win32/>, |
---|
577 | F<os2/>, F<vms/> and so on) the shell scripts which generate F<config.h> |
---|
578 | and F<Makefile>, as well as the metaconfig files which generate |
---|
579 | F<Configure>. (metaconfig isn't included in the core distribution.) |
---|
580 | |
---|
581 | =item Interpreter |
---|
582 | |
---|
583 | And of course, there's the core of the Perl interpreter itself. Let's |
---|
584 | have a look at that in a little more detail. |
---|
585 | |
---|
586 | =back |
---|
587 | |
---|
588 | Before we leave looking at the layout, though, don't forget that |
---|
589 | F<MANIFEST> contains not only the file names in the Perl distribution, |
---|
590 | but short descriptions of what's in them, too. For an overview of the |
---|
591 | important files, try this: |
---|
592 | |
---|
593 | perl -lne 'print if /^[^\/]+\.[ch]\s+/' MANIFEST |
---|
594 | |
---|
595 | =head2 Elements of the interpreter |
---|
596 | |
---|
597 | The work of the interpreter has two main stages: compiling the code |
---|
598 | into the internal representation, or bytecode, and then executing it. |
---|
599 | L<perlguts/Compiled code> explains exactly how the compilation stage |
---|
600 | happens. |
---|
601 | |
---|
602 | Here is a short breakdown of perl's operation: |
---|
603 | |
---|
604 | =over 3 |
---|
605 | |
---|
606 | =item Startup |
---|
607 | |
---|
608 | The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl) |
---|
609 | This is very high-level code, enough to fit on a single screen, and it |
---|
610 | resembles the code found in L<perlembed>; most of the real action takes |
---|
611 | place in F<perl.c> |
---|
612 | |
---|
613 | First, F<perlmain.c> allocates some memory and constructs a Perl |
---|
614 | interpreter: |
---|
615 | |
---|
616 | 1 PERL_SYS_INIT3(&argc,&argv,&env); |
---|
617 | 2 |
---|
618 | 3 if (!PL_do_undump) { |
---|
619 | 4 my_perl = perl_alloc(); |
---|
620 | 5 if (!my_perl) |
---|
621 | 6 exit(1); |
---|
622 | 7 perl_construct(my_perl); |
---|
623 | 8 PL_perl_destruct_level = 0; |
---|
624 | 9 } |
---|
625 | |
---|
626 | Line 1 is a macro, and its definition is dependent on your operating |
---|
627 | system. Line 3 references C<PL_do_undump>, a global variable - all |
---|
628 | global variables in Perl start with C<PL_>. This tells you whether the |
---|
629 | current running program was created with the C<-u> flag to perl and then |
---|
630 | F<undump>, which means it's going to be false in any sane context. |
---|
631 | |
---|
632 | Line 4 calls a function in F<perl.c> to allocate memory for a Perl |
---|
633 | interpreter. It's quite a simple function, and the guts of it looks like |
---|
634 | this: |
---|
635 | |
---|
636 | my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); |
---|
637 | |
---|
638 | Here you see an example of Perl's system abstraction, which we'll see |
---|
639 | later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's |
---|
640 | own C<malloc> as defined in F<malloc.c> if you selected that option at |
---|
641 | configure time. |
---|
642 | |
---|
643 | Next, in line 7, we construct the interpreter; this sets up all the |
---|
644 | special variables that Perl needs, the stacks, and so on. |
---|
645 | |
---|
646 | Now we pass Perl the command line options, and tell it to go: |
---|
647 | |
---|
648 | exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); |
---|
649 | if (!exitstatus) { |
---|
650 | exitstatus = perl_run(my_perl); |
---|
651 | } |
---|
652 | |
---|
653 | |
---|
654 | C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined |
---|
655 | in F<perl.c>, which processes the command line options, sets up any |
---|
656 | statically linked XS modules, opens the program and calls C<yyparse> to |
---|
657 | parse it. |
---|
658 | |
---|
659 | =item Parsing |
---|
660 | |
---|
661 | The aim of this stage is to take the Perl source, and turn it into an op |
---|
662 | tree. We'll see what one of those looks like later. Strictly speaking, |
---|
663 | there's three things going on here. |
---|
664 | |
---|
665 | C<yyparse>, the parser, lives in F<perly.c>, although you're better off |
---|
666 | reading the original YACC input in F<perly.y>. (Yes, Virginia, there |
---|
667 | B<is> a YACC grammar for Perl!) The job of the parser is to take your |
---|
668 | code and `understand' it, splitting it into sentences, deciding which |
---|
669 | operands go with which operators and so on. |
---|
670 | |
---|
671 | The parser is nobly assisted by the lexer, which chunks up your input |
---|
672 | into tokens, and decides what type of thing each token is: a variable |
---|
673 | name, an operator, a bareword, a subroutine, a core function, and so on. |
---|
674 | The main point of entry to the lexer is C<yylex>, and that and its |
---|
675 | associated routines can be found in F<toke.c>. Perl isn't much like |
---|
676 | other computer languages; it's highly context sensitive at times, it can |
---|
677 | be tricky to work out what sort of token something is, or where a token |
---|
678 | ends. As such, there's a lot of interplay between the tokeniser and the |
---|
679 | parser, which can get pretty frightening if you're not used to it. |
---|
680 | |
---|
681 | As the parser understands a Perl program, it builds up a tree of |
---|
682 | operations for the interpreter to perform during execution. The routines |
---|
683 | which construct and link together the various operations are to be found |
---|
684 | in F<op.c>, and will be examined later. |
---|
685 | |
---|
686 | =item Optimization |
---|
687 | |
---|
688 | Now the parsing stage is complete, and the finished tree represents |
---|
689 | the operations that the Perl interpreter needs to perform to execute our |
---|
690 | program. Next, Perl does a dry run over the tree looking for |
---|
691 | optimisations: constant expressions such as C<3 + 4> will be computed |
---|
692 | now, and the optimizer will also see if any multiple operations can be |
---|
693 | replaced with a single one. For instance, to fetch the variable C<$foo>, |
---|
694 | instead of grabbing the glob C<*foo> and looking at the scalar |
---|
695 | component, the optimizer fiddles the op tree to use a function which |
---|
696 | directly looks up the scalar in question. The main optimizer is C<peep> |
---|
697 | in F<op.c>, and many ops have their own optimizing functions. |
---|
698 | |
---|
699 | =item Running |
---|
700 | |
---|
701 | Now we're finally ready to go: we have compiled Perl byte code, and all |
---|
702 | that's left to do is run it. The actual execution is done by the |
---|
703 | C<runops_standard> function in F<run.c>; more specifically, it's done by |
---|
704 | these three innocent looking lines: |
---|
705 | |
---|
706 | while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { |
---|
707 | PERL_ASYNC_CHECK(); |
---|
708 | } |
---|
709 | |
---|
710 | You may be more comfortable with the Perl version of that: |
---|
711 | |
---|
712 | PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; |
---|
713 | |
---|
714 | Well, maybe not. Anyway, each op contains a function pointer, which |
---|
715 | stipulates the function which will actually carry out the operation. |
---|
716 | This function will return the next op in the sequence - this allows for |
---|
717 | things like C<if> which choose the next op dynamically at run time. |
---|
718 | The C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt |
---|
719 | execution if required. |
---|
720 | |
---|
721 | The actual functions called are known as PP code, and they're spread |
---|
722 | between four files: F<pp_hot.c> contains the `hot' code, which is most |
---|
723 | often used and highly optimized, F<pp_sys.c> contains all the |
---|
724 | system-specific functions, F<pp_ctl.c> contains the functions which |
---|
725 | implement control structures (C<if>, C<while> and the like) and F<pp.c> |
---|
726 | contains everything else. These are, if you like, the C code for Perl's |
---|
727 | built-in functions and operators. |
---|
728 | |
---|
729 | =back |
---|
730 | |
---|
731 | =head2 Internal Variable Types |
---|
732 | |
---|
733 | You should by now have had a look at L<perlguts>, which tells you about |
---|
734 | Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do |
---|
735 | that now. |
---|
736 | |
---|
737 | These variables are used not only to represent Perl-space variables, but |
---|
738 | also any constants in the code, as well as some structures completely |
---|
739 | internal to Perl. The symbol table, for instance, is an ordinary Perl |
---|
740 | hash. Your code is represented by an SV as it's read into the parser; |
---|
741 | any program files you call are opened via ordinary Perl filehandles, and |
---|
742 | so on. |
---|
743 | |
---|
744 | The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a |
---|
745 | Perl program. Let's see, for instance, how Perl treats the constant |
---|
746 | C<"hello">. |
---|
747 | |
---|
748 | % perl -MDevel::Peek -e 'Dump("hello")' |
---|
749 | 1 SV = PV(0xa041450) at 0xa04ecbc |
---|
750 | 2 REFCNT = 1 |
---|
751 | 3 FLAGS = (POK,READONLY,pPOK) |
---|
752 | 4 PV = 0xa0484e0 "hello"\0 |
---|
753 | 5 CUR = 5 |
---|
754 | 6 LEN = 6 |
---|
755 | |
---|
756 | Reading C<Devel::Peek> output takes a bit of practise, so let's go |
---|
757 | through it line by line. |
---|
758 | |
---|
759 | Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in |
---|
760 | memory. SVs themselves are very simple structures, but they contain a |
---|
761 | pointer to a more complex structure. In this case, it's a PV, a |
---|
762 | structure which holds a string value, at location C<0xa041450>. Line 2 |
---|
763 | is the reference count; there are no other references to this data, so |
---|
764 | it's 1. |
---|
765 | |
---|
766 | Line 3 are the flags for this SV - it's OK to use it as a PV, it's a |
---|
767 | read-only SV (because it's a constant) and the data is a PV internally. |
---|
768 | Next we've got the contents of the string, starting at location |
---|
769 | C<0xa0484e0>. |
---|
770 | |
---|
771 | Line 5 gives us the current length of the string - note that this does |
---|
772 | B<not> include the null terminator. Line 6 is not the length of the |
---|
773 | string, but the length of the currently allocated buffer; as the string |
---|
774 | grows, Perl automatically extends the available storage via a routine |
---|
775 | called C<SvGROW>. |
---|
776 | |
---|
777 | You can get at any of these quantities from C very easily; just add |
---|
778 | C<Sv> to the name of the field shown in the snippet, and you've got a |
---|
779 | macro which will return the value: C<SvCUR(sv)> returns the current |
---|
780 | length of the string, C<SvREFCOUNT(sv)> returns the reference count, |
---|
781 | C<SvPV(sv, len)> returns the string itself with its length, and so on. |
---|
782 | More macros to manipulate these properties can be found in L<perlguts>. |
---|
783 | |
---|
784 | Let's take an example of manipulating a PV, from C<sv_catpvn>, in F<sv.c> |
---|
785 | |
---|
786 | 1 void |
---|
787 | 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) |
---|
788 | 3 { |
---|
789 | 4 STRLEN tlen; |
---|
790 | 5 char *junk; |
---|
791 | |
---|
792 | 6 junk = SvPV_force(sv, tlen); |
---|
793 | 7 SvGROW(sv, tlen + len + 1); |
---|
794 | 8 if (ptr == junk) |
---|
795 | 9 ptr = SvPVX(sv); |
---|
796 | 10 Move(ptr,SvPVX(sv)+tlen,len,char); |
---|
797 | 11 SvCUR(sv) += len; |
---|
798 | 12 *SvEND(sv) = '\0'; |
---|
799 | 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ |
---|
800 | 14 SvTAINT(sv); |
---|
801 | 15 } |
---|
802 | |
---|
803 | This is a function which adds a string, C<ptr>, of length C<len> onto |
---|
804 | the end of the PV stored in C<sv>. The first thing we do in line 6 is |
---|
805 | make sure that the SV B<has> a valid PV, by calling the C<SvPV_force> |
---|
806 | macro to force a PV. As a side effect, C<tlen> gets set to the current |
---|
807 | value of the PV, and the PV itself is returned to C<junk>. |
---|
808 | |
---|
809 | In line 7, we make sure that the SV will have enough room to accommodate |
---|
810 | the old string, the new string and the null terminator. If C<LEN> isn't |
---|
811 | big enough, C<SvGROW> will reallocate space for us. |
---|
812 | |
---|
813 | Now, if C<junk> is the same as the string we're trying to add, we can |
---|
814 | grab the string directly from the SV; C<SvPVX> is the address of the PV |
---|
815 | in the SV. |
---|
816 | |
---|
817 | Line 10 does the actual catenation: the C<Move> macro moves a chunk of |
---|
818 | memory around: we move the string C<ptr> to the end of the PV - that's |
---|
819 | the start of the PV plus its current length. We're moving C<len> bytes |
---|
820 | of type C<char>. After doing so, we need to tell Perl we've extended the |
---|
821 | string, by altering C<CUR> to reflect the new length. C<SvEND> is a |
---|
822 | macro which gives us the end of the string, so that needs to be a |
---|
823 | C<"\0">. |
---|
824 | |
---|
825 | Line 13 manipulates the flags; since we've changed the PV, any IV or NV |
---|
826 | values will no longer be valid: if we have C<$a=10; $a.="6";> we don't |
---|
827 | want to use the old IV of 10. C<SvPOK_only_utf8> is a special UTF8-aware |
---|
828 | version of C<SvPOK_only>, a macro which turns off the IOK and NOK flags |
---|
829 | and turns on POK. The final C<SvTAINT> is a macro which launders tainted |
---|
830 | data if taint mode is turned on. |
---|
831 | |
---|
832 | AVs and HVs are more complicated, but SVs are by far the most common |
---|
833 | variable type being thrown around. Having seen something of how we |
---|
834 | manipulate these, let's go on and look at how the op tree is |
---|
835 | constructed. |
---|
836 | |
---|
837 | =head2 Op Trees |
---|
838 | |
---|
839 | First, what is the op tree, anyway? The op tree is the parsed |
---|
840 | representation of your program, as we saw in our section on parsing, and |
---|
841 | it's the sequence of operations that Perl goes through to execute your |
---|
842 | program, as we saw in L</Running>. |
---|
843 | |
---|
844 | An op is a fundamental operation that Perl can perform: all the built-in |
---|
845 | functions and operators are ops, and there are a series of ops which |
---|
846 | deal with concepts the interpreter needs internally - entering and |
---|
847 | leaving a block, ending a statement, fetching a variable, and so on. |
---|
848 | |
---|
849 | The op tree is connected in two ways: you can imagine that there are two |
---|
850 | "routes" through it, two orders in which you can traverse the tree. |
---|
851 | First, parse order reflects how the parser understood the code, and |
---|
852 | secondly, execution order tells perl what order to perform the |
---|
853 | operations in. |
---|
854 | |
---|
855 | The easiest way to examine the op tree is to stop Perl after it has |
---|
856 | finished parsing, and get it to dump out the tree. This is exactly what |
---|
857 | the compiler backends L<B::Terse|B::Terse> and L<B::Debug|B::Debug> do. |
---|
858 | |
---|
859 | Let's have a look at how Perl sees C<$a = $b + $c>: |
---|
860 | |
---|
861 | % perl -MO=Terse -e '$a=$b+$c' |
---|
862 | 1 LISTOP (0x8179888) leave |
---|
863 | 2 OP (0x81798b0) enter |
---|
864 | 3 COP (0x8179850) nextstate |
---|
865 | 4 BINOP (0x8179828) sassign |
---|
866 | 5 BINOP (0x8179800) add [1] |
---|
867 | 6 UNOP (0x81796e0) null [15] |
---|
868 | 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b |
---|
869 | 8 UNOP (0x81797e0) null [15] |
---|
870 | 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c |
---|
871 | 10 UNOP (0x816b4f0) null [15] |
---|
872 | 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a |
---|
873 | |
---|
874 | Let's start in the middle, at line 4. This is a BINOP, a binary |
---|
875 | operator, which is at location C<0x8179828>. The specific operator in |
---|
876 | question is C<sassign> - scalar assignment - and you can find the code |
---|
877 | which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a |
---|
878 | binary operator, it has two children: the add operator, providing the |
---|
879 | result of C<$b+$c>, is uppermost on line 5, and the left hand side is on |
---|
880 | line 10. |
---|
881 | |
---|
882 | Line 10 is the null op: this does exactly nothing. What is that doing |
---|
883 | there? If you see the null op, it's a sign that something has been |
---|
884 | optimized away after parsing. As we mentioned in L</Optimization>, |
---|
885 | the optimization stage sometimes converts two operations into one, for |
---|
886 | example when fetching a scalar variable. When this happens, instead of |
---|
887 | rewriting the op tree and cleaning up the dangling pointers, it's easier |
---|
888 | just to replace the redundant operation with the null op. Originally, |
---|
889 | the tree would have looked like this: |
---|
890 | |
---|
891 | 10 SVOP (0x816b4f0) rv2sv [15] |
---|
892 | 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a |
---|
893 | |
---|
894 | That is, fetch the C<a> entry from the main symbol table, and then look |
---|
895 | at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>) |
---|
896 | happens to do both these things. |
---|
897 | |
---|
898 | The right hand side, starting at line 5 is similar to what we've just |
---|
899 | seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add together |
---|
900 | two C<gvsv>s. |
---|
901 | |
---|
902 | Now, what's this about? |
---|
903 | |
---|
904 | 1 LISTOP (0x8179888) leave |
---|
905 | 2 OP (0x81798b0) enter |
---|
906 | 3 COP (0x8179850) nextstate |
---|
907 | |
---|
908 | C<enter> and C<leave> are scoping ops, and their job is to perform any |
---|
909 | housekeeping every time you enter and leave a block: lexical variables |
---|
910 | are tidied up, unreferenced variables are destroyed, and so on. Every |
---|
911 | program will have those first three lines: C<leave> is a list, and its |
---|
912 | children are all the statements in the block. Statements are delimited |
---|
913 | by C<nextstate>, so a block is a collection of C<nextstate> ops, with |
---|
914 | the ops to be performed for each statement being the children of |
---|
915 | C<nextstate>. C<enter> is a single op which functions as a marker. |
---|
916 | |
---|
917 | That's how Perl parsed the program, from top to bottom: |
---|
918 | |
---|
919 | Program |
---|
920 | | |
---|
921 | Statement |
---|
922 | | |
---|
923 | = |
---|
924 | / \ |
---|
925 | / \ |
---|
926 | $a + |
---|
927 | / \ |
---|
928 | $b $c |
---|
929 | |
---|
930 | However, it's impossible to B<perform> the operations in this order: |
---|
931 | you have to find the values of C<$b> and C<$c> before you add them |
---|
932 | together, for instance. So, the other thread that runs through the op |
---|
933 | tree is the execution order: each op has a field C<op_next> which points |
---|
934 | to the next op to be run, so following these pointers tells us how perl |
---|
935 | executes the code. We can traverse the tree in this order using |
---|
936 | the C<exec> option to C<B::Terse>: |
---|
937 | |
---|
938 | % perl -MO=Terse,exec -e '$a=$b+$c' |
---|
939 | 1 OP (0x8179928) enter |
---|
940 | 2 COP (0x81798c8) nextstate |
---|
941 | 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b |
---|
942 | 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c |
---|
943 | 5 BINOP (0x8179878) add [1] |
---|
944 | 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a |
---|
945 | 7 BINOP (0x81798a0) sassign |
---|
946 | 8 LISTOP (0x8179900) leave |
---|
947 | |
---|
948 | This probably makes more sense for a human: enter a block, start a |
---|
949 | statement. Get the values of C<$b> and C<$c>, and add them together. |
---|
950 | Find C<$a>, and assign one to the other. Then leave. |
---|
951 | |
---|
952 | The way Perl builds up these op trees in the parsing process can be |
---|
953 | unravelled by examining F<perly.y>, the YACC grammar. Let's take the |
---|
954 | piece we need to construct the tree for C<$a = $b + $c> |
---|
955 | |
---|
956 | 1 term : term ASSIGNOP term |
---|
957 | 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } |
---|
958 | 3 | term ADDOP term |
---|
959 | 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } |
---|
960 | |
---|
961 | If you're not used to reading BNF grammars, this is how it works: You're |
---|
962 | fed certain things by the tokeniser, which generally end up in upper |
---|
963 | case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in your |
---|
964 | code. C<ASSIGNOP> is provided when C<=> is used for assigning. These are |
---|
965 | `terminal symbols', because you can't get any simpler than them. |
---|
966 | |
---|
967 | The grammar, lines one and three of the snippet above, tells you how to |
---|
968 | build up more complex forms. These complex forms, `non-terminal symbols' |
---|
969 | are generally placed in lower case. C<term> here is a non-terminal |
---|
970 | symbol, representing a single expression. |
---|
971 | |
---|
972 | The grammar gives you the following rule: you can make the thing on the |
---|
973 | left of the colon if you see all the things on the right in sequence. |
---|
974 | This is called a "reduction", and the aim of parsing is to completely |
---|
975 | reduce the input. There are several different ways you can perform a |
---|
976 | reduction, separated by vertical bars: so, C<term> followed by C<=> |
---|
977 | followed by C<term> makes a C<term>, and C<term> followed by C<+> |
---|
978 | followed by C<term> can also make a C<term>. |
---|
979 | |
---|
980 | So, if you see two terms with an C<=> or C<+>, between them, you can |
---|
981 | turn them into a single expression. When you do this, you execute the |
---|
982 | code in the block on the next line: if you see C<=>, you'll do the code |
---|
983 | in line 2. If you see C<+>, you'll do the code in line 4. It's this code |
---|
984 | which contributes to the op tree. |
---|
985 | |
---|
986 | | term ADDOP term |
---|
987 | { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } |
---|
988 | |
---|
989 | What this does is creates a new binary op, and feeds it a number of |
---|
990 | variables. The variables refer to the tokens: C<$1> is the first token in |
---|
991 | the input, C<$2> the second, and so on - think regular expression |
---|
992 | backreferences. C<$$> is the op returned from this reduction. So, we |
---|
993 | call C<newBINOP> to create a new binary operator. The first parameter to |
---|
994 | C<newBINOP>, a function in F<op.c>, is the op type. It's an addition |
---|
995 | operator, so we want the type to be C<ADDOP>. We could specify this |
---|
996 | directly, but it's right there as the second token in the input, so we |
---|
997 | use C<$2>. The second parameter is the op's flags: 0 means `nothing |
---|
998 | special'. Then the things to add: the left and right hand side of our |
---|
999 | expression, in scalar context. |
---|
1000 | |
---|
1001 | =head2 Stacks |
---|
1002 | |
---|
1003 | When perl executes something like C<addop>, how does it pass on its |
---|
1004 | results to the next op? The answer is, through the use of stacks. Perl |
---|
1005 | has a number of stacks to store things it's currently working on, and |
---|
1006 | we'll look at the three most important ones here. |
---|
1007 | |
---|
1008 | =over 3 |
---|
1009 | |
---|
1010 | =item Argument stack |
---|
1011 | |
---|
1012 | Arguments are passed to PP code and returned from PP code using the |
---|
1013 | argument stack, C<ST>. The typical way to handle arguments is to pop |
---|
1014 | them off the stack, deal with them how you wish, and then push the result |
---|
1015 | back onto the stack. This is how, for instance, the cosine operator |
---|
1016 | works: |
---|
1017 | |
---|
1018 | NV value; |
---|
1019 | value = POPn; |
---|
1020 | value = Perl_cos(value); |
---|
1021 | XPUSHn(value); |
---|
1022 | |
---|
1023 | We'll see a more tricky example of this when we consider Perl's macros |
---|
1024 | below. C<POPn> gives you the NV (floating point value) of the top SV on |
---|
1025 | the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and push |
---|
1026 | the result back as an NV. The C<X> in C<XPUSHn> means that the stack |
---|
1027 | should be extended if necessary - it can't be necessary here, because we |
---|
1028 | know there's room for one more item on the stack, since we've just |
---|
1029 | removed one! The C<XPUSH*> macros at least guarantee safety. |
---|
1030 | |
---|
1031 | Alternatively, you can fiddle with the stack directly: C<SP> gives you |
---|
1032 | the first element in your portion of the stack, and C<TOP*> gives you |
---|
1033 | the top SV/IV/NV/etc. on the stack. So, for instance, to do unary |
---|
1034 | negation of an integer: |
---|
1035 | |
---|
1036 | SETi(-TOPi); |
---|
1037 | |
---|
1038 | Just set the integer value of the top stack entry to its negation. |
---|
1039 | |
---|
1040 | Argument stack manipulation in the core is exactly the same as it is in |
---|
1041 | XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer |
---|
1042 | description of the macros used in stack manipulation. |
---|
1043 | |
---|
1044 | =item Mark stack |
---|
1045 | |
---|
1046 | I say `your portion of the stack' above because PP code doesn't |
---|
1047 | necessarily get the whole stack to itself: if your function calls |
---|
1048 | another function, you'll only want to expose the arguments aimed for the |
---|
1049 | called function, and not (necessarily) let it get at your own data. The |
---|
1050 | way we do this is to have a `virtual' bottom-of-stack, exposed to each |
---|
1051 | function. The mark stack keeps bookmarks to locations in the argument |
---|
1052 | stack usable by each function. For instance, when dealing with a tied |
---|
1053 | variable, (internally, something with `P' magic) Perl has to call |
---|
1054 | methods for accesses to the tied variables. However, we need to separate |
---|
1055 | the arguments exposed to the method to the argument exposed to the |
---|
1056 | original function - the store or fetch or whatever it may be. Here's how |
---|
1057 | the tied C<push> is implemented; see C<av_push> in F<av.c>: |
---|
1058 | |
---|
1059 | 1 PUSHMARK(SP); |
---|
1060 | 2 EXTEND(SP,2); |
---|
1061 | 3 PUSHs(SvTIED_obj((SV*)av, mg)); |
---|
1062 | 4 PUSHs(val); |
---|
1063 | 5 PUTBACK; |
---|
1064 | 6 ENTER; |
---|
1065 | 7 call_method("PUSH", G_SCALAR|G_DISCARD); |
---|
1066 | 8 LEAVE; |
---|
1067 | 9 POPSTACK; |
---|
1068 | |
---|
1069 | The lines which concern the mark stack are the first, fifth and last |
---|
1070 | lines: they save away, restore and remove the current position of the |
---|
1071 | argument stack. |
---|
1072 | |
---|
1073 | Let's examine the whole implementation, for practice: |
---|
1074 | |
---|
1075 | 1 PUSHMARK(SP); |
---|
1076 | |
---|
1077 | Push the current state of the stack pointer onto the mark stack. This is |
---|
1078 | so that when we've finished adding items to the argument stack, Perl |
---|
1079 | knows how many things we've added recently. |
---|
1080 | |
---|
1081 | 2 EXTEND(SP,2); |
---|
1082 | 3 PUSHs(SvTIED_obj((SV*)av, mg)); |
---|
1083 | 4 PUSHs(val); |
---|
1084 | |
---|
1085 | We're going to add two more items onto the argument stack: when you have |
---|
1086 | a tied array, the C<PUSH> subroutine receives the object and the value |
---|
1087 | to be pushed, and that's exactly what we have here - the tied object, |
---|
1088 | retrieved with C<SvTIED_obj>, and the value, the SV C<val>. |
---|
1089 | |
---|
1090 | 5 PUTBACK; |
---|
1091 | |
---|
1092 | Next we tell Perl to make the change to the global stack pointer: C<dSP> |
---|
1093 | only gave us a local copy, not a reference to the global. |
---|
1094 | |
---|
1095 | 6 ENTER; |
---|
1096 | 7 call_method("PUSH", G_SCALAR|G_DISCARD); |
---|
1097 | 8 LEAVE; |
---|
1098 | |
---|
1099 | C<ENTER> and C<LEAVE> localise a block of code - they make sure that all |
---|
1100 | variables are tidied up, everything that has been localised gets |
---|
1101 | its previous value returned, and so on. Think of them as the C<{> and |
---|
1102 | C<}> of a Perl block. |
---|
1103 | |
---|
1104 | To actually do the magic method call, we have to call a subroutine in |
---|
1105 | Perl space: C<call_method> takes care of that, and it's described in |
---|
1106 | L<perlcall>. We call the C<PUSH> method in scalar context, and we're |
---|
1107 | going to discard its return value. |
---|
1108 | |
---|
1109 | 9 POPSTACK; |
---|
1110 | |
---|
1111 | Finally, we remove the value we placed on the mark stack, since we |
---|
1112 | don't need it any more. |
---|
1113 | |
---|
1114 | =item Save stack |
---|
1115 | |
---|
1116 | C doesn't have a concept of local scope, so perl provides one. We've |
---|
1117 | seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save |
---|
1118 | stack implements the C equivalent of, for example: |
---|
1119 | |
---|
1120 | { |
---|
1121 | local $foo = 42; |
---|
1122 | ... |
---|
1123 | } |
---|
1124 | |
---|
1125 | See L<perlguts/Localising Changes> for how to use the save stack. |
---|
1126 | |
---|
1127 | =back |
---|
1128 | |
---|
1129 | =head2 Millions of Macros |
---|
1130 | |
---|
1131 | One thing you'll notice about the Perl source is that it's full of |
---|
1132 | macros. Some have called the pervasive use of macros the hardest thing |
---|
1133 | to understand, others find it adds to clarity. Let's take an example, |
---|
1134 | the code which implements the addition operator: |
---|
1135 | |
---|
1136 | 1 PP(pp_add) |
---|
1137 | 2 { |
---|
1138 | 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); |
---|
1139 | 4 { |
---|
1140 | 5 dPOPTOPnnrl_ul; |
---|
1141 | 6 SETn( left + right ); |
---|
1142 | 7 RETURN; |
---|
1143 | 8 } |
---|
1144 | 9 } |
---|
1145 | |
---|
1146 | Every line here (apart from the braces, of course) contains a macro. The |
---|
1147 | first line sets up the function declaration as Perl expects for PP code; |
---|
1148 | line 3 sets up variable declarations for the argument stack and the |
---|
1149 | target, the return value of the operation. Finally, it tries to see if |
---|
1150 | the addition operation is overloaded; if so, the appropriate subroutine |
---|
1151 | is called. |
---|
1152 | |
---|
1153 | Line 5 is another variable declaration - all variable declarations start |
---|
1154 | with C<d> - which pops from the top of the argument stack two NVs (hence |
---|
1155 | C<nn>) and puts them into the variables C<right> and C<left>, hence the |
---|
1156 | C<rl>. These are the two operands to the addition operator. Next, we |
---|
1157 | call C<SETn> to set the NV of the return value to the result of adding |
---|
1158 | the two values. This done, we return - the C<RETURN> macro makes sure |
---|
1159 | that our return value is properly handled, and we pass the next operator |
---|
1160 | to run back to the main run loop. |
---|
1161 | |
---|
1162 | Most of these macros are explained in L<perlapi>, and some of the more |
---|
1163 | important ones are explained in L<perlxs> as well. Pay special attention |
---|
1164 | to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for information on |
---|
1165 | the C<[pad]THX_?> macros. |
---|
1166 | |
---|
1167 | |
---|
1168 | =head2 Poking at Perl |
---|
1169 | |
---|
1170 | To really poke around with Perl, you'll probably want to build Perl for |
---|
1171 | debugging, like this: |
---|
1172 | |
---|
1173 | ./Configure -d -D optimize=-g |
---|
1174 | make |
---|
1175 | |
---|
1176 | C<-g> is a flag to the C compiler to have it produce debugging |
---|
1177 | information which will allow us to step through a running program. |
---|
1178 | F<Configure> will also turn on the C<DEBUGGING> compilation symbol which |
---|
1179 | enables all the internal debugging code in Perl. There are a whole bunch |
---|
1180 | of things you can debug with this: L<perlrun> lists them all, and the |
---|
1181 | best way to find out about them is to play about with them. The most |
---|
1182 | useful options are probably |
---|
1183 | |
---|
1184 | l Context (loop) stack processing |
---|
1185 | t Trace execution |
---|
1186 | o Method and overloading resolution |
---|
1187 | c String/numeric conversions |
---|
1188 | |
---|
1189 | Some of the functionality of the debugging code can be achieved using XS |
---|
1190 | modules. |
---|
1191 | |
---|
1192 | -Dr => use re 'debug' |
---|
1193 | -Dx => use O 'Debug' |
---|
1194 | |
---|
1195 | =head2 Using a source-level debugger |
---|
1196 | |
---|
1197 | If the debugging output of C<-D> doesn't help you, it's time to step |
---|
1198 | through perl's execution with a source-level debugger. |
---|
1199 | |
---|
1200 | =over 3 |
---|
1201 | |
---|
1202 | =item * |
---|
1203 | |
---|
1204 | We'll use C<gdb> for our examples here; the principles will apply to any |
---|
1205 | debugger, but check the manual of the one you're using. |
---|
1206 | |
---|
1207 | =back |
---|
1208 | |
---|
1209 | To fire up the debugger, type |
---|
1210 | |
---|
1211 | gdb ./perl |
---|
1212 | |
---|
1213 | You'll want to do that in your Perl source tree so the debugger can read |
---|
1214 | the source code. You should see the copyright message, followed by the |
---|
1215 | prompt. |
---|
1216 | |
---|
1217 | (gdb) |
---|
1218 | |
---|
1219 | C<help> will get you into the documentation, but here are the most |
---|
1220 | useful commands: |
---|
1221 | |
---|
1222 | =over 3 |
---|
1223 | |
---|
1224 | =item run [args] |
---|
1225 | |
---|
1226 | Run the program with the given arguments. |
---|
1227 | |
---|
1228 | =item break function_name |
---|
1229 | |
---|
1230 | =item break source.c:xxx |
---|
1231 | |
---|
1232 | Tells the debugger that we'll want to pause execution when we reach |
---|
1233 | either the named function (but see L<perlguts/Internal Functions>!) or the given |
---|
1234 | line in the named source file. |
---|
1235 | |
---|
1236 | =item step |
---|
1237 | |
---|
1238 | Steps through the program a line at a time. |
---|
1239 | |
---|
1240 | =item next |
---|
1241 | |
---|
1242 | Steps through the program a line at a time, without descending into |
---|
1243 | functions. |
---|
1244 | |
---|
1245 | =item continue |
---|
1246 | |
---|
1247 | Run until the next breakpoint. |
---|
1248 | |
---|
1249 | =item finish |
---|
1250 | |
---|
1251 | Run until the end of the current function, then stop again. |
---|
1252 | |
---|
1253 | =item 'enter' |
---|
1254 | |
---|
1255 | Just pressing Enter will do the most recent operation again - it's a |
---|
1256 | blessing when stepping through miles of source code. |
---|
1257 | |
---|
1258 | =item print |
---|
1259 | |
---|
1260 | Execute the given C code and print its results. B<WARNING>: Perl makes |
---|
1261 | heavy use of macros, and F<gdb> is not aware of macros. You'll have to |
---|
1262 | substitute them yourself. So, for instance, you can't say |
---|
1263 | |
---|
1264 | print SvPV_nolen(sv) |
---|
1265 | |
---|
1266 | but you have to say |
---|
1267 | |
---|
1268 | print Perl_sv_2pv_nolen(sv) |
---|
1269 | |
---|
1270 | You may find it helpful to have a "macro dictionary", which you can |
---|
1271 | produce by saying C<cpp -dM perl.c | sort>. Even then, F<cpp> won't |
---|
1272 | recursively apply the macros for you. |
---|
1273 | |
---|
1274 | =back |
---|
1275 | |
---|
1276 | =head2 Dumping Perl Data Structures |
---|
1277 | |
---|
1278 | One way to get around this macro hell is to use the dumping functions in |
---|
1279 | F<dump.c>; these work a little like an internal |
---|
1280 | L<Devel::Peek|Devel::Peek>, but they also cover OPs and other structures |
---|
1281 | that you can't get at from Perl. Let's take an example. We'll use the |
---|
1282 | C<$a = $b + $c> we used before, but give it a bit of context: |
---|
1283 | C<$b = "6XXXX"; $c = 2.3;>. Where's a good place to stop and poke around? |
---|
1284 | |
---|
1285 | What about C<pp_add>, the function we examined earlier to implement the |
---|
1286 | C<+> operator: |
---|
1287 | |
---|
1288 | (gdb) break Perl_pp_add |
---|
1289 | Breakpoint 1 at 0x46249f: file pp_hot.c, line 309. |
---|
1290 | |
---|
1291 | Notice we use C<Perl_pp_add> and not C<pp_add> - see L<perlguts/Internal Functions>. |
---|
1292 | With the breakpoint in place, we can run our program: |
---|
1293 | |
---|
1294 | (gdb) run -e '$b = "6XXXX"; $c = 2.3; $a = $b + $c' |
---|
1295 | |
---|
1296 | Lots of junk will go past as gdb reads in the relevant source files and |
---|
1297 | libraries, and then: |
---|
1298 | |
---|
1299 | Breakpoint 1, Perl_pp_add () at pp_hot.c:309 |
---|
1300 | 309 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); |
---|
1301 | (gdb) step |
---|
1302 | 311 dPOPTOPnnrl_ul; |
---|
1303 | (gdb) |
---|
1304 | |
---|
1305 | We looked at this bit of code before, and we said that C<dPOPTOPnnrl_ul> |
---|
1306 | arranges for two C<NV>s to be placed into C<left> and C<right> - let's |
---|
1307 | slightly expand it: |
---|
1308 | |
---|
1309 | #define dPOPTOPnnrl_ul NV right = POPn; \ |
---|
1310 | SV *leftsv = TOPs; \ |
---|
1311 | NV left = USE_LEFT(leftsv) ? SvNV(leftsv) : 0.0 |
---|
1312 | |
---|
1313 | C<POPn> takes the SV from the top of the stack and obtains its NV either |
---|
1314 | directly (if C<SvNOK> is set) or by calling the C<sv_2nv> function. |
---|
1315 | C<TOPs> takes the next SV from the top of the stack - yes, C<POPn> uses |
---|
1316 | C<TOPs> - but doesn't remove it. We then use C<SvNV> to get the NV from |
---|
1317 | C<leftsv> in the same way as before - yes, C<POPn> uses C<SvNV>. |
---|
1318 | |
---|
1319 | Since we don't have an NV for C<$b>, we'll have to use C<sv_2nv> to |
---|
1320 | convert it. If we step again, we'll find ourselves there: |
---|
1321 | |
---|
1322 | Perl_sv_2nv (sv=0xa0675d0) at sv.c:1669 |
---|
1323 | 1669 if (!sv) |
---|
1324 | (gdb) |
---|
1325 | |
---|
1326 | We can now use C<Perl_sv_dump> to investigate the SV: |
---|
1327 | |
---|
1328 | SV = PV(0xa057cc0) at 0xa0675d0 |
---|
1329 | REFCNT = 1 |
---|
1330 | FLAGS = (POK,pPOK) |
---|
1331 | PV = 0xa06a510 "6XXXX"\0 |
---|
1332 | CUR = 5 |
---|
1333 | LEN = 6 |
---|
1334 | $1 = void |
---|
1335 | |
---|
1336 | We know we're going to get C<6> from this, so let's finish the |
---|
1337 | subroutine: |
---|
1338 | |
---|
1339 | (gdb) finish |
---|
1340 | Run till exit from #0 Perl_sv_2nv (sv=0xa0675d0) at sv.c:1671 |
---|
1341 | 0x462669 in Perl_pp_add () at pp_hot.c:311 |
---|
1342 | 311 dPOPTOPnnrl_ul; |
---|
1343 | |
---|
1344 | We can also dump out this op: the current op is always stored in |
---|
1345 | C<PL_op>, and we can dump it with C<Perl_op_dump>. This'll give us |
---|
1346 | similar output to L<B::Debug|B::Debug>. |
---|
1347 | |
---|
1348 | { |
---|
1349 | 13 TYPE = add ===> 14 |
---|
1350 | TARG = 1 |
---|
1351 | FLAGS = (SCALAR,KIDS) |
---|
1352 | { |
---|
1353 | TYPE = null ===> (12) |
---|
1354 | (was rv2sv) |
---|
1355 | FLAGS = (SCALAR,KIDS) |
---|
1356 | { |
---|
1357 | 11 TYPE = gvsv ===> 12 |
---|
1358 | FLAGS = (SCALAR) |
---|
1359 | GV = main::b |
---|
1360 | } |
---|
1361 | } |
---|
1362 | |
---|
1363 | < finish this later > |
---|
1364 | |
---|
1365 | =head2 Patching |
---|
1366 | |
---|
1367 | All right, we've now had a look at how to navigate the Perl sources and |
---|
1368 | some things you'll need to know when fiddling with them. Let's now get |
---|
1369 | on and create a simple patch. Here's something Larry suggested: if a |
---|
1370 | C<U> is the first active format during a C<pack>, (for example, |
---|
1371 | C<pack "U3C8", @stuff>) then the resulting string should be treated as |
---|
1372 | UTF8 encoded. |
---|
1373 | |
---|
1374 | How do we prepare to fix this up? First we locate the code in question - |
---|
1375 | the C<pack> happens at runtime, so it's going to be in one of the F<pp> |
---|
1376 | files. Sure enough, C<pp_pack> is in F<pp.c>. Since we're going to be |
---|
1377 | altering this file, let's copy it to F<pp.c~>. |
---|
1378 | |
---|
1379 | Now let's look over C<pp_pack>: we take a pattern into C<pat>, and then |
---|
1380 | loop over the pattern, taking each format character in turn into |
---|
1381 | C<datum_type>. Then for each possible format character, we swallow up |
---|
1382 | the other arguments in the pattern (a field width, an asterisk, and so |
---|
1383 | on) and convert the next chunk input into the specified format, adding |
---|
1384 | it onto the output SV C<cat>. |
---|
1385 | |
---|
1386 | How do we know if the C<U> is the first format in the C<pat>? Well, if |
---|
1387 | we have a pointer to the start of C<pat> then, if we see a C<U> we can |
---|
1388 | test whether we're still at the start of the string. So, here's where |
---|
1389 | C<pat> is set up: |
---|
1390 | |
---|
1391 | STRLEN fromlen; |
---|
1392 | register char *pat = SvPVx(*++MARK, fromlen); |
---|
1393 | register char *patend = pat + fromlen; |
---|
1394 | register I32 len; |
---|
1395 | I32 datumtype; |
---|
1396 | SV *fromstr; |
---|
1397 | |
---|
1398 | We'll have another string pointer in there: |
---|
1399 | |
---|
1400 | STRLEN fromlen; |
---|
1401 | register char *pat = SvPVx(*++MARK, fromlen); |
---|
1402 | register char *patend = pat + fromlen; |
---|
1403 | + char *patcopy; |
---|
1404 | register I32 len; |
---|
1405 | I32 datumtype; |
---|
1406 | SV *fromstr; |
---|
1407 | |
---|
1408 | And just before we start the loop, we'll set C<patcopy> to be the start |
---|
1409 | of C<pat>: |
---|
1410 | |
---|
1411 | items = SP - MARK; |
---|
1412 | MARK++; |
---|
1413 | sv_setpvn(cat, "", 0); |
---|
1414 | + patcopy = pat; |
---|
1415 | while (pat < patend) { |
---|
1416 | |
---|
1417 | Now if we see a C<U> which was at the start of the string, we turn on |
---|
1418 | the UTF8 flag for the output SV, C<cat>: |
---|
1419 | |
---|
1420 | + if (datumtype == 'U' && pat==patcopy+1) |
---|
1421 | + SvUTF8_on(cat); |
---|
1422 | if (datumtype == '#') { |
---|
1423 | while (pat < patend && *pat != '\n') |
---|
1424 | pat++; |
---|
1425 | |
---|
1426 | Remember that it has to be C<patcopy+1> because the first character of |
---|
1427 | the string is the C<U> which has been swallowed into C<datumtype!> |
---|
1428 | |
---|
1429 | Oops, we forgot one thing: what if there are spaces at the start of the |
---|
1430 | pattern? C<pack(" U*", @stuff)> will have C<U> as the first active |
---|
1431 | character, even though it's not the first thing in the pattern. In this |
---|
1432 | case, we have to advance C<patcopy> along with C<pat> when we see spaces: |
---|
1433 | |
---|
1434 | if (isSPACE(datumtype)) |
---|
1435 | continue; |
---|
1436 | |
---|
1437 | needs to become |
---|
1438 | |
---|
1439 | if (isSPACE(datumtype)) { |
---|
1440 | patcopy++; |
---|
1441 | continue; |
---|
1442 | } |
---|
1443 | |
---|
1444 | OK. That's the C part done. Now we must do two additional things before |
---|
1445 | this patch is ready to go: we've changed the behaviour of Perl, and so |
---|
1446 | we must document that change. We must also provide some more regression |
---|
1447 | tests to make sure our patch works and doesn't create a bug somewhere |
---|
1448 | else along the line. |
---|
1449 | |
---|
1450 | The regression tests for each operator live in F<t/op/>, and so we make |
---|
1451 | a copy of F<t/op/pack.t> to F<t/op/pack.t~>. Now we can add our tests |
---|
1452 | to the end. First, we'll test that the C<U> does indeed create Unicode |
---|
1453 | strings: |
---|
1454 | |
---|
1455 | print 'not ' unless "1.20.300.4000" eq sprintf "%vd", pack("U*",1,20,300,4000); |
---|
1456 | print "ok $test\n"; $test++; |
---|
1457 | |
---|
1458 | Now we'll test that we got that space-at-the-beginning business right: |
---|
1459 | |
---|
1460 | print 'not ' unless "1.20.300.4000" eq |
---|
1461 | sprintf "%vd", pack(" U*",1,20,300,4000); |
---|
1462 | print "ok $test\n"; $test++; |
---|
1463 | |
---|
1464 | And finally we'll test that we don't make Unicode strings if C<U> is B<not> |
---|
1465 | the first active format: |
---|
1466 | |
---|
1467 | print 'not ' unless v1.20.300.4000 ne |
---|
1468 | sprintf "%vd", pack("C0U*",1,20,300,4000); |
---|
1469 | print "ok $test\n"; $test++; |
---|
1470 | |
---|
1471 | Mustn't forget to change the number of tests which appears at the top, or |
---|
1472 | else the automated tester will get confused: |
---|
1473 | |
---|
1474 | -print "1..156\n"; |
---|
1475 | +print "1..159\n"; |
---|
1476 | |
---|
1477 | We now compile up Perl, and run it through the test suite. Our new |
---|
1478 | tests pass, hooray! |
---|
1479 | |
---|
1480 | Finally, the documentation. The job is never done until the paperwork is |
---|
1481 | over, so let's describe the change we've just made. The relevant place |
---|
1482 | is F<pod/perlfunc.pod>; again, we make a copy, and then we'll insert |
---|
1483 | this text in the description of C<pack>: |
---|
1484 | |
---|
1485 | =item * |
---|
1486 | |
---|
1487 | If the pattern begins with a C<U>, the resulting string will be treated |
---|
1488 | as Unicode-encoded. You can force UTF8 encoding on in a string with an |
---|
1489 | initial C<U0>, and the bytes that follow will be interpreted as Unicode |
---|
1490 | characters. If you don't want this to happen, you can begin your pattern |
---|
1491 | with C<C0> (or anything else) to force Perl not to UTF8 encode your |
---|
1492 | string, and then follow this with a C<U*> somewhere in your pattern. |
---|
1493 | |
---|
1494 | All done. Now let's create the patch. F<Porting/patching.pod> tells us |
---|
1495 | that if we're making major changes, we should copy the entire directory |
---|
1496 | to somewhere safe before we begin fiddling, and then do |
---|
1497 | |
---|
1498 | diff -ruN old new > patch |
---|
1499 | |
---|
1500 | However, we know which files we've changed, and we can simply do this: |
---|
1501 | |
---|
1502 | diff -u pp.c~ pp.c > patch |
---|
1503 | diff -u t/op/pack.t~ t/op/pack.t >> patch |
---|
1504 | diff -u pod/perlfunc.pod~ pod/perlfunc.pod >> patch |
---|
1505 | |
---|
1506 | We end up with a patch looking a little like this: |
---|
1507 | |
---|
1508 | --- pp.c~ Fri Jun 02 04:34:10 2000 |
---|
1509 | +++ pp.c Fri Jun 16 11:37:25 2000 |
---|
1510 | @@ -4375,6 +4375,7 @@ |
---|
1511 | register I32 items; |
---|
1512 | STRLEN fromlen; |
---|
1513 | register char *pat = SvPVx(*++MARK, fromlen); |
---|
1514 | + char *patcopy; |
---|
1515 | register char *patend = pat + fromlen; |
---|
1516 | register I32 len; |
---|
1517 | I32 datumtype; |
---|
1518 | @@ -4405,6 +4406,7 @@ |
---|
1519 | ... |
---|
1520 | |
---|
1521 | And finally, we submit it, with our rationale, to perl5-porters. Job |
---|
1522 | done! |
---|
1523 | |
---|
1524 | =head1 EXTERNAL TOOLS FOR DEBUGGING PERL |
---|
1525 | |
---|
1526 | Sometimes it helps to use external tools while debugging and |
---|
1527 | testing Perl. This section tries to guide you through using |
---|
1528 | some common testing and debugging tools with Perl. This is |
---|
1529 | meant as a guide to interfacing these tools with Perl, not |
---|
1530 | as any kind of guide to the use of the tools themselves. |
---|
1531 | |
---|
1532 | =head2 Rational Software's Purify |
---|
1533 | |
---|
1534 | Purify is a commercial tool that is helpful in identifying |
---|
1535 | memory overruns, wild pointers, memory leaks and other such |
---|
1536 | badness. Perl must be compiled in a specific way for |
---|
1537 | optimal testing with Purify. Purify is available under |
---|
1538 | Windows NT, Solaris, HP-UX, SGI, and Siemens Unix. |
---|
1539 | |
---|
1540 | The only currently known leaks happen when there are |
---|
1541 | compile-time errors within eval or require. (Fixing these |
---|
1542 | is non-trivial, unfortunately, but they must be fixed |
---|
1543 | eventually.) |
---|
1544 | |
---|
1545 | =head2 Purify on Unix |
---|
1546 | |
---|
1547 | On Unix, Purify creates a new Perl binary. To get the most |
---|
1548 | benefit out of Purify, you should create the perl to Purify |
---|
1549 | using: |
---|
1550 | |
---|
1551 | sh Configure -Accflags=-DPURIFY -Doptimize='-g' \ |
---|
1552 | -Uusemymalloc -Dusemultiplicity |
---|
1553 | |
---|
1554 | where these arguments mean: |
---|
1555 | |
---|
1556 | =over 4 |
---|
1557 | |
---|
1558 | =item -Accflags=-DPURIFY |
---|
1559 | |
---|
1560 | Disables Perl's arena memory allocation functions, as well as |
---|
1561 | forcing use of memory allocation functions derived from the |
---|
1562 | system malloc. |
---|
1563 | |
---|
1564 | =item -Doptimize='-g' |
---|
1565 | |
---|
1566 | Adds debugging information so that you see the exact source |
---|
1567 | statements where the problem occurs. Without this flag, all |
---|
1568 | you will see is the source filename of where the error occurred. |
---|
1569 | |
---|
1570 | =item -Uusemymalloc |
---|
1571 | |
---|
1572 | Disable Perl's malloc so that Purify can more closely monitor |
---|
1573 | allocations and leaks. Using Perl's malloc will make Purify |
---|
1574 | report most leaks in the "potential" leaks category. |
---|
1575 | |
---|
1576 | =item -Dusemultiplicity |
---|
1577 | |
---|
1578 | Enabling the multiplicity option allows perl to clean up |
---|
1579 | thoroughly when the interpreter shuts down, which reduces the |
---|
1580 | number of bogus leak reports from Purify. |
---|
1581 | |
---|
1582 | =back |
---|
1583 | |
---|
1584 | Once you've compiled a perl suitable for Purify'ing, then you |
---|
1585 | can just: |
---|
1586 | |
---|
1587 | make pureperl |
---|
1588 | |
---|
1589 | which creates a binary named 'pureperl' that has been Purify'ed. |
---|
1590 | This binary is used in place of the standard 'perl' binary |
---|
1591 | when you want to debug Perl memory problems. |
---|
1592 | |
---|
1593 | As an example, to show any memory leaks produced during the |
---|
1594 | standard Perl testset you would create and run the Purify'ed |
---|
1595 | perl as: |
---|
1596 | |
---|
1597 | make pureperl |
---|
1598 | cd t |
---|
1599 | ../pureperl -I../lib harness |
---|
1600 | |
---|
1601 | which would run Perl on test.pl and report any memory problems. |
---|
1602 | |
---|
1603 | Purify outputs messages in "Viewer" windows by default. If |
---|
1604 | you don't have a windowing environment or if you simply |
---|
1605 | want the Purify output to unobtrusively go to a log file |
---|
1606 | instead of to the interactive window, use these following |
---|
1607 | options to output to the log file "perl.log": |
---|
1608 | |
---|
1609 | setenv PURIFYOPTIONS "-chain-length=25 -windows=no \ |
---|
1610 | -log-file=perl.log -append-logfile=yes" |
---|
1611 | |
---|
1612 | If you plan to use the "Viewer" windows, then you only need this option: |
---|
1613 | |
---|
1614 | setenv PURIFYOPTIONS "-chain-length=25" |
---|
1615 | |
---|
1616 | =head2 Purify on NT |
---|
1617 | |
---|
1618 | Purify on Windows NT instruments the Perl binary 'perl.exe' |
---|
1619 | on the fly. There are several options in the makefile you |
---|
1620 | should change to get the most use out of Purify: |
---|
1621 | |
---|
1622 | =over 4 |
---|
1623 | |
---|
1624 | =item DEFINES |
---|
1625 | |
---|
1626 | You should add -DPURIFY to the DEFINES line so the DEFINES |
---|
1627 | line looks something like: |
---|
1628 | |
---|
1629 | DEFINES = -DWIN32 -D_CONSOLE -DNO_STRICT $(CRYPT_FLAG) -DPURIFY=1 |
---|
1630 | |
---|
1631 | to disable Perl's arena memory allocation functions, as |
---|
1632 | well as to force use of memory allocation functions derived |
---|
1633 | from the system malloc. |
---|
1634 | |
---|
1635 | =item USE_MULTI = define |
---|
1636 | |
---|
1637 | Enabling the multiplicity option allows perl to clean up |
---|
1638 | thoroughly when the interpreter shuts down, which reduces the |
---|
1639 | number of bogus leak reports from Purify. |
---|
1640 | |
---|
1641 | =item #PERL_MALLOC = define |
---|
1642 | |
---|
1643 | Disable Perl's malloc so that Purify can more closely monitor |
---|
1644 | allocations and leaks. Using Perl's malloc will make Purify |
---|
1645 | report most leaks in the "potential" leaks category. |
---|
1646 | |
---|
1647 | =item CFG = Debug |
---|
1648 | |
---|
1649 | Adds debugging information so that you see the exact source |
---|
1650 | statements where the problem occurs. Without this flag, all |
---|
1651 | you will see is the source filename of where the error occurred. |
---|
1652 | |
---|
1653 | =back |
---|
1654 | |
---|
1655 | As an example, to show any memory leaks produced during the |
---|
1656 | standard Perl testset you would create and run Purify as: |
---|
1657 | |
---|
1658 | cd win32 |
---|
1659 | make |
---|
1660 | cd ../t |
---|
1661 | purify ../perl -I../lib harness |
---|
1662 | |
---|
1663 | which would instrument Perl in memory, run Perl on test.pl, |
---|
1664 | then finally report any memory problems. |
---|
1665 | |
---|
1666 | =head2 CONCLUSION |
---|
1667 | |
---|
1668 | We've had a brief look around the Perl source, an overview of the stages |
---|
1669 | F<perl> goes through when it's running your code, and how to use a |
---|
1670 | debugger to poke at the Perl guts. We took a very simple problem and |
---|
1671 | demonstrated how to solve it fully - with documentation, regression |
---|
1672 | tests, and finally a patch for submission to p5p. Finally, we talked |
---|
1673 | about how to use external tools to debug and test Perl. |
---|
1674 | |
---|
1675 | I'd now suggest you read over those references again, and then, as soon |
---|
1676 | as possible, get your hands dirty. The best way to learn is by doing, |
---|
1677 | so: |
---|
1678 | |
---|
1679 | =over 3 |
---|
1680 | |
---|
1681 | =item * |
---|
1682 | |
---|
1683 | Subscribe to perl5-porters, follow the patches and try and understand |
---|
1684 | them; don't be afraid to ask if there's a portion you're not clear on - |
---|
1685 | who knows, you may unearth a bug in the patch... |
---|
1686 | |
---|
1687 | =item * |
---|
1688 | |
---|
1689 | Keep up to date with the bleeding edge Perl distributions and get |
---|
1690 | familiar with the changes. Try and get an idea of what areas people are |
---|
1691 | working on and the changes they're making. |
---|
1692 | |
---|
1693 | =item * |
---|
1694 | |
---|
1695 | Do read the README associated with your operating system, e.g. README.aix |
---|
1696 | on the IBM AIX OS. Don't hesitate to supply patches to that README if |
---|
1697 | you find anything missing or changed over a new OS release. |
---|
1698 | |
---|
1699 | =item * |
---|
1700 | |
---|
1701 | Find an area of Perl that seems interesting to you, and see if you can |
---|
1702 | work out how it works. Scan through the source, and step over it in the |
---|
1703 | debugger. Play, poke, investigate, fiddle! You'll probably get to |
---|
1704 | understand not just your chosen area but a much wider range of F<perl>'s |
---|
1705 | activity as well, and probably sooner than you'd think. |
---|
1706 | |
---|
1707 | =back |
---|
1708 | |
---|
1709 | =over 3 |
---|
1710 | |
---|
1711 | =item I<The Road goes ever on and on, down from the door where it began.> |
---|
1712 | |
---|
1713 | =back |
---|
1714 | |
---|
1715 | If you can do these things, you've started on the long road to Perl porting. |
---|
1716 | Thanks for wanting to help make Perl better - and happy hacking! |
---|
1717 | |
---|
1718 | =head1 AUTHOR |
---|
1719 | |
---|
1720 | This document was written by Nathan Torkington, and is maintained by |
---|
1721 | the perl5-porters mailing list. |
---|
1722 | |
---|