X-Git-Url: https://perl5.git.perl.org/perl5.git/blobdiff_plain/39a14fad02382ad7256334689a46a1f17b6f766f..0bec6c03caec93aa207fcfeff506a7c46c7019e9:/pod/perlhack.pod diff --git a/pod/perlhack.pod b/pod/perlhack.pod index fc3c686..5a7263c 100644 --- a/pod/perlhack.pod +++ b/pod/perlhack.pod @@ -361,7 +361,7 @@ patch directory. It's then up to you to apply these patches, using something like - # last=`ls -t *.gz | sed q` + # last="`cat ../perl-current/.patch`.gz" # rsync -avz rsync://public.activestate.com/perl-current-diffs/ . # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch # cd ../perl-current @@ -1356,10 +1356,6 @@ roughly how the tied C is implemented; see C in F: 7 call_method("PUSH", G_SCALAR|G_DISCARD); 8 LEAVE; -The lines which concern the mark stack are the first, fifth and last -lines: they save away, restore and remove the current position of the -argument stack. - Let's examine the whole implementation, for practice: 1 PUSHMARK(SP); @@ -1379,8 +1375,8 @@ retrieved with C, and the value, the SV C. 5 PUTBACK; -Next we tell Perl to make the change to the global stack pointer: C -only gave us a local copy, not a reference to the global. +Next we tell Perl to update the global stack pointer from our internal +variable: C only gave us a local copy, not a reference to the global. 6 ENTER; 7 call_method("PUSH", G_SCALAR|G_DISCARD); @@ -1394,7 +1390,9 @@ C<}> of a Perl block. To actually do the magic method call, we have to call a subroutine in Perl space: C takes care of that, and it's described in L. We call the C method in scalar context, and we're -going to discard its return value. +going to discard its return value. The call_method() function +removes the top element of the mark stack, so there is nothing for +the caller to clean up. =item Save stack @@ -1457,6 +1455,141 @@ You can expand the macros in a F file by saying which will expand the macros using cpp. Don't be scared by the results. +=head1 SOURCE CODE STATIC ANALYSIS + +Various tools exist for analysing C source code B, as +opposed to B, that is, without executing the code. +It is possible to detect resource leaks, undefined behaviour, type +mismatches, portability problems, code paths that would cause illegal +memory accesses, and other similar problems by just parsing the C code +and looking at the resulting graph, what does it tell about the +execution and data flows. As a matter of fact, this is exactly +how C compilers know to give warnings about dubious code. + +=head2 lint, splint + +The good old C code quality inspector, C, is available in +several platforms, but please be aware that there are several +different implementations of it by different vendors, which means that +the flags are not identical across different platforms. + +There is a lint variant called C (Secure Programming Lint) +available from http://www.splint.org/ that should compile on any +Unix-like platform. + +There are C and targets in Makefile, but you may have +to diddle with the flags (see above). + +=head2 Coverity + +Coverity (http://www.coverity.com/) is a product similar to lint and +as a testbed for their product they periodically check several open +source projects, and they give out accounts to open source developers +to the defect databases. + +=head2 cpd (cut-and-paste detector) + +The cpd tool detects cut-and-paste coding. If one instance of the +cut-and-pasted code changes, all the other spots should probably be +changed, too. Therefore such code should probably be turned into a +subroutine or a macro. + +cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project +(http://pmd.sourceforge.net/). pmd was originally written for static +analysis of Java code, but later the cpd part of it was extended to +parse also C and C++. + +Download the pmd-X.y.jar from the SourceForge site, and then run +it on source code thusly: + + java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt + +You may run into memory limits, in which case you should use the -Xmx option: + + java -Xmx512M ... + +=head2 gcc warnings + +Though much can be written about the inconsistency and coverage +problems of gcc warnings (like C<-Wall> not meaning "all the +warnings", or some common portability problems not being covered by +C<-Wall>, or C<-ansi> and C<-pedantic> both being a poorly defined +collection of warnings, and so forth), gcc is still a useful tool in +keeping our coding nose clean. + +The C<-Wall> is by default on. + +The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be +on always, but unfortunately they are not safe on all platforms, +they can for example cause fatal conflicts with the system headers +(Solaris being a prime example). The C frontend selects +C<-ansi -pedantic> for the platforms where they are known to be safe. + +Starting from Perl 5.9.4 the following extra flags are added: + +=over 4 + +=item * + +C<-Wendif-labels> + +=item * + +C<-Wextra> + +=item * + +C<-Wdeclaration-after-statement> + +=back + +The following flags would be nice to have but they would first need +their own Stygian stablemaster: + +=over 4 + +=item * + +C<-Wpointer-arith> + +=item * + +C<-Wshadow> + +=item * + +C<-Wstrict-prototypes> + +=item * + +=back + +The C<-Wtraditional> is another example of the annoying tendency of +gcc to bundle a lot of warnings under one switch -- it would be +impossible to deploy in practice because it would complain a lot -- but +it does contain some warnings that would be beneficial to have available +on their own, such as the warning about string constants inside macros +containing the macro arguments: this behaved differently pre-ANSI +than it does in ANSI, and some C compilers are still in transition, +AIX being an example. + +=head2 Warnings of other C compilers + +Other C compilers (yes, there B other C compilers than gcc) often +have their "strict ANSI" or "strict ANSI with some portability extensions" +modes on, like for example the Sun Workshop has its C<-Xa> mode on +(though implicitly), or the DEC (these days, HP...) has its C<-std1> +mode on. + +=head2 DEBUGGING + +You can compile a special debugging version of Perl, which allows you +to use the C<-D> option of Perl to tell more about what Perl is doing. +But sometimes there is no alternative than to dive in with a debugger, +either to see the stack trace of a core dump (very useful in a bug +report), or trying to figure out what went wrong before the core dump +happened, or how did we end up having wrong or unexpected results. + =head2 Poking at Perl To really poke around with Perl, you'll probably want to build Perl for @@ -1466,7 +1599,11 @@ debugging, like this: make C<-g> is a flag to the C compiler to have it produce debugging -information which will allow us to step through a running program. +information which will allow us to step through a running program, +and to see in which C function we are at (without the debugging +information we might see only the numerical addresses of the functions, +which is not very helpful). + F will also turn on the C compilation symbol which enables all the internal debugging code in Perl. There are a whole bunch of things you can debug with this: L lists them all, and the @@ -1493,8 +1630,9 @@ through perl's execution with a source-level debugger. =item * -We'll use C for our examples here; the principles will apply to any -debugger, but check the manual of the one you're using. +We'll use C for our examples here; the principles will apply to +any debugger (many vendors call their debugger C), but check the +manual of the one you're using. =back @@ -1502,6 +1640,10 @@ To fire up the debugger, type gdb ./perl +Or if you have a core dump: + + gdb ./perl core + You'll want to do that in your Perl source tree so the debugger can read the source code. You should see the copyright message, followed by the prompt. @@ -2210,6 +2352,279 @@ running 'make test_notty'. =back +=head2 Common problems when patching Perl source code + +Perl source plays by ANSI C89 rules: no C99 (or C++) extensions. In +some cases we have to take pre-ANSI requirements into consideration. +You don't care about some particular platform having broken Perl? +I hear there is still a strong demand for J2EE programmers. + +=head2 Perl environment problems + +=over 4 + +=item * + +Not compiling with threading + +Compiling with threading (-Duseithreads) completely rewrites +the function prototypes of Perl. You better try your changes +with that. Related to this is the difference between "Perl_-less" +and "Perl_-ly" APIs, for example: + + Perl_sv_setiv(aTHX_ ...); + sv_setiv(...); + +The first one explicitly passes in the context, which is needed for +e.g. threaded builds. The second one does that implicitly; do not get +them mixed. + +See L +for further discussion about context. + +=item * + +Not compiling with -DDEBUGGING + +The DEBUGGING define exposes more code to the compiler, +therefore more ways for things to go wrong. You should try it. + +=item * + +Not exporting your new function + +Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any +function that is part of the public API (the shared Perl library) +to be explicitly marked as exported. See the discussion about +F in L. + +=item * + +Exporting your new function + +The new shiny result of either genuine new functionality or your +arduous refactoring is now ready and correctly exported. So what +could possibly be wrong? + +Maybe simply that your function did not need to be exported in the +first place. Perl has a long and not so glorious history of exporting +functions that it should not have. + +If the function is used only inside one source code file, make it +static. See the discussion about F in L. + +If the function is used across several files, but intended only for +Perl's internal use (and this should be the common case), do not +export it to the public API. See the discussion about F +in L. + +=back + +=head Portability problems + +The following are common causes of compilation and/or execution +failures, not common to Perl as such. The C FAQ is good bedtime +reading. Please test your changes with as many C compilers and +platforms as possible -- we will, anyway, and it's nice to save +oneself from public embarrassment. + +Also study L carefully to avoid any bad assumptions +about the operating system, filesystem, and so forth. + +Do not assume an operating system indicates a certain compiler. + +=over 4 + +=item * + +Casting pointers to integers or casting integers to pointers + + void castaway(U8* p) + { + IV i = p; + +or + + void castaway(U8* p) + { + IV i = (IV)p; + +Either are bad, and broken, and unportable. Use the PTR2IV() +macro that does it right. (Likewise, there are PTR2UV(), PTR2NV(), +INT2PTR(), and NUM2PTR().) + +=item * + +Casting between data function pointers and data pointers + +Technically speaking casting between function pointers and data +pointers is unportable and undefined, but practically speaking +it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR() +macros. Sometimes you can also play games with unions. + +=item * + +Assuming sizeof(int) == sizeof(long) + +There are platforms where longs are 64 bits, and platforms where ints +are 64 bits, and while we are out to shock you, even platforms where +shorts are 64 bits. This is all legal according to the C standard. +(In other words, "long long" is not a portable way to specify 64 bits, +and "long long" is not even guaranteed to be any wider than "long".) +Use the definitions IV, UV, IVSIZE, I32SIZE, and so forth. Avoid +things like I32 because they are B guaranteed to be I +32 bits, they are I 32 bits, nor are they guaranteed to +be B or B. If you really explicitly need 64-bit variables, +use I64 and U64, but only if guarded by HAS_QUAD. + +=item * + +Assuming one can dereference any type of pointer for any type of data + + char *p = ...; + long pony = *p; + +Many platforms, quite rightly so, will give you a core dump instead +of a pony if the p happens not be correctly aligned. + +=item * + +Lvalue casts + + (int)*p = ...; + +Simply not portable. Get your lvalue to be of the right type, +or maybe use temporary variables. + +=item * + +Mixing #define and #ifdef + + #define BURGLE(x) ... \ + #ifdef BURGLE_OLD_STYLE + ... do it the old way ... \ + #else + ... do it the new way ... \ + #endif + +You cannot portably "stack" cpp directives. For example in the +above you need two separate #defines, one in each #ifdef branch. + +=item * + +Using //-comments + + // This function bamfoodles the zorklator. + +That is C99 or C++. Perl is C89. Using the //-comments is silently +allowed by many C compilers but cranking up the ANSI C89 strictness +(which we like to do) causes the compilation to fail. + +=item * + +Mixing declarations and code + + void zorklator() + { + int n = 3; + set_zorkmids(n); + int q = 4; + +That is C99 or C++. Some C compilers allow that, but you shouldn't. + +=item * + +Introducing variables inside for() + + for(int i = ...; ...; ...) + +That is C99 or C++. While it would indeed be awfully nice to have that +also in C89, to limit the scope of the loop variable, alas, we cannot. + +=item * + +Mixing signed char pointers with unsigned char pointers + + int foo(char *s) { ... } + ... + unsigned char *t = ...; /* Or U8* t = ... */ + foo(t); + +While this is legal practice, it is certainly dubious, and downright +fatal in at least one platform: for example VMS cc considers this a +fatal error. One cause for people often making this mistake is that a +"naked char" and therefore dereferencing a "naked char pointer" have +an undefined signedness: it depends on the compiler and the platform +whether the result is signed or unsigned. + +=item * + +Macros that have string constants and their arguments as substrings of +the string constants + + #define FOO(n) printf("number = %d\n", n) + FOO(10); + +Pre-ANSI semantics for that was equivalent to + + printf("10umber = %d\10"); + +which is probably not what you were expecting. Unfortunately at least +one reasonably common and modern C compiler does "real backward +compatibility here", in AIX that is what still happens even though the +rest of the AIX compiler is very happily C89. + +=item * + +Blindly using variadic macros + +gcc has had them for a while with its own syntax, and C99 +brought them with a standardized syntax. Don't use the former, +and use the latter only if the HAS_C99_VARIADIC_MACROS. + +=item * + +Blindly passing va_list + +Not all platforms support passing va_list to further varargs (stdarg) +functions. The right thing to do is to copy the va_list using the +Perl_va_copy() if the NEED_VA_COPY is defined. + +=back + +=head2 Security problems + +Last but not least, here are various tips for safer coding. + +=over 4 + +=item * + +Do not use gets() + +Or we will publicly ridicule you. Seriously. + +=item * + +Do not use strcpy() or strcat() + +While some uses of these still linger in the Perl source code, +we have inspected them for safety and are very, very ashamed of them, +and plan to get rid of them. In places where there are strlcpy() +and strlcat() we prefer to use them, and there is a plan to integrate +the strlcpy/strlcat implementation of INN. + +=item * + +Do not use sprintf() or vsprintf() + +If you really want just plain byte strings, use my_snprintf() +and my_vnsprintf() instead, which will try to use snprintf() and +vsnprintf() if those safer APIs are available. If you want something +fancier than a plain byte string, use SVs and Perl_sv_catpvf(). + +=back + =head1 EXTERNAL TOOLS FOR DEBUGGING PERL Sometimes it helps to use external tools while debugging and @@ -2712,14 +3127,16 @@ see L. =back -=head2 CONCLUSION +=head1 CONCLUSION -We've had a brief look around the Perl source, an overview of the stages -F goes through when it's running your code, and how to use a -debugger to poke at the Perl guts. We took a very simple problem and -demonstrated how to solve it fully - with documentation, regression -tests, and finally a patch for submission to p5p. Finally, we talked -about how to use external tools to debug and test Perl. +We've had a brief look around the Perl source, how to maintain quality +of the source code, an overview of the stages F goes through +when it's running your code, how to use debuggers to poke at the Perl +guts, and finally how to analyse the execution of Perl. We took a very +simple problem and demonstrated how to solve it fully - with +documentation, regression tests, and finally a patch for submission to +p5p. Finally, we talked about how to use external tools to debug and +test Perl. I'd now suggest you read over those references again, and then, as soon as possible, get your hands dirty. The best way to learn is by doing,