perlhack: more portability musings

[perl5.git] / pod / perlhack.pod
diff --git a/pod/perlhack.pod b/pod/perlhack.pod

index fc3c686..5a7263c 100644 (file)
--- a/pod/perlhack.pod
+++ b/pod/perlhack.pod
@@ -361,7 +361,7 @@ patch directory.
  
  It's then up to you to apply these patches, using something like
  
  
  It's then up to you to apply these patches, using something like
  
- # last=`ls -t *.gz | sed q`
+ # last="`cat ../perl-current/.patch`.gz"
   # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
   # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
   # cd ../perl-current
   # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
   # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
   # cd ../perl-current
@@ -1356,10 +1356,6 @@ roughly how the tied C<push> is implemented; see C<av_push> in F<av.c>:
       7 call_method("PUSH", G_SCALAR|G_DISCARD);
       8 LEAVE;
  
       7 call_method("PUSH", G_SCALAR|G_DISCARD);
       8 LEAVE;
  
-The lines which concern the mark stack are the first, fifth and last
-lines: they save away, restore and remove the current position of the
-argument stack. 
-
  Let's examine the whole implementation, for practice:
  
       1 PUSHMARK(SP);
  Let's examine the whole implementation, for practice:
  
       1 PUSHMARK(SP);
@@ -1379,8 +1375,8 @@ retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
  
       5 PUTBACK;
  
  
       5 PUTBACK;
  
-Next we tell Perl to make the change to the global stack pointer: C<dSP>
-only gave us a local copy, not a reference to the global.
+Next we tell Perl to update the global stack pointer from our internal
+variable: C<dSP> only gave us a local copy, not a reference to the global.
  
       6 ENTER;
       7 call_method("PUSH", G_SCALAR|G_DISCARD);
  
       6 ENTER;
       7 call_method("PUSH", G_SCALAR|G_DISCARD);
@@ -1394,7 +1390,9 @@ C<}> of a Perl block.
  To actually do the magic method call, we have to call a subroutine in
  Perl space: C<call_method> takes care of that, and it's described in
  L<perlcall>. We call the C<PUSH> method in scalar context, and we're
  To actually do the magic method call, we have to call a subroutine in
  Perl space: C<call_method> takes care of that, and it's described in
  L<perlcall>. We call the C<PUSH> method in scalar context, and we're
-going to discard its return value.
+going to discard its return value.  The call_method() function
+removes the top element of the mark stack, so there is nothing for
+the caller to clean up.
  
  =item Save stack
  
  
  =item Save stack
  
@@ -1457,6 +1455,141 @@ You can expand the macros in a F<foo.c> file by saying
  
  which will expand the macros using cpp.  Don't be scared by the results.
  
  
  which will expand the macros using cpp.  Don't be scared by the results.
  
+=head1 SOURCE CODE STATIC ANALYSIS
+
+Various tools exist for analysing C source code B<statically>, as
+opposed to B<dynamically>, that is, without executing the code.
+It is possible to detect resource leaks, undefined behaviour, type
+mismatches, portability problems, code paths that would cause illegal
+memory accesses, and other similar problems by just parsing the C code
+and looking at the resulting graph, what does it tell about the
+execution and data flows.  As a matter of fact, this is exactly
+how C compilers know to give warnings about dubious code.
+
+=head2 lint, splint
+
+The good old C code quality inspector, C<lint>, is available in
+several platforms, but please be aware that there are several
+different implementations of it by different vendors, which means that
+the flags are not identical across different platforms.
+
+There is a lint variant called C<splint> (Secure Programming Lint)
+available from http://www.splint.org/ that should compile on any
+Unix-like platform.
+
+There are C<lint> and <splint> targets in Makefile, but you may have
+to diddle with the flags (see above).
+
+=head2 Coverity
+
+Coverity (http://www.coverity.com/) is a product similar to lint and
+as a testbed for their product they periodically check several open
+source projects, and they give out accounts to open source developers
+to the defect databases.
+
+=head2 cpd (cut-and-paste detector)
+
+The cpd tool detects cut-and-paste coding.  If one instance of the
+cut-and-pasted code changes, all the other spots should probably be
+changed, too.  Therefore such code should probably be turned into a
+subroutine or a macro.
+
+cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project
+(http://pmd.sourceforge.net/).  pmd was originally written for static
+analysis of Java code, but later the cpd part of it was extended to
+parse also C and C++.
+
+Download the pmd-X.y.jar from the SourceForge site, and then run
+it on source code thusly:
+
+  java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
+
+You may run into memory limits, in which case you should use the -Xmx option:
+
+  java -Xmx512M ...
+
+=head2 gcc warnings
+
+Though much can be written about the inconsistency and coverage
+problems of gcc warnings (like C<-Wall> not meaning "all the
+warnings", or some common portability problems not being covered by
+C<-Wall>, or C<-ansi> and C<-pedantic> both being a poorly defined
+collection of warnings, and so forth), gcc is still a useful tool in
+keeping our coding nose clean.
+
+The C<-Wall> is by default on.
+
+The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be
+on always, but unfortunately they are not safe on all platforms,
+they can for example cause fatal conflicts with the system headers
+(Solaris being a prime example).  The C<cflags> frontend selects
+C<-ansi -pedantic> for the platforms where they are known to be safe.
+
+Starting from Perl 5.9.4 the following extra flags are added:
+
+=over 4
+
+=item *
+
+C<-Wendif-labels>
+
+=item *
+
+C<-Wextra>
+
+=item *
+
+C<-Wdeclaration-after-statement>
+
+=back
+
+The following flags would be nice to have but they would first need
+their own Stygian stablemaster:
+
+=over 4
+
+=item *
+
+C<-Wpointer-arith>
+
+=item *
+
+C<-Wshadow>
+
+=item *
+
+C<-Wstrict-prototypes>
+
+=item *
+
+=back
+
+The C<-Wtraditional> is another example of the annoying tendency of
+gcc to bundle a lot of warnings under one switch -- it would be
+impossible to deploy in practice because it would complain a lot -- but
+it does contain some warnings that would be beneficial to have available
+on their own, such as the warning about string constants inside macros
+containing the macro arguments: this behaved differently pre-ANSI
+than it does in ANSI, and some C compilers are still in transition,
+AIX being an example.
+
+=head2 Warnings of other C compilers
+
+Other C compilers (yes, there B<are> other C compilers than gcc) often
+have their "strict ANSI" or "strict ANSI with some portability extensions"
+modes on, like for example the Sun Workshop has its C<-Xa> mode on
+(though implicitly), or the DEC (these days, HP...) has its C<-std1>
+mode on.
+
+=head2 DEBUGGING
+
+You can compile a special debugging version of Perl, which allows you
+to use the C<-D> option of Perl to tell more about what Perl is doing.
+But sometimes there is no alternative than to dive in with a debugger,
+either to see the stack trace of a core dump (very useful in a bug
+report), or trying to figure out what went wrong before the core dump
+happened, or how did we end up having wrong or unexpected results.
+
  =head2 Poking at Perl
  
  To really poke around with Perl, you'll probably want to build Perl for
  =head2 Poking at Perl
  
  To really poke around with Perl, you'll probably want to build Perl for
@@ -1466,7 +1599,11 @@ debugging, like this:
      make
  
  C<-g> is a flag to the C compiler to have it produce debugging
      make
  
  C<-g> is a flag to the C compiler to have it produce debugging
-information which will allow us to step through a running program.
+information which will allow us to step through a running program,
+and to see in which C function we are at (without the debugging
+information we might see only the numerical addresses of the functions,
+which is not very helpful).
+
  F<Configure> will also turn on the C<DEBUGGING> compilation symbol which
  enables all the internal debugging code in Perl. There are a whole bunch
  of things you can debug with this: L<perlrun> lists them all, and the
  F<Configure> will also turn on the C<DEBUGGING> compilation symbol which
  enables all the internal debugging code in Perl. There are a whole bunch
  of things you can debug with this: L<perlrun> lists them all, and the
@@ -1493,8 +1630,9 @@ through perl's execution with a source-level debugger.
  
  =item *
  
  
  =item *
  
-We'll use C<gdb> for our examples here; the principles will apply to any
-debugger, but check the manual of the one you're using.
+We'll use C<gdb> for our examples here; the principles will apply to
+any debugger (many vendors call their debugger C<dbx>), but check the
+manual of the one you're using.
  
  =back
  
  
  =back
  
@@ -1502,6 +1640,10 @@ To fire up the debugger, type
  
      gdb ./perl
  
  
      gdb ./perl
  
+Or if you have a core dump:
+
+    gdb ./perl core
+
  You'll want to do that in your Perl source tree so the debugger can read
  the source code. You should see the copyright message, followed by the
  prompt.
  You'll want to do that in your Perl source tree so the debugger can read
  the source code. You should see the copyright message, followed by the
  prompt.
@@ -2210,6 +2352,279 @@ running 'make test_notty'.
  
  =back
  
  
  =back
  
+=head2 Common problems when patching Perl source code
+
+Perl source plays by ANSI C89 rules: no C99 (or C++) extensions.  In
+some cases we have to take pre-ANSI requirements into consideration.
+You don't care about some particular platform having broken Perl?
+I hear there is still a strong demand for J2EE programmers.
+
+=head2 Perl environment problems
+
+=over 4
+
+=item *
+
+Not compiling with threading
+
+Compiling with threading (-Duseithreads) completely rewrites
+the function prototypes of Perl.  You better try your changes
+with that.  Related to this is the difference between "Perl_-less"
+and "Perl_-ly" APIs, for example:
+
+  Perl_sv_setiv(aTHX_ ...);
+  sv_setiv(...);
+
+The first one explicitly passes in the context, which is needed for
+e.g. threaded builds.  The second one does that implicitly; do not get
+them mixed.
+
+See L<perlguts/"How multiple interpreters and concurrency are supported">
+for further discussion about context.
+
+=item *
+
+Not compiling with -DDEBUGGING
+
+The DEBUGGING define exposes more code to the compiler,
+therefore more ways for things to go wrong.  You should try it.
+
+=item *
+
+Not exporting your new function
+
+Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any
+function that is part of the public API (the shared Perl library)
+to be explicitly marked as exported.  See the discussion about
+F<embed.pl> in L<perlguts>.
+
+=item *
+
+Exporting your new function
+
+The new shiny result of either genuine new functionality or your
+arduous refactoring is now ready and correctly exported.  So what
+could possibly be wrong?
+
+Maybe simply that your function did not need to be exported in the
+first place.  Perl has a long and not so glorious history of exporting
+functions that it should not have.
+
+If the function is used only inside one source code file, make it
+static.  See the discussion about F<embed.pl> in L<perlguts>.
+
+If the function is used across several files, but intended only for
+Perl's internal use (and this should be the common case), do not
+export it to the public API.  See the discussion about F<embed.pl>
+in L<perlguts>.
+
+=back
+
+=head Portability problems
+
+The following are common causes of compilation and/or execution
+failures, not common to Perl as such.  The C FAQ is good bedtime
+reading.  Please test your changes with as many C compilers and
+platforms as possible -- we will, anyway, and it's nice to save
+oneself from public embarrassment.
+
+Also study L<perlport> carefully to avoid any bad assumptions
+about the operating system, filesystem, and so forth.
+
+Do not assume an operating system indicates a certain compiler.
+
+=over 4
+
+=item *
+
+Casting pointers to integers or casting integers to pointers
+
+    void castaway(U8* p)
+    {
+      IV i = p;
+
+or
+
+    void castaway(U8* p)
+    {
+      IV i = (IV)p;
+
+Either are bad, and broken, and unportable.  Use the PTR2IV()
+macro that does it right.  (Likewise, there are PTR2UV(), PTR2NV(),
+INT2PTR(), and NUM2PTR().)
+
+=item *
+
+Casting between data function pointers and data pointers
+
+Technically speaking casting between function pointers and data
+pointers is unportable and undefined, but practically speaking
+it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR()
+macros.  Sometimes you can also play games with unions.
+
+=item *
+
+Assuming sizeof(int) == sizeof(long)
+
+There are platforms where longs are 64 bits, and platforms where ints
+are 64 bits, and while we are out to shock you, even platforms where
+shorts are 64 bits.  This is all legal according to the C standard.
+(In other words, "long long" is not a portable way to specify 64 bits,
+and "long long" is not even guaranteed to be any wider than "long".)
+Use the definitions IV, UV, IVSIZE, I32SIZE, and so forth.  Avoid
+things like I32 because they are B<not> guaranteed to be I<exactly>
+32 bits, they are I<at least> 32 bits, nor are they guaranteed to
+be B<int> or B<long>.  If you really explicitly need 64-bit variables,
+use I64 and U64, but only if guarded by HAS_QUAD.
+
+=item *
+
+Assuming one can dereference any type of pointer for any type of data
+
+  char *p = ...;
+  long pony = *p;
+
+Many platforms, quite rightly so, will give you a core dump instead
+of a pony if the p happens not be correctly aligned.
+
+=item *
+
+Lvalue casts
+
+  (int)*p = ...;
+
+Simply not portable.  Get your lvalue to be of the right type,
+or maybe use temporary variables.
+
+=item *
+
+Mixing #define and #ifdef
+
+  #define BURGLE(x) ... \
+  #ifdef BURGLE_OLD_STYLE
+  ... do it the old way ... \
+  #else
+  ... do it the new way ... \
+  #endif
+
+You cannot portably "stack" cpp directives.  For example in the
+above you need two separate #defines, one in each #ifdef branch.
+
+=item *
+
+Using //-comments
+
+  // This function bamfoodles the zorklator.
+
+That is C99 or C++.  Perl is C89.  Using the //-comments is silently
+allowed by many C compilers but cranking up the ANSI C89 strictness
+(which we like to do) causes the compilation to fail.
+
+=item *
+
+Mixing declarations and code
+
+  void zorklator()
+  {
+    int n = 3;
+    set_zorkmids(n);
+    int q = 4;
+
+That is C99 or C++.  Some C compilers allow that, but you shouldn't.
+
+=item *
+
+Introducing variables inside for()
+
+  for(int i = ...; ...; ...)
+
+That is C99 or C++.  While it would indeed be awfully nice to have that
+also in C89, to limit the scope of the loop variable, alas, we cannot.
+
+=item *
+
+Mixing signed char pointers with unsigned char pointers
+
+  int foo(char *s) { ... }
+  ...
+  unsigned char *t = ...; /* Or U8* t = ... */
+  foo(t);
+
+While this is legal practice, it is certainly dubious, and downright
+fatal in at least one platform: for example VMS cc considers this a
+fatal error.  One cause for people often making this mistake is that a
+"naked char" and therefore dereferencing a "naked char pointer" have
+an undefined signedness: it depends on the compiler and the platform
+whether the result is signed or unsigned.
+
+=item *
+
+Macros that have string constants and their arguments as substrings of
+the string constants
+
+  #define FOO(n) printf("number = %d\n", n)
+  FOO(10);
+
+Pre-ANSI semantics for that was equivalent to
+
+  printf("10umber = %d\10");
+
+which is probably not what you were expecting.  Unfortunately at least
+one reasonably common and modern C compiler does "real backward
+compatibility here", in AIX that is what still happens even though the
+rest of the AIX compiler is very happily C89.
+
+=item *
+
+Blindly using variadic macros
+
+gcc has had them for a while with its own syntax, and C99
+brought them with a standardized syntax.  Don't use the former,
+and use the latter only if the HAS_C99_VARIADIC_MACROS.
+
+=item *
+
+Blindly passing va_list
+
+Not all platforms support passing va_list to further varargs (stdarg)
+functions.  The right thing to do is to copy the va_list using the
+Perl_va_copy() if the NEED_VA_COPY is defined.
+
+=back
+
+=head2 Security problems
+
+Last but not least, here are various tips for safer coding.
+
+=over 4
+
+=item *
+
+Do not use gets()
+
+Or we will publicly ridicule you.  Seriously.
+
+=item *
+
+Do not use strcpy() or strcat()
+
+While some uses of these still linger in the Perl source code,
+we have inspected them for safety and are very, very ashamed of them,
+and plan to get rid of them.  In places where there are strlcpy()
+and strlcat() we prefer to use them, and there is a plan to integrate
+the strlcpy/strlcat implementation of INN.
+
+=item *
+
+Do not use sprintf() or vsprintf()
+
+If you really want just plain byte strings, use my_snprintf()
+and my_vnsprintf() instead, which will try to use snprintf() and
+vsnprintf() if those safer APIs are available.  If you want something
+fancier than a plain byte string, use SVs and Perl_sv_catpvf().
+
+=back
+
  =head1 EXTERNAL TOOLS FOR DEBUGGING PERL
  
  Sometimes it helps to use external tools while debugging and
  =head1 EXTERNAL TOOLS FOR DEBUGGING PERL
  
  Sometimes it helps to use external tools while debugging and
@@ -2712,14 +3127,16 @@ see L<perlclib>.
  
  =back
  
  
  =back
  
-=head2 CONCLUSION
+=head1 CONCLUSION
  
  
-We've had a brief look around the Perl source, an overview of the stages
-F<perl> goes through when it's running your code, and how to use a
-debugger to poke at the Perl guts. We took a very simple problem and
-demonstrated how to solve it fully - with documentation, regression
-tests, and finally a patch for submission to p5p.  Finally, we talked
-about how to use external tools to debug and test Perl.
+We've had a brief look around the Perl source, how to maintain quality
+of the source code, an overview of the stages F<perl> goes through
+when it's running your code, how to use debuggers to poke at the Perl
+guts, and finally how to analyse the execution of Perl. We took a very
+simple problem and demonstrated how to solve it fully - with
+documentation, regression tests, and finally a patch for submission to
+p5p.  Finally, we talked about how to use external tools to debug and
+test Perl.
  
  I'd now suggest you read over those references again, and then, as soon
  as possible, get your hands dirty. The best way to learn is by doing,
  
  I'd now suggest you read over those references again, and then, as soon
  as possible, get your hands dirty. The best way to learn is by doing,