perlhack: more portability musings

[perl5.git] / pod / perlhack.pod
diff --git a/pod/perlhack.pod b/pod/perlhack.pod

index fc3c686..5a7263c 100644 (file)
--- a/pod/perlhack.pod
+++ b/pod/perlhack.pod
@@ -361,7 +361,7 @@ patch directory.
  
  It's then up to you to apply these patches, using something like
  
- # last=`ls -t *.gz | sed q`
+ # last="`cat ../perl-current/.patch`.gz"
   # rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
   # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch
   # cd ../perl-current
@@ -1356,10 +1356,6 @@ roughly how the tied C<push> is implemented; see C<av_push> in F<av.c>:
       7 call_method("PUSH", G_SCALAR|G_DISCARD);
       8 LEAVE;
  
-The lines which concern the mark stack are the first, fifth and last
-lines: they save away, restore and remove the current position of the
-argument stack. 
-
  Let's examine the whole implementation, for practice:
  
       1 PUSHMARK(SP);
@@ -1379,8 +1375,8 @@ retrieved with C<SvTIED_obj>, and the value, the SV C<val>.
  
       5 PUTBACK;
  
-Next we tell Perl to make the change to the global stack pointer: C<dSP>
-only gave us a local copy, not a reference to the global.
+Next we tell Perl to update the global stack pointer from our internal
+variable: C<dSP> only gave us a local copy, not a reference to the global.
  
       6 ENTER;
       7 call_method("PUSH", G_SCALAR|G_DISCARD);
@@ -1394,7 +1390,9 @@ C<}> of a Perl block.
  To actually do the magic method call, we have to call a subroutine in
  Perl space: C<call_method> takes care of that, and it's described in
  L<perlcall>. We call the C<PUSH> method in scalar context, and we're
-going to discard its return value.
+going to discard its return value.  The call_method() function
+removes the top element of the mark stack, so there is nothing for
+the caller to clean up.
  
  =item Save stack
  
@@ -1457,6 +1455,141 @@ You can expand the macros in a F<foo.c> file by saying
  
  which will expand the macros using cpp.  Don't be scared by the results.
  
+=head1 SOURCE CODE STATIC ANALYSIS
+
+Various tools exist for analysing C source code B<statically>, as
+opposed to B<dynamically>, that is, without executing the code.
+It is possible to detect resource leaks, undefined behaviour, type
+mismatches, portability problems, code paths that would cause illegal
+memory accesses, and other similar problems by just parsing the C code
+and looking at the resulting graph, what does it tell about the
+execution and data flows.  As a matter of fact, this is exactly
+how C compilers know to give warnings about dubious code.
+
+=head2 lint, splint
+
+The good old C code quality inspector, C<lint>, is available in
+several platforms, but please be aware that there are several
+different implementations of it by different vendors, which means that
+the flags are not identical across different platforms.
+
+There is a lint variant called C<splint> (Secure Programming Lint)
+available from http://www.splint.org/ that should compile on any
+Unix-like platform.
+
+There are C<lint> and <splint> targets in Makefile, but you may have
+to diddle with the flags (see above).
+
+=head2 Coverity
+
+Coverity (http://www.coverity.com/) is a product similar to lint and
+as a testbed for their product they periodically check several open
+source projects, and they give out accounts to open source developers
+to the defect databases.
+
+=head2 cpd (cut-and-paste detector)
+
+The cpd tool detects cut-and-paste coding.  If one instance of the
+cut-and-pasted code changes, all the other spots should probably be
+changed, too.  Therefore such code should probably be turned into a
+subroutine or a macro.
+
+cpd (http://pmd.sourceforge.net/cpd.html) is part of the pmd project
+(http://pmd.sourceforge.net/).  pmd was originally written for static
+analysis of Java code, but later the cpd part of it was extended to
+parse also C and C++.
+
+Download the pmd-X.y.jar from the SourceForge site, and then run
+it on source code thusly:
+
+  java -cp pmd-X.Y.jar net.sourceforge.pmd.cpd.CPD --minimum-tokens 100 --files /some/where/src --language c > cpd.txt
+
+You may run into memory limits, in which case you should use the -Xmx option:
+
+  java -Xmx512M ...
+
+=head2 gcc warnings
+
+Though much can be written about the inconsistency and coverage
+problems of gcc warnings (like C<-Wall> not meaning "all the
+warnings", or some common portability problems not being covered by
+C<-Wall>, or C<-ansi> and C<-pedantic> both being a poorly defined
+collection of warnings, and so forth), gcc is still a useful tool in
+keeping our coding nose clean.
+
+The C<-Wall> is by default on.
+
+The C<-ansi> (and its sidekick, C<-pedantic>) would be nice to be
+on always, but unfortunately they are not safe on all platforms,
+they can for example cause fatal conflicts with the system headers
+(Solaris being a prime example).  The C<cflags> frontend selects
+C<-ansi -pedantic> for the platforms where they are known to be safe.
+
+Starting from Perl 5.9.4 the following extra flags are added:
+
+=over 4
+
+=item *
+
+C<-Wendif-labels>
+
+=item *
+
+C<-Wextra>
+
+=item *
+
+C<-Wdeclaration-after-statement>
+
+=back
+
+The following flags would be nice to have but they would first need
+their own Stygian stablemaster:
+
+=over 4
+
+=item *
+
+C<-Wpointer-arith>
+
+=item *
+
+C<-Wshadow>
+
+=item *
+
+C<-Wstrict-prototypes>
+
+=item *
+
+=back
+
+The C<-Wtraditional> is another example of the annoying tendency of
+gcc to bundle a lot of warnings under one switch -- it would be
+impossible to deploy in practice because it would complain a lot -- but
+it does contain some warnings that would be beneficial to have available
+on their own, such as the warning about string constants inside macros
+containing the macro arguments: this behaved differently pre-ANSI
+than it does in ANSI, and some C compilers are still in transition,
+AIX being an example.
+
+=head2 Warnings of other C compilers
+
+Other C compilers (yes, there B<are> other C compilers than gcc) often
+have their "strict ANSI" or "strict ANSI with some portability extensions"
+modes on, like for example the Sun Workshop has its C<-Xa> mode on
+(though implicitly), or the DEC (these days, HP...) has its C<-std1>
+mode on.
+
+=head2 DEBUGGING
+
+You can compile a special debugging version of Perl, which allows you
+to use the C<-D> option of Perl to tell more about what Perl is doing.
+But sometimes there is no alternative than to dive in with a debugger,
+either to see the stack trace of a core dump (very useful in a bug
+report), or trying to figure out what went wrong before the core dump
+happened, or how did we end up having wrong or unexpected results.
+
  =head2 Poking at Perl
  
  To really poke around with Perl, you'll probably want to build Perl for
@@ -1466,7 +1599,11 @@ debugging, like this:
      make
  
  C<-g> is a flag to the C compiler to have it produce debugging
-information which will allow us to step through a running program.
+information which will allow us to step through a running program,
+and to see in which C function we are at (without the debugging
+information we might see only the numerical addresses of the functions,
+which is not very helpful).
+
  F<Configure> will also turn on the C<DEBUGGING> compilation symbol which
  enables all the internal debugging code in Perl. There are a whole bunch
  of things you can debug with this: L<perlrun> lists them all, and the
@@ -1493,8 +1630,9 @@ through perl's execution with a source-level debugger.
  
  =item *
  
-We'll use C<gdb> for our examples here; the principles will apply to any
-debugger, but check the manual of the one you're using.
+We'll use C<gdb> for our examples here; the principles will apply to
+any debugger (many vendors call their debugger C<dbx>), but check the
+manual of the one you're using.
  
  =back
  
@@ -1502,6 +1640,10 @@ To fire up the debugger, type
  
      gdb ./perl
  
+Or if you have a core dump:
+
+    gdb ./perl core
+
  You'll want to do that in your Perl source tree so the debugger can read
  the source code. You should see the copyright message, followed by the
  prompt.
@@ -2210,6 +2352,279 @@ running 'make test_notty'.
  
  =back
  
+=head2 Common problems when patching Perl source code
+
+Perl source plays by ANSI C89 rules: no C99 (or C++) extensions.  In
+some cases we have to take pre-ANSI requirements into consideration.
+You don't care about some particular platform having broken Perl?
+I hear there is still a strong demand for J2EE programmers.
+
+=head2 Perl environment problems
+
+=over 4
+
+=item *
+
+Not compiling with threading
+
+Compiling with threading (-Duseithreads) completely rewrites
+the function prototypes of Perl.  You better try your changes
+with that.  Related to this is the difference between "Perl_-less"
+and "Perl_-ly" APIs, for example:
+
+  Perl_sv_setiv(aTHX_ ...);
+  sv_setiv(...);
+
+The first one explicitly passes in the context, which is needed for
+e.g. threaded builds.  The second one does that implicitly; do not get
+them mixed.
+
+See L<perlguts/"How multiple interpreters and concurrency are supported">
+for further discussion about context.
+
+=item *
+
+Not compiling with -DDEBUGGING
+
+The DEBUGGING define exposes more code to the compiler,
+therefore more ways for things to go wrong.  You should try it.
+
+=item *
+
+Not exporting your new function
+
+Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any
+function that is part of the public API (the shared Perl library)
+to be explicitly marked as exported.  See the discussion about
+F<embed.pl> in L<perlguts>.
+
+=item *
+
+Exporting your new function
+
+The new shiny result of either genuine new functionality or your
+arduous refactoring is now ready and correctly exported.  So what
+could possibly be wrong?
+
+Maybe simply that your function did not need to be exported in the
+first place.  Perl has a long and not so glorious history of exporting
+functions that it should not have.
+
+If the function is used only inside one source code file, make it
+static.  See the discussion about F<embed.pl> in L<perlguts>.
+
+If the function is used across several files, but intended only for
+Perl's internal use (and this should be the common case), do not
+export it to the public API.  See the discussion about F<embed.pl>
+in L<perlguts>.
+
+=back
+
+=head Portability problems
+
+The following are common causes of compilation and/or execution
+failures, not common to Perl as such.  The C FAQ is good bedtime
+reading.  Please test your changes with as many C compilers and
+platforms as possible -- we will, anyway, and it's nice to save
+oneself from public embarrassment.
+
+Also study L<perlport> carefully to avoid any bad assumptions
+about the operating system, filesystem, and so forth.
+
+Do not assume an operating system indicates a certain compiler.
+
+=over 4
+
+=item *
+
+Casting pointers to integers or casting integers to pointers
+
+    void castaway(U8* p)
+    {
+      IV i = p;
+
+or
+
+    void castaway(U8* p)
+    {
+      IV i = (IV)p;
+
+Either are bad, and broken, and unportable.  Use the PTR2IV()
+macro that does it right.  (Likewise, there are PTR2UV(), PTR2NV(),
+INT2PTR(), and NUM2PTR().)
+
+=item *
+
+Casting between data function pointers and data pointers
+
+Technically speaking casting between function pointers and data
+pointers is unportable and undefined, but practically speaking
+it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR()
+macros.  Sometimes you can also play games with unions.
+
+=item *
+
+Assuming sizeof(int) == sizeof(long)
+
+There are platforms where longs are 64 bits, and platforms where ints
+are 64 bits, and while we are out to shock you, even platforms where
+shorts are 64 bits.  This is all legal according to the C standard.
+(In other words, "long long" is not a portable way to specify 64 bits,
+and "long long" is not even guaranteed to be any wider than "long".)
+Use the definitions IV, UV, IVSIZE, I32SIZE, and so forth.  Avoid
+things like I32 because they are B<not> guaranteed to be I<exactly>
+32 bits, they are I<at least> 32 bits, nor are they guaranteed to
+be B<int> or B<long>.  If you really explicitly need 64-bit variables,
+use I64 and U64, but only if guarded by HAS_QUAD.
+
+=item *
+
+Assuming one can dereference any type of pointer for any type of data
+
+  char *p = ...;
+  long pony = *p;
+
+Many platforms, quite rightly so, will give you a core dump instead
+of a pony if the p happens not be correctly aligned.
+
+=item *
+
+Lvalue casts
+
+  (int)*p = ...;
+
+Simply not portable.  Get your lvalue to be of the right type,
+or maybe use temporary variables.
+
+=item *
+
+Mixing #define and #ifdef
+
+  #define BURGLE(x) ... \
+  #ifdef BURGLE_OLD_STYLE
+  ... do it the old way ... \
+  #else
+  ... do it the new way ... \
+  #endif
+
+You cannot portably "stack" cpp directives.  For example in the
+above you need two separate #defines, one in each #ifdef branch.
+
+=item *
+
+Using //-comments
+
+  // This function bamfoodles the zorklator.
+
+That is C99 or C++.  Perl is C89.  Using the //-comments is silently
+allowed by many C compilers but cranking up the ANSI C89 strictness
+(which we like to do) causes the compilation to fail.
+
+=item *
+
+Mixing declarations and code
+
+  void zorklator()
+  {
+    int n = 3;
+    set_zorkmids(n);
+    int q = 4;
+
+That is C99 or C++.  Some C compilers allow that, but you shouldn't.
+
+=item *
+
+Introducing variables inside for()
+
+  for(int i = ...; ...; ...)
+
+That is C99 or C++.  While it would indeed be awfully nice to have that
+also in C89, to limit the scope of the loop variable, alas, we cannot.
+
+=item *
+
+Mixing signed char pointers with unsigned char pointers
+
+  int foo(char *s) { ... }
+  ...
+  unsigned char *t = ...; /* Or U8* t = ... */
+  foo(t);
+
+While this is legal practice, it is certainly dubious, and downright
+fatal in at least one platform: for example VMS cc considers this a
+fatal error.  One cause for people often making this mistake is that a
+"naked char" and therefore dereferencing a "naked char pointer" have
+an undefined signedness: it depends on the compiler and the platform
+whether the result is signed or unsigned.
+
+=item *
+
+Macros that have string constants and their arguments as substrings of
+the string constants
+
+  #define FOO(n) printf("number = %d\n", n)
+  FOO(10);
+
+Pre-ANSI semantics for that was equivalent to
+
+  printf("10umber = %d\10");
+
+which is probably not what you were expecting.  Unfortunately at least
+one reasonably common and modern C compiler does "real backward
+compatibility here", in AIX that is what still happens even though the
+rest of the AIX compiler is very happily C89.
+
+=item *
+
+Blindly using variadic macros
+
+gcc has had them for a while with its own syntax, and C99
+brought them with a standardized syntax.  Don't use the former,
+and use the latter only if the HAS_C99_VARIADIC_MACROS.
+
+=item *
+
+Blindly passing va_list
+
+Not all platforms support passing va_list to further varargs (stdarg)
+functions.  The right thing to do is to copy the va_list using the
+Perl_va_copy() if the NEED_VA_COPY is defined.
+
+=back
+
+=head2 Security problems
+
+Last but not least, here are various tips for safer coding.
+
+=over 4
+
+=item *
+
+Do not use gets()
+
+Or we will publicly ridicule you.  Seriously.
+
+=item *
+
+Do not use strcpy() or strcat()
+
+While some uses of these still linger in the Perl source code,
+we have inspected them for safety and are very, very ashamed of them,
+and plan to get rid of them.  In places where there are strlcpy()
+and strlcat() we prefer to use them, and there is a plan to integrate
+the strlcpy/strlcat implementation of INN.
+
+=item *
+
+Do not use sprintf() or vsprintf()
+
+If you really want just plain byte strings, use my_snprintf()
+and my_vnsprintf() instead, which will try to use snprintf() and
+vsnprintf() if those safer APIs are available.  If you want something
+fancier than a plain byte string, use SVs and Perl_sv_catpvf().
+
+=back
+
  =head1 EXTERNAL TOOLS FOR DEBUGGING PERL
  
  Sometimes it helps to use external tools while debugging and
@@ -2712,14 +3127,16 @@ see L<perlclib>.
  
  =back
  
-=head2 CONCLUSION
+=head1 CONCLUSION
  
-We've had a brief look around the Perl source, an overview of the stages
-F<perl> goes through when it's running your code, and how to use a
-debugger to poke at the Perl guts. We took a very simple problem and
-demonstrated how to solve it fully - with documentation, regression
-tests, and finally a patch for submission to p5p.  Finally, we talked
-about how to use external tools to debug and test Perl.
+We've had a brief look around the Perl source, how to maintain quality
+of the source code, an overview of the stages F<perl> goes through
+when it's running your code, how to use debuggers to poke at the Perl
+guts, and finally how to analyse the execution of Perl. We took a very
+simple problem and demonstrated how to solve it fully - with
+documentation, regression tests, and finally a patch for submission to
+p5p.  Finally, we talked about how to use external tools to debug and
+test Perl.
  
  I'd now suggest you read over those references again, and then, as soon
  as possible, get your hands dirty. The best way to learn is by doing,