perltodo: Revise utf8 todo

[perl5.git] / pod / perltodo.pod
diff --git a/pod/perltodo.pod b/pod/perltodo.pod

index ee35bf6..3bd0c06 100644 (file)
--- a/pod/perltodo.pod
+++ b/pod/perltodo.pod
@@ -4,11 +4,14 @@ perltodo - Perl TO-DO List
  
  =head1 DESCRIPTION
  
-This is a list of wishes for Perl. The tasks we think are smaller or
-easier are listed first. Anyone is welcome to work on any of these,
-but it's a good idea to first contact I<perl5-porters@perl.org> to
-avoid duplication of effort, and to learn from any previous attempts.
-By all means contact a pumpking privately first if you prefer.
+This is a list of wishes for Perl. The most up to date version of this file
+is at http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/pod/perltodo.pod
+
+The tasks we think are smaller or easier are listed first. Anyone is welcome
+to work on any of these, but it's a good idea to first contact
+I<perl5-porters@perl.org> to avoid duplication of effort, and to learn from
+any previous attempts. By all means contact a pumpking privately first if you
+prefer.
  
  Whilst patches to make the list shorter are most welcome, ideas to add to
  the list are also encouraged. Check the perl5-porters archives for past
@@ -23,55 +26,55 @@ programming languages offer you 1 line of immortality?
  
  =head1 Tasks that only need Perl knowledge
  
-=head2 Smartmatch design issues
+=head2 Migrate t/ from custom TAP generation
  
-In 5.10.0 the smartmatch operator C<~~> isn't working quite "right". But
-before we can fix the implementation, we need to define what "right" is.
-The first problem is that Robin Houston implemented the Perl 6 smart match
-spec as of February 2006, when smart match was axiomatically symmetrical:
-L<http://groups.google.com/group/perl.perl6.language/msg/bf2b486f089ad021>
+Many tests below F<t/> still generate TAP by "hand", rather than using library
+functions. As explained in L<perlhack/Writing a test>, tests in F<t/> are
+written in a particular way to test that more complex constructions actually
+work before using them routinely. Hence they don't use C<Test::More>, but
+instead there is an intentionally simpler library, F<t/test.pl>. However,
+quite a few tests in F<t/> have not been refactored to use it. Refactoring
+any of these tests, one at a time, is a useful thing TODO.
  
-Since then the Perl 6 target moved, but the Perl 5 implementation did not.
+The subdirectories F<base>, F<cmd> and F<comp>, that contain the most
+basic tests, should be excluded from this task.
  
-So it would be useful for someone to compare the Perl 6 smartmatch table
-as of February 2006 L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?view=markup&pathrev=7615>
-and the current table L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?revision=14556&view=markup>
-and tabulate the differences in Perl 6. The annotated view of changes is
-L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?view=annotate> and the diff is
-C<svn diff -r7615:14556 http://svn.perl.org/perl6/doc/trunk/design/syn/S03.pod>
--- search for C<=head1 Smart matching>. (In theory F<viewvc> can generate that,
-but in practice when I tried it hung forever, I assume "thinking")
+=head2 Test that regen.pl was run
  
-With that done and published, someone (else) can then map any changed Perl 6
-semantics back to Perl 5, based on how the existing semantics map to Perl 5:
-L<http://search.cpan.org/~rgarcia/perl-5.10.0/pod/perlsyn.pod#Smart_matching_in_detail>
+There are various generated files shipped with the perl distribution, for
+things like header files generate from data. The generation scripts are
+written in perl, and all can be run by F<regen.pl>. However, because they're
+written in perl, we can't run them before we've built perl. We can't run them
+as part of the F<Makefile>, because changing files underneath F<make> confuses
+it completely, and we don't want to run them automatically anyway, as they
+change files shipped by the distribution, something we seek not do to.
  
+If someone changes the data, but forgets to re-run F<regen.pl> then the
+generated files are out of sync. It would be good to have a test in
+F<t/porting> that checks that the generated files are in sync, and fails
+otherwise, to alert someone before they make a poor commit. I suspect that this
+would require adapting the scripts run from F<regen.pl> to have dry-run
+options, and invoking them with these, or by refactoring them into a library
+that does the generation, which can be called by the scripts, and by the test.
  
-There are also some questions that need answering:
+=head2 Automate perldelta generation
  
-=over 4
+The perldelta file accompanying each release summaries the major changes.
+It's mostly manually generated currently, but some of that could be
+automated with a bit of perl, specifically the generation of
  
-=item *
+=over
  
-How do you negate one?  (documentation issue)
-http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-01/msg00071.html
+=item Modules and Pragmata
  
-=item *
+=item New Documentation
  
-Array behaviors
-http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-12/msg00799.html
-
-* Should smart matches be symmetrical? (Perl 6 says no)
-
-* Other differences between Perl 5 and Perl 6 smart match?
-
-=item *
-
-Objects and smart match
-http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-12/msg00865.html
+=item New Tests
  
  =back
  
+See F<Porting/how_to_write_a_perldelta.pod> for details.
+
  =head2 Remove duplication of test setup.
  
  Schwern notes, that there's duplication of code - lots and lots of tests have
@@ -91,17 +94,17 @@ is needed to improve the cross-linking.
  The addition of C<Pod::Simple> and its related modules may make this task
  easier to complete.
  
-=head2 Parallel testing
+=head2 Make ExtUtils::ParseXS use strict;
  
-(This probably impacts much more than the core: also the Test::Harness
-and TAP::* modules on CPAN.)
+F<lib/ExtUtils/ParseXS.pm> contains this line
  
-All of the tests in F<t/> can now be run in parallel, if C<$ENV{TEST_JOBS}>
-is set. However, tests within each directory in F<ext> and F<lib> are still
-run in series, with directories run in parallel. This is an adequate
-heuristic, but it might be possible to relax it further, and get more
-throughput. Specifically, it would be good to audit all of F<lib/*.t>, and
-make them use C<File::Temp>.
+    # use strict;  # One of these days...
+
+Simply uncomment it, and fix all the resulting issues :-)
+
+The more practical approach, to break the task down into manageable chunks, is
+to work your way though the code from bottom to top, or if necessary adding
+extra C<{ ... }> blocks, and turning on strict within them.
  
  =head2 Make Schwern poorer
  
@@ -112,7 +115,7 @@ cash.
  
  =head2 Improve the coverage of the core tests
  
-Use Devel::Cover to ascertain the core modules's test coverage, then add
+Use Devel::Cover to ascertain the core modules' test coverage, then add
  tests that are currently missing.
  
  =head2 test B
@@ -143,12 +146,6 @@ do so. Test it with older perl releases, and fix the problems you find.
  To make a minimal perl distribution, it's useful to look at
  F<t/lib/commonsense.t>.
  
-=head2 Bundle dual life modules in ext/
-
-For maintenance (and branch merging) reasons, it would be useful to move
-some architecture-independent dual-life modules from lib/ to ext/, if this
-has no negative impact on the build of perl itself.
-
  =head2 POSIX memory footprint
  
  Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
@@ -192,6 +189,11 @@ The F<installman> script is slow. All it is doing text processing, which we're
  told is something Perl is good at. So it would be nice to know what it is doing
  that is taking so much CPU, and where possible address it.
  
+=head2 enable lexical enabling/disabling of inidvidual warnings
+
+Currently, warnings can only be enabled or disabled by category. There
+are times when it would be useful to quash a single warning, not a
+whole category.
  
  =head1 Tasks that need a little sysadmin-type knowledge
  
@@ -247,7 +249,7 @@ to do this manually are roughly
  =item *
  
  do a normal C<Configure>, but include Devel::Cover as a module to install
-(see F<INSTALL> for how to do this)
+(see L<INSTALL> for how to do this)
  
  =item *
  
@@ -324,7 +326,8 @@ visibility just to symbols declared in that file. It would be good to extend
  F<makedef.pl> to support this format, and to provide a means within
  C<Configure> to enable it. This would allow Unix users to test that the
  export list is correct, and to build a perl that does not pollute the global
-namespace with private symbols.
+namespace with private symbols, and will fail in the same way as msvc or mingw 
+builds or when using PERL_DL_NONLAZY=1.
  
  =head2 Cross-compile support
  
@@ -397,6 +400,29 @@ C<$Config{link}> and institute a fall-back plan if it weren't found."
  Although I can see that as confusing, given that C<$Config{d_link}> is true
  when (hard) links are available.
  
+=head2 Configure Windows using PowerShell
+
+Currently, Windows uses hard-coded config files based to build the
+config.h for compiling Perl.  Makefiles are also hard-coded and need to be 
+hand edited prior to building Perl. While this makes it easy to create a perl.exe 
+that works across multiple Windows versions, being able to accurately
+configure a perl.exe for a specific Windows versions and VS C++ would be
+a nice enhancement.  With PowerShell available on Windows XP and up, this 
+may now be possible.  Step 1 might be to investigate whether this is possible
+and use this to clean up our current makefile situation.  Step 2 would be to 
+see if there would be a way to use our existing metaconfig units to configure a
+Windows Perl or whether we go in a separate direction and make it so.  Of 
+course, we all know what step 3 is.
+
+=head2 decouple -g and -DDEBUGGING
+
+Currently F<Configure> automatically adds C<-DDEBUGGING> to the C compiler
+flags if it spots C<-g> in the optimiser flags. The pre-processor directive
+C<DEBUGGING> enables F<perl>'s command line C<-D> options, but in the process
+makes F<perl> slower. It would be good to disentangle this logic, so that
+C-level debugging with C<-g> and Perl level debugging with C<-D> can easily
+be enabled independently.
+
  =head1 Tasks that need a little C knowledge
  
  These tasks would need a little C knowledge, but don't need any specific
@@ -437,7 +463,7 @@ Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
  On these systems, it might be the default compilation mode, and there
  is currently no guarantee that passing no use64bitall option to the
  Configure process will build a 32bit perl. Implementing -Duse32bit*
-options would be nice for perl 5.12.
+options would be nice for perl 5.14.
  
  =head2 Profile Perl - am I hot or not?
  
@@ -572,6 +598,101 @@ These tasks would need C knowledge, and roughly the level of knowledge of
  the perl API that comes from writing modules that use XS to interface to
  C.
  
+=head2 Write an XS cookbook
+
+Create pod/perlxscookbook.pod with short, task-focused 'recipes' in XS that
+demonstrate common tasks and good practices.  (Some of these might be
+extracted from perlguts.) The target audience should be XS novices, who need
+more examples than perlguts but something less overwhelming than perlapi.
+Recipes should provide "one pretty good way to do it" instead of TIMTOWTDI.
+
+Rather than focusing on interfacing Perl to C libraries, such a cookbook
+should probably focus on how to optimize Perl routines by re-writing them
+in XS.  This will likely be more motivating to those who mostly work in
+Perl but are looking to take the next step into XS.
+
+Deconstructing and explaining some simpler XS modules could be one way to
+bootstrap a cookbook.  (List::Util? Class::XSAccessor? Tree::Ternary_XS?)
+Another option could be deconstructing the implementation of some simpler
+functions in op.c.
+
+=head2 Allow XSUBs to inline themselves as OPs
+
+For a simple XSUB, often the subroutine dispatch takes more time than the
+XSUB itself. The tokeniser already has the ability to inline constant
+subroutines - it would be good to provide a way to inline other subroutines.
+
+Specifically, simplest approach looks to be to allow an XSUB to provide an
+alternative implementation of itself as a custom OP. A new flag bit in
+C<CvFLAGS()> would signal to the peephole optimiser to take an optree
+such as this:
+
+    b  <@> leave[1 ref] vKP/REFC ->(end)
+    1     <0> enter ->2
+    2     <;> nextstate(main 1 -e:1) v:{ ->3
+    a     <2> sassign vKS/2 ->b
+    8        <1> entersub[t2] sKS/TARG,1 ->9
+    -           <1> ex-list sK ->8
+    3              <0> pushmark s ->4
+    4              <$> const(IV 1) sM ->5
+    6              <1> rv2av[t1] lKM/1 ->7
+    5                 <$> gv(*a) s ->6
+    -              <1> ex-rv2cv sK ->-
+    7                 <$> gv(*x) s/EARLYCV ->8
+    -        <1> ex-rv2sv sKRM*/1 ->a
+    9           <$> gvsv(*b) s ->a
+
+perform the symbol table lookup of C<rv2cv> and C<gv(*x)>, locate the
+pointer to the custom OP that provides the direct implementation, and re-
+write the optree something like:
+
+    b  <@> leave[1 ref] vKP/REFC ->(end)
+    1     <0> enter ->2
+    2     <;> nextstate(main 1 -e:1) v:{ ->3
+    a     <2> sassign vKS/2 ->b
+    7        <1> custom_x -> 8
+    -           <1> ex-list sK ->7
+    3              <0> pushmark s ->4
+    4              <$> const(IV 1) sM ->5
+    6              <1> rv2av[t1] lKM/1 ->7
+    5                 <$> gv(*a) s ->6
+    -              <1> ex-rv2cv sK ->-
+    -                 <$> ex-gv(*x) s/EARLYCV ->7
+    -        <1> ex-rv2sv sKRM*/1 ->a
+    8           <$> gvsv(*b) s ->a
+
+I<i.e.> the C<gv(*)> OP has been nulled and spliced out of the execution
+path, and the C<entersub> OP has been replaced by the custom op.
+
+This approach should provide a measurable speed up to simple XSUBs inside
+tight loops. Initially one would have to write the OP alternative
+implementation by hand, but it's likely that this should be reasonably
+straightforward for the type of XSUB that would benefit the most. Longer
+term, once the run-time implementation is proven, it should be possible to
+progressively update ExtUtils::ParseXS to generate OP implementations for
+some XSUBs.
+
+=head2 Remove the use of SVs as temporaries in dump.c
+
+F<dump.c> contains debugging routines to dump out the contains of perl data
+structures, such as C<SV>s, C<AV>s and C<HV>s. Currently, the dumping code
+B<uses> C<SV>s for its temporary buffers, which was a logical initial
+implementation choice, as they provide ready made memory handling.
+
+However, they also lead to a lot of confusion when it happens that what you're
+trying to debug is seen by the code in F<dump.c>, correctly or incorrectly, as
+a temporary scalar it can use for a temporary buffer. It's also not possible
+to dump scalars before the interpreter is properly set up, such as during
+ithreads cloning. It would be good to progressively replace the use of scalars
+as string accumulation buffers with something much simpler, directly allocated
+by C<malloc>. The F<dump.c> code is (or should be) only producing 7 bit
+US-ASCII, so output character sets are not an issue.
+
+Producing and proving an internal simple buffer allocation would make it easier
+to re-write the internals of the PerlIO subsystem to avoid using C<SV>s for
+B<its> buffers, use of which can cause problems similar to those of F<dump.c>,
+at similar times.
+
  =head2 safely supporting POSIX SA_SIGINFO
  
  Some years ago Jarkko supplied patches to provide support for the POSIX
@@ -689,12 +810,6 @@ See L</"Virtualize operating system access">.
  Currently glob patterns and filenames returned from File::Glob::glob()
  are always byte strings.  See L</"Virtualize operating system access">.
  
-=head2 Unicode and lc/uc operators
-
-Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
-what the internal encoding of their argument is. That should not be the
-case. Maybe add a pragma to switch behaviour.
-
  =head2 use less 'memory'
  
  Investigate trade offs to switch out perl's choices on memory usage.
@@ -794,6 +909,29 @@ also the warning messages (see L<perllexwarn>, C<warnings.pl>).
  These tasks would need C knowledge, and knowledge of how the interpreter works,
  or a willingness to learn.
  
+=head2 forbid labels with keyword names
+
+Currently C<goto keyword> "computes" the label value:
+
+    $ perl -e 'goto print'
+    Can't find label 1 at -e line 1.
+
+It is controversial if the right way to avoid the confusion is to forbid
+labels with keyword names, or if it would be better to always treat
+bareword expressions after a "goto" as a label and never as a keyword.
+
+=head2 truncate() prototype
+
+The prototype of truncate() is currently C<$$>. It should probably
+be C<*$> instead. (This is changed in F<opcode.pl>)
+
+=head2 decapsulation of smart match argument
+
+Currently C<$foo ~~ $object> will die with the message "Smart matching a
+non-overloaded object breaks encapsulation". It would be nice to allow
+to bypass this by using explictly the syntax C<$foo ~~ %$object> or
+C<$foo ~~ @$object>.
+
  =head2 error reporting of [$a ; $b]
  
  Using C<;> inside brackets is a syntax error, and we don't propose to change
@@ -828,11 +966,13 @@ years for this discrepancy.
  
  =head2 UTF-8 revamp
  
-The handling of Unicode is unclean in many places. For example, the regexp
-engine matches in Unicode semantics whenever the string or the pattern is
-flagged as UTF-8, but that should not be dependent on an internal storage
-detail of the string. Likewise, case folding behaviour is dependent on the
-UTF8 internal flag being on or off.
+The handling of Unicode is unclean in many places.  In the regex engine
+there are especially many problems.  The swash data structure could be
+replaced my something better.  Inversion lists and maps are likely
+candidates.  The whole Unicode database could be placed in-core for a
+huge speed-up.  Only minimal work was done on the optimizer when utf8
+was added, with the result that the synthetic start class often will
+fail to narrow down the possible choices when given non-Latin1 input.
  
  =head2 Properly Unicode safe tokeniser and pads.
  
@@ -872,6 +1012,15 @@ L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
  There is no method on tied filehandles to allow them to be called back by
  formats.
  
+=head2 Propagate compilation hints to the debugger
+
+Currently a debugger started with -dE on the command-line doesn't see the
+features enabled by -E. More generally hints (C<$^H> and C<%^H>) aren't
+propagated to the debugger. Probably it would be a good thing to propagate
+hints from the innermost non-C<DB::> scope: this would make code eval'ed
+in the debugger see the features (and strictures, etc.) currently in
+scope.
+
  =head2 Attach/detach debugger from running program
  
  The old perltodo notes "With C<gdb>, you can attach the debugger to a running
@@ -879,12 +1028,6 @@ program if you pass the process ID. It would be good to do this with the Perl
  debugger on a running Perl program, although I'm not sure how it would be
  done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
  
-=head2 Optimize away empty destructors
-
-Defining an empty DESTROY method might be useful (notably in
-AUTOLOAD-enabled classes), but it's still a bit expensive to call. That
-could probably be optimized.
-
  =head2 LVALUE functions for lists
  
  The old perltodo notes that lvalue functions don't work for list or hash
@@ -895,11 +1038,6 @@ slices. This would be good to fix.
  The regexp optimiser is not optional. It should configurable to be, to allow
  its performance to be measured, and its bugs to be easily demonstrated.
  
-=head2 delete &function
-
-Allow to delete functions. One can already undef them, but they're still
-in the stash.
-
  =head2 C</w> regex modifier
  
  That flag would enable to match whole words, and also to interpolate
@@ -961,7 +1099,7 @@ in fact, all of L<perlport> is.)
  This has actually already been implemented (but only for Win32),
  take a look at F<iperlsys.h> and F<win32/perlhost.h>.  While all Win32
  variants go through a set of "vtables" for operating system access,
-non-Win32 systems currently go straight for the POSIX/UNIX-style
+non-Win32 systems currently go straight for the POSIX/Unix-style
  system/library call.  Similar system as for Win32 should be
  implemented for all platforms.  The existing Win32 implementation
  probably does not need to survive alongside this proposed new
@@ -983,7 +1121,7 @@ See also L</"Extend PerlIO and PerlIO::Scalar">.
  
  =head2 Investigate PADTMP hash pessimisation
  
-The peephole optimier converts constants used for hash key lookups to shared
+The peephole optimiser converts constants used for hash key lookups to shared
  hash key scalars. Under ithreads, something is undoing this work.
  See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
  
@@ -1097,10 +1235,16 @@ combines the code in pp_entersub, pp_leavesub.  This should probably
  be done 1st in XS, and using B::Generate to patch the new OP into the
  optrees.
  
+=head2 Add C<00dddd>
+
+It has been proposed that octal constants be specifiable through the syntax
+C<0oddddd>, parallel to the existing construct to specify hex constants
+C<0xddddd>
+
  =head1 Big projects
  
  Tasks that will get your name mentioned in the description of the "Highlights
-of 5.12"
+of 5.14"
  
  =head2 make ithreads more robust
  
@@ -1109,7 +1253,8 @@ Generally make ithreads more robust. See also L</iCOW>
  This task is incremental - even a little bit of work on it will help, and
  will be greatly appreciated.
  
-One bit would be to write the missing code in sv.c:Perl_dirp_dup.
+One bit would be to determine how to clone directory handles on systems
+without a C<fchdir> function (in sv.c:Perl_dirp_dup).
  
  Fix Perl_sv_dup, et al so that threads can return objects.
  
@@ -1123,13 +1268,29 @@ it would be a good thing.
  
  Fix (or rewrite) the implementation of the C</(?{...})/> closures.
  
-=head2 A re-entrant regexp engine
-
-This will allow the use of a regex from inside (?{ }), (??{ }) and
-(?(?{ })|) constructs.
-
  =head2 Add class set operations to regexp engine
  
  Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
  
  demerphq has this on his todo list, but right at the bottom.  
+
+
+=head1 Tasks for microperl
+
+
+[ Each and every one of these may be obsolete, but they were listed
+  in the old Todo.micro file]
+
+
+=head2 make creating uconfig.sh automatic 
+
+=head2 make creating Makefile.micro automatic
+
+=head2 do away with fork/exec/wait?
+
+(system, popen should be enough?)
+
+=head2 some of the uconfig.sh really needs to be probed (using cc) in buildtime:
+
+(uConfigure? :-) native datatype widths and endianness come to mind
+