=head1 DESCRIPTION
-This is a list of wishes for Perl. The tasks we think are smaller or
-easier are listed first. Anyone is welcome to work on any of these,
-but it's a good idea to first contact I<perl5-porters@perl.org> to
-avoid duplication of effort, and to learn from any previous attempts.
-By all means contact a pumpking privately first if you prefer.
+This is a list of wishes for Perl. The most up to date version of this file
+is at http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/pod/perltodo.pod
+
+The tasks we think are smaller or easier are listed first. Anyone is welcome
+to work on any of these, but it's a good idea to first contact
+I<perl5-porters@perl.org> to avoid duplication of effort, and to learn from
+any previous attempts. By all means contact a pumpking privately first if you
+prefer.
Whilst patches to make the list shorter are most welcome, ideas to add to
the list are also encouraged. Check the perl5-porters archives for past
=head1 Tasks that only need Perl knowledge
+=head2 Migrate t/ from custom TAP generation
+
+Many tests below F<t/> still generate TAP by "hand", rather than using library
+functions. As explained in L<perlhack/Writing a test>, tests in F<t/> are
+written in a particular way to test that more complex constructions actually
+work before using them routinely. Hence they don't use C<Test::More>, but
+instead there is an intentionally simpler library, F<t/test.pl>. However,
+quite a few tests in F<t/> have not been refactored to use it. Refactoring
+any of these tests, one at a time, is a useful thing TODO.
+
+The subdirectories F<base>, F<cmd> and F<comp>, that contain the most
+basic tests, should be excluded from this task.
+
+=head2 Test that regen.pl was run
+
+There are various generated files shipped with the perl distribution, for
+things like header files generate from data. The generation scripts are
+written in perl, and all can be run by F<regen.pl>. However, because they're
+written in perl, we can't run them before we've built perl. We can't run them
+as part of the F<Makefile>, because changing files underneath F<make> confuses
+it completely, and we don't want to run them automatically anyway, as they
+change files shipped by the distribution, something we seek not do to.
+
+If someone changes the data, but forgets to re-run F<regen.pl> then the
+generated files are out of sync. It would be good to have a test in
+F<t/porting> that checks that the generated files are in sync, and fails
+otherwise, to alert someone before they make a poor commit. I suspect that this
+would require adapting the scripts run from F<regen.pl> to have dry-run
+options, and invoking them with these, or by refactoring them into a library
+that does the generation, which can be called by the scripts, and by the test.
+
+=head2 Automate perldelta generation
+
+The perldelta file accompanying each release summaries the major changes.
+It's mostly manually generated currently, but some of that could be
+automated with a bit of perl, specifically the generation of
+
+=over
+
+=item Modules and Pragmata
+
+=item New Documentation
+
+=item New Tests
+
+=back
+
+See F<Porting/how_to_write_a_perldelta.pod> for details.
+
=head2 Remove duplication of test setup.
Schwern notes, that there's duplication of code - lots and lots of tests have
into a file, change it to build an C<%Is> hash and require it. Maybe just put
it into F<test.pl>. Throw in the handy tainting subroutines.
-=head2 common test code for timed bail out
-
-Write portable self destruct code for tests to stop them burning CPU in
-infinite loops. This needs to avoid using alarm, as some of the tests are
-testing alarm/sleep or timers.
-
=head2 POD -E<gt> HTML conversion in the core still sucks
Which is crazy given just how simple POD purports to be, and how simple HTML
The addition of C<Pod::Simple> and its related modules may make this task
easier to complete.
-=head2 merge checkpods and podchecker
-
-F<pod/checkpods.PL> (and C<make check> in the F<pod/> subdirectory)
-implements a very basic check for pod files, but the errors it discovers
-aren't found by podchecker. Add this check to podchecker, get rid of
-checkpods and have C<make check> use podchecker.
-
-=head2 Parallel testing
-
-(This probably impacts much more than the core: also the Test::Harness
-and TAP::* modules on CPAN.)
+=head2 Make ExtUtils::ParseXS use strict;
-The core regression test suite is getting ever more comprehensive, which has
-the side effect that it takes longer to run. This isn't so good. Investigate
-whether it would be feasible to give the harness script the B<option> of
-running sets of tests in parallel. This would be useful for tests in
-F<t/op/*.t> and F<t/uni/*.t> and maybe some sets of tests in F<lib/>.
+F<lib/ExtUtils/ParseXS.pm> contains this line
-Questions to answer
+ # use strict; # One of these days...
-=over 4
-
-=item 1
-
-How does screen layout work when you're running more than one test?
-
-=item 2
+Simply uncomment it, and fix all the resulting issues :-)
-How does the caller of test specify how many tests to run in parallel?
-
-=item 3
-
-How do setup/teardown tests identify themselves?
-
-=back
-
-Pugs already does parallel testing - can their approach be re-used?
+The more practical approach, to break the task down into manageable chunks, is
+to work your way though the code from bottom to top, or if necessary adding
+extra C<{ ... }> blocks, and turning on strict within them.
=head2 Make Schwern poorer
=head2 Improve the coverage of the core tests
-Use Devel::Cover to ascertain the core modules's test coverage, then add
+Use Devel::Cover to ascertain the core modules' test coverage, then add
tests that are currently missing.
=head2 test B
A full test suite for the B module would be nice.
-=head2 Deparse inlined constants
-
-Code such as this
-
- use constant PI => 4;
- warn PI
-
-will currently deparse as
-
- use constant ('PI', 4);
- warn 4;
-
-because the tokenizer inlines the value of the constant subroutine C<PI>.
-This allows various compile time optimisations, such as constant folding
-and dead code elimination. Where these haven't happened (such as the example
-above) it ought be possible to make B::Deparse work out the name of the
-original constant, because just enough information survives in the symbol
-table to do this. Specifically, the same scalar is used for the constant in
-the optree as is used for the constant subroutine, so by iterating over all
-symbol tables and generating a mapping of SV address to constant name, it
-would be possible to provide B::Deparse with this functionality.
-
=head2 A decent benchmark
C<perlbench> seems impervious to any recent changes made to the perl core. It
To make a minimal perl distribution, it's useful to look at
F<t/lib/commonsense.t>.
-=head2 Bundle dual life modules in ext/
-
-For maintenance (and branch merging) reasons, it would be useful to move
-some architecture-independent dual-life modules from lib/ to ext/, if this
-has no negative impact on the build of perl itself.
-
-However, we need to make sure that they are still installed in
-architecture-independent directories by C<make install>.
-
-=head2 Improving C<threads::shared>
-
-Investigate whether C<threads::shared> could share aggregates properly with
-only Perl level changes to shared.pm
-
=head2 POSIX memory footprint
Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
There's a similar problem with SelfLoader.
+=head2 profile installman
+
+The F<installman> script is slow. All it is doing text processing, which we're
+told is something Perl is good at. So it would be nice to know what it is doing
+that is taking so much CPU, and where possible address it.
+
+=head2 enable lexical enabling/disabling of inidvidual warnings
+
+Currently, warnings can only be enabled or disabled by category. There
+are times when it would be useful to quash a single warning, not a
+whole category.
+
=head1 Tasks that need a little sysadmin-type knowledge
Or if you prefer, tasks that you would learn from, and broaden your skills
=item *
do a normal C<Configure>, but include Devel::Cover as a module to install
-(see F<INSTALL> for how to do this)
+(see L<INSTALL> for how to do this)
=item *
F<makedef.pl> to support this format, and to provide a means within
C<Configure> to enable it. This would allow Unix users to test that the
export list is correct, and to build a perl that does not pollute the global
-namespace with private symbols.
+namespace with private symbols, and will fail in the same way as msvc or mingw
+builds or when using PERL_DL_NONLAZY=1.
=head2 Cross-compile support
Make F<pod/roffitall> be updated by F<pod/buildtoc>.
+=head2 Split "linker" from "compiler"
+
+Right now, Configure probes for two commands, and sets two variables:
+
+=over 4
+
+=item * C<cc> (in F<cc.U>)
+
+This variable holds the name of a command to execute a C compiler which
+can resolve multiple global references that happen to have the same
+name. Usual values are F<cc> and F<gcc>.
+Fervent ANSI compilers may be called F<c89>. AIX has F<xlc>.
+
+=item * C<ld> (in F<dlsrc.U>)
+
+This variable indicates the program to be used to link
+libraries for dynamic loading. On some systems, it is F<ld>.
+On ELF systems, it should be C<$cc>. Mostly, we'll try to respect
+the hint file setting.
+
+=back
+
+There is an implicit historical assumption from around Perl5.000alpha
+something, that C<$cc> is also the correct command for linking object files
+together to make an executable. This may be true on Unix, but it's not true
+on other platforms, and there are a maze of work arounds in other places (such
+as F<Makefile.SH>) to cope with this.
+
+Ideally, we should create a new variable to hold the name of the executable
+linker program, probe for it in F<Configure>, and centralise all the special
+case logic there or in hints files.
+
+A small bikeshed issue remains - what to call it, given that C<$ld> is already
+taken (arguably for the wrong thing now, but on SunOS 4.1 it is the command
+for creating dynamically-loadable modules) and C<$link> could be confused with
+the Unix command line executable of the same name, which does something
+completely different. Andy Dougherty makes the counter argument "In parrot, I
+tried to call the command used to link object files and libraries into an
+executable F<link>, since that's what my vaguely-remembered DOS and VMS
+experience suggested. I don't think any real confusion has ensued, so it's
+probably a reasonable name for perl5 to use."
+
+"Alas, I've always worried that introducing it would make things worse,
+since now the module building utilities would have to look for
+C<$Config{link}> and institute a fall-back plan if it weren't found."
+Although I can see that as confusing, given that C<$Config{d_link}> is true
+when (hard) links are available.
+
+=head2 Configure Windows using PowerShell
+
+Currently, Windows uses hard-coded config files based to build the
+config.h for compiling Perl. Makefiles are also hard-coded and need to be
+hand edited prior to building Perl. While this makes it easy to create a perl.exe
+that works across multiple Windows versions, being able to accurately
+configure a perl.exe for a specific Windows versions and VS C++ would be
+a nice enhancement. With PowerShell available on Windows XP and up, this
+may now be possible. Step 1 might be to investigate whether this is possible
+and use this to clean up our current makefile situation. Step 2 would be to
+see if there would be a way to use our existing metaconfig units to configure a
+Windows Perl or whether we go in a separate direction and make it so. Of
+course, we all know what step 3 is.
+
+=head2 decouple -g and -DDEBUGGING
+
+Currently F<Configure> automatically adds C<-DDEBUGGING> to the C compiler
+flags if it spots C<-g> in the optimiser flags. The pre-processor directive
+C<DEBUGGING> enables F<perl>'s command line C<-D> options, but in the process
+makes F<perl> slower. It would be good to disentangle this logic, so that
+C-level debugging with C<-g> and Perl level debugging with C<-D> can easily
+be enabled independently.
+
=head1 Tasks that need a little C knowledge
These tasks would need a little C knowledge, but don't need any specific
Configure process will build a 32bit perl. Implementing -Duse32bit*
options would be nice for perl 5.12.
-=head2 Make it clear from -v if this is the exact official release
-
-Currently perl from C<p4>/C<rsync> ships with a F<patchlevel.h> file that
-usually defines one local patch, of the form "MAINT12345" or "RC1". The output
-of perl -v doesn't report that a perl isn't an official release, and this
-information can get lost in bugs reports. Because of this, the minor version
-isn't bumped up until RC time, to minimise the possibility of versions of perl
-escaping that believe themselves to be newer than they actually are.
-
-It would be useful to find an elegant way to have the "this is an interim
-maintenance release" or "this is a release candidate" in the terse -v output,
-and have it so that it's easy for the pumpking to remove this just as the
-release tarball is rolled up. This way the version pulled out of rsync would
-always say "I'm a development release" and it would be safe to bump the
-reported minor version as soon as a release ships, which would aid perl
-developers.
-
-This task is really about thinking of an elegant way to arrange the C source
-such that it's trivial for the Pumpking to flag "this is an official release"
-when making a tarball, yet leave the default source saying "I'm not the
-official release".
-
=head2 Profile Perl - am I hot or not?
The Perl source code is stable enough that it makes sense to profile it,
want to determine what ops I<really> are the most commonly used. And in turn
suggest evictions and promotions to achieve a better F<pp_hot.c>.
+One piece of Perl code that might make a good testbed is F<installman>.
+
=head2 Allocate OPs from arenas
Currently all new OP structures are individually malloc()ed and free()d.
C<MAGIC> is allocated/deallocated more often, but in turn, is also something
more externally visible, so changing the rules here may bite external code.
+=head2 Shared arenas
+
+Several SV body structs are now the same size, notably PVMG and PVGV, PVAV and
+PVHV, and PVCV and PVFM. It should be possible to allocate and return same
+sized bodies from the same actual arena, rather than maintaining one arena for
+each. This could save 4-6K per thread, of memory no longer tied up in the
+not-yet-allocated part of an arena.
+
=head1 Tasks that need a knowledge of XS
the perl API that comes from writing modules that use XS to interface to
C.
-=head2 investigate removing int_macro_int from POSIX.xs
-
-As a hang over from the original C<constant> implementation, F<POSIX.xs>
-contains a function C<int_macro_int> which in conjunction with C<AUTOLOAD> is
-used to wrap the C functions C<WEXITSTATUS>, C<WIFEXITED>, C<WIFSIGNALED>,
-C<WIFSTOPPED>, C<WSTOPSIG> and C<WTERMSIG>. It's probably worth replacing
-this complexity with 5 simple direct wrappings of those 5 functions.
-
-However, it would be interesting if someone could measure the memory usage
-before and after, both for the case of C<use POSIX();> and the case of
-actually calling the Perl space functions.
+=head2 Write an XS cookbook
+
+Create pod/perlxscookbook.pod with short, task-focused 'recipes' in XS that
+demonstrate common tasks and good practices. (Some of these might be
+extracted from perlguts.) The target audience should be XS novices, who need
+more examples than perlguts but something less overwhelming than perlapi.
+Recipes should provide "one pretty good way to do it" instead of TIMTOWTDI.
+
+Rather than focusing on interfacing Perl to C libraries, such a cookbook
+should probably focus on how to optimize Perl routines by re-writing them
+in XS. This will likely be more motivating to those who mostly work in
+Perl but are looking to take the next step into XS.
+
+Deconstructing and explaining some simpler XS modules could be one way to
+bootstrap a cookbook. (List::Util? Class::XSAccessor? Tree::Ternary_XS?)
+Another option could be deconstructing the implementation of some simpler
+functions in op.c.
+
+=head2 Allow XSUBs to inline themselves as OPs
+
+For a simple XSUB, often the subroutine dispatch takes more time than the
+XSUB itself. The tokeniser already has the ability to inline constant
+subroutines - it would be good to provide a way to inline other subroutines.
+
+Specifically, simplest approach looks to be to allow an XSUB to provide an
+alternative implementation of itself as a custom OP. A new flag bit in
+C<CvFLAGS()> would signal to the peephole optimiser to take an optree
+such as this:
+
+ b <@> leave[1 ref] vKP/REFC ->(end)
+ 1 <0> enter ->2
+ 2 <;> nextstate(main 1 -e:1) v:{ ->3
+ a <2> sassign vKS/2 ->b
+ 8 <1> entersub[t2] sKS/TARG,1 ->9
+ - <1> ex-list sK ->8
+ 3 <0> pushmark s ->4
+ 4 <$> const(IV 1) sM ->5
+ 6 <1> rv2av[t1] lKM/1 ->7
+ 5 <$> gv(*a) s ->6
+ - <1> ex-rv2cv sK ->-
+ 7 <$> gv(*x) s/EARLYCV ->8
+ - <1> ex-rv2sv sKRM*/1 ->a
+ 9 <$> gvsv(*b) s ->a
+
+perform the symbol table lookup of C<rv2cv> and C<gv(*x)>, locate the
+pointer to the custom OP that provides the direct implementation, and re-
+write the optree something like:
+
+ b <@> leave[1 ref] vKP/REFC ->(end)
+ 1 <0> enter ->2
+ 2 <;> nextstate(main 1 -e:1) v:{ ->3
+ a <2> sassign vKS/2 ->b
+ 7 <1> custom_x -> 8
+ - <1> ex-list sK ->7
+ 3 <0> pushmark s ->4
+ 4 <$> const(IV 1) sM ->5
+ 6 <1> rv2av[t1] lKM/1 ->7
+ 5 <$> gv(*a) s ->6
+ - <1> ex-rv2cv sK ->-
+ - <$> ex-gv(*x) s/EARLYCV ->7
+ - <1> ex-rv2sv sKRM*/1 ->a
+ 8 <$> gvsv(*b) s ->a
+
+I<i.e.> the C<gv(*)> OP has been nulled and spliced out of the execution
+path, and the C<entersub> OP has been replaced by the custom op.
+
+This approach should provide a measurable speed up to simple XSUBs inside
+tight loops. Initially one would have to write the OP alternative
+implementation by hand, but it's likely that this should be reasonably
+straightforward for the type of XSUB that would benefit the most. Longer
+term, once the run-time implementation is proven, it should be possible to
+progressively update ExtUtils::ParseXS to generate OP implementations for
+some XSUBs.
+
+=head2 Remove the use of SVs as temporaries in dump.c
+
+F<dump.c> contains debugging routines to dump out the contains of perl data
+structures, such as C<SV>s, C<AV>s and C<HV>s. Currently, the dumping code
+B<uses> C<SV>s for its temporary buffers, which was a logical initial
+implementation choice, as they provide ready made memory handling.
+
+However, they also lead to a lot of confusion when it happens that what you're
+trying to debug is seen by the code in F<dump.c>, correctly or incorrectly, as
+a temporary scalar it can use for a temporary buffer. It's also not possible
+to dump scalars before the interpreter is properly set up, such as during
+ithreads cloning. It would be good to progressively replace the use of scalars
+as string accumulation buffers with something much simpler, directly allocated
+by C<malloc>. The F<dump.c> code is (or should be) only producing 7 bit
+US-ASCII, so output character sets are not an issue.
+
+Producing and proving an internal simple buffer allocation would make it easier
+to re-write the internals of the PerlIO subsystem to avoid using C<SV>s for
+B<its> buffers, use of which can cause problems similar to those of F<dump.c>,
+at similar times.
=head2 safely supporting POSIX SA_SIGINFO
Currently glob patterns and filenames returned from File::Glob::glob()
are always byte strings. See L</"Virtualize operating system access">.
-=head2 Unicode and lc/uc operators
-
-Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
-what the internal encoding of their argument is. That should not be the
-case. Maybe add a pragma to switch behaviour.
-
=head2 use less 'memory'
Investigate trade offs to switch out perl's choices on memory usage.
These tasks would need C knowledge, and knowledge of how the interpreter works,
or a willingness to learn.
+=head2 forbid labels with keyword names
+
+Currently C<goto keyword> "computes" the label value:
+
+ $ perl -e 'goto print'
+ Can't find label 1 at -e line 1.
+
+It is controversial if the right way to avoid the confusion is to forbid
+labels with keyword names, or if it would be better to always treat
+bareword expressions after a "goto" as a label and never as a keyword.
+
+=head2 truncate() prototype
+
+The prototype of truncate() is currently C<$$>. It should probably
+be C<*$> instead. (This is changed in F<opcode.pl>)
+
+=head2 decapsulation of smart match argument
+
+Currently C<$foo ~~ $object> will die with the message "Smart matching a
+non-overloaded object breaks encapsulation". It would be nice to allow
+to bypass this by using explictly the syntax C<$foo ~~ %$object> or
+C<$foo ~~ @$object>.
+
+=head2 error reporting of [$a ; $b]
+
+Using C<;> inside brackets is a syntax error, and we don't propose to change
+that by giving it any meaning. However, it's not reported very helpfully:
+
+ $ perl -e '$a = [$b; $c];'
+ syntax error at -e line 1, near "$b;"
+ syntax error at -e line 1, near "$c]"
+ Execution of -e aborted due to compilation errors.
+
+It should be possible to hook into the tokeniser or the lexer, so that when a
+C<;> is parsed where it is not legal as a statement terminator (ie inside
+C<{}> used as a hashref, C<[]> or C<()>) it issues an error something like
+I<';' isn't legal inside an expression - if you need multiple statements use a
+do {...} block>. See the thread starting at
+http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00573.html
+
=head2 lexicals used only once
This warns:
The handling of Unicode is unclean in many places. For example, the regexp
engine matches in Unicode semantics whenever the string or the pattern is
flagged as UTF-8, but that should not be dependent on an internal storage
-detail of the string. Likewise, case folding behaviour is dependent on the
-UTF8 internal flag being on or off.
+detail of the string.
=head2 Properly Unicode safe tokeniser and pads.
There is no method on tied filehandles to allow them to be called back by
formats.
+=head2 Propagate compilation hints to the debugger
+
+Currently a debugger started with -dE on the command-line doesn't see the
+features enabled by -E. More generally hints (C<$^H> and C<%^H>) aren't
+propagated to the debugger. Probably it would be a good thing to propagate
+hints from the innermost non-C<DB::> scope: this would make code eval'ed
+in the debugger see the features (and strictures, etc.) currently in
+scope.
+
=head2 Attach/detach debugger from running program
The old perltodo notes "With C<gdb>, you can attach the debugger to a running
debugger on a running Perl program, although I'm not sure how it would be
done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
-=head2 Optimize away empty destructors
-
-Defining an empty DESTROY method might be useful (notably in
-AUTOLOAD-enabled classes), but it's still a bit expensive to call. That
-could probably be optimized.
-
=head2 LVALUE functions for lists
The old perltodo notes that lvalue functions don't work for list or hash
slices. This would be good to fix.
-=head2 LVALUE functions in the debugger
-
-The old perltodo notes that lvalue functions don't work in the debugger. This
-would be good to fix.
-
=head2 regexp optimiser optional
The regexp optimiser is not optional. It should configurable to be, to allow
its performance to be measured, and its bugs to be easily demonstrated.
-=head2 delete &function
-
-Allow to delete functions. One can already undef them, but they're still
-in the stash.
-
=head2 C</w> regex modifier
That flag would enable to match whole words, and also to interpolate
This has actually already been implemented (but only for Win32),
take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
variants go through a set of "vtables" for operating system access,
-non-Win32 systems currently go straight for the POSIX/UNIX-style
+non-Win32 systems currently go straight for the POSIX/Unix-style
system/library call. Similar system as for Win32 should be
implemented for all platforms. The existing Win32 implementation
probably does not need to survive alongside this proposed new
=head2 Investigate PADTMP hash pessimisation
-The peephole optimier converts constants used for hash key lookups to shared
+The peephole optimiser converts constants used for hash key lookups to shared
hash key scalars. Under ithreads, something is undoing this work.
See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
be done 1st in XS, and using B::Generate to patch the new OP into the
optrees.
+=head2 Add C<00dddd>
+
+It has been proposed that octal constants be specifiable through the syntax
+C<0oddddd>, parallel to the existing construct to specify hex constants
+C<0xddddd>
+
=head1 Big projects
Tasks that will get your name mentioned in the description of the "Highlights
This task is incremental - even a little bit of work on it will help, and
will be greatly appreciated.
-One bit would be to write the missing code in sv.c:Perl_dirp_dup.
+One bit would be to determine how to clone directory handles on systems
+without a C<fchdir> function (in sv.c:Perl_dirp_dup).
Fix Perl_sv_dup, et al so that threads can return objects.
Fix (or rewrite) the implementation of the C</(?{...})/> closures.
-=head2 A re-entrant regexp engine
-
-This will allow the use of a regex from inside (?{ }), (??{ }) and
-(?(?{ })|) constructs.
-
=head2 Add class set operations to regexp engine
Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
demerphq has this on his todo list, but right at the bottom.
+
+
+=head1 Tasks for microperl
+
+
+[ Each and every one of these may be obsolete, but they were listed
+ in the old Todo.micro file]
+
+
+=head2 make creating uconfig.sh automatic
+
+=head2 make creating Makefile.micro automatic
+
+=head2 do away with fork/exec/wait?
+
+(system, popen should be enough?)
+
+=head2 some of the uconfig.sh really needs to be probed (using cc) in buildtime:
+
+(uConfigure? :-) native datatype widths and endianness come to mind
+