perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perltodo - Perl TO-DO List
	4
	5	=head1 DESCRIPTION
	6
	7	This is a list of wishes for Perl. The tasks we think are smaller or
	8	easier are listed first. Anyone is welcome to work on any of these,
	9	but it's a good idea to first contact I<perl5-porters@perl.org> to
	10	avoid duplication of effort, and to learn from any previous attempts.
	11	By all means contact a pumpking privately first if you prefer.
	12
	13	Whilst patches to make the list shorter are most welcome, ideas to add to
	14	the list are also encouraged. Check the perl5-porters archives for past
	15	ideas, and any discussion about them. One set of archives may be found at:
	16
	17	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
	18
	19	What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
	20	not, but if your patch is incorporated, then we'll add your name to the
	21	F<AUTHORS> file, which ships in the official distribution. How many other
	22	programming languages offer you 1 line of immortality?
	23
	24	=head1 Tasks that only need Perl knowledge
	25
	26	=head2 Remove duplication of test setup.
	27
	28	Schwern notes, that there's duplication of code - lots and lots of tests have
	29	some variation on the big block of C<$Is_Foo> checks. We can safely put this
	30	into a file, change it to build an C<%Is> hash and require it. Maybe just put
	31	it into F<test.pl>. Throw in the handy tainting subroutines.
	32
	33	=head2 common test code for timed bail out
	34
	35	Write portable self destruct code for tests to stop them burning CPU in
	36	infinite loops. This needs to avoid using alarm, as some of the tests are
	37	testing alarm/sleep or timers.
	38
	39	=head2 POD -E<gt> HTML conversion in the core still sucks
	40
	41	Which is crazy given just how simple POD purports to be, and how simple HTML
	42	can be. It's not actually I<as> simple as it sounds, particularly with the
	43	flexibility POD allows for C<=item>, but it would be good to improve the
	44	visual appeal of the HTML generated, and to avoid it having any validation
	45	errors. See also L</make HTML install work>, as the layout of installation tree
	46	is needed to improve the cross-linking.
	47
	48	The addition of C<Pod::Simple> and its related modules may make this task
	49	easier to complete.
	50
	51	=head2 merge checkpods and podchecker
	52
	53	F<pod/checkpods.PL> (and C<make check> in the F<pod/> subdirectory)
	54	implements a very basic check for pod files, but the errors it discovers
	55	aren't found by podchecker. Add this check to podchecker, get rid of
	56	checkpods and have C<make check> use podchecker.
	57
	58	=head2 Parallel testing
	59
	60	(This probably impacts much more than the core: also the Test::Harness
	61	and TAP::* modules on CPAN.)
	62
	63	The core regression test suite is getting ever more comprehensive, which has
	64	the side effect that it takes longer to run. This isn't so good. Investigate
	65	whether it would be feasible to give the harness script the B<option> of
	66	running sets of tests in parallel. This would be useful for tests in
	67	F<t/op/.t> and F<t/uni/.t> and maybe some sets of tests in F<lib/>.
	68
	69	Questions to answer
	70
	71	=over 4
	72
	73	=item 1
	74
	75	How does screen layout work when you're running more than one test?
	76
	77	=item 2
	78
	79	How does the caller of test specify how many tests to run in parallel?
	80
	81	=item 3
	82
	83	How do setup/teardown tests identify themselves?
	84
	85	=back
	86
	87	Pugs already does parallel testing - can their approach be re-used?
	88
	89	=head2 Make Schwern poorer
	90
	91	We should have tests for everything. When all the core's modules are tested,
	92	Schwern has promised to donate to $500 to TPF. We may need volunteers to
	93	hold him upside down and shake vigorously in order to actually extract the
	94	cash.
	95
	96	=head2 Improve the coverage of the core tests
	97
	98	Use Devel::Cover to ascertain the core modules's test coverage, then add
	99	tests that are currently missing.
	100
	101	=head2 test B
	102
	103	A full test suite for the B module would be nice.
	104
	105	=head2 Deparse inlined constants
	106
	107	Code such as this
	108
	109	use constant PI => 4;
	110	warn PI
	111
	112	will currently deparse as
	113
	114	use constant ('PI', 4);
	115	warn 4;
	116
	117	because the tokenizer inlines the value of the constant subroutine C<PI>.
	118	This allows various compile time optimisations, such as constant folding
	119	and dead code elimination. Where these haven't happened (such as the example
	120	above) it ought be possible to make B::Deparse work out the name of the
	121	original constant, because just enough information survives in the symbol
	122	table to do this. Specifically, the same scalar is used for the constant in
	123	the optree as is used for the constant subroutine, so by iterating over all
	124	symbol tables and generating a mapping of SV address to constant name, it
	125	would be possible to provide B::Deparse with this functionality.
	126
	127	=head2 A decent benchmark
	128
	129	C<perlbench> seems impervious to any recent changes made to the perl core. It
	130	would be useful to have a reasonable general benchmarking suite that roughly
	131	represented what current perl programs do, and measurably reported whether
	132	tweaks to the core improve, degrade or don't really affect performance, to
	133	guide people attempting to optimise the guts of perl. Gisle would welcome
	134	new tests for perlbench.
	135
	136	=head2 fix tainting bugs
	137
	138	Fix the bugs revealed by running the test suite with the C<-t> switch (via
	139	C<make test.taintwarn>).
	140
	141	=head2 Dual life everything
	142
	143	As part of the "dists" plan, anything that doesn't belong in the smallest perl
	144	distribution needs to be dual lifed. Anything else can be too. Figure out what
	145	changes would be needed to package that module and its tests up for CPAN, and
	146	do so. Test it with older perl releases, and fix the problems you find.
	147
	148	To make a minimal perl distribution, it's useful to look at
	149	F<t/lib/commonsense.t>.
	150
	151	=head2 Bundle dual life modules in ext/
	152
	153	For maintenance (and branch merging) reasons, it would be useful to move
	154	some architecture-independent dual-life modules from lib/ to ext/, if this
	155	has no negative impact on the build of perl itself.
	156
	157	However, we need to make sure that they are still installed in
	158	architecture-independent directories by C<make install>.
	159
	160	=head2 Improving C<threads::shared>
	161
	162	Investigate whether C<threads::shared> could share aggregates properly with
	163	only Perl level changes to shared.pm
	164
	165	=head2 POSIX memory footprint
	166
	167	Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
	168	various times worked to cut it down. There is probably still fat to cut out -
	169	for example POSIX passes Exporter some very memory hungry data structures.
	170
	171	=head2 embed.pl/makedef.pl
	172
	173	There is a script F<embed.pl> that generates several header files to prefix
	174	all of Perl's symbols in a consistent way, to provide some semblance of
	175	namespace support in C<C>. Functions are declared in F<embed.fnc>, variables
	176	in F<interpvar.h>. Quite a few of the functions and variables
	177	are conditionally declared there, using C<#ifdef>. However, F<embed.pl>
	178	doesn't understand the C macros, so the rules about which symbols are present
	179	when is duplicated in F<makedef.pl>. Writing things twice is bad, m'kay.
	180	It would be good to teach C<embed.pl> to understand the conditional
	181	compilation, and hence remove the duplication, and the mistakes it has caused.
	182
	183	=head2 use strict; and AutoLoad
	184
	185	Currently if you write
	186
	187	package Whack;
	188	use AutoLoader 'AUTOLOAD';
	189	use strict;
	190	1;
	191	__END__
	192	sub bloop {
	193	print join (' ', No, strict, here), "!\n";
	194	}
	195
	196	then C<use strict;> isn't in force within the autoloaded subroutines. It would
	197	be more consistent (and less surprising) to arrange for all lexical pragmas
	198	in force at the __END__ block to be in force within each autoloaded subroutine.
	199
	200	There's a similar problem with SelfLoader.
	201
	202	=head2 profile installman
	203
	204	The F<installman> script is slow. All it is doing text processing, which we're
	205	told is something Perl is good at. So it would be nice to know what it is doing
	206	that is taking so much CPU, and where possible address it.
	207
	208
	209	=head1 Tasks that need a little sysadmin-type knowledge
	210
	211	Or if you prefer, tasks that you would learn from, and broaden your skills
	212	base...
	213
	214	=head2 make HTML install work
	215
	216	There is an C<installhtml> target in the Makefile. It's marked as
	217	"experimental". It would be good to get this tested, make it work reliably, and
	218	remove the "experimental" tag. This would include
	219
	220	=over 4
	221
	222	=item 1
	223
	224	Checking that cross linking between various parts of the documentation works.
	225	In particular that links work between the modules (files with POD in F<lib/>)
	226	and the core documentation (files in F<pod/>)
	227
	228	=item 2
	229
	230	Work out how to split C<perlfunc> into chunks, preferably one per function
	231	group, preferably with general case code that could be used elsewhere.
	232	Challenges here are correctly identifying the groups of functions that go
	233	together, and making the right named external cross-links point to the right
	234	page. Things to be aware of are C<-X>, groups such as C<getpwnam> to
	235	C<endservent>, two or more C<=items> giving the different parameter lists, such
	236	as
	237
	238	=item substr EXPR,OFFSET,LENGTH,REPLACEMENT
	239	=item substr EXPR,OFFSET,LENGTH
	240	=item substr EXPR,OFFSET
	241
	242	and different parameter lists having different meanings. (eg C<select>)
	243
	244	=back
	245
	246	=head2 compressed man pages
	247
	248	Be able to install them. This would probably need a configure test to see how
	249	the system does compressed man pages (same directory/different directory?
	250	same filename/different filename), as well as tweaking the F<installman> script
	251	to compress as necessary.
	252
	253	=head2 Add a code coverage target to the Makefile
	254
	255	Make it easy for anyone to run Devel::Cover on the core's tests. The steps
	256	to do this manually are roughly
	257
	258	=over 4
	259
	260	=item *
	261
	262	do a normal C<Configure>, but include Devel::Cover as a module to install
	263	(see F<INSTALL> for how to do this)
	264
	265	=item *
	266
	267	make perl
	268
	269	=item *
	270
	271	cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
	272
	273	=item *
	274
	275	Process the resulting Devel::Cover database
	276
	277	=back
	278
	279	This just give you the coverage of the F<.pm>s. To also get the C level
	280	coverage you need to
	281
	282	=over 4
	283
	284	=item *
	285
	286	Additionally tell C<Configure> to use the appropriate C compiler flags for
	287	C<gcov>
	288
	289	=item *
	290
	291	make perl.gcov
	292
	293	(instead of C<make perl>)
	294
	295	=item *
	296
	297	After running the tests run C<gcov> to generate all the F<.gcov> files.
	298	(Including down in the subdirectories of F<ext/>
	299
	300	=item *
	301
	302	(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
	303	to get their stats into the cover_db directory.
	304
	305	=item *
	306
	307	Then process the Devel::Cover database
	308
	309	=back
	310
	311	It would be good to add a single switch to C<Configure> to specify that you
	312	wanted to perform perl level coverage, and another to specify C level
	313	coverage, and have C<Configure> and the F<Makefile> do all the right things
	314	automatically.
	315
	316	=head2 Make Config.pm cope with differences between built and installed perl
	317
	318	Quite often vendors ship a perl binary compiled with their (pay-for)
	319	compilers. People install a free compiler, such as gcc. To work out how to
	320	build extensions, Perl interrogates C<%Config>, so in this situation
	321	C<%Config> describes compilers that aren't there, and extension building
	322	fails. This forces people into choosing between re-compiling perl themselves
	323	using the compiler they have, or only using modules that the vendor ships.
	324
	325	It would be good to find a way teach C<Config.pm> about the installation setup,
	326	possibly involving probing at install time or later, so that the C<%Config> in
	327	a binary distribution better describes the installed machine, when the
	328	installed machine differs from the build machine in some significant way.
	329
	330	=head2 linker specification files
	331
	332	Some platforms mandate that you provide a list of a shared library's external
	333	symbols to the linker, so the core already has the infrastructure in place to
	334	do this for generating shared perl libraries. My understanding is that the
	335	GNU toolchain can accept an optional linker specification file, and restrict
	336	visibility just to symbols declared in that file. It would be good to extend
	337	F<makedef.pl> to support this format, and to provide a means within
	338	C<Configure> to enable it. This would allow Unix users to test that the
	339	export list is correct, and to build a perl that does not pollute the global
	340	namespace with private symbols.
	341
	342	=head2 Cross-compile support
	343
	344	Currently C<Configure> understands C<-Dusecrosscompile> option. This option
	345	arranges for building C<miniperl> for TARGET machine, so this C<miniperl> is
	346	assumed then to be copied to TARGET machine and used as a replacement of full
	347	C<perl> executable.
	348
	349	This could be done little differently. Namely C<miniperl> should be built for
	350	HOST and then full C<perl> with extensions should be compiled for TARGET.
	351	This, however, might require extra trickery for %Config: we have one config
	352	first for HOST and then another for TARGET. Tools like MakeMaker will be
	353	mightily confused. Having around two different types of executables and
	354	libraries (HOST and TARGET) makes life interesting for Makefiles and
	355	shell (and Perl) scripts. There is $Config{run}, normally empty, which
	356	can be used as an execution wrapper. Also note that in some
	357	cross-compilation/execution environments the HOST and the TARGET do
	358	not see the same filesystem(s), the $Config{run} may need to do some
	359	file/directory copying back and forth.
	360
	361	=head2 roffitall
	362
	363	Make F<pod/roffitall> be updated by F<pod/buildtoc>.
	364
	365	=head1 Tasks that need a little C knowledge
	366
	367	These tasks would need a little C knowledge, but don't need any specific
	368	background or experience with XS, or how the Perl interpreter works
	369
	370	=head2 Weed out needless PERL_UNUSED_ARG
	371
	372	The C code uses the macro C<PERL_UNUSED_ARG> to stop compilers warning about
	373	unused arguments. Often the arguments can't be removed, as there is an
	374	external constraint that determines the prototype of the function, so this
	375	approach is valid. However, there are some cases where C<PERL_UNUSED_ARG>
	376	could be removed. Specifically
	377
	378	=over 4
	379
	380	=item *
	381
	382	The prototypes of (nearly all) static functions can be changed
	383
	384	=item *
	385
	386	Unused arguments generated by short cut macros are wasteful - the short cut
	387	macro used can be changed.
	388
	389	=back
	390
	391	=head2 Modernize the order of directories in @INC
	392
	393	The way @INC is laid out by default, one cannot upgrade core (dual-life)
	394	modules without overwriting files. This causes problems for binary
	395	package builders. One possible proposal is laid out in this
	396	message:
	397	L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
	398
	399	=head2 -Duse32bit*
	400
	401	Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
	402	On these systems, it might be the default compilation mode, and there
	403	is currently no guarantee that passing no use64bitall option to the
	404	Configure process will build a 32bit perl. Implementing -Duse32bit*
	405	options would be nice for perl 5.12.
	406
	407	=head2 Make it clear from -v if this is the exact official release
	408
	409	Currently perl from C<p4>/C<rsync> ships with a F<patchlevel.h> file that
	410	usually defines one local patch, of the form "MAINT12345" or "RC1". The output
	411	of perl -v doesn't report that a perl isn't an official release, and this
	412	information can get lost in bugs reports. Because of this, the minor version
	413	isn't bumped up until RC time, to minimise the possibility of versions of perl
	414	escaping that believe themselves to be newer than they actually are.
	415
	416	It would be useful to find an elegant way to have the "this is an interim
	417	maintenance release" or "this is a release candidate" in the terse -v output,
	418	and have it so that it's easy for the pumpking to remove this just as the
	419	release tarball is rolled up. This way the version pulled out of rsync would
	420	always say "I'm a development release" and it would be safe to bump the
	421	reported minor version as soon as a release ships, which would aid perl
	422	developers.
	423
	424	This task is really about thinking of an elegant way to arrange the C source
	425	such that it's trivial for the Pumpking to flag "this is an official release"
	426	when making a tarball, yet leave the default source saying "I'm not the
	427	official release".
	428
	429	=head2 Profile Perl - am I hot or not?
	430
	431	The Perl source code is stable enough that it makes sense to profile it,
	432	identify and optimise the hotspots. It would be good to measure the
	433	performance of the Perl interpreter using free tools such as cachegrind,
	434	gprof, and dtrace, and work to reduce the bottlenecks they reveal.
	435
	436	As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops,
	437	the ops that are most commonly used. The idea is that by grouping them, their
	438	object code will be adjacent in the executable, so they have a greater chance
	439	of already being in the CPU cache (or swapped in) due to being near another op
	440	already in use.
	441
	442	Except that it's not clear if these really are the most commonly used ops. So
	443	as part of exercising your skills with coverage and profiling tools you might
	444	want to determine what ops I<really> are the most commonly used. And in turn
	445	suggest evictions and promotions to achieve a better F<pp_hot.c>.
	446
	447	One piece of Perl code that might make a good testbed is F<installman>.
	448
	449	=head2 Allocate OPs from arenas
	450
	451	Currently all new OP structures are individually malloc()ed and free()d.
	452	All C<malloc> implementations have space overheads, and are now as fast as
	453	custom allocates so it would both use less memory and less CPU to allocate
	454	the various OP structures from arenas. The SV arena code can probably be
	455	re-used for this.
	456
	457	Note that Configuring perl with C<-Accflags=-DPL_OP_SLAB_ALLOC> will use
	458	Perl_Slab_alloc() to pack optrees into a contiguous block, which is
	459	probably superior to the use of OP arenas, esp. from a cache locality
	460	standpoint. See L<Profile Perl - am I hot or not?>.
	461
	462	=head2 Improve win32/wince.c
	463
	464	Currently, numerous functions look virtually, if not completely,
	465	identical in both C<win32/wince.c> and C<win32/win32.c> files, which can't
	466	be good.
	467
	468	=head2 Use secure CRT functions when building with VC8 on Win32
	469
	470	Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the basis
	471	that they were "unsafe" and introduced differently named secure versions of
	472	them as replacements, e.g. instead of writing
	473
	474	FILE* f = fopen(__FILE__, "r");
	475
	476	one should now write
	477
	478	FILE* f;
	479	errno_t err = fopen_s(&f, __FILE__, "r");
	480
	481	Currently, the warnings about these deprecations have been disabled by adding
	482	-D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to remove that
	483	warning suppressant and actually make use of the new secure CRT functions.
	484
	485	There is also a similar issue with POSIX CRT function names like fileno having
	486	been deprecated in favour of ISO C++ conformant names like _fileno. These
	487	warnings are also currently suppressed by adding -D_CRT_NONSTDC_NO_DEPRECATE. It
	488	might be nice to do as Microsoft suggest here too, although, unlike the secure
	489	functions issue, there is presumably little or no benefit in this case.
	490
	491	=head2 Fix POSIX::access() and chdir() on Win32
	492
	493	These functions currently take no account of DACLs and therefore do not behave
	494	correctly in situations where access is restricted by DACLs (as opposed to the
	495	read-only attribute).
	496
	497	Furthermore, POSIX::access() behaves differently for directories having the
	498	read-only attribute set depending on what CRT library is being used. For
	499	example, the _access() function in the VC6 and VC7 CRTs (wrongly) claim that
	500	such directories are not writable, whereas in fact all directories are writable
	501	unless access is denied by DACLs. (In the case of directories, the read-only
	502	attribute actually only means that the directory cannot be deleted.) This CRT
	503	bug is fixed in the VC8 and VC9 CRTs (but, of course, the directory may still
	504	not actually be writable if access is indeed denied by DACLs).
	505
	506	For the chdir() issue, see ActiveState bug #74552:
	507	http://bugs.activestate.com/show_bug.cgi?id=74552
	508
	509	Therefore, DACLs should be checked both for consistency across CRTs and for
	510	the correct answer.
	511
	512	(Note that perl's -w operator should not be modified to check DACLs. It has
	513	been written so that it reflects the state of the read-only attribute, even
	514	for directories (whatever CRT is being used), for symmetry with chmod().)
	515
	516	=head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
	517
	518	Maybe create a utility that checks after each libperl.a creation that
	519	none of the above (nor sprintf(), vsprintf(), or SHUDDER gets())
	520	ever creep back to libperl.a.
	521
	522	nm libperl.a \| ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at\|py)\|v?sprintf\|gets)$/'
	523
	524	Note, of course, that this will only tell whether B<your> platform
	525	is using those naughty interfaces.
	526
	527	=head2 -D_FORTIFY_SOURCE=2, -fstack-protector
	528
	529	Recent glibcs support C<-D_FORTIFY_SOURCE=2> and recent gcc
	530	(4.1 onwards?) supports C<-fstack-protector>, both of which give
	531	protection against various kinds of buffer overflow problems.
	532	These should probably be used for compiling Perl whenever available,
	533	Configure and/or hints files should be adjusted to probe for the
	534	availability of these features and enable them as appropriate.
	535
	536	=head2 Arenas for GPs? For MAGIC?
	537
	538	C<struct gp> and C<struct magic> are both currently allocated by C<malloc>.
	539	It might be a speed or memory saving to change to using arenas. Or it might
	540	not. It would need some suitable benchmarking first. In particular, C<GP>s
	541	can probably be changed with minimal compatibility impact (probably nothing
	542	outside of the core, or even outside of F<gv.c> allocates them), but they
	543	probably aren't allocated/deallocated often enough for a speed saving. Whereas
	544	C<MAGIC> is allocated/deallocated more often, but in turn, is also something
	545	more externally visible, so changing the rules here may bite external code.
	546
	547
	548	=head1 Tasks that need a knowledge of XS
	549
	550	These tasks would need C knowledge, and roughly the level of knowledge of
	551	the perl API that comes from writing modules that use XS to interface to
	552	C.
	553
	554	=head2 investigate removing int_macro_int from POSIX.xs
	555
	556	As a hang over from the original C<constant> implementation, F<POSIX.xs>
	557	contains a function C<int_macro_int> which in conjunction with C<AUTOLOAD> is
	558	used to wrap the C functions C<WEXITSTATUS>, C<WIFEXITED>, C<WIFSIGNALED>,
	559	C<WIFSTOPPED>, C<WSTOPSIG> and C<WTERMSIG>. It's probably worth replacing
	560	this complexity with 5 simple direct wrappings of those 5 functions.
	561
	562	However, it would be interesting if someone could measure the memory usage
	563	before and after, both for the case of C<use POSIX();> and the case of
	564	actually calling the Perl space functions.
	565
	566	=head2 safely supporting POSIX SA_SIGINFO
	567
	568	Some years ago Jarkko supplied patches to provide support for the POSIX
	569	SA_SIGINFO feature in Perl, passing the extra data to the Perl signal handler.
	570
	571	Unfortunately, it only works with "unsafe" signals, because under safe
	572	signals, by the time Perl gets to run the signal handler, the extra
	573	information has been lost. Moreover, it's not easy to store it somewhere,
	574	as you can't call mutexs, or do anything else fancy, from inside a signal
	575	handler.
	576
	577	So it strikes me that we could provide safe SA_SIGINFO support
	578
	579	=over 4
	580
	581	=item 1
	582
	583	Provide global variables for two file descriptors
	584
	585	=item 2
	586
	587	When the first request is made via C<sigaction> for C<SA_SIGINFO>, create a
	588	pipe, store the reader in one, the writer in the other
	589
	590	=item 3
	591
	592	In the "safe" signal handler (C<Perl_csighandler()>/C<S_raise_signal()>), if
	593	the C<siginfo_t> pointer non-C<NULL>, and the writer file handle is open,
	594
	595	=over 8
	596
	597	=item 1
	598
	599	serialise signal number, C<struct siginfo_t> (or at least the parts we care
	600	about) into a small auto char buff
	601
	602	=item 2
	603
	604	C<write()> that (non-blocking) to the writer fd
	605
	606	=over 12
	607
	608	=item 1
	609
	610	if it writes 100%, flag the signal in a counter of "signals on the pipe" akin
	611	to the current per-signal-number counts
	612
	613	=item 2
	614
	615	if it writes 0%, assume the pipe is full. Flag the data as lost?
	616
	617	=item 3
	618
	619	if it writes partially, croak a panic, as your OS is broken.
	620
	621	=back
	622
	623	=back
	624
	625	=item 4
	626
	627	in the regular C<PERL_ASYNC_CHECK()> processing, if there are "signals on
	628	the pipe", read the data out, deserialise, build the Perl structures on
	629	the stack (code in C<Perl_sighandler()>, the "unsafe" handler), and call as
	630	usual.
	631
	632	=back
	633
	634	I think that this gets us decent C<SA_SIGINFO> support, without the current risk
	635	of running Perl code inside the signal handler context. (With all the dangers
	636	of things like C<malloc> corruption that that currently offers us)
	637
	638	For more information see the thread starting with this message:
	639	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
	640
	641	=head2 autovivification
	642
	643	Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
	644
	645	This task is incremental - even a little bit of work on it will help.
	646
	647	=head2 Unicode in Filenames
	648
	649	chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
	650	opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
	651	system, truncate, unlink, utime, -X. All these could potentially accept
	652	Unicode filenames either as input or output (and in the case of system
	653	and qx Unicode in general, as input or output to/from the shell).
	654	Whether a filesystem - an operating system pair understands Unicode in
	655	filenames varies.
	656
	657	Known combinations that have some level of understanding include
	658	Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
	659	OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
	660	create Unicode filenames, what forms of Unicode are accepted and used
	661	(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
	662	and so on, varies. Finding the right level of interfacing to Perl
	663	requires some thought. Remember that an OS does not implicate a
	664	filesystem.
	665
	666	(The Windows -C command flag "wide API support" has been at least
	667	temporarily retired in 5.8.1, and the -C has been repurposed, see
	668	L<perlrun>.)
	669
	670	Most probably the right way to do this would be this:
	671	L</"Virtualize operating system access">.
	672
	673	=head2 Unicode in %ENV
	674
	675	Currently the %ENV entries are always byte strings.
	676	See L</"Virtualize operating system access">.
	677
	678	=head2 Unicode and glob()
	679
	680	Currently glob patterns and filenames returned from File::Glob::glob()
	681	are always byte strings. See L</"Virtualize operating system access">.
	682
	683	=head2 Unicode and lc/uc operators
	684
	685	Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
	686	what the internal encoding of their argument is. That should not be the
	687	case. Maybe add a pragma to switch behaviour.
	688
	689	=head2 use less 'memory'
	690
	691	Investigate trade offs to switch out perl's choices on memory usage.
	692	Particularly perl should be able to give memory back.
	693
	694	This task is incremental - even a little bit of work on it will help.
	695
	696	=head2 Re-implement C<:unique> in a way that is actually thread-safe
	697
	698	The old implementation made bad assumptions on several levels. A good 90%
	699	solution might be just to make C<:unique> work to share the string buffer
	700	of SvPVs. That way large constant strings can be shared between ithreads,
	701	such as the configuration information in F<Config>.
	702
	703	=head2 Make tainting consistent
	704
	705	Tainting would be easier to use if it didn't take documented shortcuts and
	706	allow taint to "leak" everywhere within an expression.
	707
	708	=head2 readpipe(LIST)
	709
	710	system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
	711	running a shell. readpipe() (the function behind qx//) could be similarly
	712	extended.
	713
	714	=head2 Audit the code for destruction ordering assumptions
	715
	716	Change 25773 notes
	717
	718	/* Need to check SvMAGICAL, as during global destruction it may be that
	719	AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
	720	is now part of the linked list of SV heads, rather than pointing to
	721	the original body. */
	722	/* FIXME - audit the code for other bugs like this one. */
	723
	724	adding the C<SvMAGICAL> check to
	725
	726	if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
	727	MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
	728
	729	Go through the core and look for similar assumptions that SVs have particular
	730	types, as all bets are off during global destruction.
	731
	732	=head2 Extend PerlIO and PerlIO::Scalar
	733
	734	PerlIO::Scalar doesn't know how to truncate(). Implementing this
	735	would require extending the PerlIO vtable.
	736
	737	Similarly the PerlIO vtable doesn't know about formats (write()), or
	738	about stat(), or chmod()/chown(), utime(), or flock().
	739
	740	(For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
	741	would mean.)
	742
	743	PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
	744	opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
	745	readlink().
	746
	747	See also L</"Virtualize operating system access">.
	748
	749	=head2 -C on the #! line
	750
	751	It should be possible to make -C work correctly if found on the #! line,
	752	given that all perl command line options are strict ASCII, and -C changes
	753	only the interpretation of non-ASCII characters, and not for the script file
	754	handle. To make it work needs some investigation of the ordering of function
	755	calls during startup, and (by implication) a bit of tweaking of that order.
	756
	757	=head2 Organize error messages
	758
	759	Perl's diagnostics (error messages, see L<perldiag>) could use
	760	reorganizing and formalizing so that each error message has its
	761	stable-for-all-eternity unique id, categorized by severity, type, and
	762	subsystem. (The error messages would be listed in a datafile outside
	763	of the Perl source code, and the source code would only refer to the
	764	messages by the id.) This clean-up and regularizing should apply
	765	for all croak() messages.
	766
	767	This would enable all sorts of things: easier translation/localization
	768	of the messages (though please do keep in mind the caveats of
	769	L<Locale::Maketext> about too straightforward approaches to
	770	translation), filtering by severity, and instead of grepping for a
	771	particular error message one could look for a stable error id. (Of
	772	course, changing the error messages by default would break all the
	773	existing software depending on some particular error message...)
	774
	775	This kind of functionality is known as I<message catalogs>. Look for
	776	inspiration for example in the catgets() system, possibly even use it
	777	if available-- but B<only> if available, all platforms will B<not>
	778	have catgets().
	779
	780	For the really pure at heart, consider extending this item to cover
	781	also the warning messages (see L<perllexwarn>, C<warnings.pl>).
	782
	783	=head1 Tasks that need a knowledge of the interpreter
	784
	785	These tasks would need C knowledge, and knowledge of how the interpreter works,
	786	or a willingness to learn.
	787
	788	=head2 lexicals used only once
	789
	790	This warns:
	791
	792	$ perl -we '$pie = 42'
	793	Name "main::pie" used only once: possible typo at -e line 1.
	794
	795	This does not:
	796
	797	$ perl -we 'my $pie = 42'
	798
	799	Logically all lexicals used only once should warn, if the user asks for
	800	warnings. An unworked RT ticket (#5087) has been open for almost seven
	801	years for this discrepancy.
	802
	803	=head2 UTF-8 revamp
	804
	805	The handling of Unicode is unclean in many places. For example, the regexp
	806	engine matches in Unicode semantics whenever the string or the pattern is
	807	flagged as UTF-8, but that should not be dependent on an internal storage
	808	detail of the string. Likewise, case folding behaviour is dependent on the
	809	UTF8 internal flag being on or off.
	810
	811	=head2 Properly Unicode safe tokeniser and pads.
	812
	813	The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
	814	variable names are stored in stashes as raw bytes, without the utf-8 flag
	815	set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
	816	tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
	817	source filters. All this could be fixed.
	818
	819	=head2 state variable initialization in list context
	820
	821	Currently this is illegal:
	822
	823	state ($a, $b) = foo();
	824
	825	In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
	826	semantics, which is tricky to implement in Perl 5 as currently they produce
	827	the same opcode trees. The Perl 6 design is firm, so it would be good to
	828	implement the necessary code in Perl 5. There are comments in
	829	C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
	830	constructions involving state variables.
	831
	832	=head2 Implement $value ~~ 0 .. $range
	833
	834	It would be nice to extend the syntax of the C<~~> operator to also
	835	understand numeric (and maybe alphanumeric) ranges.
	836
	837	=head2 A does() built-in
	838
	839	Like ref(), only useful. It would call the C<DOES> method on objects; it
	840	would also tell whether something can be dereferenced as an
	841	array/hash/etc., or used as a regexp, etc.
	842	L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
	843
	844	=head2 Tied filehandles and write() don't mix
	845
	846	There is no method on tied filehandles to allow them to be called back by
	847	formats.
	848
	849	=head2 Attach/detach debugger from running program
	850
	851	The old perltodo notes "With C<gdb>, you can attach the debugger to a running
	852	program if you pass the process ID. It would be good to do this with the Perl
	853	debugger on a running Perl program, although I'm not sure how it would be
	854	done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
	855
	856	=head2 Optimize away empty destructors
	857
	858	Defining an empty DESTROY method might be useful (notably in
	859	AUTOLOAD-enabled classes), but it's still a bit expensive to call. That
	860	could probably be optimized.
	861
	862	=head2 LVALUE functions for lists
	863
	864	The old perltodo notes that lvalue functions don't work for list or hash
	865	slices. This would be good to fix.
	866
	867	=head2 LVALUE functions in the debugger
	868
	869	The old perltodo notes that lvalue functions don't work in the debugger. This
	870	would be good to fix.
	871
	872	=head2 regexp optimiser optional
	873
	874	The regexp optimiser is not optional. It should configurable to be, to allow
	875	its performance to be measured, and its bugs to be easily demonstrated.
	876
	877	=head2 delete &function
	878
	879	Allow to delete functions. One can already undef them, but they're still
	880	in the stash.
	881
	882	=head2 C</w> regex modifier
	883
	884	That flag would enable to match whole words, and also to interpolate
	885	arrays as alternations. With it, C</P/w> would be roughly equivalent to:
	886
	887	do { local $"='\|'; /\b(?:P)\b/ }
	888
	889	See L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
	890	for the discussion.
	891
	892	=head2 optional optimizer
	893
	894	Make the peephole optimizer optional. Currently it performs two tasks as
	895	it walks the optree - genuine peephole optimisations, and necessary fixups of
	896	ops. It would be good to find an efficient way to switch out the
	897	optimisations whilst keeping the fixups.
	898
	899	=head2 You WANT how many
	900
	901	Currently contexts are void, scalar and list. split has a special mechanism in
	902	place to pass in the number of return values wanted. It would be useful to
	903	have a general mechanism for this, backwards compatible and little speed hit.
	904	This would allow proposals such as short circuiting sort to be implemented
	905	as a module on CPAN.
	906
	907	=head2 lexical aliases
	908
	909	Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>.
	910
	911	=head2 entersub XS vs Perl
	912
	913	At the moment pp_entersub is huge, and has code to deal with entering both
	914	perl and XS subroutines. Subroutine implementations rarely change between
	915	perl and XS at run time, so investigate using 2 ops to enter subs (one for
	916	XS, one for perl) and swap between if a sub is redefined.
	917
	918	=head2 Self-ties
	919
	920	Self-ties are currently illegal because they caused too many segfaults. Maybe
	921	the causes of these could be tracked down and self-ties on all types
	922	reinstated.
	923
	924	=head2 Optimize away @_
	925
	926	The old perltodo notes "Look at the "reification" code in C<av.c>".
	927
	928	=head2 Virtualize operating system access
	929
	930	Implement a set of "vtables" that virtualizes operating system access
	931	(open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
	932	least these interfaces should take SVs as "name" arguments instead of
	933	bare char pointers; probably the most flexible and extensible way
	934	would be for the Perl-facing interfaces to accept HVs. The system
	935	needs to be per-operating-system and per-file-system
	936	hookable/filterable, preferably both from XS and Perl level
	937	(L<perlport/"Files and Filesystems"> is good reading at this point,
	938	in fact, all of L<perlport> is.)
	939
	940	This has actually already been implemented (but only for Win32),
	941	take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
	942	variants go through a set of "vtables" for operating system access,
	943	non-Win32 systems currently go straight for the POSIX/UNIX-style
	944	system/library call. Similar system as for Win32 should be
	945	implemented for all platforms. The existing Win32 implementation
	946	probably does not need to survive alongside this proposed new
	947	implementation, the approaches could be merged.
	948
	949	What would this give us? One often-asked-for feature this would
	950	enable is using Unicode for filenames, and other "names" like %ENV,
	951	usernames, hostnames, and so forth.
	952	(See L<perlunicode/"When Unicode Does Not Happen">.)
	953
	954	But this kind of virtualization would also allow for things like
	955	virtual filesystems, virtual networks, and "sandboxes" (though as long
	956	as dynamic loading of random object code is allowed, not very safe
	957	sandboxes since external code of course know not of Perl's vtables).
	958	An example of a smaller "sandbox" is that this feature can be used to
	959	implement per-thread working directories: Win32 already does this.
	960
	961	See also L</"Extend PerlIO and PerlIO::Scalar">.
	962
	963	=head2 Investigate PADTMP hash pessimisation
	964
	965	The peephole optimier converts constants used for hash key lookups to shared
	966	hash key scalars. Under ithreads, something is undoing this work.
	967	See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
	968
	969	=head2 Store the current pad in the OP slab allocator
	970
	971	=for clarification
	972	I hope that I got that "current pad" part correct
	973
	974	Currently we leak ops in various cases of parse failure. I suggested that we
	975	could solve this by always using the op slab allocator, and walking it to
	976	free ops. Dave comments that as some ops are already freed during optree
	977	creation one would have to mark which ops are freed, and not double free them
	978	when walking the slab. He notes that one problem with this is that for some ops
	979	you have to know which pad was current at the time of allocation, which does
	980	change. I suggested storing a pointer to the current pad in the memory allocated
	981	for the slab, and swapping to a new slab each time the pad changes. Dave thinks
	982	that this would work.
	983
	984	=head2 repack the optree
	985
	986	Repacking the optree after execution order is determined could allow
	987	removal of NULL ops, and optimal ordering of OPs with respect to cache-line
	988	filling. The slab allocator could be reused for this purpose. I think that
	989	the best way to do this is to make it an optional step just before the
	990	completed optree is attached to anything else, and to use the slab allocator
	991	unchanged, so that freeing ops is identical whether or not this step runs.
	992	Note that the slab allocator allocates ops downwards in memory, so one would
	993	have to actually "allocate" the ops in reverse-execution order to get them
	994	contiguous in memory in execution order.
	995
	996	See http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
	997
	998	Note that running this copy, and then freeing all the old location ops would
	999	cause their slabs to be freed, which would eliminate possible memory wastage if
	1000	the previous suggestion is implemented, and we swap slabs more frequently.
	1001
	1002	=head2 eliminate incorrect line numbers in warnings
	1003
	1004	This code
	1005
	1006	use warnings;
	1007	my $undef;
	1008
	1009	if ($undef == 3) {
	1010	} elsif ($undef == 0) {
	1011	}
	1012
	1013	used to produce this output:
	1014
	1015	Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
	1016	Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
	1017
	1018	where the line of the second warning was misreported - it should be line 5.
	1019	Rafael fixed this - the problem arose because there was no nextstate OP
	1020	between the execution of the C<if> and the C<elsif>, hence C<PL_curcop> still
	1021	reports that the currently executing line is line 4. The solution was to inject
	1022	a nextstate OPs for each C<elsif>, although it turned out that the nextstate
	1023	OP needed to be a nulled OP, rather than a live nextstate OP, else other line
	1024	numbers became misreported. (Jenga!)
	1025
	1026	The problem is more general than C<elsif> (although the C<elsif> case is the
	1027	most common and the most confusing). Ideally this code
	1028
	1029	use warnings;
	1030	my $undef;
	1031
	1032	my $a = $undef + 1;
	1033	my $b
	1034	= $undef
	1035	+ 1;
	1036
	1037	would produce this output
	1038
	1039	Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
	1040	Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
	1041
	1042	(rather than lines 4 and 5), but this would seem to require every OP to carry
	1043	(at least) line number information.
	1044
	1045	What might work is to have an optional line number in memory just before the
	1046	BASEOP structure, with a flag bit in the op to say whether it's present.
	1047	Initially during compile every OP would carry its line number. Then add a late
	1048	pass to the optimiser (potentially combined with L</repack the optree>) which
	1049	looks at the two ops on every edge of the graph of the execution path. If
	1050	the line number changes, flags the destination OP with this information.
	1051	Once all paths are traced, replace every op with the flag with a
	1052	nextstate-light op (that just updates C<PL_curcop>), which in turn then passes
	1053	control on to the true op. All ops would then be replaced by variants that
	1054	do not store the line number. (Which, logically, why it would work best in
	1055	conjunction with L</repack the optree>, as that is already copying/reallocating
	1056	all the OPs)
	1057
	1058	(Although I should note that we're not certain that doing this for the general
	1059	case is worth it)
	1060
	1061	=head2 optimize tail-calls
	1062
	1063	Tail-calls present an opportunity for broadly applicable optimization;
	1064	anywhere that C<< return foo(...) >> is called, the outer return can
	1065	be replaced by a goto, and foo will return directly to the outer
	1066	caller, saving (conservatively) 25% of perl's call&return cost, which
	1067	is relatively higher than in C. The scheme language is known to do
	1068	this heavily. B::Concise provides good insight into where this
	1069	optimization is possible, ie anywhere entersub,leavesub op-sequence
	1070	occurs.
	1071
	1072	perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
	1073
	1074	Bottom line on this is probably a new pp_tailcall function which
	1075	combines the code in pp_entersub, pp_leavesub. This should probably
	1076	be done 1st in XS, and using B::Generate to patch the new OP into the
	1077	optrees.
	1078
	1079	=head1 Big projects
	1080
	1081	Tasks that will get your name mentioned in the description of the "Highlights
	1082	of 5.12"
	1083
	1084	=head2 make ithreads more robust
	1085
	1086	Generally make ithreads more robust. See also L</iCOW>
	1087
	1088	This task is incremental - even a little bit of work on it will help, and
	1089	will be greatly appreciated.
	1090
	1091	One bit would be to write the missing code in sv.c:Perl_dirp_dup.
	1092
	1093	Fix Perl_sv_dup, et al so that threads can return objects.
	1094
	1095	=head2 iCOW
	1096
	1097	Sarathy and Arthur have a proposal for an improved Copy On Write which
	1098	specifically will be able to COW new ithreads. If this can be implemented
	1099	it would be a good thing.
	1100
	1101	=head2 (?{...}) closures in regexps
	1102
	1103	Fix (or rewrite) the implementation of the C</(?{...})/> closures.
	1104
	1105	=head2 A re-entrant regexp engine
	1106
	1107	This will allow the use of a regex from inside (?{ }), (??{ }) and
	1108	(?(?{ })\|) constructs.
	1109
	1110	=head2 Add class set operations to regexp engine
	1111
	1112	Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
	1113
	1114	demerphq has this on his todo list, but right at the bottom.