perl5.git.perl.org Git - perl5.git/blame_incremental

... / ...

Commit	Line	Data
	1	=head1 NAME
	2
	3	perltodo - Perl TO-DO List
	4
	5	=head1 DESCRIPTION
	6
	7	This is a list of wishes for Perl. The most up to date version of this file
	8	is at http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/pod/perltodo.pod
	9
	10	The tasks we think are smaller or easier are listed first. Anyone is welcome
	11	to work on any of these, but it's a good idea to first contact
	12	I<perl5-porters@perl.org> to avoid duplication of effort, and to learn from
	13	any previous attempts. By all means contact a pumpking privately first if you
	14	prefer.
	15
	16	Whilst patches to make the list shorter are most welcome, ideas to add to
	17	the list are also encouraged. Check the perl5-porters archives for past
	18	ideas, and any discussion about them. One set of archives may be found at:
	19
	20	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
	21
	22	What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
	23	not, but if your patch is incorporated, then we'll add your name to the
	24	F<AUTHORS> file, which ships in the official distribution. How many other
	25	programming languages offer you 1 line of immortality?
	26
	27	=head1 Tasks that only need Perl knowledge
	28
	29	=head2 Smartmatch design issues
	30
	31	In 5.10.0 the smartmatch operator C<~~> isn't working quite "right". But
	32	before we can fix the implementation, we need to define what "right" is.
	33	The first problem is that Robin Houston implemented the Perl 6 smart match
	34	spec as of February 2006, when smart match was axiomatically symmetrical:
	35	L<http://groups.google.com/group/perl.perl6.language/msg/bf2b486f089ad021>
	36
	37	Since then the Perl 6 target moved, but the Perl 5 implementation did not.
	38
	39	So it would be useful for someone to compare the Perl 6 smartmatch table
	40	as of February 2006 L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?view=markup&pathrev=7615>
	41	and the current table L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?revision=14556&view=markup>
	42	and tabulate the differences in Perl 6. The annotated view of changes is
	43	L<http://svn.perl.org/viewvc/perl6/doc/trunk/design/syn/S03.pod?view=annotate> and the diff is
	44	C<svn diff -r7615:14556 http://svn.perl.org/perl6/doc/trunk/design/syn/S03.pod>
	45	-- search for C<=head1 Smart matching>. (In theory F<viewvc> can generate that,
	46	but in practice when I tried it hung forever, I assume "thinking")
	47
	48	With that done and published, someone (else) can then map any changed Perl 6
	49	semantics back to Perl 5, based on how the existing semantics map to Perl 5:
	50	L<http://search.cpan.org/~rgarcia/perl-5.10.0/pod/perlsyn.pod#Smart_matching_in_detail>
	51
	52
	53	There are also some questions that need answering:
	54
	55	=over 4
	56
	57	=item *
	58
	59	How do you negate one? (documentation issue)
	60	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-01/msg00071.html
	61
	62	=item *
	63
	64	Array behaviors
	65	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-12/msg00799.html
	66
	67	* Should smart matches be symmetrical? (Perl 6 says no)
	68
	69	* Other differences between Perl 5 and Perl 6 smart match?
	70
	71	=item *
	72
	73	Objects and smart match
	74	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-12/msg00865.html
	75
	76	=back
	77
	78	=head2 Remove duplication of test setup.
	79
	80	Schwern notes, that there's duplication of code - lots and lots of tests have
	81	some variation on the big block of C<$Is_Foo> checks. We can safely put this
	82	into a file, change it to build an C<%Is> hash and require it. Maybe just put
	83	it into F<test.pl>. Throw in the handy tainting subroutines.
	84
	85	=head2 POD -E<gt> HTML conversion in the core still sucks
	86
	87	Which is crazy given just how simple POD purports to be, and how simple HTML
	88	can be. It's not actually I<as> simple as it sounds, particularly with the
	89	flexibility POD allows for C<=item>, but it would be good to improve the
	90	visual appeal of the HTML generated, and to avoid it having any validation
	91	errors. See also L</make HTML install work>, as the layout of installation tree
	92	is needed to improve the cross-linking.
	93
	94	The addition of C<Pod::Simple> and its related modules may make this task
	95	easier to complete.
	96
	97	=head2 Parallel testing
	98
	99	(This probably impacts much more than the core: also the Test::Harness
	100	and TAP::* modules on CPAN.)
	101
	102	All of the tests in F<t/> can now be run in parallel, if C<$ENV{TEST_JOBS}>
	103	is set. However, tests within each directory in F<ext> and F<lib> are still
	104	run in series, with directories run in parallel. This is an adequate
	105	heuristic, but it might be possible to relax it further, and get more
	106	throughput. Specifically, it would be good to audit all of F<lib/*.t>, and
	107	make them use C<File::Temp>.
	108
	109	=head2 Make Schwern poorer
	110
	111	We should have tests for everything. When all the core's modules are tested,
	112	Schwern has promised to donate to $500 to TPF. We may need volunteers to
	113	hold him upside down and shake vigorously in order to actually extract the
	114	cash.
	115
	116	=head2 Improve the coverage of the core tests
	117
	118	Use Devel::Cover to ascertain the core modules's test coverage, then add
	119	tests that are currently missing.
	120
	121	=head2 test B
	122
	123	A full test suite for the B module would be nice.
	124
	125	=head2 A decent benchmark
	126
	127	C<perlbench> seems impervious to any recent changes made to the perl core. It
	128	would be useful to have a reasonable general benchmarking suite that roughly
	129	represented what current perl programs do, and measurably reported whether
	130	tweaks to the core improve, degrade or don't really affect performance, to
	131	guide people attempting to optimise the guts of perl. Gisle would welcome
	132	new tests for perlbench.
	133
	134	=head2 fix tainting bugs
	135
	136	Fix the bugs revealed by running the test suite with the C<-t> switch (via
	137	C<make test.taintwarn>).
	138
	139	=head2 Dual life everything
	140
	141	As part of the "dists" plan, anything that doesn't belong in the smallest perl
	142	distribution needs to be dual lifed. Anything else can be too. Figure out what
	143	changes would be needed to package that module and its tests up for CPAN, and
	144	do so. Test it with older perl releases, and fix the problems you find.
	145
	146	To make a minimal perl distribution, it's useful to look at
	147	F<t/lib/commonsense.t>.
	148
	149	=head2 Bundle dual life modules in ext/
	150
	151	For maintenance (and branch merging) reasons, it would be useful to move
	152	some architecture-independent dual-life modules from lib/ to ext/, if this
	153	has no negative impact on the build of perl itself.
	154
	155	=head2 POSIX memory footprint
	156
	157	Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
	158	various times worked to cut it down. There is probably still fat to cut out -
	159	for example POSIX passes Exporter some very memory hungry data structures.
	160
	161	=head2 embed.pl/makedef.pl
	162
	163	There is a script F<embed.pl> that generates several header files to prefix
	164	all of Perl's symbols in a consistent way, to provide some semblance of
	165	namespace support in C<C>. Functions are declared in F<embed.fnc>, variables
	166	in F<interpvar.h>. Quite a few of the functions and variables
	167	are conditionally declared there, using C<#ifdef>. However, F<embed.pl>
	168	doesn't understand the C macros, so the rules about which symbols are present
	169	when is duplicated in F<makedef.pl>. Writing things twice is bad, m'kay.
	170	It would be good to teach C<embed.pl> to understand the conditional
	171	compilation, and hence remove the duplication, and the mistakes it has caused.
	172
	173	=head2 use strict; and AutoLoad
	174
	175	Currently if you write
	176
	177	package Whack;
	178	use AutoLoader 'AUTOLOAD';
	179	use strict;
	180	1;
	181	__END__
	182	sub bloop {
	183	print join (' ', No, strict, here), "!\n";
	184	}
	185
	186	then C<use strict;> isn't in force within the autoloaded subroutines. It would
	187	be more consistent (and less surprising) to arrange for all lexical pragmas
	188	in force at the __END__ block to be in force within each autoloaded subroutine.
	189
	190	There's a similar problem with SelfLoader.
	191
	192	=head2 profile installman
	193
	194	The F<installman> script is slow. All it is doing text processing, which we're
	195	told is something Perl is good at. So it would be nice to know what it is doing
	196	that is taking so much CPU, and where possible address it.
	197
	198
	199	=head1 Tasks that need a little sysadmin-type knowledge
	200
	201	Or if you prefer, tasks that you would learn from, and broaden your skills
	202	base...
	203
	204	=head2 make HTML install work
	205
	206	There is an C<installhtml> target in the Makefile. It's marked as
	207	"experimental". It would be good to get this tested, make it work reliably, and
	208	remove the "experimental" tag. This would include
	209
	210	=over 4
	211
	212	=item 1
	213
	214	Checking that cross linking between various parts of the documentation works.
	215	In particular that links work between the modules (files with POD in F<lib/>)
	216	and the core documentation (files in F<pod/>)
	217
	218	=item 2
	219
	220	Work out how to split C<perlfunc> into chunks, preferably one per function
	221	group, preferably with general case code that could be used elsewhere.
	222	Challenges here are correctly identifying the groups of functions that go
	223	together, and making the right named external cross-links point to the right
	224	page. Things to be aware of are C<-X>, groups such as C<getpwnam> to
	225	C<endservent>, two or more C<=items> giving the different parameter lists, such
	226	as
	227
	228	=item substr EXPR,OFFSET,LENGTH,REPLACEMENT
	229	=item substr EXPR,OFFSET,LENGTH
	230	=item substr EXPR,OFFSET
	231
	232	and different parameter lists having different meanings. (eg C<select>)
	233
	234	=back
	235
	236	=head2 compressed man pages
	237
	238	Be able to install them. This would probably need a configure test to see how
	239	the system does compressed man pages (same directory/different directory?
	240	same filename/different filename), as well as tweaking the F<installman> script
	241	to compress as necessary.
	242
	243	=head2 Add a code coverage target to the Makefile
	244
	245	Make it easy for anyone to run Devel::Cover on the core's tests. The steps
	246	to do this manually are roughly
	247
	248	=over 4
	249
	250	=item *
	251
	252	do a normal C<Configure>, but include Devel::Cover as a module to install
	253	(see F<INSTALL> for how to do this)
	254
	255	=item *
	256
	257	make perl
	258
	259	=item *
	260
	261	cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
	262
	263	=item *
	264
	265	Process the resulting Devel::Cover database
	266
	267	=back
	268
	269	This just give you the coverage of the F<.pm>s. To also get the C level
	270	coverage you need to
	271
	272	=over 4
	273
	274	=item *
	275
	276	Additionally tell C<Configure> to use the appropriate C compiler flags for
	277	C<gcov>
	278
	279	=item *
	280
	281	make perl.gcov
	282
	283	(instead of C<make perl>)
	284
	285	=item *
	286
	287	After running the tests run C<gcov> to generate all the F<.gcov> files.
	288	(Including down in the subdirectories of F<ext/>
	289
	290	=item *
	291
	292	(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
	293	to get their stats into the cover_db directory.
	294
	295	=item *
	296
	297	Then process the Devel::Cover database
	298
	299	=back
	300
	301	It would be good to add a single switch to C<Configure> to specify that you
	302	wanted to perform perl level coverage, and another to specify C level
	303	coverage, and have C<Configure> and the F<Makefile> do all the right things
	304	automatically.
	305
	306	=head2 Make Config.pm cope with differences between built and installed perl
	307
	308	Quite often vendors ship a perl binary compiled with their (pay-for)
	309	compilers. People install a free compiler, such as gcc. To work out how to
	310	build extensions, Perl interrogates C<%Config>, so in this situation
	311	C<%Config> describes compilers that aren't there, and extension building
	312	fails. This forces people into choosing between re-compiling perl themselves
	313	using the compiler they have, or only using modules that the vendor ships.
	314
	315	It would be good to find a way teach C<Config.pm> about the installation setup,
	316	possibly involving probing at install time or later, so that the C<%Config> in
	317	a binary distribution better describes the installed machine, when the
	318	installed machine differs from the build machine in some significant way.
	319
	320	=head2 linker specification files
	321
	322	Some platforms mandate that you provide a list of a shared library's external
	323	symbols to the linker, so the core already has the infrastructure in place to
	324	do this for generating shared perl libraries. My understanding is that the
	325	GNU toolchain can accept an optional linker specification file, and restrict
	326	visibility just to symbols declared in that file. It would be good to extend
	327	F<makedef.pl> to support this format, and to provide a means within
	328	C<Configure> to enable it. This would allow Unix users to test that the
	329	export list is correct, and to build a perl that does not pollute the global
	330	namespace with private symbols.
	331
	332	=head2 Cross-compile support
	333
	334	Currently C<Configure> understands C<-Dusecrosscompile> option. This option
	335	arranges for building C<miniperl> for TARGET machine, so this C<miniperl> is
	336	assumed then to be copied to TARGET machine and used as a replacement of full
	337	C<perl> executable.
	338
	339	This could be done little differently. Namely C<miniperl> should be built for
	340	HOST and then full C<perl> with extensions should be compiled for TARGET.
	341	This, however, might require extra trickery for %Config: we have one config
	342	first for HOST and then another for TARGET. Tools like MakeMaker will be
	343	mightily confused. Having around two different types of executables and
	344	libraries (HOST and TARGET) makes life interesting for Makefiles and
	345	shell (and Perl) scripts. There is $Config{run}, normally empty, which
	346	can be used as an execution wrapper. Also note that in some
	347	cross-compilation/execution environments the HOST and the TARGET do
	348	not see the same filesystem(s), the $Config{run} may need to do some
	349	file/directory copying back and forth.
	350
	351	=head2 roffitall
	352
	353	Make F<pod/roffitall> be updated by F<pod/buildtoc>.
	354
	355	=head2 Split "linker" from "compiler"
	356
	357	Right now, Configure probes for two commands, and sets two variables:
	358
	359	=over 4
	360
	361	=item * C<cc> (in F<cc.U>)
	362
	363	This variable holds the name of a command to execute a C compiler which
	364	can resolve multiple global references that happen to have the same
	365	name. Usual values are F<cc> and F<gcc>.
	366	Fervent ANSI compilers may be called F<c89>. AIX has F<xlc>.
	367
	368	=item * C<ld> (in F<dlsrc.U>)
	369
	370	This variable indicates the program to be used to link
	371	libraries for dynamic loading. On some systems, it is F<ld>.
	372	On ELF systems, it should be C<$cc>. Mostly, we'll try to respect
	373	the hint file setting.
	374
	375	=back
	376
	377	There is an implicit historical assumption from around Perl5.000alpha
	378	something, that C<$cc> is also the correct command for linking object files
	379	together to make an executable. This may be true on Unix, but it's not true
	380	on other platforms, and there are a maze of work arounds in other places (such
	381	as F<Makefile.SH>) to cope with this.
	382
	383	Ideally, we should create a new variable to hold the name of the executable
	384	linker program, probe for it in F<Configure>, and centralise all the special
	385	case logic there or in hints files.
	386
	387	A small bikeshed issue remains - what to call it, given that C<$ld> is already
	388	taken (arguably for the wrong thing now, but on SunOS 4.1 it is the command
	389	for creating dynamically-loadable modules) and C<$link> could be confused with
	390	the Unix command line executable of the same name, which does something
	391	completely different. Andy Dougherty makes the counter argument "In parrot, I
	392	tried to call the command used to link object files and libraries into an
	393	executable F<link>, since that's what my vaguely-remembered DOS and VMS
	394	experience suggested. I don't think any real confusion has ensued, so it's
	395	probably a reasonable name for perl5 to use."
	396
	397	"Alas, I've always worried that introducing it would make things worse,
	398	since now the module building utilities would have to look for
	399	C<$Config{link}> and institute a fall-back plan if it weren't found."
	400	Although I can see that as confusing, given that C<$Config{d_link}> is true
	401	when (hard) links are available.
	402
	403	=head1 Tasks that need a little C knowledge
	404
	405	These tasks would need a little C knowledge, but don't need any specific
	406	background or experience with XS, or how the Perl interpreter works
	407
	408	=head2 Weed out needless PERL_UNUSED_ARG
	409
	410	The C code uses the macro C<PERL_UNUSED_ARG> to stop compilers warning about
	411	unused arguments. Often the arguments can't be removed, as there is an
	412	external constraint that determines the prototype of the function, so this
	413	approach is valid. However, there are some cases where C<PERL_UNUSED_ARG>
	414	could be removed. Specifically
	415
	416	=over 4
	417
	418	=item *
	419
	420	The prototypes of (nearly all) static functions can be changed
	421
	422	=item *
	423
	424	Unused arguments generated by short cut macros are wasteful - the short cut
	425	macro used can be changed.
	426
	427	=back
	428
	429	=head2 Modernize the order of directories in @INC
	430
	431	The way @INC is laid out by default, one cannot upgrade core (dual-life)
	432	modules without overwriting files. This causes problems for binary
	433	package builders. One possible proposal is laid out in this
	434	message:
	435	L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
	436
	437	=head2 -Duse32bit*
	438
	439	Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
	440	On these systems, it might be the default compilation mode, and there
	441	is currently no guarantee that passing no use64bitall option to the
	442	Configure process will build a 32bit perl. Implementing -Duse32bit*
	443	options would be nice for perl 5.12.
	444
	445	=head2 Profile Perl - am I hot or not?
	446
	447	The Perl source code is stable enough that it makes sense to profile it,
	448	identify and optimise the hotspots. It would be good to measure the
	449	performance of the Perl interpreter using free tools such as cachegrind,
	450	gprof, and dtrace, and work to reduce the bottlenecks they reveal.
	451
	452	As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops,
	453	the ops that are most commonly used. The idea is that by grouping them, their
	454	object code will be adjacent in the executable, so they have a greater chance
	455	of already being in the CPU cache (or swapped in) due to being near another op
	456	already in use.
	457
	458	Except that it's not clear if these really are the most commonly used ops. So
	459	as part of exercising your skills with coverage and profiling tools you might
	460	want to determine what ops I<really> are the most commonly used. And in turn
	461	suggest evictions and promotions to achieve a better F<pp_hot.c>.
	462
	463	One piece of Perl code that might make a good testbed is F<installman>.
	464
	465	=head2 Allocate OPs from arenas
	466
	467	Currently all new OP structures are individually malloc()ed and free()d.
	468	All C<malloc> implementations have space overheads, and are now as fast as
	469	custom allocates so it would both use less memory and less CPU to allocate
	470	the various OP structures from arenas. The SV arena code can probably be
	471	re-used for this.
	472
	473	Note that Configuring perl with C<-Accflags=-DPL_OP_SLAB_ALLOC> will use
	474	Perl_Slab_alloc() to pack optrees into a contiguous block, which is
	475	probably superior to the use of OP arenas, esp. from a cache locality
	476	standpoint. See L<Profile Perl - am I hot or not?>.
	477
	478	=head2 Improve win32/wince.c
	479
	480	Currently, numerous functions look virtually, if not completely,
	481	identical in both C<win32/wince.c> and C<win32/win32.c> files, which can't
	482	be good.
	483
	484	=head2 Use secure CRT functions when building with VC8 on Win32
	485
	486	Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the basis
	487	that they were "unsafe" and introduced differently named secure versions of
	488	them as replacements, e.g. instead of writing
	489
	490	FILE* f = fopen(__FILE__, "r");
	491
	492	one should now write
	493
	494	FILE* f;
	495	errno_t err = fopen_s(&f, __FILE__, "r");
	496
	497	Currently, the warnings about these deprecations have been disabled by adding
	498	-D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to remove that
	499	warning suppressant and actually make use of the new secure CRT functions.
	500
	501	There is also a similar issue with POSIX CRT function names like fileno having
	502	been deprecated in favour of ISO C++ conformant names like _fileno. These
	503	warnings are also currently suppressed by adding -D_CRT_NONSTDC_NO_DEPRECATE. It
	504	might be nice to do as Microsoft suggest here too, although, unlike the secure
	505	functions issue, there is presumably little or no benefit in this case.
	506
	507	=head2 Fix POSIX::access() and chdir() on Win32
	508
	509	These functions currently take no account of DACLs and therefore do not behave
	510	correctly in situations where access is restricted by DACLs (as opposed to the
	511	read-only attribute).
	512
	513	Furthermore, POSIX::access() behaves differently for directories having the
	514	read-only attribute set depending on what CRT library is being used. For
	515	example, the _access() function in the VC6 and VC7 CRTs (wrongly) claim that
	516	such directories are not writable, whereas in fact all directories are writable
	517	unless access is denied by DACLs. (In the case of directories, the read-only
	518	attribute actually only means that the directory cannot be deleted.) This CRT
	519	bug is fixed in the VC8 and VC9 CRTs (but, of course, the directory may still
	520	not actually be writable if access is indeed denied by DACLs).
	521
	522	For the chdir() issue, see ActiveState bug #74552:
	523	http://bugs.activestate.com/show_bug.cgi?id=74552
	524
	525	Therefore, DACLs should be checked both for consistency across CRTs and for
	526	the correct answer.
	527
	528	(Note that perl's -w operator should not be modified to check DACLs. It has
	529	been written so that it reflects the state of the read-only attribute, even
	530	for directories (whatever CRT is being used), for symmetry with chmod().)
	531
	532	=head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
	533
	534	Maybe create a utility that checks after each libperl.a creation that
	535	none of the above (nor sprintf(), vsprintf(), or SHUDDER gets())
	536	ever creep back to libperl.a.
	537
	538	nm libperl.a \| ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at\|py)\|v?sprintf\|gets)$/'
	539
	540	Note, of course, that this will only tell whether B<your> platform
	541	is using those naughty interfaces.
	542
	543	=head2 -D_FORTIFY_SOURCE=2, -fstack-protector
	544
	545	Recent glibcs support C<-D_FORTIFY_SOURCE=2> and recent gcc
	546	(4.1 onwards?) supports C<-fstack-protector>, both of which give
	547	protection against various kinds of buffer overflow problems.
	548	These should probably be used for compiling Perl whenever available,
	549	Configure and/or hints files should be adjusted to probe for the
	550	availability of these features and enable them as appropriate.
	551
	552	=head2 Arenas for GPs? For MAGIC?
	553
	554	C<struct gp> and C<struct magic> are both currently allocated by C<malloc>.
	555	It might be a speed or memory saving to change to using arenas. Or it might
	556	not. It would need some suitable benchmarking first. In particular, C<GP>s
	557	can probably be changed with minimal compatibility impact (probably nothing
	558	outside of the core, or even outside of F<gv.c> allocates them), but they
	559	probably aren't allocated/deallocated often enough for a speed saving. Whereas
	560	C<MAGIC> is allocated/deallocated more often, but in turn, is also something
	561	more externally visible, so changing the rules here may bite external code.
	562
	563	=head2 Shared arenas
	564
	565	Several SV body structs are now the same size, notably PVMG and PVGV, PVAV and
	566	PVHV, and PVCV and PVFM. It should be possible to allocate and return same
	567	sized bodies from the same actual arena, rather than maintaining one arena for
	568	each. This could save 4-6K per thread, of memory no longer tied up in the
	569	not-yet-allocated part of an arena.
	570
	571
	572	=head1 Tasks that need a knowledge of XS
	573
	574	These tasks would need C knowledge, and roughly the level of knowledge of
	575	the perl API that comes from writing modules that use XS to interface to
	576	C.
	577
	578	=head2 safely supporting POSIX SA_SIGINFO
	579
	580	Some years ago Jarkko supplied patches to provide support for the POSIX
	581	SA_SIGINFO feature in Perl, passing the extra data to the Perl signal handler.
	582
	583	Unfortunately, it only works with "unsafe" signals, because under safe
	584	signals, by the time Perl gets to run the signal handler, the extra
	585	information has been lost. Moreover, it's not easy to store it somewhere,
	586	as you can't call mutexs, or do anything else fancy, from inside a signal
	587	handler.
	588
	589	So it strikes me that we could provide safe SA_SIGINFO support
	590
	591	=over 4
	592
	593	=item 1
	594
	595	Provide global variables for two file descriptors
	596
	597	=item 2
	598
	599	When the first request is made via C<sigaction> for C<SA_SIGINFO>, create a
	600	pipe, store the reader in one, the writer in the other
	601
	602	=item 3
	603
	604	In the "safe" signal handler (C<Perl_csighandler()>/C<S_raise_signal()>), if
	605	the C<siginfo_t> pointer non-C<NULL>, and the writer file handle is open,
	606
	607	=over 8
	608
	609	=item 1
	610
	611	serialise signal number, C<struct siginfo_t> (or at least the parts we care
	612	about) into a small auto char buff
	613
	614	=item 2
	615
	616	C<write()> that (non-blocking) to the writer fd
	617
	618	=over 12
	619
	620	=item 1
	621
	622	if it writes 100%, flag the signal in a counter of "signals on the pipe" akin
	623	to the current per-signal-number counts
	624
	625	=item 2
	626
	627	if it writes 0%, assume the pipe is full. Flag the data as lost?
	628
	629	=item 3
	630
	631	if it writes partially, croak a panic, as your OS is broken.
	632
	633	=back
	634
	635	=back
	636
	637	=item 4
	638
	639	in the regular C<PERL_ASYNC_CHECK()> processing, if there are "signals on
	640	the pipe", read the data out, deserialise, build the Perl structures on
	641	the stack (code in C<Perl_sighandler()>, the "unsafe" handler), and call as
	642	usual.
	643
	644	=back
	645
	646	I think that this gets us decent C<SA_SIGINFO> support, without the current risk
	647	of running Perl code inside the signal handler context. (With all the dangers
	648	of things like C<malloc> corruption that that currently offers us)
	649
	650	For more information see the thread starting with this message:
	651	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html
	652
	653	=head2 autovivification
	654
	655	Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
	656
	657	This task is incremental - even a little bit of work on it will help.
	658
	659	=head2 Unicode in Filenames
	660
	661	chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
	662	opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
	663	system, truncate, unlink, utime, -X. All these could potentially accept
	664	Unicode filenames either as input or output (and in the case of system
	665	and qx Unicode in general, as input or output to/from the shell).
	666	Whether a filesystem - an operating system pair understands Unicode in
	667	filenames varies.
	668
	669	Known combinations that have some level of understanding include
	670	Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
	671	OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
	672	create Unicode filenames, what forms of Unicode are accepted and used
	673	(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
	674	and so on, varies. Finding the right level of interfacing to Perl
	675	requires some thought. Remember that an OS does not implicate a
	676	filesystem.
	677
	678	(The Windows -C command flag "wide API support" has been at least
	679	temporarily retired in 5.8.1, and the -C has been repurposed, see
	680	L<perlrun>.)
	681
	682	Most probably the right way to do this would be this:
	683	L</"Virtualize operating system access">.
	684
	685	=head2 Unicode in %ENV
	686
	687	Currently the %ENV entries are always byte strings.
	688	See L</"Virtualize operating system access">.
	689
	690	=head2 Unicode and glob()
	691
	692	Currently glob patterns and filenames returned from File::Glob::glob()
	693	are always byte strings. See L</"Virtualize operating system access">.
	694
	695	=head2 Unicode and lc/uc operators
	696
	697	Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
	698	what the internal encoding of their argument is. That should not be the
	699	case. Maybe add a pragma to switch behaviour.
	700
	701	=head2 use less 'memory'
	702
	703	Investigate trade offs to switch out perl's choices on memory usage.
	704	Particularly perl should be able to give memory back.
	705
	706	This task is incremental - even a little bit of work on it will help.
	707
	708	=head2 Re-implement C<:unique> in a way that is actually thread-safe
	709
	710	The old implementation made bad assumptions on several levels. A good 90%
	711	solution might be just to make C<:unique> work to share the string buffer
	712	of SvPVs. That way large constant strings can be shared between ithreads,
	713	such as the configuration information in F<Config>.
	714
	715	=head2 Make tainting consistent
	716
	717	Tainting would be easier to use if it didn't take documented shortcuts and
	718	allow taint to "leak" everywhere within an expression.
	719
	720	=head2 readpipe(LIST)
	721
	722	system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
	723	running a shell. readpipe() (the function behind qx//) could be similarly
	724	extended.
	725
	726	=head2 Audit the code for destruction ordering assumptions
	727
	728	Change 25773 notes
	729
	730	/* Need to check SvMAGICAL, as during global destruction it may be that
	731	AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
	732	is now part of the linked list of SV heads, rather than pointing to
	733	the original body. */
	734	/* FIXME - audit the code for other bugs like this one. */
	735
	736	adding the C<SvMAGICAL> check to
	737
	738	if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
	739	MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
	740
	741	Go through the core and look for similar assumptions that SVs have particular
	742	types, as all bets are off during global destruction.
	743
	744	=head2 Extend PerlIO and PerlIO::Scalar
	745
	746	PerlIO::Scalar doesn't know how to truncate(). Implementing this
	747	would require extending the PerlIO vtable.
	748
	749	Similarly the PerlIO vtable doesn't know about formats (write()), or
	750	about stat(), or chmod()/chown(), utime(), or flock().
	751
	752	(For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
	753	would mean.)
	754
	755	PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
	756	opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
	757	readlink().
	758
	759	See also L</"Virtualize operating system access">.
	760
	761	=head2 -C on the #! line
	762
	763	It should be possible to make -C work correctly if found on the #! line,
	764	given that all perl command line options are strict ASCII, and -C changes
	765	only the interpretation of non-ASCII characters, and not for the script file
	766	handle. To make it work needs some investigation of the ordering of function
	767	calls during startup, and (by implication) a bit of tweaking of that order.
	768
	769	=head2 Organize error messages
	770
	771	Perl's diagnostics (error messages, see L<perldiag>) could use
	772	reorganizing and formalizing so that each error message has its
	773	stable-for-all-eternity unique id, categorized by severity, type, and
	774	subsystem. (The error messages would be listed in a datafile outside
	775	of the Perl source code, and the source code would only refer to the
	776	messages by the id.) This clean-up and regularizing should apply
	777	for all croak() messages.
	778
	779	This would enable all sorts of things: easier translation/localization
	780	of the messages (though please do keep in mind the caveats of
	781	L<Locale::Maketext> about too straightforward approaches to
	782	translation), filtering by severity, and instead of grepping for a
	783	particular error message one could look for a stable error id. (Of
	784	course, changing the error messages by default would break all the
	785	existing software depending on some particular error message...)
	786
	787	This kind of functionality is known as I<message catalogs>. Look for
	788	inspiration for example in the catgets() system, possibly even use it
	789	if available-- but B<only> if available, all platforms will B<not>
	790	have catgets().
	791
	792	For the really pure at heart, consider extending this item to cover
	793	also the warning messages (see L<perllexwarn>, C<warnings.pl>).
	794
	795	=head1 Tasks that need a knowledge of the interpreter
	796
	797	These tasks would need C knowledge, and knowledge of how the interpreter works,
	798	or a willingness to learn.
	799
	800	=head2 error reporting of [$a ; $b]
	801
	802	Using C<;> inside brackets is a syntax error, and we don't propose to change
	803	that by giving it any meaning. However, it's not reported very helpfully:
	804
	805	$ perl -e '$a = [$b; $c];'
	806	syntax error at -e line 1, near "$b;"
	807	syntax error at -e line 1, near "$c]"
	808	Execution of -e aborted due to compilation errors.
	809
	810	It should be possible to hook into the tokeniser or the lexer, so that when a
	811	C<;> is parsed where it is not legal as a statement terminator (ie inside
	812	C<{}> used as a hashref, C<[]> or C<()>) it issues an error something like
	813	I<';' isn't legal inside an expression - if you need multiple statements use a
	814	do {...} block>. See the thread starting at
	815	http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00573.html
	816
	817	=head2 lexicals used only once
	818
	819	This warns:
	820
	821	$ perl -we '$pie = 42'
	822	Name "main::pie" used only once: possible typo at -e line 1.
	823
	824	This does not:
	825
	826	$ perl -we 'my $pie = 42'
	827
	828	Logically all lexicals used only once should warn, if the user asks for
	829	warnings. An unworked RT ticket (#5087) has been open for almost seven
	830	years for this discrepancy.
	831
	832	=head2 UTF-8 revamp
	833
	834	The handling of Unicode is unclean in many places. For example, the regexp
	835	engine matches in Unicode semantics whenever the string or the pattern is
	836	flagged as UTF-8, but that should not be dependent on an internal storage
	837	detail of the string. Likewise, case folding behaviour is dependent on the
	838	UTF8 internal flag being on or off.
	839
	840	=head2 Properly Unicode safe tokeniser and pads.
	841
	842	The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
	843	variable names are stored in stashes as raw bytes, without the utf-8 flag
	844	set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
	845	tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
	846	source filters. All this could be fixed.
	847
	848	=head2 state variable initialization in list context
	849
	850	Currently this is illegal:
	851
	852	state ($a, $b) = foo();
	853
	854	In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
	855	semantics, which is tricky to implement in Perl 5 as currently they produce
	856	the same opcode trees. The Perl 6 design is firm, so it would be good to
	857	implement the necessary code in Perl 5. There are comments in
	858	C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
	859	constructions involving state variables.
	860
	861	=head2 Implement $value ~~ 0 .. $range
	862
	863	It would be nice to extend the syntax of the C<~~> operator to also
	864	understand numeric (and maybe alphanumeric) ranges.
	865
	866	=head2 A does() built-in
	867
	868	Like ref(), only useful. It would call the C<DOES> method on objects; it
	869	would also tell whether something can be dereferenced as an
	870	array/hash/etc., or used as a regexp, etc.
	871	L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
	872
	873	=head2 Tied filehandles and write() don't mix
	874
	875	There is no method on tied filehandles to allow them to be called back by
	876	formats.
	877
	878	=head2 Attach/detach debugger from running program
	879
	880	The old perltodo notes "With C<gdb>, you can attach the debugger to a running
	881	program if you pass the process ID. It would be good to do this with the Perl
	882	debugger on a running Perl program, although I'm not sure how it would be
	883	done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
	884
	885	=head2 LVALUE functions for lists
	886
	887	The old perltodo notes that lvalue functions don't work for list or hash
	888	slices. This would be good to fix.
	889
	890	=head2 regexp optimiser optional
	891
	892	The regexp optimiser is not optional. It should configurable to be, to allow
	893	its performance to be measured, and its bugs to be easily demonstrated.
	894
	895	=head2 delete &function
	896
	897	Allow to delete functions. One can already undef them, but they're still
	898	in the stash.
	899
	900	=head2 C</w> regex modifier
	901
	902	That flag would enable to match whole words, and also to interpolate
	903	arrays as alternations. With it, C</P/w> would be roughly equivalent to:
	904
	905	do { local $"='\|'; /\b(?:P)\b/ }
	906
	907	See L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
	908	for the discussion.
	909
	910	=head2 optional optimizer
	911
	912	Make the peephole optimizer optional. Currently it performs two tasks as
	913	it walks the optree - genuine peephole optimisations, and necessary fixups of
	914	ops. It would be good to find an efficient way to switch out the
	915	optimisations whilst keeping the fixups.
	916
	917	=head2 You WANT how many
	918
	919	Currently contexts are void, scalar and list. split has a special mechanism in
	920	place to pass in the number of return values wanted. It would be useful to
	921	have a general mechanism for this, backwards compatible and little speed hit.
	922	This would allow proposals such as short circuiting sort to be implemented
	923	as a module on CPAN.
	924
	925	=head2 lexical aliases
	926
	927	Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>.
	928
	929	=head2 entersub XS vs Perl
	930
	931	At the moment pp_entersub is huge, and has code to deal with entering both
	932	perl and XS subroutines. Subroutine implementations rarely change between
	933	perl and XS at run time, so investigate using 2 ops to enter subs (one for
	934	XS, one for perl) and swap between if a sub is redefined.
	935
	936	=head2 Self-ties
	937
	938	Self-ties are currently illegal because they caused too many segfaults. Maybe
	939	the causes of these could be tracked down and self-ties on all types
	940	reinstated.
	941
	942	=head2 Optimize away @_
	943
	944	The old perltodo notes "Look at the "reification" code in C<av.c>".
	945
	946	=head2 Virtualize operating system access
	947
	948	Implement a set of "vtables" that virtualizes operating system access
	949	(open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
	950	least these interfaces should take SVs as "name" arguments instead of
	951	bare char pointers; probably the most flexible and extensible way
	952	would be for the Perl-facing interfaces to accept HVs. The system
	953	needs to be per-operating-system and per-file-system
	954	hookable/filterable, preferably both from XS and Perl level
	955	(L<perlport/"Files and Filesystems"> is good reading at this point,
	956	in fact, all of L<perlport> is.)
	957
	958	This has actually already been implemented (but only for Win32),
	959	take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
	960	variants go through a set of "vtables" for operating system access,
	961	non-Win32 systems currently go straight for the POSIX/UNIX-style
	962	system/library call. Similar system as for Win32 should be
	963	implemented for all platforms. The existing Win32 implementation
	964	probably does not need to survive alongside this proposed new
	965	implementation, the approaches could be merged.
	966
	967	What would this give us? One often-asked-for feature this would
	968	enable is using Unicode for filenames, and other "names" like %ENV,
	969	usernames, hostnames, and so forth.
	970	(See L<perlunicode/"When Unicode Does Not Happen">.)
	971
	972	But this kind of virtualization would also allow for things like
	973	virtual filesystems, virtual networks, and "sandboxes" (though as long
	974	as dynamic loading of random object code is allowed, not very safe
	975	sandboxes since external code of course know not of Perl's vtables).
	976	An example of a smaller "sandbox" is that this feature can be used to
	977	implement per-thread working directories: Win32 already does this.
	978
	979	See also L</"Extend PerlIO and PerlIO::Scalar">.
	980
	981	=head2 Investigate PADTMP hash pessimisation
	982
	983	The peephole optimiser converts constants used for hash key lookups to shared
	984	hash key scalars. Under ithreads, something is undoing this work.
	985	See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
	986
	987	=head2 Store the current pad in the OP slab allocator
	988
	989	=for clarification
	990	I hope that I got that "current pad" part correct
	991
	992	Currently we leak ops in various cases of parse failure. I suggested that we
	993	could solve this by always using the op slab allocator, and walking it to
	994	free ops. Dave comments that as some ops are already freed during optree
	995	creation one would have to mark which ops are freed, and not double free them
	996	when walking the slab. He notes that one problem with this is that for some ops
	997	you have to know which pad was current at the time of allocation, which does
	998	change. I suggested storing a pointer to the current pad in the memory allocated
	999	for the slab, and swapping to a new slab each time the pad changes. Dave thinks
	1000	that this would work.
	1001
	1002	=head2 repack the optree
	1003
	1004	Repacking the optree after execution order is determined could allow
	1005	removal of NULL ops, and optimal ordering of OPs with respect to cache-line
	1006	filling. The slab allocator could be reused for this purpose. I think that
	1007	the best way to do this is to make it an optional step just before the
	1008	completed optree is attached to anything else, and to use the slab allocator
	1009	unchanged, so that freeing ops is identical whether or not this step runs.
	1010	Note that the slab allocator allocates ops downwards in memory, so one would
	1011	have to actually "allocate" the ops in reverse-execution order to get them
	1012	contiguous in memory in execution order.
	1013
	1014	See http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
	1015
	1016	Note that running this copy, and then freeing all the old location ops would
	1017	cause their slabs to be freed, which would eliminate possible memory wastage if
	1018	the previous suggestion is implemented, and we swap slabs more frequently.
	1019
	1020	=head2 eliminate incorrect line numbers in warnings
	1021
	1022	This code
	1023
	1024	use warnings;
	1025	my $undef;
	1026
	1027	if ($undef == 3) {
	1028	} elsif ($undef == 0) {
	1029	}
	1030
	1031	used to produce this output:
	1032
	1033	Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
	1034	Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
	1035
	1036	where the line of the second warning was misreported - it should be line 5.
	1037	Rafael fixed this - the problem arose because there was no nextstate OP
	1038	between the execution of the C<if> and the C<elsif>, hence C<PL_curcop> still
	1039	reports that the currently executing line is line 4. The solution was to inject
	1040	a nextstate OPs for each C<elsif>, although it turned out that the nextstate
	1041	OP needed to be a nulled OP, rather than a live nextstate OP, else other line
	1042	numbers became misreported. (Jenga!)
	1043
	1044	The problem is more general than C<elsif> (although the C<elsif> case is the
	1045	most common and the most confusing). Ideally this code
	1046
	1047	use warnings;
	1048	my $undef;
	1049
	1050	my $a = $undef + 1;
	1051	my $b
	1052	= $undef
	1053	+ 1;
	1054
	1055	would produce this output
	1056
	1057	Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
	1058	Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
	1059
	1060	(rather than lines 4 and 5), but this would seem to require every OP to carry
	1061	(at least) line number information.
	1062
	1063	What might work is to have an optional line number in memory just before the
	1064	BASEOP structure, with a flag bit in the op to say whether it's present.
	1065	Initially during compile every OP would carry its line number. Then add a late
	1066	pass to the optimiser (potentially combined with L</repack the optree>) which
	1067	looks at the two ops on every edge of the graph of the execution path. If
	1068	the line number changes, flags the destination OP with this information.
	1069	Once all paths are traced, replace every op with the flag with a
	1070	nextstate-light op (that just updates C<PL_curcop>), which in turn then passes
	1071	control on to the true op. All ops would then be replaced by variants that
	1072	do not store the line number. (Which, logically, why it would work best in
	1073	conjunction with L</repack the optree>, as that is already copying/reallocating
	1074	all the OPs)
	1075
	1076	(Although I should note that we're not certain that doing this for the general
	1077	case is worth it)
	1078
	1079	=head2 optimize tail-calls
	1080
	1081	Tail-calls present an opportunity for broadly applicable optimization;
	1082	anywhere that C<< return foo(...) >> is called, the outer return can
	1083	be replaced by a goto, and foo will return directly to the outer
	1084	caller, saving (conservatively) 25% of perl's call&return cost, which
	1085	is relatively higher than in C. The scheme language is known to do
	1086	this heavily. B::Concise provides good insight into where this
	1087	optimization is possible, ie anywhere entersub,leavesub op-sequence
	1088	occurs.
	1089
	1090	perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
	1091
	1092	Bottom line on this is probably a new pp_tailcall function which
	1093	combines the code in pp_entersub, pp_leavesub. This should probably
	1094	be done 1st in XS, and using B::Generate to patch the new OP into the
	1095	optrees.
	1096
	1097	=head1 Big projects
	1098
	1099	Tasks that will get your name mentioned in the description of the "Highlights
	1100	of 5.12"
	1101
	1102	=head2 make ithreads more robust
	1103
	1104	Generally make ithreads more robust. See also L</iCOW>
	1105
	1106	This task is incremental - even a little bit of work on it will help, and
	1107	will be greatly appreciated.
	1108
	1109	One bit would be to write the missing code in sv.c:Perl_dirp_dup.
	1110
	1111	Fix Perl_sv_dup, et al so that threads can return objects.
	1112
	1113	=head2 iCOW
	1114
	1115	Sarathy and Arthur have a proposal for an improved Copy On Write which
	1116	specifically will be able to COW new ithreads. If this can be implemented
	1117	it would be a good thing.
	1118
	1119	=head2 (?{...}) closures in regexps
	1120
	1121	Fix (or rewrite) the implementation of the C</(?{...})/> closures.
	1122
	1123	=head2 A re-entrant regexp engine
	1124
	1125	This will allow the use of a regex from inside (?{ }), (??{ }) and
	1126	(?(?{ })\|) constructs.
	1127
	1128	=head2 Add class set operations to regexp engine
	1129
	1130	Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
	1131
	1132	demerphq has this on his todo list, but right at the bottom.