This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Remove a dead case from the error reporting in
[perl5.git] / pod / perltodo.pod
CommitLineData
7711098a
GS
1=head1 NAME
2
3perltodo - Perl TO-DO List
4
5=head1 DESCRIPTION
e50bb9a1 6
52960e22
JC
7This is a list of wishes for Perl. The tasks we think are smaller or
8easier are listed first. Anyone is welcome to work on any of these,
9but it's a good idea to first contact I<perl5-porters@perl.org> to
10avoid duplication of effort, and to learn from any previous attempts.
11By all means contact a pumpking privately first if you prefer.
e50bb9a1 12
0bdfc961
NC
13Whilst patches to make the list shorter are most welcome, ideas to add to
14the list are also encouraged. Check the perl5-porters archives for past
15ideas, and any discussion about them. One set of archives may be found at:
e50bb9a1 16
0bdfc961 17 http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
938c8732 18
617eabfa
NC
19What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
20not, but if your patch is incorporated, then we'll add your name to the
21F<AUTHORS> file, which ships in the official distribution. How many other
22programming languages offer you 1 line of immortality?
938c8732 23
0bdfc961 24=head1 Tasks that only need Perl knowledge
e50bb9a1 25
5a176cbc
NC
26=head2 Remove duplication of test setup.
27
28Schwern notes, that there's duplication of code - lots and lots of tests have
29some variation on the big block of C<$Is_Foo> checks. We can safely put this
30into a file, change it to build an C<%Is> hash and require it. Maybe just put
31it into F<test.pl>. Throw in the handy tainting subroutines.
32
412f19a0
NC
33=head2 merge common code in installperl and installman
34
35There are some common subroutines and a common C<BEGIN> block in F<installperl>
36and F<installman>. These should probably be merged. It would also be good to
37check for duplication in all the utility scripts supplied in the source
38tarball. It might be good to move them all to a subdirectory, but this would
39require careful checking to find all places that call them, and change those
40correctly.
41
0bdfc961 42=head2 common test code for timed bail out
e50bb9a1 43
0bdfc961
NC
44Write portable self destruct code for tests to stop them burning CPU in
45infinite loops. This needs to avoid using alarm, as some of the tests are
46testing alarm/sleep or timers.
e50bb9a1 47
87a942b1 48=head2 POD -E<gt> HTML conversion in the core still sucks
e50bb9a1 49
938c8732 50Which is crazy given just how simple POD purports to be, and how simple HTML
adebf063
NC
51can be. It's not actually I<as> simple as it sounds, particularly with the
52flexibility POD allows for C<=item>, but it would be good to improve the
53visual appeal of the HTML generated, and to avoid it having any validation
54errors. See also L</make HTML install work>, as the layout of installation tree
55is needed to improve the cross-linking.
938c8732 56
dc0fb092
SP
57The addition of C<Pod::Simple> and its related modules may make this task
58easier to complete.
59
8537f021
RGS
60=head2 merge checkpods and podchecker
61
62F<pod/checkpods.PL> (and C<make check> in the F<pod/> subdirectory)
63implements a very basic check for pod files, but the errors it discovers
64aren't found by podchecker. Add this check to podchecker, get rid of
65checkpods and have C<make check> use podchecker.
66
b032e2ff
RGS
67=head2 perlmodlib.PL rewrite
68
69Currently perlmodlib.PL needs to be run from a source directory where perl
70has been built, or some modules won't be found, and others will be
71skipped. Make it run from a clean perl source tree (so it's reproducible).
72
aa237293
NC
73=head2 Parallel testing
74
b2e2905c 75(This probably impacts much more than the core: also the Test::Harness
02f21748
RGS
76and TAP::* modules on CPAN.)
77
aa237293
NC
78The core regression test suite is getting ever more comprehensive, which has
79the side effect that it takes longer to run. This isn't so good. Investigate
80whether it would be feasible to give the harness script the B<option> of
81running sets of tests in parallel. This would be useful for tests in
82F<t/op/*.t> and F<t/uni/*.t> and maybe some sets of tests in F<lib/>.
83
84Questions to answer
85
86=over 4
87
88=item 1
89
90How does screen layout work when you're running more than one test?
91
92=item 2
93
94How does the caller of test specify how many tests to run in parallel?
95
96=item 3
97
98How do setup/teardown tests identify themselves?
99
100=back
101
102Pugs already does parallel testing - can their approach be re-used?
103
0bdfc961 104=head2 Make Schwern poorer
e50bb9a1 105
613bd4f7 106We should have tests for everything. When all the core's modules are tested,
0bdfc961
NC
107Schwern has promised to donate to $500 to TPF. We may need volunteers to
108hold him upside down and shake vigorously in order to actually extract the
109cash.
3958b146 110
0bdfc961 111=head2 Improve the coverage of the core tests
e50bb9a1 112
02f21748
RGS
113Use Devel::Cover to ascertain the core modules's test coverage, then add
114tests that are currently missing.
30222c0f 115
0bdfc961 116=head2 test B
e50bb9a1 117
0bdfc961 118A full test suite for the B module would be nice.
e50bb9a1 119
636e63cb
NC
120=head2 Deparse inlined constants
121
122Code such as this
123
124 use constant PI => 4;
125 warn PI
126
127will currently deparse as
128
129 use constant ('PI', 4);
130 warn 4;
131
132because the tokenizer inlines the value of the constant subroutine C<PI>.
133This allows various compile time optimisations, such as constant folding
134and dead code elimination. Where these haven't happened (such as the example
135above) it ought be possible to make B::Deparse work out the name of the
136original constant, because just enough information survives in the symbol
137table to do this. Specifically, the same scalar is used for the constant in
138the optree as is used for the constant subroutine, so by iterating over all
139symbol tables and generating a mapping of SV address to constant name, it
140would be possible to provide B::Deparse with this functionality.
141
0bdfc961 142=head2 A decent benchmark
e50bb9a1 143
617eabfa 144C<perlbench> seems impervious to any recent changes made to the perl core. It
0bdfc961
NC
145would be useful to have a reasonable general benchmarking suite that roughly
146represented what current perl programs do, and measurably reported whether
147tweaks to the core improve, degrade or don't really affect performance, to
148guide people attempting to optimise the guts of perl. Gisle would welcome
149new tests for perlbench.
6168cf99 150
0bdfc961 151=head2 fix tainting bugs
6168cf99 152
0bdfc961
NC
153Fix the bugs revealed by running the test suite with the C<-t> switch (via
154C<make test.taintwarn>).
e50bb9a1 155
0bdfc961 156=head2 Dual life everything
e50bb9a1 157
0bdfc961
NC
158As part of the "dists" plan, anything that doesn't belong in the smallest perl
159distribution needs to be dual lifed. Anything else can be too. Figure out what
160changes would be needed to package that module and its tests up for CPAN, and
161do so. Test it with older perl releases, and fix the problems you find.
e50bb9a1 162
a393eb28
RGS
163To make a minimal perl distribution, it's useful to look at
164F<t/lib/commonsense.t>.
165
0bdfc961 166=head2 Improving C<threads::shared>
722d2a37 167
0bdfc961
NC
168Investigate whether C<threads::shared> could share aggregates properly with
169only Perl level changes to shared.pm
722d2a37 170
0bdfc961 171=head2 POSIX memory footprint
e50bb9a1 172
0bdfc961
NC
173Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
174various times worked to cut it down. There is probably still fat to cut out -
175for example POSIX passes Exporter some very memory hungry data structures.
e50bb9a1 176
eed36644
NC
177=head2 embed.pl/makedef.pl
178
179There is a script F<embed.pl> that generates several header files to prefix
180all of Perl's symbols in a consistent way, to provide some semblance of
181namespace support in C<C>. Functions are declared in F<embed.fnc>, variables
907b3e23 182in F<interpvar.h>. Quite a few of the functions and variables
eed36644
NC
183are conditionally declared there, using C<#ifdef>. However, F<embed.pl>
184doesn't understand the C macros, so the rules about which symbols are present
185when is duplicated in F<makedef.pl>. Writing things twice is bad, m'kay.
186It would be good to teach C<embed.pl> to understand the conditional
187compilation, and hence remove the duplication, and the mistakes it has caused.
e50bb9a1 188
801de10e
NC
189=head2 use strict; and AutoLoad
190
191Currently if you write
192
193 package Whack;
194 use AutoLoader 'AUTOLOAD';
195 use strict;
196 1;
197 __END__
198 sub bloop {
199 print join (' ', No, strict, here), "!\n";
200 }
201
202then C<use strict;> isn't in force within the autoloaded subroutines. It would
203be more consistent (and less surprising) to arrange for all lexical pragmas
204in force at the __END__ block to be in force within each autoloaded subroutine.
205
773b3597
RGS
206There's a similar problem with SelfLoader.
207
0bdfc961 208=head1 Tasks that need a little sysadmin-type knowledge
e50bb9a1 209
0bdfc961
NC
210Or if you prefer, tasks that you would learn from, and broaden your skills
211base...
e50bb9a1 212
cd793d32 213=head2 make HTML install work
e50bb9a1 214
adebf063
NC
215There is an C<installhtml> target in the Makefile. It's marked as
216"experimental". It would be good to get this tested, make it work reliably, and
217remove the "experimental" tag. This would include
218
219=over 4
220
221=item 1
222
223Checking that cross linking between various parts of the documentation works.
224In particular that links work between the modules (files with POD in F<lib/>)
225and the core documentation (files in F<pod/>)
226
227=item 2
228
617eabfa
NC
229Work out how to split C<perlfunc> into chunks, preferably one per function
230group, preferably with general case code that could be used elsewhere.
231Challenges here are correctly identifying the groups of functions that go
232together, and making the right named external cross-links point to the right
233page. Things to be aware of are C<-X>, groups such as C<getpwnam> to
234C<endservent>, two or more C<=items> giving the different parameter lists, such
235as
adebf063
NC
236
237 =item substr EXPR,OFFSET,LENGTH,REPLACEMENT
adebf063 238 =item substr EXPR,OFFSET,LENGTH
adebf063
NC
239 =item substr EXPR,OFFSET
240
241and different parameter lists having different meanings. (eg C<select>)
242
243=back
3a89a73c 244
0bdfc961
NC
245=head2 compressed man pages
246
247Be able to install them. This would probably need a configure test to see how
248the system does compressed man pages (same directory/different directory?
249same filename/different filename), as well as tweaking the F<installman> script
250to compress as necessary.
251
30222c0f
NC
252=head2 Add a code coverage target to the Makefile
253
254Make it easy for anyone to run Devel::Cover on the core's tests. The steps
255to do this manually are roughly
256
257=over 4
258
259=item *
260
261do a normal C<Configure>, but include Devel::Cover as a module to install
262(see F<INSTALL> for how to do this)
263
264=item *
265
266 make perl
267
268=item *
269
270 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
271
272=item *
273
274Process the resulting Devel::Cover database
275
276=back
277
278This just give you the coverage of the F<.pm>s. To also get the C level
279coverage you need to
280
281=over 4
282
283=item *
284
285Additionally tell C<Configure> to use the appropriate C compiler flags for
286C<gcov>
287
288=item *
289
290 make perl.gcov
291
292(instead of C<make perl>)
293
294=item *
295
296After running the tests run C<gcov> to generate all the F<.gcov> files.
297(Including down in the subdirectories of F<ext/>
298
299=item *
300
301(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
302to get their stats into the cover_db directory.
303
304=item *
305
306Then process the Devel::Cover database
307
308=back
309
310It would be good to add a single switch to C<Configure> to specify that you
311wanted to perform perl level coverage, and another to specify C level
312coverage, and have C<Configure> and the F<Makefile> do all the right things
313automatically.
314
02f21748 315=head2 Make Config.pm cope with differences between built and installed perl
0bdfc961
NC
316
317Quite often vendors ship a perl binary compiled with their (pay-for)
318compilers. People install a free compiler, such as gcc. To work out how to
319build extensions, Perl interrogates C<%Config>, so in this situation
320C<%Config> describes compilers that aren't there, and extension building
321fails. This forces people into choosing between re-compiling perl themselves
322using the compiler they have, or only using modules that the vendor ships.
323
324It would be good to find a way teach C<Config.pm> about the installation setup,
325possibly involving probing at install time or later, so that the C<%Config> in
326a binary distribution better describes the installed machine, when the
327installed machine differs from the build machine in some significant way.
328
728f4ecd
NC
329=head2 linker specification files
330
331Some platforms mandate that you provide a list of a shared library's external
332symbols to the linker, so the core already has the infrastructure in place to
333do this for generating shared perl libraries. My understanding is that the
334GNU toolchain can accept an optional linker specification file, and restrict
335visibility just to symbols declared in that file. It would be good to extend
336F<makedef.pl> to support this format, and to provide a means within
337C<Configure> to enable it. This would allow Unix users to test that the
338export list is correct, and to build a perl that does not pollute the global
339namespace with private symbols.
340
a229ae3b
RGS
341=head2 Cross-compile support
342
343Currently C<Configure> understands C<-Dusecrosscompile> option. This option
344arranges for building C<miniperl> for TARGET machine, so this C<miniperl> is
345assumed then to be copied to TARGET machine and used as a replacement of full
346C<perl> executable.
347
d1307786 348This could be done little differently. Namely C<miniperl> should be built for
a229ae3b 349HOST and then full C<perl> with extensions should be compiled for TARGET.
d1307786 350This, however, might require extra trickery for %Config: we have one config
87a942b1
JH
351first for HOST and then another for TARGET. Tools like MakeMaker will be
352mightily confused. Having around two different types of executables and
353libraries (HOST and TARGET) makes life interesting for Makefiles and
354shell (and Perl) scripts. There is $Config{run}, normally empty, which
355can be used as an execution wrapper. Also note that in some
356cross-compilation/execution environments the HOST and the TARGET do
357not see the same filesystem(s), the $Config{run} may need to do some
358file/directory copying back and forth.
0bdfc961 359
8537f021
RGS
360=head2 roffitall
361
362Make F<pod/roffitall> be updated by F<pod/buildtoc>.
363
0bdfc961
NC
364=head1 Tasks that need a little C knowledge
365
366These tasks would need a little C knowledge, but don't need any specific
367background or experience with XS, or how the Perl interpreter works
368
3d826b29
NC
369=head2 Weed out needless PERL_UNUSED_ARG
370
371The C code uses the macro C<PERL_UNUSED_ARG> to stop compilers warning about
372unused arguments. Often the arguments can't be removed, as there is an
373external constraint that determines the prototype of the function, so this
374approach is valid. However, there are some cases where C<PERL_UNUSED_ARG>
375could be removed. Specifically
376
377=over 4
378
379=item *
380
381The prototypes of (nearly all) static functions can be changed
382
383=item *
384
385Unused arguments generated by short cut macros are wasteful - the short cut
386macro used can be changed.
387
388=back
389
fbf638cb
RGS
390=head2 Modernize the order of directories in @INC
391
392The way @INC is laid out by default, one cannot upgrade core (dual-life)
393modules without overwriting files. This causes problems for binary
3d14fd97
AD
394package builders. One possible proposal is laid out in this
395message:
396L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2002-04/msg02380.html>.
fbf638cb 397
bcbaa2d5
RGS
398=head2 -Duse32bit*
399
400Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
401On these systems, it might be the default compilation mode, and there
402is currently no guarantee that passing no use64bitall option to the
403Configure process will build a 32bit perl. Implementing -Duse32bit*
404options would be nice for perl 5.12.
405
0bdfc961 406=head2 Make it clear from -v if this is the exact official release
89007cb3 407
617eabfa
NC
408Currently perl from C<p4>/C<rsync> ships with a F<patchlevel.h> file that
409usually defines one local patch, of the form "MAINT12345" or "RC1". The output
410of perl -v doesn't report that a perl isn't an official release, and this
89007cb3 411information can get lost in bugs reports. Because of this, the minor version
fa11829f 412isn't bumped up until RC time, to minimise the possibility of versions of perl
89007cb3
NC
413escaping that believe themselves to be newer than they actually are.
414
415It would be useful to find an elegant way to have the "this is an interim
416maintenance release" or "this is a release candidate" in the terse -v output,
417and have it so that it's easy for the pumpking to remove this just as the
418release tarball is rolled up. This way the version pulled out of rsync would
419always say "I'm a development release" and it would be safe to bump the
420reported minor version as soon as a release ships, which would aid perl
421developers.
422
0bdfc961
NC
423This task is really about thinking of an elegant way to arrange the C source
424such that it's trivial for the Pumpking to flag "this is an official release"
425when making a tarball, yet leave the default source saying "I'm not the
426official release".
427
fee0a0f7 428=head2 Profile Perl - am I hot or not?
62403a3c 429
fee0a0f7
NC
430The Perl source code is stable enough that it makes sense to profile it,
431identify and optimise the hotspots. It would be good to measure the
432performance of the Perl interpreter using free tools such as cachegrind,
433gprof, and dtrace, and work to reduce the bottlenecks they reveal.
434
435As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops,
436the ops that are most commonly used. The idea is that by grouping them, their
437object code will be adjacent in the executable, so they have a greater chance
438of already being in the CPU cache (or swapped in) due to being near another op
439already in use.
62403a3c
NC
440
441Except that it's not clear if these really are the most commonly used ops. So
fee0a0f7
NC
442as part of exercising your skills with coverage and profiling tools you might
443want to determine what ops I<really> are the most commonly used. And in turn
444suggest evictions and promotions to achieve a better F<pp_hot.c>.
62403a3c 445
98fed0ad
NC
446=head2 Allocate OPs from arenas
447
448Currently all new OP structures are individually malloc()ed and free()d.
449All C<malloc> implementations have space overheads, and are now as fast as
450custom allocates so it would both use less memory and less CPU to allocate
451the various OP structures from arenas. The SV arena code can probably be
452re-used for this.
453
539f2c54
JC
454Note that Configuring perl with C<-Accflags=-DPL_OP_SLAB_ALLOC> will use
455Perl_Slab_alloc() to pack optrees into a contiguous block, which is
456probably superior to the use of OP arenas, esp. from a cache locality
457standpoint. See L<Profile Perl - am I hot or not?>.
458
a229ae3b 459=head2 Improve win32/wince.c
0bdfc961 460
a229ae3b 461Currently, numerous functions look virtually, if not completely,
02f21748 462identical in both C<win32/wince.c> and C<win32/win32.c> files, which can't
6d71adcd
NC
463be good.
464
c5b31784
SH
465=head2 Use secure CRT functions when building with VC8 on Win32
466
467Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the basis
468that they were "unsafe" and introduced differently named secure versions of
469them as replacements, e.g. instead of writing
470
471 FILE* f = fopen(__FILE__, "r");
472
473one should now write
474
475 FILE* f;
476 errno_t err = fopen_s(&f, __FILE__, "r");
477
478Currently, the warnings about these deprecations have been disabled by adding
479-D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to remove that
480warning suppressant and actually make use of the new secure CRT functions.
481
482There is also a similar issue with POSIX CRT function names like fileno having
483been deprecated in favour of ISO C++ conformant names like _fileno. These
26a6faa8 484warnings are also currently suppressed by adding -D_CRT_NONSTDC_NO_DEPRECATE. It
c5b31784
SH
485might be nice to do as Microsoft suggest here too, although, unlike the secure
486functions issue, there is presumably little or no benefit in this case.
487
038ae9a4
SH
488=head2 Fix POSIX::access() and chdir() on Win32
489
490These functions currently take no account of DACLs and therefore do not behave
491correctly in situations where access is restricted by DACLs (as opposed to the
492read-only attribute).
493
494Furthermore, POSIX::access() behaves differently for directories having the
495read-only attribute set depending on what CRT library is being used. For
496example, the _access() function in the VC6 and VC7 CRTs (wrongly) claim that
497such directories are not writable, whereas in fact all directories are writable
498unless access is denied by DACLs. (In the case of directories, the read-only
499attribute actually only means that the directory cannot be deleted.) This CRT
500bug is fixed in the VC8 and VC9 CRTs (but, of course, the directory may still
501not actually be writable if access is indeed denied by DACLs).
502
503For the chdir() issue, see ActiveState bug #74552:
504http://bugs.activestate.com/show_bug.cgi?id=74552
505
506Therefore, DACLs should be checked both for consistency across CRTs and for
507the correct answer.
508
509(Note that perl's -w operator should not be modified to check DACLs. It has
510been written so that it reflects the state of the read-only attribute, even
511for directories (whatever CRT is being used), for symmetry with chmod().)
512
16815324
NC
513=head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
514
515Maybe create a utility that checks after each libperl.a creation that
516none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets())
517ever creep back to libperl.a.
518
519 nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
520
521Note, of course, that this will only tell whether B<your> platform
522is using those naughty interfaces.
523
de96509d
JH
524=head2 -D_FORTIFY_SOURCE=2, -fstack-protector
525
526Recent glibcs support C<-D_FORTIFY_SOURCE=2> and recent gcc
527(4.1 onwards?) supports C<-fstack-protector>, both of which give
528protection against various kinds of buffer overflow problems.
529These should probably be used for compiling Perl whenever available,
530Configure and/or hints files should be adjusted to probe for the
531availability of these features and enable them as appropriate.
16815324 532
8964cfe0
NC
533=head2 Arenas for GPs? For MAGIC?
534
535C<struct gp> and C<struct magic> are both currently allocated by C<malloc>.
536It might be a speed or memory saving to change to using arenas. Or it might
537not. It would need some suitable benchmarking first. In particular, C<GP>s
538can probably be changed with minimal compatibility impact (probably nothing
539outside of the core, or even outside of F<gv.c> allocates them), but they
540probably aren't allocated/deallocated often enough for a speed saving. Whereas
541C<MAGIC> is allocated/deallocated more often, but in turn, is also something
542more externally visible, so changing the rules here may bite external code.
543
544
6d71adcd
NC
545=head1 Tasks that need a knowledge of XS
546
547These tasks would need C knowledge, and roughly the level of knowledge of
548the perl API that comes from writing modules that use XS to interface to
549C.
550
6d71adcd
NC
551=head2 autovivification
552
553Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
554
555This task is incremental - even a little bit of work on it will help.
556
557=head2 Unicode in Filenames
558
559chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
560opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
561system, truncate, unlink, utime, -X. All these could potentially accept
562Unicode filenames either as input or output (and in the case of system
563and qx Unicode in general, as input or output to/from the shell).
564Whether a filesystem - an operating system pair understands Unicode in
565filenames varies.
566
567Known combinations that have some level of understanding include
568Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
569OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
570create Unicode filenames, what forms of Unicode are accepted and used
571(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
572and so on, varies. Finding the right level of interfacing to Perl
573requires some thought. Remember that an OS does not implicate a
574filesystem.
575
576(The Windows -C command flag "wide API support" has been at least
577temporarily retired in 5.8.1, and the -C has been repurposed, see
578L<perlrun>.)
579
87a942b1
JH
580Most probably the right way to do this would be this:
581L</"Virtualize operating system access">.
582
6d71adcd
NC
583=head2 Unicode in %ENV
584
585Currently the %ENV entries are always byte strings.
87a942b1 586See L</"Virtualize operating system access">.
6d71adcd 587
1f2e7916
JD
588=head2 Unicode and glob()
589
590Currently glob patterns and filenames returned from File::Glob::glob()
87a942b1 591are always byte strings. See L</"Virtualize operating system access">.
1f2e7916 592
dbb0c492
RGS
593=head2 Unicode and lc/uc operators
594
595Some built-in operators (C<lc>, C<uc>, etc.) behave differently, based on
596what the internal encoding of their argument is. That should not be the
597case. Maybe add a pragma to switch behaviour.
598
6d71adcd
NC
599=head2 use less 'memory'
600
601Investigate trade offs to switch out perl's choices on memory usage.
602Particularly perl should be able to give memory back.
603
604This task is incremental - even a little bit of work on it will help.
605
606=head2 Re-implement C<:unique> in a way that is actually thread-safe
607
608The old implementation made bad assumptions on several levels. A good 90%
609solution might be just to make C<:unique> work to share the string buffer
610of SvPVs. That way large constant strings can be shared between ithreads,
611such as the configuration information in F<Config>.
612
613=head2 Make tainting consistent
614
615Tainting would be easier to use if it didn't take documented shortcuts and
616allow taint to "leak" everywhere within an expression.
617
618=head2 readpipe(LIST)
619
620system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
621running a shell. readpipe() (the function behind qx//) could be similarly
622extended.
623
6d71adcd
NC
624=head2 Audit the code for destruction ordering assumptions
625
626Change 25773 notes
627
628 /* Need to check SvMAGICAL, as during global destruction it may be that
629 AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
630 is now part of the linked list of SV heads, rather than pointing to
631 the original body. */
632 /* FIXME - audit the code for other bugs like this one. */
633
634adding the C<SvMAGICAL> check to
635
636 if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
637 MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
638
639Go through the core and look for similar assumptions that SVs have particular
640types, as all bets are off during global destruction.
641
749904bf
JH
642=head2 Extend PerlIO and PerlIO::Scalar
643
644PerlIO::Scalar doesn't know how to truncate(). Implementing this
645would require extending the PerlIO vtable.
646
647Similarly the PerlIO vtable doesn't know about formats (write()), or
648about stat(), or chmod()/chown(), utime(), or flock().
649
650(For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
651would mean.)
652
653PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
654opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
655readlink().
656
94da6c29
JH
657See also L</"Virtualize operating system access">.
658
3236f110
NC
659=head2 -C on the #! line
660
661It should be possible to make -C work correctly if found on the #! line,
662given that all perl command line options are strict ASCII, and -C changes
663only the interpretation of non-ASCII characters, and not for the script file
664handle. To make it work needs some investigation of the ordering of function
665calls during startup, and (by implication) a bit of tweaking of that order.
666
d6c1e11f
JH
667=head2 Organize error messages
668
669Perl's diagnostics (error messages, see L<perldiag>) could use
a8d0aeb9 670reorganizing and formalizing so that each error message has its
d6c1e11f
JH
671stable-for-all-eternity unique id, categorized by severity, type, and
672subsystem. (The error messages would be listed in a datafile outside
c4bd451b
CB
673of the Perl source code, and the source code would only refer to the
674messages by the id.) This clean-up and regularizing should apply
d6c1e11f
JH
675for all croak() messages.
676
677This would enable all sorts of things: easier translation/localization
678of the messages (though please do keep in mind the caveats of
679L<Locale::Maketext> about too straightforward approaches to
680translation), filtering by severity, and instead of grepping for a
681particular error message one could look for a stable error id. (Of
682course, changing the error messages by default would break all the
683existing software depending on some particular error message...)
684
685This kind of functionality is known as I<message catalogs>. Look for
686inspiration for example in the catgets() system, possibly even use it
687if available-- but B<only> if available, all platforms will B<not>
de96509d 688have catgets().
d6c1e11f
JH
689
690For the really pure at heart, consider extending this item to cover
691also the warning messages (see L<perllexwarn>, C<warnings.pl>).
3236f110 692
0bdfc961 693=head1 Tasks that need a knowledge of the interpreter
3298bd4d 694
0bdfc961
NC
695These tasks would need C knowledge, and knowledge of how the interpreter works,
696or a willingness to learn.
3298bd4d 697
718140ec
NC
698=head2 lexicals used only once
699
700This warns:
701
702 $ perl -we '$pie = 42'
703 Name "main::pie" used only once: possible typo at -e line 1.
704
705This does not:
706
707 $ perl -we 'my $pie = 42'
708
709Logically all lexicals used only once should warn, if the user asks for
d6f4ea2e
SP
710warnings. An unworked RT ticket (#5087) has been open for almost seven
711years for this discrepancy.
718140ec 712
a3d15f9a
RGS
713=head2 UTF-8 revamp
714
715The handling of Unicode is unclean in many places. For example, the regexp
716engine matches in Unicode semantics whenever the string or the pattern is
717flagged as UTF-8, but that should not be dependent on an internal storage
718detail of the string. Likewise, case folding behaviour is dependent on the
719UTF8 internal flag being on or off.
720
721=head2 Properly Unicode safe tokeniser and pads.
722
723The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
724variable names are stored in stashes as raw bytes, without the utf-8 flag
725set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
726tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
727source filters. All this could be fixed.
728
636e63cb
NC
729=head2 state variable initialization in list context
730
731Currently this is illegal:
732
733 state ($a, $b) = foo();
734
a2874905 735In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
a8d0aeb9 736semantics, which is tricky to implement in Perl 5 as currently they produce
a2874905 737the same opcode trees. The Perl 6 design is firm, so it would be good to
a8d0aeb9 738implement the necessary code in Perl 5. There are comments in
a2874905
NC
739C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
740constructions involving state variables.
636e63cb 741
4fedb12c
RGS
742=head2 Implement $value ~~ 0 .. $range
743
744It would be nice to extend the syntax of the C<~~> operator to also
745understand numeric (and maybe alphanumeric) ranges.
a393eb28
RGS
746
747=head2 A does() built-in
748
749Like ref(), only useful. It would call the C<DOES> method on objects; it
750would also tell whether something can be dereferenced as an
751array/hash/etc., or used as a regexp, etc.
752L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
753
754=head2 Tied filehandles and write() don't mix
755
756There is no method on tied filehandles to allow them to be called back by
757formats.
4fedb12c 758
d10fc472 759=head2 Attach/detach debugger from running program
1626a787 760
cd793d32
NC
761The old perltodo notes "With C<gdb>, you can attach the debugger to a running
762program if you pass the process ID. It would be good to do this with the Perl
0bdfc961
NC
763debugger on a running Perl program, although I'm not sure how it would be
764done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
1626a787 765
a8cb5b9e
RGS
766=head2 Optimize away empty destructors
767
768Defining an empty DESTROY method might be useful (notably in
769AUTOLOAD-enabled classes), but it's still a bit expensive to call. That
770could probably be optimized.
771
0bdfc961
NC
772=head2 LVALUE functions for lists
773
774The old perltodo notes that lvalue functions don't work for list or hash
775slices. This would be good to fix.
776
777=head2 LVALUE functions in the debugger
778
779The old perltodo notes that lvalue functions don't work in the debugger. This
780would be good to fix.
781
0bdfc961
NC
782=head2 regexp optimiser optional
783
784The regexp optimiser is not optional. It should configurable to be, to allow
785its performance to be measured, and its bugs to be easily demonstrated.
786
02f21748
RGS
787=head2 delete &function
788
789Allow to delete functions. One can already undef them, but they're still
790in the stash.
791
ef36c6a7
RGS
792=head2 C</w> regex modifier
793
794That flag would enable to match whole words, and also to interpolate
795arrays as alternations. With it, C</P/w> would be roughly equivalent to:
796
797 do { local $"='|'; /\b(?:P)\b/ }
798
799See L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
800for the discussion.
801
0bdfc961
NC
802=head2 optional optimizer
803
804Make the peephole optimizer optional. Currently it performs two tasks as
805it walks the optree - genuine peephole optimisations, and necessary fixups of
806ops. It would be good to find an efficient way to switch out the
807optimisations whilst keeping the fixups.
808
809=head2 You WANT *how* many
810
811Currently contexts are void, scalar and list. split has a special mechanism in
812place to pass in the number of return values wanted. It would be useful to
813have a general mechanism for this, backwards compatible and little speed hit.
814This would allow proposals such as short circuiting sort to be implemented
815as a module on CPAN.
816
817=head2 lexical aliases
818
819Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>.
820
821=head2 entersub XS vs Perl
822
823At the moment pp_entersub is huge, and has code to deal with entering both
824perl and XS subroutines. Subroutine implementations rarely change between
825perl and XS at run time, so investigate using 2 ops to enter subs (one for
826XS, one for perl) and swap between if a sub is redefined.
2810d901 827
de535794 828=head2 Self-ties
2810d901 829
de535794 830Self-ties are currently illegal because they caused too many segfaults. Maybe
a8d0aeb9 831the causes of these could be tracked down and self-ties on all types
de535794 832reinstated.
0bdfc961
NC
833
834=head2 Optimize away @_
835
836The old perltodo notes "Look at the "reification" code in C<av.c>".
837
f092b1f4
RGS
838=head2 The yada yada yada operators
839
840Perl 6's Synopsis 3 says:
841
842I<The ... operator is the "yada, yada, yada" list operator, which is used as
843the body in function prototypes. It complains bitterly (by calling fail)
844if it is ever executed. Variant ??? calls warn, and !!! calls die.>
845
846Those would be nice to add to Perl 5. That could be done without new ops.
847
87a942b1
JH
848=head2 Virtualize operating system access
849
850Implement a set of "vtables" that virtualizes operating system access
851(open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
852least these interfaces should take SVs as "name" arguments instead of
853bare char pointers; probably the most flexible and extensible way
e1a3d5d1
JH
854would be for the Perl-facing interfaces to accept HVs. The system
855needs to be per-operating-system and per-file-system
856hookable/filterable, preferably both from XS and Perl level
87a942b1
JH
857(L<perlport/"Files and Filesystems"> is good reading at this point,
858in fact, all of L<perlport> is.)
859
e1a3d5d1
JH
860This has actually already been implemented (but only for Win32),
861take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
862variants go through a set of "vtables" for operating system access,
863non-Win32 systems currently go straight for the POSIX/UNIX-style
864system/library call. Similar system as for Win32 should be
865implemented for all platforms. The existing Win32 implementation
866probably does not need to survive alongside this proposed new
867implementation, the approaches could be merged.
87a942b1
JH
868
869What would this give us? One often-asked-for feature this would
94da6c29
JH
870enable is using Unicode for filenames, and other "names" like %ENV,
871usernames, hostnames, and so forth.
872(See L<perlunicode/"When Unicode Does Not Happen">.)
873
874But this kind of virtualization would also allow for things like
875virtual filesystems, virtual networks, and "sandboxes" (though as long
876as dynamic loading of random object code is allowed, not very safe
877sandboxes since external code of course know not of Perl's vtables).
878An example of a smaller "sandbox" is that this feature can be used to
879implement per-thread working directories: Win32 already does this.
880
881See also L</"Extend PerlIO and PerlIO::Scalar">.
87a942b1 882
ac6197af
NC
883=head2 Investigate PADTMP hash pessimisation
884
885The peephole optimier converts constants used for hash key lookups to shared
886hash key scalars. Under ithreads, something is undoing this work. See
887See http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-09/msg00793.html
888
52960e22
JC
889=head2 repack the optree
890
891Repacking the optree after execution order is determined could allow
892removal of NULL ops, and optimal ordering of OPs wrt cache-line
893filling. The slab allocator could be reused for this purpose.
894
895http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html
896
897=head2 optimize tail-calls
898
899Tail-calls present an opportunity for broadly applicable optimization;
900anywhere that C<< return foo(...) >> is called, the outer return can
901be replaced by a goto, and foo will return directly to the outer
902caller, saving (conservatively) 25% of perl's call&return cost, which
903is relatively higher than in C. The scheme language is known to do
904this heavily. B::Concise provides good insight into where this
905optimization is possible, ie anywhere entersub,leavesub op-sequence
906occurs.
907
908 perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
909
910Bottom line on this is probably a new pp_tailcall function which
911combines the code in pp_entersub, pp_leavesub. This should probably
912be done 1st in XS, and using B::Generate to patch the new OP into the
913optrees.
914
0bdfc961
NC
915=head1 Big projects
916
917Tasks that will get your name mentioned in the description of the "Highlights
87a942b1 918of 5.12"
0bdfc961
NC
919
920=head2 make ithreads more robust
921
4e577f8b 922Generally make ithreads more robust. See also L</iCOW>
0bdfc961
NC
923
924This task is incremental - even a little bit of work on it will help, and
925will be greatly appreciated.
926
6c047da7
YST
927One bit would be to write the missing code in sv.c:Perl_dirp_dup.
928
59c7f7d5
RGS
929Fix Perl_sv_dup, et al so that threads can return objects.
930
0bdfc961
NC
931=head2 iCOW
932
933Sarathy and Arthur have a proposal for an improved Copy On Write which
934specifically will be able to COW new ithreads. If this can be implemented
935it would be a good thing.
936
937=head2 (?{...}) closures in regexps
938
939Fix (or rewrite) the implementation of the C</(?{...})/> closures.
940
941=head2 A re-entrant regexp engine
942
943This will allow the use of a regex from inside (?{ }), (??{ }) and
944(?(?{ })|) constructs.
6bda09f9 945
6bda09f9
YO
946=head2 Add class set operations to regexp engine
947
948Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
949
950demerphq has this on his todo list, but right at the bottom.