This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Extract _cmd_l_calc_initial_end_and_i .
[perl5.git] / Porting / todo.pod
CommitLineData
7711098a
GS
1=head1 NAME
2
c3143508 3todo - Perl TO-DO list
7711098a
GS
4
5=head1 DESCRIPTION
e50bb9a1 6
049aabcb 7This is a list of wishes for Perl. The most up to date version of this file
c3143508 8is at L<http://perl5.git.perl.org/perl.git/blob_plain/HEAD:/Porting/todo.pod>
049aabcb
NC
9
10The tasks we think are smaller or easier are listed first. Anyone is welcome
11to work on any of these, but it's a good idea to first contact
12I<perl5-porters@perl.org> to avoid duplication of effort, and to learn from
13any previous attempts. By all means contact a pumpking privately first if you
14prefer.
e50bb9a1 15
0bdfc961
NC
16Whilst patches to make the list shorter are most welcome, ideas to add to
17the list are also encouraged. Check the perl5-porters archives for past
b4af8972
RB
18ideas, and any discussion about them. One set of archives may be found at
19L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/>
938c8732 20
617eabfa
NC
21What can we offer you in return? Fame, fortune, and everlasting glory? Maybe
22not, but if your patch is incorporated, then we'll add your name to the
23F<AUTHORS> file, which ships in the official distribution. How many other
24programming languages offer you 1 line of immortality?
938c8732 25
0bdfc961 26=head1 Tasks that only need Perl knowledge
e50bb9a1 27
de2b17d8
NC
28=head2 Migrate t/ from custom TAP generation
29
30Many tests below F<t/> still generate TAP by "hand", rather than using library
96090e4f 31functions. As explained in L<perlhack/TESTING>, tests in F<t/> are
de2b17d8
NC
32written in a particular way to test that more complex constructions actually
33work before using them routinely. Hence they don't use C<Test::More>, but
34instead there is an intentionally simpler library, F<t/test.pl>. However,
35quite a few tests in F<t/> have not been refactored to use it. Refactoring
36any of these tests, one at a time, is a useful thing TODO.
37
0d8e5a42
RGS
38The subdirectories F<base>, F<cmd> and F<comp>, that contain the most
39basic tests, should be excluded from this task.
40
0be987a2
NC
41=head2 Automate perldelta generation
42
43The perldelta file accompanying each release summaries the major changes.
44It's mostly manually generated currently, but some of that could be
45automated with a bit of perl, specifically the generation of
46
47=over
48
49=item Modules and Pragmata
50
51=item New Documentation
52
53=item New Tests
54
55=back
56
57See F<Porting/how_to_write_a_perldelta.pod> for details.
58
0bdfc961 59=head2 Make Schwern poorer
e50bb9a1 60
613bd4f7 61We should have tests for everything. When all the core's modules are tested,
0bdfc961
NC
62Schwern has promised to donate to $500 to TPF. We may need volunteers to
63hold him upside down and shake vigorously in order to actually extract the
64cash.
3958b146 65
831efc5a
JK
66=head2 Write descriptions for all tests
67
68Many individual tests in the test suite lack descriptions (or names, or labels
69-- call them what you will). Many files completely lack descriptions, meaning
70that the only output you get is the test numbers. If all tests had
71descriptions, understanding what the tests are testing and why they sometimes
72fail would both get a whole lot easier.
73
0bdfc961 74=head2 Improve the coverage of the core tests
e50bb9a1 75
e1020413 76Use Devel::Cover to ascertain the core modules' test coverage, then add
02f21748 77tests that are currently missing.
30222c0f 78
0bdfc961 79=head2 test B
e50bb9a1 80
0bdfc961 81A full test suite for the B module would be nice.
e50bb9a1 82
0bdfc961 83=head2 A decent benchmark
e50bb9a1 84
617eabfa 85C<perlbench> seems impervious to any recent changes made to the perl core. It
0bdfc961
NC
86would be useful to have a reasonable general benchmarking suite that roughly
87represented what current perl programs do, and measurably reported whether
88tweaks to the core improve, degrade or don't really affect performance, to
89guide people attempting to optimise the guts of perl. Gisle would welcome
4e1c9055
NC
90new tests for perlbench. Steffen Schwingon would welcome help with
91L<Benchmark::Perl::Formance>
6168cf99 92
0bdfc961 93=head2 fix tainting bugs
6168cf99 94
0bdfc961
NC
95Fix the bugs revealed by running the test suite with the C<-t> switch (via
96C<make test.taintwarn>).
e50bb9a1 97
0bdfc961 98=head2 Dual life everything
e50bb9a1 99
0bdfc961
NC
100As part of the "dists" plan, anything that doesn't belong in the smallest perl
101distribution needs to be dual lifed. Anything else can be too. Figure out what
102changes would be needed to package that module and its tests up for CPAN, and
103do so. Test it with older perl releases, and fix the problems you find.
e50bb9a1 104
a393eb28
RGS
105To make a minimal perl distribution, it's useful to look at
106F<t/lib/commonsense.t>.
107
0bdfc961 108=head2 POSIX memory footprint
e50bb9a1 109
0bdfc961
NC
110Ilya observed that use POSIX; eats memory like there's no tomorrow, and at
111various times worked to cut it down. There is probably still fat to cut out -
112for example POSIX passes Exporter some very memory hungry data structures.
e50bb9a1 113
8c422da5
NC
114=head2 makedef.pl and conditional compilation
115
116The script F<makedef.pl> that generates the list of exported symbols on
117platforms which need this. Functions are declared in F<embed.fnc>, variables
118in F<intrpvar.h>. Quite a few of the functions and variables are conditionally
119declared there, using C<#ifdef>. However, F<makedef.pl> doesn't understand the
120C macros, so the rules about which symbols are present when is duplicated in
121the Perl code. Writing things twice is bad, m'kay. It would be good to teach
122F<.pl> to understand the conditional compilation, and hence remove the
123duplication, and the mistakes it has caused.
e50bb9a1 124
801de10e
NC
125=head2 use strict; and AutoLoad
126
127Currently if you write
128
129 package Whack;
130 use AutoLoader 'AUTOLOAD';
131 use strict;
132 1;
133 __END__
134 sub bloop {
135 print join (' ', No, strict, here), "!\n";
136 }
137
138then C<use strict;> isn't in force within the autoloaded subroutines. It would
139be more consistent (and less surprising) to arrange for all lexical pragmas
140in force at the __END__ block to be in force within each autoloaded subroutine.
141
773b3597
RGS
142There's a similar problem with SelfLoader.
143
91d0cbf6
NC
144=head2 profile installman
145
146The F<installman> script is slow. All it is doing text processing, which we're
147told is something Perl is good at. So it would be nice to know what it is doing
148that is taking so much CPU, and where possible address it.
149
c69ca1d4 150=head2 enable lexical enabling/disabling of individual warnings
a9ed9b74
JV
151
152Currently, warnings can only be enabled or disabled by category. There
153are times when it would be useful to quash a single warning, not a
154whole category.
91d0cbf6 155
85234543
KW
156=head2 document diagnostics
157
158Many diagnostic messages are not currently documented. The list is at the end
159of t/porting/diag.t.
160
0bdfc961 161=head1 Tasks that need a little sysadmin-type knowledge
e50bb9a1 162
0bdfc961
NC
163Or if you prefer, tasks that you would learn from, and broaden your skills
164base...
e50bb9a1 165
cd793d32 166=head2 make HTML install work
e50bb9a1 167
78b489b0 168There is an C<install.html> target in the Makefile. It's marked as
adebf063
NC
169"experimental". It would be good to get this tested, make it work reliably, and
170remove the "experimental" tag. This would include
171
172=over 4
173
174=item 1
175
176Checking that cross linking between various parts of the documentation works.
177In particular that links work between the modules (files with POD in F<lib/>)
178and the core documentation (files in F<pod/>)
179
180=item 2
181
78b489b0
NC
182Improving the code that split C<perlfunc> into chunks, preferably with
183general case code added to L<Pod::Functions> that could be used elsewhere.
184
617eabfa
NC
185Challenges here are correctly identifying the groups of functions that go
186together, and making the right named external cross-links point to the right
78b489b0
NC
187page. Currently this works reasonably well in the general case, and correctly
188parses two or more C<=items> giving the different parameter lists for the
189same function, such used by C<substr>. However it fails completely where
190I<different> functions are listed as a sequence of C<=items> but share the
191same description. All the functions from C<getpwnam> to C<endprotoent> have
192individual stub pages, with only the page for C<endservent> holding the
193description common to all. Likewise C<q>, C<qq> and C<qw> have stub pages,
194instead of sharing the body of C<qx>.
195
196Note also the current code isn't ideal with the two forms of C<select>, mushing
197them both into one F<select.html> with the two descriptions run together.
198Fixing this may well be a special case.
adebf063
NC
199
200=back
3a89a73c 201
0bdfc961
NC
202=head2 compressed man pages
203
204Be able to install them. This would probably need a configure test to see how
205the system does compressed man pages (same directory/different directory?
206same filename/different filename), as well as tweaking the F<installman> script
207to compress as necessary.
208
30222c0f
NC
209=head2 Add a code coverage target to the Makefile
210
211Make it easy for anyone to run Devel::Cover on the core's tests. The steps
212to do this manually are roughly
213
214=over 4
215
216=item *
217
218do a normal C<Configure>, but include Devel::Cover as a module to install
f11a3063 219(see L<INSTALL> for how to do this)
30222c0f
NC
220
221=item *
222
223 make perl
224
225=item *
226
227 cd t; HARNESS_PERL_SWITCHES=-MDevel::Cover ./perl -I../lib harness
228
229=item *
230
231Process the resulting Devel::Cover database
232
233=back
234
235This just give you the coverage of the F<.pm>s. To also get the C level
236coverage you need to
237
238=over 4
239
240=item *
241
242Additionally tell C<Configure> to use the appropriate C compiler flags for
243C<gcov>
244
245=item *
246
247 make perl.gcov
248
249(instead of C<make perl>)
250
251=item *
252
253After running the tests run C<gcov> to generate all the F<.gcov> files.
254(Including down in the subdirectories of F<ext/>
255
256=item *
257
258(From the top level perl directory) run C<gcov2perl> on all the C<.gcov> files
259to get their stats into the cover_db directory.
260
261=item *
262
263Then process the Devel::Cover database
264
265=back
266
267It would be good to add a single switch to C<Configure> to specify that you
268wanted to perform perl level coverage, and another to specify C level
269coverage, and have C<Configure> and the F<Makefile> do all the right things
270automatically.
271
02f21748 272=head2 Make Config.pm cope with differences between built and installed perl
0bdfc961
NC
273
274Quite often vendors ship a perl binary compiled with their (pay-for)
275compilers. People install a free compiler, such as gcc. To work out how to
276build extensions, Perl interrogates C<%Config>, so in this situation
277C<%Config> describes compilers that aren't there, and extension building
278fails. This forces people into choosing between re-compiling perl themselves
279using the compiler they have, or only using modules that the vendor ships.
280
281It would be good to find a way teach C<Config.pm> about the installation setup,
282possibly involving probing at install time or later, so that the C<%Config> in
283a binary distribution better describes the installed machine, when the
284installed machine differs from the build machine in some significant way.
285
728f4ecd
NC
286=head2 linker specification files
287
288Some platforms mandate that you provide a list of a shared library's external
289symbols to the linker, so the core already has the infrastructure in place to
4e1c9055
NC
290do this for generating shared perl libraries. Florian Ragwitz has been working
291to offer this for the GNU toolchain, to allow Unix users to test that the
728f4ecd 292export list is correct, and to build a perl that does not pollute the global
32d539f5 293namespace with private symbols, and will fail in the same way as msvc or mingw
4e1c9055 294builds or when using PERL_DL_NONLAZY=1. See the branch smoke-me/rafl/ld_export
728f4ecd 295
a229ae3b
RGS
296=head2 Cross-compile support
297
4e1c9055
NC
298We get requests for "how to cross compile Perl". The vast majority of these
299seem to be for a couple of scenarios:
300
301=over 4
302
303=item *
304
305Platforms that could build natively using F<./Configure> (I<e.g.> Linux or
306NetBSD on MIPS or ARM) but people want to use a beefier machine (and on the
307same OS) to build more easily.
308
309=item *
310
311Platforms that can't build natively, but no (significant) porting changes
312are needed to our current source code. Prime example of this is Android.
313
314=back
315
316There are several scripts and tools for cross-compiling perl for other
317platforms. However, these are somewhat inconsistent and scattered across the
318codebase, none are documented well, none are clearly flexible enough to
319be confident that they can support any TARGET/HOST plaform pair other than
320that which they were developed on, and it's not clear how bitrotted they are.
321
322For example, C<Configure> understands C<-Dusecrosscompile> option. This option
a229ae3b 323arranges for building C<miniperl> for TARGET machine, so this C<miniperl> is
4e1c9055
NC
324assumed then to be copied to TARGET machine and used as a replacement of
325full C<perl> executable. This code is almost 10 years old. Meanwhile, the
326F<Cross/> directory contains two different approaches for cross compiling to
327ARM Linux targets, relying on hand curated F<config.sh> files, but that code
328is getting on for 5 years old, and requires insider knowledge of perl's
329build system to draft a F<config.sh> for a new platform.
330
331Jess Robinson has sumbitted a grant to TPF to work on cleaning this up.
0bdfc961 332
98fca0e8
NC
333=head2 Split "linker" from "compiler"
334
335Right now, Configure probes for two commands, and sets two variables:
336
337=over 4
338
b91dd380 339=item * C<cc> (in F<cc.U>)
98fca0e8
NC
340
341This variable holds the name of a command to execute a C compiler which
342can resolve multiple global references that happen to have the same
343name. Usual values are F<cc> and F<gcc>.
344Fervent ANSI compilers may be called F<c89>. AIX has F<xlc>.
345
b91dd380 346=item * C<ld> (in F<dlsrc.U>)
98fca0e8
NC
347
348This variable indicates the program to be used to link
349libraries for dynamic loading. On some systems, it is F<ld>.
350On ELF systems, it should be C<$cc>. Mostly, we'll try to respect
351the hint file setting.
352
353=back
354
8d159ec1
NC
355There is an implicit historical assumption from around Perl5.000alpha
356something, that C<$cc> is also the correct command for linking object files
357together to make an executable. This may be true on Unix, but it's not true
358on other platforms, and there are a maze of work arounds in other places (such
359as F<Makefile.SH>) to cope with this.
98fca0e8
NC
360
361Ideally, we should create a new variable to hold the name of the executable
362linker program, probe for it in F<Configure>, and centralise all the special
363case logic there or in hints files.
364
365A small bikeshed issue remains - what to call it, given that C<$ld> is already
8d159ec1
NC
366taken (arguably for the wrong thing now, but on SunOS 4.1 it is the command
367for creating dynamically-loadable modules) and C<$link> could be confused with
368the Unix command line executable of the same name, which does something
369completely different. Andy Dougherty makes the counter argument "In parrot, I
370tried to call the command used to link object files and libraries into an
371executable F<link>, since that's what my vaguely-remembered DOS and VMS
372experience suggested. I don't think any real confusion has ensued, so it's
373probably a reasonable name for perl5 to use."
98fca0e8
NC
374
375"Alas, I've always worried that introducing it would make things worse,
376since now the module building utilities would have to look for
377C<$Config{link}> and institute a fall-back plan if it weren't found."
8d159ec1
NC
378Although I can see that as confusing, given that C<$Config{d_link}> is true
379when (hard) links are available.
98fca0e8 380
75585ce3
SP
381=head2 Configure Windows using PowerShell
382
383Currently, Windows uses hard-coded config files based to build the
384config.h for compiling Perl. Makefiles are also hard-coded and need to be
385hand edited prior to building Perl. While this makes it easy to create a perl.exe
386that works across multiple Windows versions, being able to accurately
387configure a perl.exe for a specific Windows versions and VS C++ would be
388a nice enhancement. With PowerShell available on Windows XP and up, this
389may now be possible. Step 1 might be to investigate whether this is possible
390and use this to clean up our current makefile situation. Step 2 would be to
391see if there would be a way to use our existing metaconfig units to configure a
392Windows Perl or whether we go in a separate direction and make it so. Of
393course, we all know what step 3 is.
394
0bdfc961
NC
395=head1 Tasks that need a little C knowledge
396
397These tasks would need a little C knowledge, but don't need any specific
398background or experience with XS, or how the Perl interpreter works
399
3d826b29
NC
400=head2 Weed out needless PERL_UNUSED_ARG
401
402The C code uses the macro C<PERL_UNUSED_ARG> to stop compilers warning about
403unused arguments. Often the arguments can't be removed, as there is an
404external constraint that determines the prototype of the function, so this
405approach is valid. However, there are some cases where C<PERL_UNUSED_ARG>
406could be removed. Specifically
407
408=over 4
409
410=item *
411
412The prototypes of (nearly all) static functions can be changed
413
414=item *
415
416Unused arguments generated by short cut macros are wasteful - the short cut
417macro used can be changed.
418
419=back
420
bcbaa2d5
RGS
421=head2 -Duse32bit*
422
423Natively 64-bit systems need neither -Duse64bitint nor -Duse64bitall.
424On these systems, it might be the default compilation mode, and there
425is currently no guarantee that passing no use64bitall option to the
426Configure process will build a 32bit perl. Implementing -Duse32bit*
e12cb30b 427options would be nice for perl 5.18.0.
bcbaa2d5 428
fee0a0f7 429=head2 Profile Perl - am I hot or not?
62403a3c 430
fee0a0f7
NC
431The Perl source code is stable enough that it makes sense to profile it,
432identify and optimise the hotspots. It would be good to measure the
433performance of the Perl interpreter using free tools such as cachegrind,
434gprof, and dtrace, and work to reduce the bottlenecks they reveal.
435
436As part of this, the idea of F<pp_hot.c> is that it contains the I<hot> ops,
437the ops that are most commonly used. The idea is that by grouping them, their
438object code will be adjacent in the executable, so they have a greater chance
439of already being in the CPU cache (or swapped in) due to being near another op
440already in use.
62403a3c
NC
441
442Except that it's not clear if these really are the most commonly used ops. So
fee0a0f7
NC
443as part of exercising your skills with coverage and profiling tools you might
444want to determine what ops I<really> are the most commonly used. And in turn
445suggest evictions and promotions to achieve a better F<pp_hot.c>.
62403a3c 446
91d0cbf6
NC
447One piece of Perl code that might make a good testbed is F<installman>.
448
a229ae3b 449=head2 Improve win32/wince.c
0bdfc961 450
a229ae3b 451Currently, numerous functions look virtually, if not completely,
c23989d1 452identical in both F<win32/wince.c> and F<win32/win32.c> files, which can't
6d71adcd
NC
453be good.
454
c5b31784
SH
455=head2 Use secure CRT functions when building with VC8 on Win32
456
457Visual C++ 2005 (VC++ 8.x) deprecated a number of CRT functions on the basis
458that they were "unsafe" and introduced differently named secure versions of
459them as replacements, e.g. instead of writing
460
461 FILE* f = fopen(__FILE__, "r");
462
463one should now write
464
465 FILE* f;
466 errno_t err = fopen_s(&f, __FILE__, "r");
467
468Currently, the warnings about these deprecations have been disabled by adding
469-D_CRT_SECURE_NO_DEPRECATE to the CFLAGS. It would be nice to remove that
470warning suppressant and actually make use of the new secure CRT functions.
471
472There is also a similar issue with POSIX CRT function names like fileno having
473been deprecated in favour of ISO C++ conformant names like _fileno. These
26a6faa8 474warnings are also currently suppressed by adding -D_CRT_NONSTDC_NO_DEPRECATE. It
c5b31784
SH
475might be nice to do as Microsoft suggest here too, although, unlike the secure
476functions issue, there is presumably little or no benefit in this case.
477
038ae9a4
SH
478=head2 Fix POSIX::access() and chdir() on Win32
479
480These functions currently take no account of DACLs and therefore do not behave
481correctly in situations where access is restricted by DACLs (as opposed to the
482read-only attribute).
483
484Furthermore, POSIX::access() behaves differently for directories having the
485read-only attribute set depending on what CRT library is being used. For
486example, the _access() function in the VC6 and VC7 CRTs (wrongly) claim that
487such directories are not writable, whereas in fact all directories are writable
488unless access is denied by DACLs. (In the case of directories, the read-only
489attribute actually only means that the directory cannot be deleted.) This CRT
490bug is fixed in the VC8 and VC9 CRTs (but, of course, the directory may still
491not actually be writable if access is indeed denied by DACLs).
492
493For the chdir() issue, see ActiveState bug #74552:
b4af8972 494L<http://bugs.activestate.com/show_bug.cgi?id=74552>
038ae9a4
SH
495
496Therefore, DACLs should be checked both for consistency across CRTs and for
497the correct answer.
498
499(Note that perl's -w operator should not be modified to check DACLs. It has
500been written so that it reflects the state of the read-only attribute, even
501for directories (whatever CRT is being used), for symmetry with chmod().)
502
16815324
NC
503=head2 strcat(), strcpy(), strncat(), strncpy(), sprintf(), vsprintf()
504
505Maybe create a utility that checks after each libperl.a creation that
506none of the above (nor sprintf(), vsprintf(), or *SHUDDER* gets())
507ever creep back to libperl.a.
508
509 nm libperl.a | ./miniperl -alne '$o = $F[0] if /:$/; print "$o $F[1]" if $F[0] eq "U" && $F[1] =~ /^(?:strn?c(?:at|py)|v?sprintf|gets)$/'
510
511Note, of course, that this will only tell whether B<your> platform
512is using those naughty interfaces.
513
2a930eea 514=head2 -D_FORTIFY_SOURCE=2
de96509d 515
2a930eea 516Recent glibcs support C<-D_FORTIFY_SOURCE=2> which gives
de96509d 517protection against various kinds of buffer overflow problems.
2a930eea 518It should probably be used for compiling Perl whenever available,
de96509d 519Configure and/or hints files should be adjusted to probe for the
2a930eea 520availability of these feature and enable it as appropriate.
16815324 521
8964cfe0
NC
522=head2 Arenas for GPs? For MAGIC?
523
524C<struct gp> and C<struct magic> are both currently allocated by C<malloc>.
525It might be a speed or memory saving to change to using arenas. Or it might
526not. It would need some suitable benchmarking first. In particular, C<GP>s
527can probably be changed with minimal compatibility impact (probably nothing
528outside of the core, or even outside of F<gv.c> allocates them), but they
529probably aren't allocated/deallocated often enough for a speed saving. Whereas
530C<MAGIC> is allocated/deallocated more often, but in turn, is also something
531more externally visible, so changing the rules here may bite external code.
532
3880c8ec
NC
533=head2 Shared arenas
534
535Several SV body structs are now the same size, notably PVMG and PVGV, PVAV and
536PVHV, and PVCV and PVFM. It should be possible to allocate and return same
537sized bodies from the same actual arena, rather than maintaining one arena for
538each. This could save 4-6K per thread, of memory no longer tied up in the
539not-yet-allocated part of an arena.
540
8964cfe0 541
6d71adcd
NC
542=head1 Tasks that need a knowledge of XS
543
544These tasks would need C knowledge, and roughly the level of knowledge of
545the perl API that comes from writing modules that use XS to interface to
546C.
547
e851c105
DG
548=head2 Write an XS cookbook
549
550Create pod/perlxscookbook.pod with short, task-focused 'recipes' in XS that
551demonstrate common tasks and good practices. (Some of these might be
552extracted from perlguts.) The target audience should be XS novices, who need
553more examples than perlguts but something less overwhelming than perlapi.
554Recipes should provide "one pretty good way to do it" instead of TIMTOWTDI.
555
5b7d14ff
DG
556Rather than focusing on interfacing Perl to C libraries, such a cookbook
557should probably focus on how to optimize Perl routines by re-writing them
558in XS. This will likely be more motivating to those who mostly work in
559Perl but are looking to take the next step into XS.
560
561Deconstructing and explaining some simpler XS modules could be one way to
562bootstrap a cookbook. (List::Util? Class::XSAccessor? Tree::Ternary_XS?)
563Another option could be deconstructing the implementation of some simpler
564functions in op.c.
565
0b162fb0 566=head2 Document how XSUBs can use C<cv_set_call_checker> to inline themselves as OPs
05fb4e20
NC
567
568For a simple XSUB, often the subroutine dispatch takes more time than the
0b162fb0
NC
569XSUB itself. v5.14.0 now allows XSUBs to register a function which will be
570called when the parser is finished building an C<entersub> op which calls
571them.
572
573Registration is done with C<Perl_cv_set_call_checker>, is documented at the
574API level in L<perlapi>, and L<perl5140delta/Custom per-subroutine check hooks>
575notes that it can be used to inline a subroutine, by replacing it with a
576custom op. However there is no further detail of the code needed to do this.
577It would be useful to add one or more annotated examples of how to create
578XSUBs that inline.
579
580This should provide a measurable speed up to simple XSUBs inside
05fb4e20
NC
581tight loops. Initially one would have to write the OP alternative
582implementation by hand, but it's likely that this should be reasonably
583straightforward for the type of XSUB that would benefit the most. Longer
584term, once the run-time implementation is proven, it should be possible to
585progressively update ExtUtils::ParseXS to generate OP implementations for
586some XSUBs.
587
318bf708
NC
588=head2 Remove the use of SVs as temporaries in dump.c
589
590F<dump.c> contains debugging routines to dump out the contains of perl data
591structures, such as C<SV>s, C<AV>s and C<HV>s. Currently, the dumping code
592B<uses> C<SV>s for its temporary buffers, which was a logical initial
593implementation choice, as they provide ready made memory handling.
594
595However, they also lead to a lot of confusion when it happens that what you're
596trying to debug is seen by the code in F<dump.c>, correctly or incorrectly, as
597a temporary scalar it can use for a temporary buffer. It's also not possible
598to dump scalars before the interpreter is properly set up, such as during
599ithreads cloning. It would be good to progressively replace the use of scalars
600as string accumulation buffers with something much simpler, directly allocated
601by C<malloc>. The F<dump.c> code is (or should be) only producing 7 bit
602US-ASCII, so output character sets are not an issue.
603
604Producing and proving an internal simple buffer allocation would make it easier
605to re-write the internals of the PerlIO subsystem to avoid using C<SV>s for
606B<its> buffers, use of which can cause problems similar to those of F<dump.c>,
607at similar times.
608
5d96f598
NC
609=head2 safely supporting POSIX SA_SIGINFO
610
611Some years ago Jarkko supplied patches to provide support for the POSIX
612SA_SIGINFO feature in Perl, passing the extra data to the Perl signal handler.
613
614Unfortunately, it only works with "unsafe" signals, because under safe
615signals, by the time Perl gets to run the signal handler, the extra
616information has been lost. Moreover, it's not easy to store it somewhere,
617as you can't call mutexs, or do anything else fancy, from inside a signal
618handler.
619
620So it strikes me that we could provide safe SA_SIGINFO support
621
622=over 4
623
624=item 1
625
626Provide global variables for two file descriptors
627
628=item 2
629
630When the first request is made via C<sigaction> for C<SA_SIGINFO>, create a
631pipe, store the reader in one, the writer in the other
632
633=item 3
634
635In the "safe" signal handler (C<Perl_csighandler()>/C<S_raise_signal()>), if
636the C<siginfo_t> pointer non-C<NULL>, and the writer file handle is open,
637
638=over 8
639
640=item 1
641
642serialise signal number, C<struct siginfo_t> (or at least the parts we care
643about) into a small auto char buff
644
645=item 2
646
647C<write()> that (non-blocking) to the writer fd
648
649=over 12
650
651=item 1
652
653if it writes 100%, flag the signal in a counter of "signals on the pipe" akin
654to the current per-signal-number counts
655
656=item 2
657
658if it writes 0%, assume the pipe is full. Flag the data as lost?
659
660=item 3
661
662if it writes partially, croak a panic, as your OS is broken.
663
664=back
665
666=back
667
668=item 4
669
670in the regular C<PERL_ASYNC_CHECK()> processing, if there are "signals on
671the pipe", read the data out, deserialise, build the Perl structures on
672the stack (code in C<Perl_sighandler()>, the "unsafe" handler), and call as
673usual.
674
675=back
676
677I think that this gets us decent C<SA_SIGINFO> support, without the current risk
678of running Perl code inside the signal handler context. (With all the dangers
679of things like C<malloc> corruption that that currently offers us)
680
681For more information see the thread starting with this message:
b4af8972 682L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-03/msg00305.html>
5d96f598 683
6d71adcd
NC
684=head2 autovivification
685
686Make all autovivification consistent w.r.t LVALUE/RVALUE and strict/no strict;
687
688This task is incremental - even a little bit of work on it will help.
689
690=head2 Unicode in Filenames
691
692chdir, chmod, chown, chroot, exec, glob, link, lstat, mkdir, open,
693opendir, qx, readdir, readlink, rename, rmdir, stat, symlink, sysopen,
694system, truncate, unlink, utime, -X. All these could potentially accept
695Unicode filenames either as input or output (and in the case of system
696and qx Unicode in general, as input or output to/from the shell).
697Whether a filesystem - an operating system pair understands Unicode in
698filenames varies.
699
700Known combinations that have some level of understanding include
701Microsoft NTFS, Apple HFS+ (In Mac OS 9 and X) and Apple UFS (in Mac
702OS X), NFS v4 is rumored to be Unicode, and of course Plan 9. How to
703create Unicode filenames, what forms of Unicode are accepted and used
704(UCS-2, UTF-16, UTF-8), what (if any) is the normalization form used,
705and so on, varies. Finding the right level of interfacing to Perl
706requires some thought. Remember that an OS does not implicate a
707filesystem.
708
709(The Windows -C command flag "wide API support" has been at least
710temporarily retired in 5.8.1, and the -C has been repurposed, see
711L<perlrun>.)
712
87a942b1
JH
713Most probably the right way to do this would be this:
714L</"Virtualize operating system access">.
715
6d71adcd
NC
716=head2 Unicode in %ENV
717
718Currently the %ENV entries are always byte strings.
87a942b1 719See L</"Virtualize operating system access">.
6d71adcd 720
799c141b
SH
721(See RT ticket #113536 for information on Win32's handling of %ENV,
722which was fixed to work with native ANSI codepage characters in the
723environment, but still doesn't work with other characters outside of
724that codepage present in the environment.)
725
1f2e7916
JD
726=head2 Unicode and glob()
727
728Currently glob patterns and filenames returned from File::Glob::glob()
87a942b1 729are always byte strings. See L</"Virtualize operating system access">.
1f2e7916 730
6d71adcd
NC
731=head2 use less 'memory'
732
733Investigate trade offs to switch out perl's choices on memory usage.
734Particularly perl should be able to give memory back.
735
736This task is incremental - even a little bit of work on it will help.
737
738=head2 Re-implement C<:unique> in a way that is actually thread-safe
739
740The old implementation made bad assumptions on several levels. A good 90%
741solution might be just to make C<:unique> work to share the string buffer
742of SvPVs. That way large constant strings can be shared between ithreads,
743such as the configuration information in F<Config>.
744
745=head2 Make tainting consistent
746
747Tainting would be easier to use if it didn't take documented shortcuts and
748allow taint to "leak" everywhere within an expression.
749
750=head2 readpipe(LIST)
751
752system() accepts a LIST syntax (and a PROGRAM LIST syntax) to avoid
753running a shell. readpipe() (the function behind qx//) could be similarly
754extended.
755
6d71adcd
NC
756=head2 Audit the code for destruction ordering assumptions
757
758Change 25773 notes
759
760 /* Need to check SvMAGICAL, as during global destruction it may be that
761 AvARYLEN(av) has been freed before av, and hence the SvANY() pointer
762 is now part of the linked list of SV heads, rather than pointing to
763 the original body. */
764 /* FIXME - audit the code for other bugs like this one. */
765
766adding the C<SvMAGICAL> check to
767
768 if (AvARYLEN(av) && SvMAGICAL(AvARYLEN(av))) {
769 MAGIC *mg = mg_find (AvARYLEN(av), PERL_MAGIC_arylen);
770
771Go through the core and look for similar assumptions that SVs have particular
772types, as all bets are off during global destruction.
773
749904bf
JH
774=head2 Extend PerlIO and PerlIO::Scalar
775
776PerlIO::Scalar doesn't know how to truncate(). Implementing this
777would require extending the PerlIO vtable.
778
779Similarly the PerlIO vtable doesn't know about formats (write()), or
780about stat(), or chmod()/chown(), utime(), or flock().
781
782(For PerlIO::Scalar it's hard to see what e.g. mode bits or ownership
783would mean.)
784
785PerlIO doesn't do directories or symlinks, either: mkdir(), rmdir(),
786opendir(), closedir(), seekdir(), rewinddir(), glob(); symlink(),
787readlink().
788
94da6c29
JH
789See also L</"Virtualize operating system access">.
790
d6c1e11f
JH
791=head2 Organize error messages
792
793Perl's diagnostics (error messages, see L<perldiag>) could use
a8d0aeb9 794reorganizing and formalizing so that each error message has its
d6c1e11f
JH
795stable-for-all-eternity unique id, categorized by severity, type, and
796subsystem. (The error messages would be listed in a datafile outside
c4bd451b
CB
797of the Perl source code, and the source code would only refer to the
798messages by the id.) This clean-up and regularizing should apply
d6c1e11f
JH
799for all croak() messages.
800
801This would enable all sorts of things: easier translation/localization
802of the messages (though please do keep in mind the caveats of
803L<Locale::Maketext> about too straightforward approaches to
804translation), filtering by severity, and instead of grepping for a
805particular error message one could look for a stable error id. (Of
806course, changing the error messages by default would break all the
807existing software depending on some particular error message...)
808
809This kind of functionality is known as I<message catalogs>. Look for
810inspiration for example in the catgets() system, possibly even use it
811if available-- but B<only> if available, all platforms will B<not>
de96509d 812have catgets().
d6c1e11f
JH
813
814For the really pure at heart, consider extending this item to cover
815also the warning messages (see L<perllexwarn>, C<warnings.pl>).
3236f110 816
0bdfc961 817=head1 Tasks that need a knowledge of the interpreter
3298bd4d 818
0bdfc961
NC
819These tasks would need C knowledge, and knowledge of how the interpreter works,
820or a willingness to learn.
3298bd4d 821
10517af5
JD
822=head2 forbid labels with keyword names
823
824Currently C<goto keyword> "computes" the label value:
825
826 $ perl -e 'goto print'
827 Can't find label 1 at -e line 1.
828
343c8006
JD
829It is controversial if the right way to avoid the confusion is to forbid
830labels with keyword names, or if it would be better to always treat
831bareword expressions after a "goto" as a label and never as a keyword.
10517af5 832
de6375e3
RGS
833=head2 truncate() prototype
834
835The prototype of truncate() is currently C<$$>. It should probably
836be C<*$> instead. (This is changed in F<opcode.pl>)
837
565590b5
NC
838=head2 error reporting of [$a ; $b]
839
840Using C<;> inside brackets is a syntax error, and we don't propose to change
841that by giving it any meaning. However, it's not reported very helpfully:
842
843 $ perl -e '$a = [$b; $c];'
844 syntax error at -e line 1, near "$b;"
845 syntax error at -e line 1, near "$c]"
846 Execution of -e aborted due to compilation errors.
847
848It should be possible to hook into the tokeniser or the lexer, so that when a
849C<;> is parsed where it is not legal as a statement terminator (ie inside
850C<{}> used as a hashref, C<[]> or C<()>) it issues an error something like
851I<';' isn't legal inside an expression - if you need multiple statements use a
852do {...} block>. See the thread starting at
b4af8972 853L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-09/msg00573.html>
565590b5 854
e053a921
RS
855=head2 strict as warnings
856
857See L<http://markmail.org/message/vbrupaslr3bybmvk>, where Josua ben Jore
858writes: I've been of the opinion that everything strict.pm does ought to be
859able to considered just warnings that have been promoted to 'FATAL'.
860
718140ec
NC
861=head2 lexicals used only once
862
863This warns:
864
865 $ perl -we '$pie = 42'
866 Name "main::pie" used only once: possible typo at -e line 1.
867
868This does not:
869
870 $ perl -we 'my $pie = 42'
871
872Logically all lexicals used only once should warn, if the user asks for
d6f4ea2e
SP
873warnings. An unworked RT ticket (#5087) has been open for almost seven
874years for this discrepancy.
718140ec 875
a3d15f9a
RGS
876=head2 UTF-8 revamp
877
85c006b6
KW
878The handling of Unicode is unclean in many places. In the regex engine
879there are especially many problems. The swash data structure could be
880replaced my something better. Inversion lists and maps are likely
881candidates. The whole Unicode database could be placed in-core for a
882huge speed-up. Only minimal work was done on the optimizer when utf8
883was added, with the result that the synthetic start class often will
884fail to narrow down the possible choices when given non-Latin1 input.
4e1c9055 885Karl Williamson has been working on this - talk to him.
a3d15f9a 886
636e63cb
NC
887=head2 state variable initialization in list context
888
889Currently this is illegal:
890
891 state ($a, $b) = foo();
892
a2874905 893In Perl 6, C<state ($a) = foo();> and C<(state $a) = foo();> have different
a8d0aeb9 894semantics, which is tricky to implement in Perl 5 as currently they produce
a2874905 895the same opcode trees. The Perl 6 design is firm, so it would be good to
a8d0aeb9 896implement the necessary code in Perl 5. There are comments in
a2874905
NC
897C<Perl_newASSIGNOP()> that show the code paths taken by various assignment
898constructions involving state variables.
636e63cb 899
a393eb28
RGS
900=head2 A does() built-in
901
902Like ref(), only useful. It would call the C<DOES> method on objects; it
903would also tell whether something can be dereferenced as an
904array/hash/etc., or used as a regexp, etc.
905L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-03/msg00481.html>
906
907=head2 Tied filehandles and write() don't mix
908
909There is no method on tied filehandles to allow them to be called back by
910formats.
4fedb12c 911
53967bb9
RGS
912=head2 Propagate compilation hints to the debugger
913
914Currently a debugger started with -dE on the command-line doesn't see the
915features enabled by -E. More generally hints (C<$^H> and C<%^H>) aren't
916propagated to the debugger. Probably it would be a good thing to propagate
917hints from the innermost non-C<DB::> scope: this would make code eval'ed
918in the debugger see the features (and strictures, etc.) currently in
919scope.
920
d10fc472 921=head2 Attach/detach debugger from running program
1626a787 922
cd793d32
NC
923The old perltodo notes "With C<gdb>, you can attach the debugger to a running
924program if you pass the process ID. It would be good to do this with the Perl
0bdfc961
NC
925debugger on a running Perl program, although I'm not sure how it would be
926done." ssh and screen do this with named pipes in /tmp. Maybe we can too.
1626a787 927
0bdfc961
NC
928=head2 LVALUE functions for lists
929
930The old perltodo notes that lvalue functions don't work for list or hash
931slices. This would be good to fix.
932
0bdfc961
NC
933=head2 regexp optimiser optional
934
935The regexp optimiser is not optional. It should configurable to be, to allow
936its performance to be measured, and its bugs to be easily demonstrated.
937
ef36c6a7
RGS
938=head2 C</w> regex modifier
939
940That flag would enable to match whole words, and also to interpolate
941arrays as alternations. With it, C</P/w> would be roughly equivalent to:
942
943 do { local $"='|'; /\b(?:P)\b/ }
944
b4af8972
RB
945See
946L<http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2007-01/msg00400.html>
ef36c6a7
RGS
947for the discussion.
948
0bdfc961
NC
949=head2 optional optimizer
950
951Make the peephole optimizer optional. Currently it performs two tasks as
952it walks the optree - genuine peephole optimisations, and necessary fixups of
953ops. It would be good to find an efficient way to switch out the
954optimisations whilst keeping the fixups.
955
956=head2 You WANT *how* many
957
958Currently contexts are void, scalar and list. split has a special mechanism in
959place to pass in the number of return values wanted. It would be useful to
960have a general mechanism for this, backwards compatible and little speed hit.
961This would allow proposals such as short circuiting sort to be implemented
962as a module on CPAN.
963
964=head2 lexical aliases
965
e12cb30b 966Allow lexical aliases (maybe via the syntax C<my \$alias = \$foo>).
0bdfc961 967
de535794 968=head2 Self-ties
2810d901 969
de535794 970Self-ties are currently illegal because they caused too many segfaults. Maybe
a8d0aeb9 971the causes of these could be tracked down and self-ties on all types
de535794 972reinstated.
0bdfc961
NC
973
974=head2 Optimize away @_
975
976The old perltodo notes "Look at the "reification" code in C<av.c>".
977
87a942b1
JH
978=head2 Virtualize operating system access
979
980Implement a set of "vtables" that virtualizes operating system access
981(open(), mkdir(), unlink(), readdir(), getenv(), etc.) At the very
982least these interfaces should take SVs as "name" arguments instead of
983bare char pointers; probably the most flexible and extensible way
e1a3d5d1
JH
984would be for the Perl-facing interfaces to accept HVs. The system
985needs to be per-operating-system and per-file-system
986hookable/filterable, preferably both from XS and Perl level
87a942b1
JH
987(L<perlport/"Files and Filesystems"> is good reading at this point,
988in fact, all of L<perlport> is.)
989
e1a3d5d1
JH
990This has actually already been implemented (but only for Win32),
991take a look at F<iperlsys.h> and F<win32/perlhost.h>. While all Win32
992variants go through a set of "vtables" for operating system access,
e1020413 993non-Win32 systems currently go straight for the POSIX/Unix-style
e1a3d5d1
JH
994system/library call. Similar system as for Win32 should be
995implemented for all platforms. The existing Win32 implementation
996probably does not need to survive alongside this proposed new
997implementation, the approaches could be merged.
87a942b1
JH
998
999What would this give us? One often-asked-for feature this would
94da6c29
JH
1000enable is using Unicode for filenames, and other "names" like %ENV,
1001usernames, hostnames, and so forth.
1002(See L<perlunicode/"When Unicode Does Not Happen">.)
1003
1004But this kind of virtualization would also allow for things like
1005virtual filesystems, virtual networks, and "sandboxes" (though as long
1006as dynamic loading of random object code is allowed, not very safe
1007sandboxes since external code of course know not of Perl's vtables).
1008An example of a smaller "sandbox" is that this feature can be used to
1009implement per-thread working directories: Win32 already does this.
1010
1011See also L</"Extend PerlIO and PerlIO::Scalar">.
87a942b1 1012
52960e22
JC
1013=head2 repack the optree
1014
1015Repacking the optree after execution order is determined could allow
057163d7 1016removal of NULL ops, and optimal ordering of OPs with respect to cache-line
2723c0fb 1017filling. I think that
057163d7
NC
1018the best way to do this is to make it an optional step just before the
1019completed optree is attached to anything else, and to use the slab allocator
2723c0fb
FC
1020unchanged--but allocate a single slab the right size, avoiding partial
1021slabs--, so that freeing ops is identical whether or not this step runs.
057163d7
NC
1022Note that the slab allocator allocates ops downwards in memory, so one would
1023have to actually "allocate" the ops in reverse-execution order to get them
1024contiguous in memory in execution order.
1025
b4af8972
RB
1026See
1027L<http://www.nntp.perl.org/group/perl.perl5.porters/2007/12/msg131975.html>
057163d7
NC
1028
1029Note that running this copy, and then freeing all the old location ops would
1030cause their slabs to be freed, which would eliminate possible memory wastage if
1031the previous suggestion is implemented, and we swap slabs more frequently.
52960e22 1032
12e06b6f
NC
1033=head2 eliminate incorrect line numbers in warnings
1034
1035This code
1036
1037 use warnings;
1038 my $undef;
1039
1040 if ($undef == 3) {
1041 } elsif ($undef == 0) {
1042 }
1043
18a16cc5 1044used to produce this output:
12e06b6f
NC
1045
1046 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1047 Use of uninitialized value in numeric eq (==) at wrong.pl line 4.
1048
18a16cc5
NC
1049where the line of the second warning was misreported - it should be line 5.
1050Rafael fixed this - the problem arose because there was no nextstate OP
1051between the execution of the C<if> and the C<elsif>, hence C<PL_curcop> still
1052reports that the currently executing line is line 4. The solution was to inject
1053a nextstate OPs for each C<elsif>, although it turned out that the nextstate
1054OP needed to be a nulled OP, rather than a live nextstate OP, else other line
1055numbers became misreported. (Jenga!)
12e06b6f
NC
1056
1057The problem is more general than C<elsif> (although the C<elsif> case is the
1058most common and the most confusing). Ideally this code
1059
1060 use warnings;
1061 my $undef;
1062
1063 my $a = $undef + 1;
1064 my $b
1065 = $undef
1066 + 1;
1067
1068would produce this output
1069
1070 Use of uninitialized value $undef in addition (+) at wrong.pl line 4.
1071 Use of uninitialized value $undef in addition (+) at wrong.pl line 7.
1072
1073(rather than lines 4 and 5), but this would seem to require every OP to carry
1074(at least) line number information.
1075
1076What might work is to have an optional line number in memory just before the
1077BASEOP structure, with a flag bit in the op to say whether it's present.
1078Initially during compile every OP would carry its line number. Then add a late
1079pass to the optimiser (potentially combined with L</repack the optree>) which
1080looks at the two ops on every edge of the graph of the execution path. If
1081the line number changes, flags the destination OP with this information.
1082Once all paths are traced, replace every op with the flag with a
1083nextstate-light op (that just updates C<PL_curcop>), which in turn then passes
1084control on to the true op. All ops would then be replaced by variants that
1085do not store the line number. (Which, logically, why it would work best in
1086conjunction with L</repack the optree>, as that is already copying/reallocating
1087all the OPs)
1088
18a16cc5
NC
1089(Although I should note that we're not certain that doing this for the general
1090case is worth it)
1091
52960e22
JC
1092=head2 optimize tail-calls
1093
1094Tail-calls present an opportunity for broadly applicable optimization;
1095anywhere that C<< return foo(...) >> is called, the outer return can
1096be replaced by a goto, and foo will return directly to the outer
1097caller, saving (conservatively) 25% of perl's call&return cost, which
1098is relatively higher than in C. The scheme language is known to do
1099this heavily. B::Concise provides good insight into where this
1100optimization is possible, ie anywhere entersub,leavesub op-sequence
1101occurs.
1102
1103 perl -MO=Concise,-exec,a,b,-main -e 'sub a{ 1 }; sub b {a()}; b(2)'
1104
1105Bottom line on this is probably a new pp_tailcall function which
1106combines the code in pp_entersub, pp_leavesub. This should probably
1107be done 1st in XS, and using B::Generate to patch the new OP into the
1108optrees.
1109
e12cb30b 1110=head2 Add C<0odddd>
0c397127
KW
1111
1112It has been proposed that octal constants be specifiable through the syntax
1113C<0oddddd>, parallel to the existing construct to specify hex constants
1114C<0xddddd>
1115
0bdfc961
NC
1116=head1 Big projects
1117
1118Tasks that will get your name mentioned in the description of the "Highlights
e12cb30b 1119of 5.18.0"
0bdfc961
NC
1120
1121=head2 make ithreads more robust
1122
45a81a90 1123Generally make ithreads more robust.
0bdfc961
NC
1124
1125This task is incremental - even a little bit of work on it will help, and
1126will be greatly appreciated.
1127
07577ec1
FC
1128One bit would be to determine how to clone directory handles on systems
1129without a C<fchdir> function (in sv.c:Perl_dirp_dup).
6c047da7 1130
59c7f7d5
RGS
1131Fix Perl_sv_dup, et al so that threads can return objects.
1132
6bda09f9
YO
1133=head2 Add class set operations to regexp engine
1134
1135Apparently these are quite useful. Anyway, Jeffery Friedl wants them.
1136
1137demerphq has this on his todo list, but right at the bottom.
44a7a252
JV
1138
1139
1140=head1 Tasks for microperl
1141
1142
1143[ Each and every one of these may be obsolete, but they were listed
1144 in the old Todo.micro file]
1145
44a7a252
JV
1146=head2 do away with fork/exec/wait?
1147
1148(system, popen should be enough?)
1149
1150=head2 some of the uconfig.sh really needs to be probed (using cc) in buildtime:
1151
1152(uConfigure? :-) native datatype widths and endianness come to mind
1153