[perl5.git] / pod / perlfork.pod

=head1 NAME

perlfork - Perl's fork() emulation (EXPERIMENTAL, subject to change)

=head1 SYNOPSIS

    WARNING:  As of the 5.6.1 release, the fork() emulation continues
    to be an experimental feature.  Use in production applications is
    not recommended.  See the "BUGS" and "CAVEATS AND LIMITATIONS"
    sections below.

Perl provides a fork() keyword that corresponds to the Unix system call
of the same name.  On most Unix-like platforms where the fork() system
call is available, Perl's fork() simply calls it.

On some platforms such as Windows where the fork() system call is not
available, Perl can be built to emulate fork() at the interpreter level.
While the emulation is designed to be as compatible as possible with the
real fork() at the level of the Perl program, there are certain
important differences that stem from the fact that all the pseudo child
"processes" created this way live in the same real process as far as the
operating system is concerned.

This document provides a general overview of the capabilities and
limitations of the fork() emulation.  Note that the issues discussed here
are not applicable to platforms where a real fork() is available and Perl
has been configured to use it.

=head1 DESCRIPTION

The fork() emulation is implemented at the level of the Perl interpreter.
What this means in general is that running fork() will actually clone the
running interpreter and all its state, and run the cloned interpreter in
a separate thread, beginning execution in the new thread just after the
point where the fork() was called in the parent.  We will refer to the
thread that implements this child "process" as the pseudo-process.

To the Perl program that called fork(), all this is designed to be
transparent.  The parent returns from the fork() with a pseudo-process
ID that can be subsequently used in any process manipulation functions;
the child returns from the fork() with a value of C<0> to signify that
it is the child pseudo-process.

=head2 Behavior of other Perl features in forked pseudo-processes

Most Perl features behave in a natural way within pseudo-processes.

=over 8

=item $$ or $PROCESS_ID

This special variable is correctly set to the pseudo-process ID.
It can be used to identify pseudo-processes within a particular
session.  Note that this value is subject to recycling if any
pseudo-processes are launched after others have been wait()-ed on.

=item %ENV

Each pseudo-process maintains its own virtual environment.  Modifications
to %ENV affect the virtual environment, and are only visible within that
pseudo-process, and in any processes (or pseudo-processes) launched from
it.

=item chdir() and all other builtins that accept filenames

Each pseudo-process maintains its own virtual idea of the current directory.
Modifications to the current directory using chdir() are only visible within
that pseudo-process, and in any processes (or pseudo-processes) launched from
it.  All file and directory accesses from the pseudo-process will correctly
map the virtual working directory to the real working directory appropriately.

=item wait() and waitpid()

wait() and waitpid() can be passed a pseudo-process ID returned by fork().
These calls will properly wait for the termination of the pseudo-process
and return its status.

=item kill()

kill() can be used to terminate a pseudo-process by passing it the ID returned
by fork().  This should not be used except under dire circumstances, because
the operating system may not guarantee integrity of the process resources
when a running thread is terminated.  Note that using kill() on a
pseudo-process() may typically cause memory leaks, because the thread that
implements the pseudo-process does not get a chance to clean up its resources.

=item exec()

Calling exec() within a pseudo-process actually spawns the requested
executable in a separate process and waits for it to complete before
exiting with the same exit status as that process.  This means that the
process ID reported within the running executable will be different from
what the earlier Perl fork() might have returned.  Similarly, any process
manipulation functions applied to the ID returned by fork() will affect the
waiting pseudo-process that called exec(), not the real process it is
waiting for after the exec().

=item exit()

exit() always exits just the executing pseudo-process, after automatically
wait()-ing for any outstanding child pseudo-processes.  Note that this means
that the process as a whole will not exit unless all running pseudo-processes
have exited.

=item Open handles to files, directories and network sockets

All open handles are dup()-ed in pseudo-processes, so that closing
any handles in one process does not affect the others.  See below for
some limitations.

=back

=head2 Resource limits

In the eyes of the operating system, pseudo-processes created via the fork()
emulation are simply threads in the same process.  This means that any
process-level limits imposed by the operating system apply to all
pseudo-processes taken together.  This includes any limits imposed by the
operating system on the number of open file, directory and socket handles,
limits on disk space usage, limits on memory size, limits on CPU utilization
etc.

=head2 Killing the parent process

If the parent process is killed (either using Perl's kill() builtin, or
using some external means) all the pseudo-processes are killed as well,
and the whole process exits.

=head2 Lifetime of the parent process and pseudo-processes

During the normal course of events, the parent process and every
pseudo-process started by it will wait for their respective pseudo-children
to complete before they exit.  This means that the parent and every
pseudo-child created by it that is also a pseudo-parent will only exit
after their pseudo-children have exited.

A way to mark a pseudo-processes as running detached from their parent (so
that the parent would not have to wait() for them if it doesn't want to)
will be provided in future.

=head2 CAVEATS AND LIMITATIONS

=over 8

=item BEGIN blocks

The fork() emulation will not work entirely correctly when called from
within a BEGIN block.  The forked copy will run the contents of the
BEGIN block, but will not continue parsing the source stream after the
BEGIN block.  For example, consider the following code:

    BEGIN {
        fork and exit;		# fork child and exit the parent
	print "inner\n";
    }
    print "outer\n";

This will print:

    inner

rather than the expected:

    inner
    outer

This limitation arises from fundamental technical difficulties in
cloning and restarting the stacks used by the Perl parser in the
middle of a parse.

=item Open filehandles

Any filehandles open at the time of the fork() will be dup()-ed.  Thus,
the files can be closed independently in the parent and child, but beware
that the dup()-ed handles will still share the same seek pointer.  Changing
the seek position in the parent will change it in the child and vice-versa.
One can avoid this by opening files that need distinct seek pointers
separately in the child.

=item Forking pipe open() not yet implemented

The C<open(FOO, "|-")> and C<open(BAR, "-|")> constructs are not yet
implemented.  This limitation can be easily worked around in new code
by creating a pipe explicitly.  The following example shows how to
write to a forked child:

    # simulate open(FOO, "|-")
    sub pipe_to_fork ($) {
	my $parent = shift;
	pipe my $child, $parent or die;
	my $pid = fork();
	die "fork() failed: $!" unless defined $pid;
	if ($pid) {
	    close $child;
	}
	else {
	    close $parent;
	    open(STDIN, "<&=" . fileno($child)) or die;
	}
	$pid;
    }

    if (pipe_to_fork('FOO')) {
	# parent
	print FOO "pipe_to_fork\n";
	close FOO;
    }
    else {
	# child
	while (<STDIN>) { print; }
	close STDIN;
	exit(0);
    }

And this one reads from the child:

    # simulate open(FOO, "-|")
    sub pipe_from_fork ($) {
	my $parent = shift;
	pipe $parent, my $child or die;
	my $pid = fork();
	die "fork() failed: $!" unless defined $pid;
	if ($pid) {
	    close $child;
	}
	else {
	    close $parent;
	    open(STDOUT, ">&=" . fileno($child)) or die;
	}
	$pid;
    }

    if (pipe_from_fork('BAR')) {
	# parent
	while (<BAR>) { print; }
	close BAR;
    }
    else {
	# child
	print "pipe_from_fork\n";
	close STDOUT;
	exit(0);
    }

Forking pipe open() constructs will be supported in future.

=item Global state maintained by XSUBs 

External subroutines (XSUBs) that maintain their own global state may
not work correctly.  Such XSUBs will either need to maintain locks to
protect simultaneous access to global data from different pseudo-processes,
or maintain all their state on the Perl symbol table, which is copied
naturally when fork() is called.  A callback mechanism that provides
extensions an opportunity to clone their state will be provided in the
near future.

=item Interpreter embedded in larger application

The fork() emulation may not behave as expected when it is executed in an
application which embeds a Perl interpreter and calls Perl APIs that can
evaluate bits of Perl code.  This stems from the fact that the emulation
only has knowledge about the Perl interpreter's own data structures and
knows nothing about the containing application's state.  For example, any
state carried on the application's own call stack is out of reach.

=item Thread-safety of extensions

Since the fork() emulation runs code in multiple threads, extensions
calling into non-thread-safe libraries may not work reliably when
calling fork().  As Perl's threading support gradually becomes more
widely adopted even on platforms with a native fork(), such extensions
are expected to be fixed for thread-safety.

=back

=head1 BUGS

=over 8

=item *

Perl's regular expression engine currently does not play very nicely
with the fork() emulation.  There are known race conditions arising
from the regular expression engine modifying state carried in the opcode
tree at run time (the fork() emulation relies on the opcode tree being
immutable).  This typically happens when the regex contains paren groups
or variables interpolated within it that force a run time recompilation
of the regex.  Due to this major bug, the fork() emulation is not
recommended for use in production applications at this time.

=item *

Having pseudo-process IDs be negative integers breaks down for the integer
C<-1> because the wait() and waitpid() functions treat this number as
being special.  The tacit assumption in the current implementation is that
the system never allocates a thread ID of C<1> for user threads.  A better
representation for pseudo-process IDs will be implemented in future.

=item *

This document may be incomplete in some respects.

=back

=head1 AUTHOR

Support for concurrent interpreters and the fork() emulation was implemented
by ActiveState, with funding from Microsoft Corporation.

This document is authored and maintained by Gurusamy Sarathy
E<lt>gsar@activestate.comE<gt>.

=head1 SEE ALSO

L<perlfunc/"fork">, L<perlipc>

=cut
Commit	Line	Data
7766f137 GS	1	=head1 NAME
7766f137 GS	2
c7fa416b	3	perlfork - Perl's fork() emulation (EXPERIMENTAL, subject to change)
7766f137 GS	4
	5	=head1 SYNOPSIS
	6
c7fa416b JH	7	WARNING: As of the 5.6.1 release, the fork() emulation continues
	8	to be an experimental feature. Use in production applications is
	9	not recommended. See the "BUGS" and "CAVEATS AND LIMITATIONS"
	10	sections below.
	11
7766f137 GS	12	Perl provides a fork() keyword that corresponds to the Unix system call
	13	of the same name. On most Unix-like platforms where the fork() system
	14	call is available, Perl's fork() simply calls it.
	15
	16	On some platforms such as Windows where the fork() system call is not
	17	available, Perl can be built to emulate fork() at the interpreter level.
	18	While the emulation is designed to be as compatible as possible with the
106325ad	19	real fork() at the level of the Perl program, there are certain
7766f137 GS	20	important differences that stem from the fact that all the pseudo child
	21	"processes" created this way live in the same real process as far as the
	22	operating system is concerned.
	23
	24	This document provides a general overview of the capabilities and
	25	limitations of the fork() emulation. Note that the issues discussed here
	26	are not applicable to platforms where a real fork() is available and Perl
	27	has been configured to use it.
	28
	29	=head1 DESCRIPTION
	30
	31	The fork() emulation is implemented at the level of the Perl interpreter.
	32	What this means in general is that running fork() will actually clone the
	33	running interpreter and all its state, and run the cloned interpreter in
	34	a separate thread, beginning execution in the new thread just after the
	35	point where the fork() was called in the parent. We will refer to the
	36	thread that implements this child "process" as the pseudo-process.
	37
	38	To the Perl program that called fork(), all this is designed to be
	39	transparent. The parent returns from the fork() with a pseudo-process
	40	ID that can be subsequently used in any process manipulation functions;
	41	the child returns from the fork() with a value of C<0> to signify that
	42	it is the child pseudo-process.
	43
	44	=head2 Behavior of other Perl features in forked pseudo-processes
	45
	46	Most Perl features behave in a natural way within pseudo-processes.
	47
	48	=over 8
	49
	50	=item $$ or $PROCESS_ID
	51
	52	This special variable is correctly set to the pseudo-process ID.
	53	It can be used to identify pseudo-processes within a particular
	54	session. Note that this value is subject to recycling if any
	55	pseudo-processes are launched after others have been wait()-ed on.
	56
	57	=item %ENV
	58
4375e838	59	Each pseudo-process maintains its own virtual environment. Modifications
7766f137 GS	60	to %ENV affect the virtual environment, and are only visible within that
	61	pseudo-process, and in any processes (or pseudo-processes) launched from
	62	it.
	63
	64	=item chdir() and all other builtins that accept filenames
	65
	66	Each pseudo-process maintains its own virtual idea of the current directory.
	67	Modifications to the current directory using chdir() are only visible within
	68	that pseudo-process, and in any processes (or pseudo-processes) launched from
	69	it. All file and directory accesses from the pseudo-process will correctly
	70	map the virtual working directory to the real working directory appropriately.
	71
	72	=item wait() and waitpid()
	73
	74	wait() and waitpid() can be passed a pseudo-process ID returned by fork().
	75	These calls will properly wait for the termination of the pseudo-process
	76	and return its status.
	77
	78	=item kill()
	79
	80	kill() can be used to terminate a pseudo-process by passing it the ID returned
	81	by fork(). This should not be used except under dire circumstances, because
	82	the operating system may not guarantee integrity of the process resources
	83	when a running thread is terminated. Note that using kill() on a
	84	pseudo-process() may typically cause memory leaks, because the thread that
	85	implements the pseudo-process does not get a chance to clean up its resources.
	86
	87	=item exec()
	88
	89	Calling exec() within a pseudo-process actually spawns the requested
	90	executable in a separate process and waits for it to complete before
	91	exiting with the same exit status as that process. This means that the
	92	process ID reported within the running executable will be different from
	93	what the earlier Perl fork() might have returned. Similarly, any process
	94	manipulation functions applied to the ID returned by fork() will affect the
	95	waiting pseudo-process that called exec(), not the real process it is
	96	waiting for after the exec().
	97
	98	=item exit()
	99
	100	exit() always exits just the executing pseudo-process, after automatically
	101	wait()-ing for any outstanding child pseudo-processes. Note that this means
	102	that the process as a whole will not exit unless all running pseudo-processes
	103	have exited.
	104
	105	=item Open handles to files, directories and network sockets
	106
	107	All open handles are dup()-ed in pseudo-processes, so that closing
	108	any handles in one process does not affect the others. See below for
	109	some limitations.
	110
	111	=back
	112
	113	=head2 Resource limits
	114
	115	In the eyes of the operating system, pseudo-processes created via the fork()
	116	emulation are simply threads in the same process. This means that any
	117	process-level limits imposed by the operating system apply to all
	118	pseudo-processes taken together. This includes any limits imposed by the
	119	operating system on the number of open file, directory and socket handles,
	120	limits on disk space usage, limits on memory size, limits on CPU utilization
	121	etc.
	122
	123	=head2 Killing the parent process
124
125	If the parent process is killed (either using Perl's kill() builtin, or
126	using some external means) all the pseudo-processes are killed as well,
127	and the whole process exits.
128
129	=head2 Lifetime of the parent process and pseudo-processes
130
131	During the normal course of events, the parent process and every
132	pseudo-process started by it will wait for their respective pseudo-children
133	to complete before they exit. This means that the parent and every
134	pseudo-child created by it that is also a pseudo-parent will only exit
135	after their pseudo-children have exited.
136
137	A way to mark a pseudo-processes as running detached from their parent (so
138	that the parent would not have to wait() for them if it doesn't want to)
139	will be provided in future.
140
141	=head2 CAVEATS AND LIMITATIONS
142
143	=over 8
144
145	=item BEGIN blocks
146
147	The fork() emulation will not work entirely correctly when called from
148	within a BEGIN block. The forked copy will run the contents of the
149	BEGIN block, but will not continue parsing the source stream after the
150	BEGIN block. For example, consider the following code:
151
152	BEGIN {
153	fork and exit; # fork child and exit the parent
154	print "inner\n";
155	}
156	print "outer\n";
157
158	This will print:
159
160	inner
161
162	rather than the expected:
163
164	inner
165	outer
166
167	This limitation arises from fundamental technical difficulties in
168	cloning and restarting the stacks used by the Perl parser in the
169	middle of a parse.
170
171	=item Open filehandles
172
173	Any filehandles open at the time of the fork() will be dup()-ed. Thus,
174	the files can be closed independently in the parent and child, but beware
175	that the dup()-ed handles will still share the same seek pointer. Changing
176	the seek position in the parent will change it in the child and vice-versa.
177	One can avoid this by opening files that need distinct seek pointers
178	separately in the child.
179
030866aa GS	180	=item Forking pipe open() not yet implemented
	181
	182	The C<open(FOO, "\|-")> and C<open(BAR, "-\|")> constructs are not yet
	183	implemented. This limitation can be easily worked around in new code
	184	by creating a pipe explicitly. The following example shows how to
	185	write to a forked child:
	186
	187	# simulate open(FOO, "\|-")
	188	sub pipe_to_fork ($) {
	189	my $parent = shift;
	190	pipe my $child, $parent or die;
	191	my $pid = fork();
	192	die "fork() failed: $!" unless defined $pid;
	193	if ($pid) {
	194	close $child;
	195	}
	196	else {
	197	close $parent;
	198	open(STDIN, "<&=" . fileno($child)) or die;
	199	}
	200	$pid;
	201	}
	202
	203	if (pipe_to_fork('FOO')) {
	204	# parent
	205	print FOO "pipe_to_fork\n";
	206	close FOO;
	207	}
	208	else {
	209	# child
	210	while (<STDIN>) { print; }
	211	close STDIN;
	212	exit(0);
	213	}
	214
	215	And this one reads from the child:
	216
	217	# simulate open(FOO, "-\|")
	218	sub pipe_from_fork ($) {
	219	my $parent = shift;
	220	pipe $parent, my $child or die;
	221	my $pid = fork();
	222	die "fork() failed: $!" unless defined $pid;
	223	if ($pid) {
	224	close $child;
	225	}
	226	else {
	227	close $parent;
	228	open(STDOUT, ">&=" . fileno($child)) or die;
	229	}
	230	$pid;
	231	}
	232
	233	if (pipe_from_fork('BAR')) {
	234	# parent
	235	while (<BAR>) { print; }
	236	close BAR;
	237	}
	238	else {
	239	# child
	240	print "pipe_from_fork\n";
	241	close STDOUT;
	242	exit(0);
	243	}
244
245	Forking pipe open() constructs will be supported in future.
246
7766f137 GS	247	=item Global state maintained by XSUBs
	248
	249	External subroutines (XSUBs) that maintain their own global state may
	250	not work correctly. Such XSUBs will either need to maintain locks to
	251	protect simultaneous access to global data from different pseudo-processes,
	252	or maintain all their state on the Perl symbol table, which is copied
	253	naturally when fork() is called. A callback mechanism that provides
	254	extensions an opportunity to clone their state will be provided in the
	255	near future.
	256
	257	=item Interpreter embedded in larger application
	258
	259	The fork() emulation may not behave as expected when it is executed in an
	260	application which embeds a Perl interpreter and calls Perl APIs that can
	261	evaluate bits of Perl code. This stems from the fact that the emulation
	262	only has knowledge about the Perl interpreter's own data structures and
	263	knows nothing about the containing application's state. For example, any
	264	state carried on the application's own call stack is out of reach.
	265
7e396c59 GS	266	=item Thread-safety of extensions
	267
	268	Since the fork() emulation runs code in multiple threads, extensions
	269	calling into non-thread-safe libraries may not work reliably when
	270	calling fork(). As Perl's threading support gradually becomes more
	271	widely adopted even on platforms with a native fork(), such extensions
	272	are expected to be fixed for thread-safety.
	273
7766f137 GS	274	=back
	275
	276	=head1 BUGS
	277
	278	=over 8
	279
	280	=item *
	281
c7fa416b JH	282	Perl's regular expression engine currently does not play very nicely
	283	with the fork() emulation. There are known race conditions arising
	284	from the regular expression engine modifying state carried in the opcode
	285	tree at run time (the fork() emulation relies on the opcode tree being
	286	immutable). This typically happens when the regex contains paren groups
	287	or variables interpolated within it that force a run time recompilation
	288	of the regex. Due to this major bug, the fork() emulation is not
	289	recommended for use in production applications at this time.
	290
	291	=item *
	292
7766f137 GS	293	Having pseudo-process IDs be negative integers breaks down for the integer
	294	C<-1> because the wait() and waitpid() functions treat this number as
	295	being special. The tacit assumption in the current implementation is that
	296	the system never allocates a thread ID of C<1> for user threads. A better
	297	representation for pseudo-process IDs will be implemented in future.
	298
	299	=item *
	300
	301	This document may be incomplete in some respects.
	302
a45bd81d GS	303	=back
a45bd81d GS	304
7766f137 GS	305	=head1 AUTHOR
7766f137 GS	306
7e396c59 GS	307	Support for concurrent interpreters and the fork() emulation was implemented
7e396c59 GS	308	by ActiveState, with funding from Microsoft Corporation.
7766f137 GS	309
	310	This document is authored and maintained by Gurusamy Sarathy
	311	E<lt>gsar@activestate.comE<gt>.
	312
	313	=head1 SEE ALSO
	314
	315	L<perlfunc/"fork">, L<perlipc>
	316
	317	=cut