This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Change core_prototype to take a keyword num
[perl5.git] / pod / perlthrtut.pod
... / ...
CommitLineData
1=encoding utf8
2
3=head1 NAME
4
5perlthrtut - Tutorial on threads in Perl
6
7=head1 DESCRIPTION
8
9This tutorial describes the use of Perl interpreter threads (sometimes
10referred to as I<ithreads>) that was first introduced in Perl 5.6.0. In this
11model, each thread runs in its own Perl interpreter, and any data sharing
12between threads must be explicit. The user-level interface for I<ithreads>
13uses the L<threads> class.
14
15B<NOTE>: There was another older Perl threading flavor called the 5.005 model
16that used the L<threads> class. This old model was known to have problems, is
17deprecated, and was removed for release 5.10. You are
18strongly encouraged to migrate any existing 5.005 threads code to the new
19model as soon as possible.
20
21You can see which (or neither) threading flavour you have by
22running C<perl -V> and looking at the C<Platform> section.
23If you have C<useithreads=define> you have ithreads, if you
24have C<use5005threads=define> you have 5.005 threads.
25If you have neither, you don't have any thread support built in.
26If you have both, you are in trouble.
27
28The L<threads> and L<threads::shared> modules are included in the core Perl
29distribution. Additionally, they are maintained as a separate modules on
30CPAN, so you can check there for any updates.
31
32=head1 What Is A Thread Anyway?
33
34A thread is a flow of control through a program with a single
35execution point.
36
37Sounds an awful lot like a process, doesn't it? Well, it should.
38Threads are one of the pieces of a process. Every process has at least
39one thread and, up until now, every process running Perl had only one
40thread. With 5.8, though, you can create extra threads. We're going
41to show you how, when, and why.
42
43=head1 Threaded Program Models
44
45There are three basic ways that you can structure a threaded
46program. Which model you choose depends on what you need your program
47to do. For many non-trivial threaded programs, you'll need to choose
48different models for different pieces of your program.
49
50=head2 Boss/Worker
51
52The boss/worker model usually has one I<boss> thread and one or more
53I<worker> threads. The boss thread gathers or generates tasks that need
54to be done, then parcels those tasks out to the appropriate worker
55thread.
56
57This model is common in GUI and server programs, where a main thread
58waits for some event and then passes that event to the appropriate
59worker threads for processing. Once the event has been passed on, the
60boss thread goes back to waiting for another event.
61
62The boss thread does relatively little work. While tasks aren't
63necessarily performed faster than with any other method, it tends to
64have the best user-response times.
65
66=head2 Work Crew
67
68In the work crew model, several threads are created that do
69essentially the same thing to different pieces of data. It closely
70mirrors classical parallel processing and vector processors, where a
71large array of processors do the exact same thing to many pieces of
72data.
73
74This model is particularly useful if the system running the program
75will distribute multiple threads across different processors. It can
76also be useful in ray tracing or rendering engines, where the
77individual threads can pass on interim results to give the user visual
78feedback.
79
80=head2 Pipeline
81
82The pipeline model divides up a task into a series of steps, and
83passes the results of one step on to the thread processing the
84next. Each thread does one thing to each piece of data and passes the
85results to the next thread in line.
86
87This model makes the most sense if you have multiple processors so two
88or more threads will be executing in parallel, though it can often
89make sense in other contexts as well. It tends to keep the individual
90tasks small and simple, as well as allowing some parts of the pipeline
91to block (on I/O or system calls, for example) while other parts keep
92going. If you're running different parts of the pipeline on different
93processors you may also take advantage of the caches on each
94processor.
95
96This model is also handy for a form of recursive programming where,
97rather than having a subroutine call itself, it instead creates
98another thread. Prime and Fibonacci generators both map well to this
99form of the pipeline model. (A version of a prime number generator is
100presented later on.)
101
102=head1 What kind of threads are Perl threads?
103
104If you have experience with other thread implementations, you might
105find that things aren't quite what you expect. It's very important to
106remember when dealing with Perl threads that I<Perl Threads Are Not X
107Threads> for all values of X. They aren't POSIX threads, or
108DecThreads, or Java's Green threads, or Win32 threads. There are
109similarities, and the broad concepts are the same, but if you start
110looking for implementation details you're going to be either
111disappointed or confused. Possibly both.
112
113This is not to say that Perl threads are completely different from
114everything that's ever come before. They're not. Perl's threading
115model owes a lot to other thread models, especially POSIX. Just as
116Perl is not C, though, Perl threads are not POSIX threads. So if you
117find yourself looking for mutexes, or thread priorities, it's time to
118step back a bit and think about what you want to do and how Perl can
119do it.
120
121However, it is important to remember that Perl threads cannot magically
122do things unless your operating system's threads allow it. So if your
123system blocks the entire process on C<sleep()>, Perl usually will, as well.
124
125B<Perl Threads Are Different.>
126
127=head1 Thread-Safe Modules
128
129The addition of threads has changed Perl's internals
130substantially. There are implications for people who write
131modules with XS code or external libraries. However, since Perl data is
132not shared among threads by default, Perl modules stand a high chance of
133being thread-safe or can be made thread-safe easily. Modules that are not
134tagged as thread-safe should be tested or code reviewed before being used
135in production code.
136
137Not all modules that you might use are thread-safe, and you should
138always assume a module is unsafe unless the documentation says
139otherwise. This includes modules that are distributed as part of the
140core. Threads are a relatively new feature, and even some of the standard
141modules aren't thread-safe.
142
143Even if a module is thread-safe, it doesn't mean that the module is optimized
144to work well with threads. A module could possibly be rewritten to utilize
145the new features in threaded Perl to increase performance in a threaded
146environment.
147
148If you're using a module that's not thread-safe for some reason, you
149can protect yourself by using it from one, and only one thread at all.
150If you need multiple threads to access such a module, you can use semaphores and
151lots of programming discipline to control access to it. Semaphores
152are covered in L</"Basic semaphores">.
153
154See also L</"Thread-Safety of System Libraries">.
155
156=head1 Thread Basics
157
158The L<threads> module provides the basic functions you need to write
159threaded programs. In the following sections, we'll cover the basics,
160showing you what you need to do to create a threaded program. After
161that, we'll go over some of the features of the L<threads> module that
162make threaded programming easier.
163
164=head2 Basic Thread Support
165
166Thread support is a Perl compile-time option. It's something that's
167turned on or off when Perl is built at your site, rather than when
168your programs are compiled. If your Perl wasn't compiled with thread
169support enabled, then any attempt to use threads will fail.
170
171Your programs can use the Config module to check whether threads are
172enabled. If your program can't run without them, you can say something
173like:
174
175 use Config;
176 $Config{useithreads} or die('Recompile Perl with threads to run this program.');
177
178A possibly-threaded program using a possibly-threaded module might
179have code like this:
180
181 use Config;
182 use MyMod;
183
184 BEGIN {
185 if ($Config{useithreads}) {
186 # We have threads
187 require MyMod_threaded;
188 import MyMod_threaded;
189 } else {
190 require MyMod_unthreaded;
191 import MyMod_unthreaded;
192 }
193 }
194
195Since code that runs both with and without threads is usually pretty
196messy, it's best to isolate the thread-specific code in its own
197module. In our example above, that's what C<MyMod_threaded> is, and it's
198only imported if we're running on a threaded Perl.
199
200=head2 A Note about the Examples
201
202In a real situation, care should be taken that all threads are finished
203executing before the program exits. That care has B<not> been taken in these
204examples in the interest of simplicity. Running these examples I<as is> will
205produce error messages, usually caused by the fact that there are still
206threads running when the program exits. You should not be alarmed by this.
207
208=head2 Creating Threads
209
210The L<threads> module provides the tools you need to create new
211threads. Like any other module, you need to tell Perl that you want to use
212it; C<use threads;> imports all the pieces you need to create basic
213threads.
214
215The simplest, most straightforward way to create a thread is with C<create()>:
216
217 use threads;
218
219 my $thr = threads->create(\&sub1);
220
221 sub sub1 {
222 print("In the thread\n");
223 }
224
225The C<create()> method takes a reference to a subroutine and creates a new
226thread that starts executing in the referenced subroutine. Control
227then passes both to the subroutine and the caller.
228
229If you need to, your program can pass parameters to the subroutine as
230part of the thread startup. Just include the list of parameters as
231part of the C<threads-E<gt>create()> call, like this:
232
233 use threads;
234
235 my $Param3 = 'foo';
236 my $thr1 = threads->create(\&sub1, 'Param 1', 'Param 2', $Param3);
237 my @ParamList = (42, 'Hello', 3.14);
238 my $thr2 = threads->create(\&sub1, @ParamList);
239 my $thr3 = threads->create(\&sub1, qw(Param1 Param2 Param3));
240
241 sub sub1 {
242 my @InboundParameters = @_;
243 print("In the thread\n");
244 print('Got parameters >', join('<>', @InboundParameters), "<\n");
245 }
246
247The last example illustrates another feature of threads. You can spawn
248off several threads using the same subroutine. Each thread executes
249the same subroutine, but in a separate thread with a separate
250environment and potentially separate arguments.
251
252C<new()> is a synonym for C<create()>.
253
254=head2 Waiting For A Thread To Exit
255
256Since threads are also subroutines, they can return values. To wait
257for a thread to exit and extract any values it might return, you can
258use the C<join()> method:
259
260 use threads;
261
262 my ($thr) = threads->create(\&sub1);
263
264 my @ReturnData = $thr->join();
265 print('Thread returned ', join(', ', @ReturnData), "\n");
266
267 sub sub1 { return ('Fifty-six', 'foo', 2); }
268
269In the example above, the C<join()> method returns as soon as the thread
270ends. In addition to waiting for a thread to finish and gathering up
271any values that the thread might have returned, C<join()> also performs
272any OS cleanup necessary for the thread. That cleanup might be
273important, especially for long-running programs that spawn lots of
274threads. If you don't want the return values and don't want to wait
275for the thread to finish, you should call the C<detach()> method
276instead, as described next.
277
278NOTE: In the example above, the thread returns a list, thus necessitating
279that the thread creation call be made in list context (i.e., C<my ($thr)>).
280See L<< threads/"$thr->join()" >> and L<threads/"THREAD CONTEXT"> for more
281details on thread context and return values.
282
283=head2 Ignoring A Thread
284
285C<join()> does three things: it waits for a thread to exit, cleans up
286after it, and returns any data the thread may have produced. But what
287if you're not interested in the thread's return values, and you don't
288really care when the thread finishes? All you want is for the thread
289to get cleaned up after when it's done.
290
291In this case, you use the C<detach()> method. Once a thread is detached,
292it'll run until it's finished; then Perl will clean up after it
293automatically.
294
295 use threads;
296
297 my $thr = threads->create(\&sub1); # Spawn the thread
298
299 $thr->detach(); # Now we officially don't care any more
300
301 sleep(15); # Let thread run for awhile
302
303 sub sub1 {
304 $a = 0;
305 while (1) {
306 $a++;
307 print("\$a is $a\n");
308 sleep(1);
309 }
310 }
311
312Once a thread is detached, it may not be joined, and any return data
313that it might have produced (if it was done and waiting for a join) is
314lost.
315
316C<detach()> can also be called as a class method to allow a thread to
317detach itself:
318
319 use threads;
320
321 my $thr = threads->create(\&sub1);
322
323 sub sub1 {
324 threads->detach();
325 # Do more work
326 }
327
328=head2 Process and Thread Termination
329
330With threads one must be careful to make sure they all have a chance to
331run to completion, assuming that is what you want.
332
333An action that terminates a process will terminate I<all> running
334threads. die() and exit() have this property,
335and perl does an exit when the main thread exits,
336perhaps implicitly by falling off the end of your code,
337even if that's not what you want.
338
339As an example of this case, this code prints the message
340"Perl exited with active threads: 2 running and unjoined":
341
342 use threads;
343 my $thr1 = threads->new(\&thrsub, "test1");
344 my $thr2 = threads->new(\&thrsub, "test2");
345 sub thrsub {
346 my ($message) = @_;
347 sleep 1;
348 print "thread $message\n";
349 }
350
351But when the following lines are added at the end:
352
353 $thr1->join();
354 $thr2->join();
355
356it prints two lines of output, a perhaps more useful outcome.
357
358=head1 Threads And Data
359
360Now that we've covered the basics of threads, it's time for our next
361topic: Data. Threading introduces a couple of complications to data
362access that non-threaded programs never need to worry about.
363
364=head2 Shared And Unshared Data
365
366The biggest difference between Perl I<ithreads> and the old 5.005 style
367threading, or for that matter, to most other threading systems out there,
368is that by default, no data is shared. When a new Perl thread is created,
369all the data associated with the current thread is copied to the new
370thread, and is subsequently private to that new thread!
371This is similar in feel to what happens when a Unix process forks,
372except that in this case, the data is just copied to a different part of
373memory within the same process rather than a real fork taking place.
374
375To make use of threading, however, one usually wants the threads to share
376at least some data between themselves. This is done with the
377L<threads::shared> module and the C<:shared> attribute:
378
379 use threads;
380 use threads::shared;
381
382 my $foo :shared = 1;
383 my $bar = 1;
384 threads->create(sub { $foo++; $bar++; })->join();
385
386 print("$foo\n"); # Prints 2 since $foo is shared
387 print("$bar\n"); # Prints 1 since $bar is not shared
388
389In the case of a shared array, all the array's elements are shared, and for
390a shared hash, all the keys and values are shared. This places
391restrictions on what may be assigned to shared array and hash elements: only
392simple values or references to shared variables are allowed - this is
393so that a private variable can't accidentally become shared. A bad
394assignment will cause the thread to die. For example:
395
396 use threads;
397 use threads::shared;
398
399 my $var = 1;
400 my $svar :shared = 2;
401 my %hash :shared;
402
403 ... create some threads ...
404
405 $hash{a} = 1; # All threads see exists($hash{a}) and $hash{a} == 1
406 $hash{a} = $var; # okay - copy-by-value: same effect as previous
407 $hash{a} = $svar; # okay - copy-by-value: same effect as previous
408 $hash{a} = \$svar; # okay - a reference to a shared variable
409 $hash{a} = \$var; # This will die
410 delete($hash{a}); # okay - all threads will see !exists($hash{a})
411
412Note that a shared variable guarantees that if two or more threads try to
413modify it at the same time, the internal state of the variable will not
414become corrupted. However, there are no guarantees beyond this, as
415explained in the next section.
416
417=head2 Thread Pitfalls: Races
418
419While threads bring a new set of useful tools, they also bring a
420number of pitfalls. One pitfall is the race condition:
421
422 use threads;
423 use threads::shared;
424
425 my $a :shared = 1;
426 my $thr1 = threads->create(\&sub1);
427 my $thr2 = threads->create(\&sub2);
428
429 $thr1->join();
430 $thr2->join();
431 print("$a\n");
432
433 sub sub1 { my $foo = $a; $a = $foo + 1; }
434 sub sub2 { my $bar = $a; $a = $bar + 1; }
435
436What do you think C<$a> will be? The answer, unfortunately, is I<it
437depends>. Both C<sub1()> and C<sub2()> access the global variable C<$a>, once
438to read and once to write. Depending on factors ranging from your
439thread implementation's scheduling algorithm to the phase of the moon,
440C<$a> can be 2 or 3.
441
442Race conditions are caused by unsynchronized access to shared
443data. Without explicit synchronization, there's no way to be sure that
444nothing has happened to the shared data between the time you access it
445and the time you update it. Even this simple code fragment has the
446possibility of error:
447
448 use threads;
449 my $a :shared = 2;
450 my $b :shared;
451 my $c :shared;
452 my $thr1 = threads->create(sub { $b = $a; $a = $b + 1; });
453 my $thr2 = threads->create(sub { $c = $a; $a = $c + 1; });
454 $thr1->join();
455 $thr2->join();
456
457Two threads both access C<$a>. Each thread can potentially be interrupted
458at any point, or be executed in any order. At the end, C<$a> could be 3
459or 4, and both C<$b> and C<$c> could be 2 or 3.
460
461Even C<$a += 5> or C<$a++> are not guaranteed to be atomic.
462
463Whenever your program accesses data or resources that can be accessed
464by other threads, you must take steps to coordinate access or risk
465data inconsistency and race conditions. Note that Perl will protect its
466internals from your race conditions, but it won't protect you from you.
467
468=head1 Synchronization and control
469
470Perl provides a number of mechanisms to coordinate the interactions
471between themselves and their data, to avoid race conditions and the like.
472Some of these are designed to resemble the common techniques used in thread
473libraries such as C<pthreads>; others are Perl-specific. Often, the
474standard techniques are clumsy and difficult to get right (such as
475condition waits). Where possible, it is usually easier to use Perlish
476techniques such as queues, which remove some of the hard work involved.
477
478=head2 Controlling access: lock()
479
480The C<lock()> function takes a shared variable and puts a lock on it.
481No other thread may lock the variable until the variable is unlocked
482by the thread holding the lock. Unlocking happens automatically
483when the locking thread exits the block that contains the call to the
484C<lock()> function. Using C<lock()> is straightforward: This example has
485several threads doing some calculations in parallel, and occasionally
486updating a running total:
487
488 use threads;
489 use threads::shared;
490
491 my $total :shared = 0;
492
493 sub calc {
494 while (1) {
495 my $result;
496 # (... do some calculations and set $result ...)
497 {
498 lock($total); # Block until we obtain the lock
499 $total += $result;
500 } # Lock implicitly released at end of scope
501 last if $result == 0;
502 }
503 }
504
505 my $thr1 = threads->create(\&calc);
506 my $thr2 = threads->create(\&calc);
507 my $thr3 = threads->create(\&calc);
508 $thr1->join();
509 $thr2->join();
510 $thr3->join();
511 print("total=$total\n");
512
513C<lock()> blocks the thread until the variable being locked is
514available. When C<lock()> returns, your thread can be sure that no other
515thread can lock that variable until the block containing the
516lock exits.
517
518It's important to note that locks don't prevent access to the variable
519in question, only lock attempts. This is in keeping with Perl's
520longstanding tradition of courteous programming, and the advisory file
521locking that C<flock()> gives you.
522
523You may lock arrays and hashes as well as scalars. Locking an array,
524though, will not block subsequent locks on array elements, just lock
525attempts on the array itself.
526
527Locks are recursive, which means it's okay for a thread to
528lock a variable more than once. The lock will last until the outermost
529C<lock()> on the variable goes out of scope. For example:
530
531 my $x :shared;
532 doit();
533
534 sub doit {
535 {
536 {
537 lock($x); # Wait for lock
538 lock($x); # NOOP - we already have the lock
539 {
540 lock($x); # NOOP
541 {
542 lock($x); # NOOP
543 lockit_some_more();
544 }
545 }
546 } # *** Implicit unlock here ***
547 }
548 }
549
550 sub lockit_some_more {
551 lock($x); # NOOP
552 } # Nothing happens here
553
554Note that there is no C<unlock()> function - the only way to unlock a
555variable is to allow it to go out of scope.
556
557A lock can either be used to guard the data contained within the variable
558being locked, or it can be used to guard something else, like a section
559of code. In this latter case, the variable in question does not hold any
560useful data, and exists only for the purpose of being locked. In this
561respect, the variable behaves like the mutexes and basic semaphores of
562traditional thread libraries.
563
564=head2 A Thread Pitfall: Deadlocks
565
566Locks are a handy tool to synchronize access to data, and using them
567properly is the key to safe shared data. Unfortunately, locks aren't
568without their dangers, especially when multiple locks are involved.
569Consider the following code:
570
571 use threads;
572
573 my $a :shared = 4;
574 my $b :shared = 'foo';
575 my $thr1 = threads->create(sub {
576 lock($a);
577 sleep(20);
578 lock($b);
579 });
580 my $thr2 = threads->create(sub {
581 lock($b);
582 sleep(20);
583 lock($a);
584 });
585
586This program will probably hang until you kill it. The only way it
587won't hang is if one of the two threads acquires both locks
588first. A guaranteed-to-hang version is more complicated, but the
589principle is the same.
590
591The first thread will grab a lock on C<$a>, then, after a pause during which
592the second thread has probably had time to do some work, try to grab a
593lock on C<$b>. Meanwhile, the second thread grabs a lock on C<$b>, then later
594tries to grab a lock on C<$a>. The second lock attempt for both threads will
595block, each waiting for the other to release its lock.
596
597This condition is called a deadlock, and it occurs whenever two or
598more threads are trying to get locks on resources that the others
599own. Each thread will block, waiting for the other to release a lock
600on a resource. That never happens, though, since the thread with the
601resource is itself waiting for a lock to be released.
602
603There are a number of ways to handle this sort of problem. The best
604way is to always have all threads acquire locks in the exact same
605order. If, for example, you lock variables C<$a>, C<$b>, and C<$c>, always lock
606C<$a> before C<$b>, and C<$b> before C<$c>. It's also best to hold on to locks for
607as short a period of time to minimize the risks of deadlock.
608
609The other synchronization primitives described below can suffer from
610similar problems.
611
612=head2 Queues: Passing Data Around
613
614A queue is a special thread-safe object that lets you put data in one
615end and take it out the other without having to worry about
616synchronization issues. They're pretty straightforward, and look like
617this:
618
619 use threads;
620 use Thread::Queue;
621
622 my $DataQueue = Thread::Queue->new();
623 my $thr = threads->create(sub {
624 while (my $DataElement = $DataQueue->dequeue()) {
625 print("Popped $DataElement off the queue\n");
626 }
627 });
628
629 $DataQueue->enqueue(12);
630 $DataQueue->enqueue("A", "B", "C");
631 sleep(10);
632 $DataQueue->enqueue(undef);
633 $thr->join();
634
635You create the queue with C<Thread::Queue-E<gt>new()>. Then you can
636add lists of scalars onto the end with C<enqueue()>, and pop scalars off
637the front of it with C<dequeue()>. A queue has no fixed size, and can grow
638as needed to hold everything pushed on to it.
639
640If a queue is empty, C<dequeue()> blocks until another thread enqueues
641something. This makes queues ideal for event loops and other
642communications between threads.
643
644=head2 Semaphores: Synchronizing Data Access
645
646Semaphores are a kind of generic locking mechanism. In their most basic
647form, they behave very much like lockable scalars, except that they
648can't hold data, and that they must be explicitly unlocked. In their
649advanced form, they act like a kind of counter, and can allow multiple
650threads to have the I<lock> at any one time.
651
652=head2 Basic semaphores
653
654Semaphores have two methods, C<down()> and C<up()>: C<down()> decrements the resource
655count, while C<up()> increments it. Calls to C<down()> will block if the
656semaphore's current count would decrement below zero. This program
657gives a quick demonstration:
658
659 use threads;
660 use Thread::Semaphore;
661
662 my $semaphore = Thread::Semaphore->new();
663 my $GlobalVariable :shared = 0;
664
665 $thr1 = threads->create(\&sample_sub, 1);
666 $thr2 = threads->create(\&sample_sub, 2);
667 $thr3 = threads->create(\&sample_sub, 3);
668
669 sub sample_sub {
670 my $SubNumber = shift(@_);
671 my $TryCount = 10;
672 my $LocalCopy;
673 sleep(1);
674 while ($TryCount--) {
675 $semaphore->down();
676 $LocalCopy = $GlobalVariable;
677 print("$TryCount tries left for sub $SubNumber (\$GlobalVariable is $GlobalVariable)\n");
678 sleep(2);
679 $LocalCopy++;
680 $GlobalVariable = $LocalCopy;
681 $semaphore->up();
682 }
683 }
684
685 $thr1->join();
686 $thr2->join();
687 $thr3->join();
688
689The three invocations of the subroutine all operate in sync. The
690semaphore, though, makes sure that only one thread is accessing the
691global variable at once.
692
693=head2 Advanced Semaphores
694
695By default, semaphores behave like locks, letting only one thread
696C<down()> them at a time. However, there are other uses for semaphores.
697
698Each semaphore has a counter attached to it. By default, semaphores are
699created with the counter set to one, C<down()> decrements the counter by
700one, and C<up()> increments by one. However, we can override any or all
701of these defaults simply by passing in different values:
702
703 use threads;
704 use Thread::Semaphore;
705
706 my $semaphore = Thread::Semaphore->new(5);
707 # Creates a semaphore with the counter set to five
708
709 my $thr1 = threads->create(\&sub1);
710 my $thr2 = threads->create(\&sub1);
711
712 sub sub1 {
713 $semaphore->down(5); # Decrements the counter by five
714 # Do stuff here
715 $semaphore->up(5); # Increment the counter by five
716 }
717
718 $thr1->detach();
719 $thr2->detach();
720
721If C<down()> attempts to decrement the counter below zero, it blocks until
722the counter is large enough. Note that while a semaphore can be created
723with a starting count of zero, any C<up()> or C<down()> always changes the
724counter by at least one, and so C<< $semaphore->down(0) >> is the same as
725C<< $semaphore->down(1) >>.
726
727The question, of course, is why would you do something like this? Why
728create a semaphore with a starting count that's not one, or why
729decrement or increment it by more than one? The answer is resource
730availability. Many resources that you want to manage access for can be
731safely used by more than one thread at once.
732
733For example, let's take a GUI driven program. It has a semaphore that
734it uses to synchronize access to the display, so only one thread is
735ever drawing at once. Handy, but of course you don't want any thread
736to start drawing until things are properly set up. In this case, you
737can create a semaphore with a counter set to zero, and up it when
738things are ready for drawing.
739
740Semaphores with counters greater than one are also useful for
741establishing quotas. Say, for example, that you have a number of
742threads that can do I/O at once. You don't want all the threads
743reading or writing at once though, since that can potentially swamp
744your I/O channels, or deplete your process's quota of filehandles. You
745can use a semaphore initialized to the number of concurrent I/O
746requests (or open files) that you want at any one time, and have your
747threads quietly block and unblock themselves.
748
749Larger increments or decrements are handy in those cases where a
750thread needs to check out or return a number of resources at once.
751
752=head2 Waiting for a Condition
753
754The functions C<cond_wait()> and C<cond_signal()>
755can be used in conjunction with locks to notify
756co-operating threads that a resource has become available. They are
757very similar in use to the functions found in C<pthreads>. However
758for most purposes, queues are simpler to use and more intuitive. See
759L<threads::shared> for more details.
760
761=head2 Giving up control
762
763There are times when you may find it useful to have a thread
764explicitly give up the CPU to another thread. You may be doing something
765processor-intensive and want to make sure that the user-interface thread
766gets called frequently. Regardless, there are times that you might want
767a thread to give up the processor.
768
769Perl's threading package provides the C<yield()> function that does
770this. C<yield()> is pretty straightforward, and works like this:
771
772 use threads;
773
774 sub loop {
775 my $thread = shift;
776 my $foo = 50;
777 while($foo--) { print("In thread $thread\n"); }
778 threads->yield();
779 $foo = 50;
780 while($foo--) { print("In thread $thread\n"); }
781 }
782
783 my $thr1 = threads->create(\&loop, 'first');
784 my $thr2 = threads->create(\&loop, 'second');
785 my $thr3 = threads->create(\&loop, 'third');
786
787It is important to remember that C<yield()> is only a hint to give up the CPU,
788it depends on your hardware, OS and threading libraries what actually happens.
789B<On many operating systems, yield() is a no-op.> Therefore it is important
790to note that one should not build the scheduling of the threads around
791C<yield()> calls. It might work on your platform but it won't work on another
792platform.
793
794=head1 General Thread Utility Routines
795
796We've covered the workhorse parts of Perl's threading package, and
797with these tools you should be well on your way to writing threaded
798code and packages. There are a few useful little pieces that didn't
799really fit in anyplace else.
800
801=head2 What Thread Am I In?
802
803The C<threads-E<gt>self()> class method provides your program with a way to
804get an object representing the thread it's currently in. You can use this
805object in the same way as the ones returned from thread creation.
806
807=head2 Thread IDs
808
809C<tid()> is a thread object method that returns the thread ID of the
810thread the object represents. Thread IDs are integers, with the main
811thread in a program being 0. Currently Perl assigns a unique TID to
812every thread ever created in your program, assigning the first thread
813to be created a TID of 1, and increasing the TID by 1 for each new
814thread that's created. When used as a class method, C<threads-E<gt>tid()>
815can be used by a thread to get its own TID.
816
817=head2 Are These Threads The Same?
818
819The C<equal()> method takes two thread objects and returns true
820if the objects represent the same thread, and false if they don't.
821
822Thread objects also have an overloaded C<==> comparison so that you can do
823comparison on them as you would with normal objects.
824
825=head2 What Threads Are Running?
826
827C<threads-E<gt>list()> returns a list of thread objects, one for each thread
828that's currently running and not detached. Handy for a number of things,
829including cleaning up at the end of your program (from the main Perl thread,
830of course):
831
832 # Loop through all the threads
833 foreach my $thr (threads->list()) {
834 $thr->join();
835 }
836
837If some threads have not finished running when the main Perl thread
838ends, Perl will warn you about it and die, since it is impossible for Perl
839to clean up itself while other threads are running.
840
841NOTE: The main Perl thread (thread 0) is in a I<detached> state, and so
842does not appear in the list returned by C<threads-E<gt>list()>.
843
844=head1 A Complete Example
845
846Confused yet? It's time for an example program to show some of the
847things we've covered. This program finds prime numbers using threads.
848
849 1 #!/usr/bin/perl
850 2 # prime-pthread, courtesy of Tom Christiansen
851 3
852 4 use strict;
853 5 use warnings;
854 6
855 7 use threads;
856 8 use Thread::Queue;
857 9
858 10 sub check_num {
859 11 my ($upstream, $cur_prime) = @_;
860 12 my $kid;
861 13 my $downstream = Thread::Queue->new();
862 14 while (my $num = $upstream->dequeue()) {
863 15 next unless ($num % $cur_prime);
864 16 if ($kid) {
865 17 $downstream->enqueue($num);
866 18 } else {
867 19 print("Found prime: $num\n");
868 20 $kid = threads->create(\&check_num, $downstream, $num);
869 21 if (! $kid) {
870 22 warn("Sorry. Ran out of threads.\n");
871 23 last;
872 24 }
873 25 }
874 26 }
875 27 if ($kid) {
876 28 $downstream->enqueue(undef);
877 29 $kid->join();
878 30 }
879 31 }
880 32
881 33 my $stream = Thread::Queue->new(3..1000, undef);
882 34 check_num($stream, 2);
883
884This program uses the pipeline model to generate prime numbers. Each
885thread in the pipeline has an input queue that feeds numbers to be
886checked, a prime number that it's responsible for, and an output queue
887into which it funnels numbers that have failed the check. If the thread
888has a number that's failed its check and there's no child thread, then
889the thread must have found a new prime number. In that case, a new
890child thread is created for that prime and stuck on the end of the
891pipeline.
892
893This probably sounds a bit more confusing than it really is, so let's
894go through this program piece by piece and see what it does. (For
895those of you who might be trying to remember exactly what a prime
896number is, it's a number that's only evenly divisible by itself and 1.)
897
898The bulk of the work is done by the C<check_num()> subroutine, which
899takes a reference to its input queue and a prime number that it's
900responsible for. After pulling in the input queue and the prime that
901the subroutine is checking (line 11), we create a new queue (line 13)
902and reserve a scalar for the thread that we're likely to create later
903(line 12).
904
905The while loop from line 14 to line 26 grabs a scalar off the input
906queue and checks against the prime this thread is responsible
907for. Line 15 checks to see if there's a remainder when we divide the
908number to be checked by our prime. If there is one, the number
909must not be evenly divisible by our prime, so we need to either pass
910it on to the next thread if we've created one (line 17) or create a
911new thread if we haven't.
912
913The new thread creation is line 20. We pass on to it a reference to
914the queue we've created, and the prime number we've found. In lines 21
915through 24, we check to make sure that our new thread got created, and
916if not, we stop checking any remaining numbers in the queue.
917
918Finally, once the loop terminates (because we got a 0 or C<undef> in the
919queue, which serves as a note to terminate), we pass on the notice to our
920child, and wait for it to exit if we've created a child (lines 27 and
92130).
922
923Meanwhile, back in the main thread, we first create a queue (line 33) and
924queue up all the numbers from 3 to 1000 for checking, plus a termination
925notice. Then all we have to do to get the ball rolling is pass the queue
926and the first prime to the C<check_num()> subroutine (line 34).
927
928That's how it works. It's pretty simple; as with many Perl programs,
929the explanation is much longer than the program.
930
931=head1 Different implementations of threads
932
933Some background on thread implementations from the operating system
934viewpoint. There are three basic categories of threads: user-mode threads,
935kernel threads, and multiprocessor kernel threads.
936
937User-mode threads are threads that live entirely within a program and
938its libraries. In this model, the OS knows nothing about threads. As
939far as it's concerned, your process is just a process.
940
941This is the easiest way to implement threads, and the way most OSes
942start. The big disadvantage is that, since the OS knows nothing about
943threads, if one thread blocks they all do. Typical blocking activities
944include most system calls, most I/O, and things like C<sleep()>.
945
946Kernel threads are the next step in thread evolution. The OS knows
947about kernel threads, and makes allowances for them. The main
948difference between a kernel thread and a user-mode thread is
949blocking. With kernel threads, things that block a single thread don't
950block other threads. This is not the case with user-mode threads,
951where the kernel blocks at the process level and not the thread level.
952
953This is a big step forward, and can give a threaded program quite a
954performance boost over non-threaded programs. Threads that block
955performing I/O, for example, won't block threads that are doing other
956things. Each process still has only one thread running at once,
957though, regardless of how many CPUs a system might have.
958
959Since kernel threading can interrupt a thread at any time, they will
960uncover some of the implicit locking assumptions you may make in your
961program. For example, something as simple as C<$a = $a + 2> can behave
962unpredictably with kernel threads if C<$a> is visible to other
963threads, as another thread may have changed C<$a> between the time it
964was fetched on the right hand side and the time the new value is
965stored.
966
967Multiprocessor kernel threads are the final step in thread
968support. With multiprocessor kernel threads on a machine with multiple
969CPUs, the OS may schedule two or more threads to run simultaneously on
970different CPUs.
971
972This can give a serious performance boost to your threaded program,
973since more than one thread will be executing at the same time. As a
974tradeoff, though, any of those nagging synchronization issues that
975might not have shown with basic kernel threads will appear with a
976vengeance.
977
978In addition to the different levels of OS involvement in threads,
979different OSes (and different thread implementations for a particular
980OS) allocate CPU cycles to threads in different ways.
981
982Cooperative multitasking systems have running threads give up control
983if one of two things happen. If a thread calls a yield function, it
984gives up control. It also gives up control if the thread does
985something that would cause it to block, such as perform I/O. In a
986cooperative multitasking implementation, one thread can starve all the
987others for CPU time if it so chooses.
988
989Preemptive multitasking systems interrupt threads at regular intervals
990while the system decides which thread should run next. In a preemptive
991multitasking system, one thread usually won't monopolize the CPU.
992
993On some systems, there can be cooperative and preemptive threads
994running simultaneously. (Threads running with realtime priorities
995often behave cooperatively, for example, while threads running at
996normal priorities behave preemptively.)
997
998Most modern operating systems support preemptive multitasking nowadays.
999
1000=head1 Performance considerations
1001
1002The main thing to bear in mind when comparing Perl's I<ithreads> to other threading
1003models is the fact that for each new thread created, a complete copy of
1004all the variables and data of the parent thread has to be taken. Thus,
1005thread creation can be quite expensive, both in terms of memory usage and
1006time spent in creation. The ideal way to reduce these costs is to have a
1007relatively short number of long-lived threads, all created fairly early
1008on (before the base thread has accumulated too much data). Of course, this
1009may not always be possible, so compromises have to be made. However, after
1010a thread has been created, its performance and extra memory usage should
1011be little different than ordinary code.
1012
1013Also note that under the current implementation, shared variables
1014use a little more memory and are a little slower than ordinary variables.
1015
1016=head1 Process-scope Changes
1017
1018Note that while threads themselves are separate execution threads and
1019Perl data is thread-private unless explicitly shared, the threads can
1020affect process-scope state, affecting all the threads.
1021
1022The most common example of this is changing the current working
1023directory using C<chdir()>. One thread calls C<chdir()>, and the working
1024directory of all the threads changes.
1025
1026Even more drastic example of a process-scope change is C<chroot()>:
1027the root directory of all the threads changes, and no thread can
1028undo it (as opposed to C<chdir()>).
1029
1030Further examples of process-scope changes include C<umask()> and
1031changing uids and gids.
1032
1033Thinking of mixing C<fork()> and threads? Please lie down and wait
1034until the feeling passes. Be aware that the semantics of C<fork()> vary
1035between platforms. For example, some Unix systems copy all the current
1036threads into the child process, while others only copy the thread that
1037called C<fork()>. You have been warned!
1038
1039Similarly, mixing signals and threads may be problematic.
1040Implementations are platform-dependent, and even the POSIX
1041semantics may not be what you expect (and Perl doesn't even
1042give you the full POSIX API). For example, there is no way to
1043guarantee that a signal sent to a multi-threaded Perl application
1044will get intercepted by any particular thread. (However, a recently
1045added feature does provide the capability to send signals between
1046threads. See L<threads/THREAD SIGNALLING> for more details.)
1047
1048=head1 Thread-Safety of System Libraries
1049
1050Whether various library calls are thread-safe is outside the control
1051of Perl. Calls often suffering from not being thread-safe include:
1052C<localtime()>, C<gmtime()>, functions fetching user, group and
1053network information (such as C<getgrent()>, C<gethostent()>,
1054C<getnetent()> and so on), C<readdir()>, C<rand()>, and C<srand()>. In
1055general, calls that depend on some global external state.
1056
1057If the system Perl is compiled in has thread-safe variants of such
1058calls, they will be used. Beyond that, Perl is at the mercy of
1059the thread-safety or -unsafety of the calls. Please consult your
1060C library call documentation.
1061
1062On some platforms the thread-safe library interfaces may fail if the
1063result buffer is too small (for example the user group databases may
1064be rather large, and the reentrant interfaces may have to carry around
1065a full snapshot of those databases). Perl will start with a small
1066buffer, but keep retrying and growing the result buffer
1067until the result fits. If this limitless growing sounds bad for
1068security or memory consumption reasons you can recompile Perl with
1069C<PERL_REENTRANT_MAXSIZE> defined to the maximum number of bytes you will
1070allow.
1071
1072=head1 Conclusion
1073
1074A complete thread tutorial could fill a book (and has, many times),
1075but with what we've covered in this introduction, you should be well
1076on your way to becoming a threaded Perl expert.
1077
1078=head1 SEE ALSO
1079
1080Annotated POD for L<threads>:
1081L<http://annocpan.org/?mode=search&field=Module&name=threads>
1082
1083Latest version of L<threads> on CPAN:
1084L<http://search.cpan.org/search?module=threads>
1085
1086Annotated POD for L<threads::shared>:
1087L<http://annocpan.org/?mode=search&field=Module&name=threads%3A%3Ashared>
1088
1089Latest version of L<threads::shared> on CPAN:
1090L<http://search.cpan.org/search?module=threads%3A%3Ashared>
1091
1092Perl threads mailing list:
1093L<http://lists.cpan.org/showlist.cgi?name=iThreads>
1094
1095=head1 Bibliography
1096
1097Here's a short bibliography courtesy of Jürgen Christoffel:
1098
1099=head2 Introductory Texts
1100
1101Birrell, Andrew D. An Introduction to Programming with
1102Threads. Digital Equipment Corporation, 1989, DEC-SRC Research Report
1103#35 online as
1104ftp://ftp.dec.com/pub/DEC/SRC/research-reports/SRC-035.pdf
1105(highly recommended)
1106
1107Robbins, Kay. A., and Steven Robbins. Practical Unix Programming: A
1108Guide to Concurrency, Communication, and
1109Multithreading. Prentice-Hall, 1996.
1110
1111Lewis, Bill, and Daniel J. Berg. Multithreaded Programming with
1112Pthreads. Prentice Hall, 1997, ISBN 0-13-443698-9 (a well-written
1113introduction to threads).
1114
1115Nelson, Greg (editor). Systems Programming with Modula-3. Prentice
1116Hall, 1991, ISBN 0-13-590464-1.
1117
1118Nichols, Bradford, Dick Buttlar, and Jacqueline Proulx Farrell.
1119Pthreads Programming. O'Reilly & Associates, 1996, ISBN 156592-115-1
1120(covers POSIX threads).
1121
1122=head2 OS-Related References
1123
1124Boykin, Joseph, David Kirschen, Alan Langerman, and Susan
1125LoVerso. Programming under Mach. Addison-Wesley, 1994, ISBN
11260-201-52739-1.
1127
1128Tanenbaum, Andrew S. Distributed Operating Systems. Prentice Hall,
11291995, ISBN 0-13-219908-4 (great textbook).
1130
1131Silberschatz, Abraham, and Peter B. Galvin. Operating System Concepts,
11324th ed. Addison-Wesley, 1995, ISBN 0-201-59292-4
1133
1134=head2 Other References
1135
1136Arnold, Ken and James Gosling. The Java Programming Language, 2nd
1137ed. Addison-Wesley, 1998, ISBN 0-201-31006-6.
1138
1139comp.programming.threads FAQ,
1140L<http://www.serpentine.com/~bos/threads-faq/>
1141
1142Le Sergent, T. and B. Berthomieu. "Incremental MultiThreaded Garbage
1143Collection on Virtually Shared Memory Architectures" in Memory
1144Management: Proc. of the International Workshop IWMM 92, St. Malo,
1145France, September 1992, Yves Bekkers and Jacques Cohen, eds. Springer,
11461992, ISBN 3540-55940-X (real-life thread applications).
1147
1148Artur Bergman, "Where Wizards Fear To Tread", June 11, 2002,
1149L<http://www.perl.com/pub/a/2002/06/11/threads.html>
1150
1151=head1 Acknowledgements
1152
1153Thanks (in no particular order) to Chaim Frenkel, Steve Fink, Gurusamy
1154Sarathy, Ilya Zakharevich, Benjamin Sugars, Jürgen Christoffel, Joshua
1155Pritikin, and Alan Burlison, for their help in reality-checking and
1156polishing this article. Big thanks to Tom Christiansen for his rewrite
1157of the prime number generator.
1158
1159=head1 AUTHOR
1160
1161Dan Sugalski E<lt>dan@sidhe.org<gt>
1162
1163Slightly modified by Arthur Bergman to fit the new thread model/module.
1164
1165Reworked slightly by Jörg Walter E<lt>jwalt@cpan.org<gt> to be more concise
1166about thread-safety of Perl code.
1167
1168Rearranged slightly by Elizabeth Mattijsen E<lt>liz@dijkmat.nl<gt> to put
1169less emphasis on yield().
1170
1171=head1 Copyrights
1172
1173The original version of this article originally appeared in The Perl
1174Journal #10, and is copyright 1998 The Perl Journal. It appears courtesy
1175of Jon Orwant and The Perl Journal. This document may be distributed
1176under the same terms as Perl itself.
1177
1178=cut