=head1 NAME
perlipc - Perl interprocess communication (signals, fifos, pipes, safe subprocesses, sockets, and semaphores)
=head1 DESCRIPTION
The basic IPC facilities of Perl are built out of the good old Unix
signals, named pipes, pipe opens, the Berkeley socket routines, and SysV
IPC calls. Each is used in slightly different situations.
=head1 Signals
Perl uses a simple signal handling model: the %SIG hash contains names
or references of user-installed signal handlers. These handlers will
be called with an argument which is the name of the signal that
triggered it. A signal may be generated intentionally from a
particular keyboard sequence like control-C or control-Z, sent to you
from another process, or triggered automatically by the kernel when
special events transpire, like a child process exiting, your process
running out of stack space, or hitting file size limit.
For example, to trap an interrupt signal, set up a handler like this:
sub catch_zap {
my $signame = shift;
$shucks++;
die "Somebody sent me a SIG$signame";
}
$SIG{INT} = 'catch_zap'; # could fail in modules
$SIG{INT} = \&catch_zap; # best strategy
Prior to Perl 5.7.3 it was necessary to do as little as you possibly
could in your handler; notice how all we do is set a global variable
and then raise an exception. That's because on most systems,
libraries are not re-entrant; particularly, memory allocation and I/O
routines are not. That meant that doing nearly I in your
handler could in theory trigger a memory fault and subsequent core
dump - see L below.
The names of the signals are the ones listed out by C on your
system, or you can retrieve them from the Config module. Set up an
@signame list indexed by number to get the name and a %signo table
indexed by name to get the number:
use Config;
defined $Config{sig_name} || die "No sigs?";
foreach $name (split(' ', $Config{sig_name})) {
$signo{$name} = $i;
$signame[$i] = $name;
$i++;
}
So to check whether signal 17 and SIGALRM were the same, do just this:
print "signal #17 = $signame[17]\n";
if ($signo{ALRM}) {
print "SIGALRM is $signo{ALRM}\n";
}
You may also choose to assign the strings C<'IGNORE'> or C<'DEFAULT'> as
the handler, in which case Perl will try to discard the signal or do the
default thing.
On most Unix platforms, the C (sometimes also known as C) signal
has special behavior with respect to a value of C<'IGNORE'>.
Setting C<$SIG{CHLD}> to C<'IGNORE'> on such a platform has the effect of
not creating zombie processes when the parent process fails to C
on its child processes (i.e. child processes are automatically reaped).
Calling C with C<$SIG{CHLD}> set to C<'IGNORE'> usually returns
C<-1> on such platforms.
Some signals can be neither trapped nor ignored, such as
the KILL and STOP (but not the TSTP) signals. One strategy for
temporarily ignoring signals is to use a local() statement, which will be
automatically restored once your block is exited. (Remember that local()
values are "inherited" by functions called from within that block.)
sub precious {
local $SIG{INT} = 'IGNORE';
&more_functions;
}
sub more_functions {
# interrupts still ignored, for now...
}
Sending a signal to a negative process ID means that you send the signal
to the entire Unix process-group. This code sends a hang-up signal to all
processes in the current process group (and sets $SIG{HUP} to IGNORE so
it doesn't kill itself):
{
local $SIG{HUP} = 'IGNORE';
kill HUP => -$$;
# snazzy writing of: kill('HUP', -$$)
}
Another interesting signal to send is signal number zero. This doesn't
actually affect a child process, but instead checks whether it's alive
or has changed its UID.
unless (kill 0 => $kid_pid) {
warn "something wicked happened to $kid_pid";
}
When directed at a process whose UID is not identical to that
of the sending process, signal number zero may fail because
you lack permission to send the signal, even though the process is alive.
You may be able to determine the cause of failure using C<%!>.
unless (kill 0 => $pid or $!{EPERM}) {
warn "$pid looks dead";
}
You might also want to employ anonymous functions for simple signal
handlers:
$SIG{INT} = sub { die "\nOutta here!\n" };
But that will be problematic for the more complicated handlers that need
to reinstall themselves. Because Perl's signal mechanism is currently
based on the signal(3) function from the C library, you may sometimes be so
unfortunate as to run on systems where that function is "broken", that
is, it behaves in the old unreliable SysV way rather than the newer, more
reasonable BSD and POSIX fashion. So you'll see defensive people writing
signal handlers like this:
sub REAPER {
$waitedpid = wait;
# loathe SysV: it makes us not only reinstate
# the handler, but place it after the wait
$SIG{CHLD} = \&REAPER;
}
$SIG{CHLD} = \&REAPER;
# now do something that forks...
or better still:
use POSIX ":sys_wait_h";
sub REAPER {
my $child;
# If a second child dies while in the signal handler caused by the
# first death, we won't get another signal. So must loop here else
# we will leave the unreaped child as a zombie. And the next time
# two children die we get another zombie. And so on.
while (($child = waitpid(-1,WNOHANG)) > 0) {
$Kid_Status{$child} = $?;
}
$SIG{CHLD} = \&REAPER; # still loathe SysV
}
$SIG{CHLD} = \&REAPER;
# do something that forks...
Signal handling is also used for timeouts in Unix, While safely
protected within an C block, you set a signal handler to trap
alarm signals and then schedule to have one delivered to you in some
number of seconds. Then try your blocking operation, clearing the alarm
when it's done but not before you've exited your C block. If it
goes off, you'll use die() to jump out of the block, much as you might
using longjmp() or throw() in other languages.
Here's an example:
eval {
local $SIG{ALRM} = sub { die "alarm clock restart" };
alarm 10;
flock(FH, 2); # blocking write lock
alarm 0;
};
if ($@ and $@ !~ /alarm clock restart/) { die }
If the operation being timed out is system() or qx(), this technique
is liable to generate zombies. If this matters to you, you'll
need to do your own fork() and exec(), and kill the errant child process.
For more complex signal handling, you might see the standard POSIX
module. Lamentably, this is almost entirely undocumented, but
the F file from the Perl source distribution has some
examples in it.
=head2 Handling the SIGHUP Signal in Daemons
A process that usually starts when the system boots and shuts down
when the system is shut down is called a daemon (Disk And Execution
MONitor). If a daemon process has a configuration file which is
modified after the process has been started, there should be a way to
tell that process to re-read its configuration file, without stopping
the process. Many daemons provide this mechanism using the C
signal handler. When you want to tell the daemon to re-read the file
you simply send it the C signal.
Not all platforms automatically reinstall their (native) signal
handlers after a signal delivery. This means that the handler works
only the first time the signal is sent. The solution to this problem
is to use C signal handlers if available, their behaviour
is well-defined.
The following example implements a simple daemon, which restarts
itself every time the C signal is received. The actual code is
located in the subroutine C, which simply prints some debug
info to show that it works and should be replaced with the real code.
#!/usr/bin/perl -w
use POSIX ();
use FindBin ();
use File::Basename ();
use File::Spec::Functions;
$|=1;
# make the daemon cross-platform, so exec always calls the script
# itself with the right path, no matter how the script was invoked.
my $script = File::Basename::basename($0);
my $SELF = catfile $FindBin::Bin, $script;
# POSIX unmasks the sigprocmask properly
my $sigset = POSIX::SigSet->new();
my $action = POSIX::SigAction->new('sigHUP_handler',
$sigset,
&POSIX::SA_NODEFER);
POSIX::sigaction(&POSIX::SIGHUP, $action);
sub sigHUP_handler {
print "got SIGHUP\n";
exec($SELF, @ARGV) or die "Couldn't restart: $!\n";
}
code();
sub code {
print "PID: $$\n";
print "ARGV: @ARGV\n";
my $c = 0;
while (++$c) {
sleep 2;
print "$c\n";
}
}
__END__
=head1 Named Pipes
A named pipe (often referred to as a FIFO) is an old Unix IPC
mechanism for processes communicating on the same machine. It works
just like a regular, connected anonymous pipes, except that the
processes rendezvous using a filename and don't have to be related.
To create a named pipe, use the C function.
use POSIX qw(mkfifo);
mkfifo($path, 0700) or die "mkfifo $path failed: $!";
You can also use the Unix command mknod(1) or on some
systems, mkfifo(1). These may not be in your normal path.
# system return val is backwards, so && not ||
#
$ENV{PATH} .= ":/etc:/usr/etc";
if ( system('mknod', $path, 'p')
&& system('mkfifo', $path) )
{
die "mk{nod,fifo} $path failed";
}
A fifo is convenient when you want to connect a process to an unrelated
one. When you open a fifo, the program will block until there's something
on the other end.
For example, let's say you'd like to have your F<.signature> file be a
named pipe that has a Perl program on the other end. Now every time any
program (like a mailer, news reader, finger program, etc.) tries to read
from that file, the reading program will block and your program will
supply the new signature. We'll use the pipe-checking file test B<-p>
to find out whether anyone (or anything) has accidentally removed our fifo.
chdir; # go home
$FIFO = '.signature';
while (1) {
unless (-p $FIFO) {
unlink $FIFO;
require POSIX;
POSIX::mkfifo($FIFO, 0700)
or die "can't mkfifo $FIFO: $!";
}
# next line blocks until there's a reader
open (FIFO, "> $FIFO") || die "can't write $FIFO: $!";
print FIFO "John Smith (smith\@host.org)\n", `fortune -s`;
close FIFO;
sleep 2; # to avoid dup signals
}
=head2 Deferred Signals (Safe Signals)
In Perls before Perl 5.7.3 by installing Perl code to deal with
signals, you were exposing yourself to danger from two things. First,
few system library functions are re-entrant. If the signal interrupts
while Perl is executing one function (like malloc(3) or printf(3)),
and your signal handler then calls the same function again, you could
get unpredictable behavior--often, a core dump. Second, Perl isn't
itself re-entrant at the lowest levels. If the signal interrupts Perl
while Perl is changing its own internal data structures, similarly
unpredictable behaviour may result.
There were two things you could do, knowing this: be paranoid or be
pragmatic. The paranoid approach was to do as little as possible in your
signal handler. Set an existing integer variable that already has a
value, and return. This doesn't help you if you're in a slow system call,
which will just restart. That means you have to C to longjmp(3) out
of the handler. Even this is a little cavalier for the true paranoiac,
who avoids C in a handler because the system I out to get you.
The pragmatic approach was to say "I know the risks, but prefer the
convenience", and to do anything you wanted in your signal handler,
and be prepared to clean up core dumps now and again.
In Perl 5.7.3 and later to avoid these problems signals are
"deferred"-- that is when the signal is delivered to the process by
the system (to the C code that implements Perl) a flag is set, and the
handler returns immediately. Then at strategic "safe" points in the
Perl interpreter (e.g. when it is about to execute a new opcode) the
flags are checked and the Perl level handler from %SIG is
executed. The "deferred" scheme allows much more flexibility in the
coding of signal handler as we know Perl interpreter is in a safe
state, and that we are not in a system library function when the
handler is called. However the implementation does differ from
previous Perls in the following ways:
=over 4
=item Long-running opcodes
As the Perl interpreter only looks at the signal flags when it is about
to execute a new opcode, a signal that arrives during a long-running
opcode (e.g. a regular expression operation on a very large string) will
not be seen until the current opcode completes.
N.B. If a signal of any given type fires multiple times during an opcode
(such as from a fine-grained timer), the handler for that signal will
only be called once after the opcode completes, and all the other
instances will be discarded. Furthermore, if your system's signal queue
gets flooded to the point that there are signals that have been raised
but not yet caught (and thus not deferred) at the time an opcode
completes, those signals may well be caught and deferred during
subsequent opcodes, with sometimes surprising results. For example, you
may see alarms delivered even after calling C as the latter
stops the raising of alarms but does not cancel the delivery of alarms
raised but not yet caught. Do not depend on the behaviors described in
this paragraph as they are side effects of the current implementation and
may change in future versions of Perl.
=item Interrupting IO
When a signal is delivered (e.g. INT control-C) the operating system
breaks into IO operations like C (used to implement Perls
EE operator). On older Perls the handler was called
immediately (and as C is not "unsafe" this worked well). With
the "deferred" scheme the handler is not called immediately, and if
Perl is using system's C library that library may re-start the
C without returning to Perl and giving it a chance to call the
%SIG handler. If this happens on your system the solution is to use
C<:perlio> layer to do IO - at least on those handles which you want
to be able to break into with signals. (The C<:perlio> layer checks
the signal flags and calls %SIG handlers before resuming IO operation.)
Note that the default in Perl 5.7.3 and later is to automatically use
the C<:perlio> layer.
Note that some networking library functions like gethostbyname() are
known to have their own implementations of timeouts which may conflict
with your timeouts. If you are having problems with such functions,
you can try using the POSIX sigaction() function, which bypasses the
Perl safe signals (note that this means subjecting yourself to
possible memory corruption, as described above). Instead of setting
C<$SIG{ALRM}>:
local $SIG{ALRM} = sub { die "alarm" };
try something like the following:
use POSIX qw(SIGALRM);
POSIX::sigaction(SIGALRM,
POSIX::SigAction->new(sub { die "alarm" }))
or die "Error setting SIGALRM handler: $!\n";
Another way to disable the safe signal behavior locally is to use
the C module from CPAN (which will affect
all signals).
=item Restartable system calls
On systems that supported it, older versions of Perl used the
SA_RESTART flag when installing %SIG handlers. This meant that
restartable system calls would continue rather than returning when
a signal arrived. In order to deliver deferred signals promptly,
Perl 5.7.3 and later do I use SA_RESTART. Consequently,
restartable system calls can fail (with $! set to C) in places
where they previously would have succeeded.
Note that the default C<:perlio> layer will retry C, C
and C as described above and that interrupted C and
C calls will always be retried.
=item Signals as "faults"
Certain signals, e.g. SEGV, ILL, and BUS, are generated as a result of
virtual memory or other "faults". These are normally fatal and there is
little a Perl-level handler can do with them, so Perl now delivers them
immediately rather than attempting to defer them.
=item Signals triggered by operating system state
On some operating systems certain signal handlers are supposed to "do
something" before returning. One example can be CHLD or CLD which
indicates a child process has completed. On some operating systems the
signal handler is expected to C for the completed child
process. On such systems the deferred signal scheme will not work for
those signals (it does not do the C). Again the failure will
look like a loop as the operating system will re-issue the signal as
there are un-waited-for completed child processes.
=back
If you want the old signal behaviour back regardless of possible
memory corruption, set the environment variable C to
C<"unsafe"> (a new feature since Perl 5.8.1).
=head1 Using open() for IPC
Perl's basic open() statement can also be used for unidirectional
interprocess communication by either appending or prepending a pipe
symbol to the second argument to open(). Here's how to start
something up in a child process you intend to write to:
open(SPOOLER, "| cat -v | lpr -h 2>/dev/null")
|| die "can't fork: $!";
local $SIG{PIPE} = sub { die "spooler pipe broke" };
print SPOOLER "stuff\n";
close SPOOLER || die "bad spool: $! $?";
And here's how to start up a child process you intend to read from:
open(STATUS, "netstat -an 2>&1 |")
|| die "can't fork: $!";
while () {
next if /^(tcp|udp)/;
print;
}
close STATUS || die "bad netstat: $! $?";
If one can be sure that a particular program is a Perl script that is
expecting filenames in @ARGV, the clever programmer can write something
like this:
% program f1 "cmd1|" - f2 "cmd2|" f3 < tmpfile
and irrespective of which shell it's called from, the Perl program will
read from the file F, the process F, standard input (F
in this case), the F file, the F command, and finally the F
file. Pretty nifty, eh?
You might notice that you could use backticks for much the
same effect as opening a pipe for reading:
print grep { !/^(tcp|udp)/ } `netstat -an 2>&1`;
die "bad netstat" if $?;
While this is true on the surface, it's much more efficient to process the
file one line or record at a time because then you don't have to read the
whole thing into memory at once. It also gives you finer control of the
whole process, letting you to kill off the child process early if you'd
like.
Be careful to check both the open() and the close() return values. If
you're I to a pipe, you should also trap SIGPIPE. Otherwise,
think of what happens when you start up a pipe to a command that doesn't
exist: the open() will in all likelihood succeed (it only reflects the
fork()'s success), but then your output will fail--spectacularly. Perl
can't know whether the command worked because your command is actually
running in a separate process whose exec() might have failed. Therefore,
while readers of bogus commands return just a quick end of file, writers
to bogus command will trigger a signal they'd better be prepared to
handle. Consider:
open(FH, "|bogus") or die "can't fork: $!";
print FH "bang\n" or die "can't write: $!";
close FH or die "can't close: $!";
That won't blow up until the close, and it will blow up with a SIGPIPE.
To catch it, you could use this:
$SIG{PIPE} = 'IGNORE';
open(FH, "|bogus") or die "can't fork: $!";
print FH "bang\n" or die "can't write: $!";
close FH or die "can't close: status=$?";
=head2 Filehandles
Both the main process and any child processes it forks share the same
STDIN, STDOUT, and STDERR filehandles. If both processes try to access
them at once, strange things can happen. You may also want to close
or reopen the filehandles for the child. You can get around this by
opening your pipe with open(), but on some systems this means that the
child process cannot outlive the parent.
=head2 Background Processes
You can run a command in the background with:
system("cmd &");
The command's STDOUT and STDERR (and possibly STDIN, depending on your
shell) will be the same as the parent's. You won't need to catch
SIGCHLD because of the double-fork taking place (see below for more
details).
=head2 Complete Dissociation of Child from Parent
In some cases (starting server processes, for instance) you'll want to
completely dissociate the child process from the parent. This is
often called daemonization. A well behaved daemon will also chdir()
to the root directory (so it doesn't prevent unmounting the filesystem
containing the directory from which it was launched) and redirect its
standard file descriptors from and to F (so that random
output doesn't wind up on the user's terminal).
use POSIX 'setsid';
sub daemonize {
chdir '/' or die "Can't chdir to /: $!";
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null'
or die "Can't write to /dev/null: $!";
defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
die "Can't start a new session: $!" if setsid == -1;
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
}
The fork() has to come before the setsid() to ensure that you aren't a
process group leader (the setsid() will fail if you are). If your
system doesn't have the setsid() function, open F and use the
C ioctl() on it instead. See tty(4) for details.
Non-Unix users should check their Your_OS::Process module for other
solutions.
=head2 Safe Pipe Opens
Another interesting approach to IPC is making your single program go
multiprocess and communicate between (or even amongst) yourselves. The
open() function will accept a file argument of either C<"-|"> or C<"|-">
to do a very interesting thing: it forks a child connected to the
filehandle you've opened. The child is running the same program as the
parent. This is useful for safely opening a file when running under an
assumed UID or GID, for example. If you open a pipe I minus, you can
write to the filehandle you opened and your kid will find it in his
STDIN. If you open a pipe I minus, you can read from the filehandle
you opened whatever your kid writes to his STDOUT.
use English '-no_match_vars';
my $sleep_count = 0;
do {
$pid = open(KID_TO_WRITE, "|-");
unless (defined $pid) {
warn "cannot fork: $!";
die "bailing out" if $sleep_count++ > 6;
sleep 10;
}
} until defined $pid;
if ($pid) { # parent
print KID_TO_WRITE @some_data;
close(KID_TO_WRITE) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID); # suid progs only
open (FILE, "> /safe/file")
|| die "can't open /safe/file: $!";
while () {
print FILE; # child's STDIN is parent's KID_TO_WRITE
}
exit; # don't forget this
}
Another common use for this construct is when you need to execute
something without the shell's interference. With system(), it's
straightforward, but you can't use a pipe open or backticks safely.
That's because there's no way to stop the shell from getting its hands on
your arguments. Instead, use lower-level control to call exec() directly.
Here's a safe backtick or pipe open for read:
# add error processing as above
$pid = open(KID_TO_READ, "-|");
if ($pid) { # parent
while () {
# do something interesting
}
close(KID_TO_READ) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID); # suid only
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
And here's a safe pipe open for writing:
# add error processing as above
$pid = open(KID_TO_WRITE, "|-");
$SIG{PIPE} = sub { die "whoops, $program pipe broke" };
if ($pid) { # parent
for (@data) {
print KID_TO_WRITE;
}
close(KID_TO_WRITE) || warn "kid exited $?";
} else { # child
($EUID, $EGID) = ($UID, $GID);
exec($program, @options, @args)
|| die "can't exec program: $!";
# NOTREACHED
}
It is very easy to dead-lock a process using this form of open(), or
indeed any use of pipe() and multiple sub-processes. The above
example is 'safe' because it is simple and calls exec(). See
L"Avoiding Pipe Deadlocks"> for general safety principles, but there
are extra gotchas with Safe Pipe Opens.
In particular, if you opened the pipe using C, then you
cannot simply use close() in the parent process to close an unwanted
writer. Consider this code:
$pid = open WRITER, "|-";
defined $pid or die "fork failed; $!";
if ($pid) {
if (my $sub_pid = fork()) {
close WRITER;
# do something else...
}
else {
# write to WRITER...
exit;
}
}
else {
# do something with STDIN...
exit;
}
In the above, the true parent does not want to write to the WRITER
filehandle, so it closes it. However, because WRITER was opened using
C, it has a special behaviour: closing it will call
waitpid() (see L), which waits for the sub-process
to exit. If the child process ends up waiting for something happening
in the section marked "do something else", then you have a deadlock.
This can also be a problem with intermediate sub-processes in more
complicated code, which will call waitpid() on all open filehandles
during global destruction; in no predictable order.
To solve this, you must manually use pipe(), fork(), and the form of
open() which sets one file descriptor to another, as below:
pipe(READER, WRITER);
$pid = fork();
defined $pid or die "fork failed; $!";
if ($pid) {
close READER;
if (my $sub_pid = fork()) {
close WRITER;
}
else {
# write to WRITER...
exit;
}
# write to WRITER...
}
else {
open STDIN, "<&READER";
close WRITER;
# do something...
exit;
}
Since Perl 5.8.0, you can also use the list form of C for pipes :
the syntax
open KID_PS, "-|", "ps", "aux" or die $!;
forks the ps(1) command (without spawning a shell, as there are more than
three arguments to open()), and reads its standard output via the
C filehandle. The corresponding syntax to write to command
pipes (with C<"|-"> in place of C<"-|">) is also implemented.
Note that these operations are full Unix forks, which means they may not be
correctly implemented on alien systems. Additionally, these are not true
multithreading. If you'd like to learn more about threading, see the
F file mentioned below in the SEE ALSO section.
=head2 Avoiding Pipe Deadlocks
In general, if you have more than one sub-process, you need to be very
careful that any process which does not need the writer half of any
pipe you create for inter-process communication does not have it open.
The reason for this is that any child process which is reading from
the pipe and expecting an EOF will never receive it, and therefore
never exit. A single process closing a pipe is not enough to close it;
the last process with the pipe open must close it for it to read EOF.
There are some features built-in to unix to help prevent this most of
the time. For instance, filehandles have a 'close on exec' flag (set
I with Perl using the C<$^F> L), so that any
filehandles which you didn't explicitly route to the STDIN, STDOUT or
STDERR of a child I will automatically be closed for you.
So, always explicitly and immediately call close() on the writable end
of any pipe, unless that process is actually writing to it. If you
don't explicitly call close() then be warned Perl will still close()
all the filehandles during global destruction. As warned above, if
those filehandles were opened with Safe Pipe Open, they will also call
waitpid() and you might again deadlock.
=head2 Bidirectional Communication with Another Process
While this works reasonably well for unidirectional communication, what
about bidirectional communication? The obvious thing you'd like to do
doesn't actually work:
open(PROG_FOR_READING_AND_WRITING, "| some program |")
and if you forget to use the C