To: stktrc <stktrc at yahoo com>
Cc: scsh at zurich csail mit edu
Subject: Re: Threads, forks and file descriptors
From: Martin Gasbichler <gasbichl at informatik uni-tuebingen de>
Date: Tue, 04 Nov 2003 14:57:33 +0100
Message-ID: <x9wuagp776.fsf at informatik uni-tuebingen de>
stktrc <stktrc at yahoo com> writes:
> In a recent message I suggested a work-around to a problem presented
> by ZHAO Wei, which later on appeared to not be a problem, but just
> lack of understanding the system. The reason I suggested the
> work-around was that I thought ZHAO Wei was experiencing a problem
> similar to one that I had earlier, for which I had discovered the
> suggested work-around.
>
> After realizing ZHAO Wei's problem really wasn't a problem, but a lack
> of understanding the system, I decided to go back and track down my
> old problem to see if the case was the same there. I found out it
> likely isn't, so I'm presenting it here now.
>
> The program in question was required to read commands on standard
> input and at the same time both read and write to a subprocess (with
> reads/writes happening without synchronization). This was done by
> spawning a thread responsible for reading from the subprocess which
> forked the subprocess with pipes set up appropriately on the stdio
> file descriptors.
>
> The problem was that the forked SCSH subprocess seemed to block, and
> didn't get to execing the real program supposed to run in the
> subprocess.
>
> I now managed to narrow it down to move->fdes blocking, as
> demonstrated by the SCSH program below. The program first spawns a
> thread after which it tries to read from standard input. The spawned
> thread forks the process and the child calls to move->fdes to set up
> stdin, but here it blocks, and never outputs "not reached" as the
> intention is.
>
> A work-around to make the program work as intended is to add a
> (sleep 1) immediately after the spawn.
>
> #!/bin/sh
> exec scsh -o threads -e main -s "$0" "$@"
> !#
>
> (define (main args)
> (spawn
> (lambda ()
> (receive
> (rport wport) (pipe)
> (fork
> (lambda ()
> (display "trying move->fdes") (newline)
> (move->fdes rport 0)
> (display "not reached") (newline))))))
> ;; uncomment next line to make it work
> ;(sleep 1)
> (read-line))
>
> So, why doesn't the program work as intended? I can only speculate as
> I don't understand enough of what is going on. I'm looking for an
> explanation.
>
> I have a wild guess of what is happening though:
>
> The main thread spawns a new thread (which isn't scheduled for
> execution yet though) and then blocks on the read on standard input.
> Now the spawned thread is scheduled for execution and goes on and
> forks the SCSH process. The child SCSH process starts executing and
> calls move->fdes, where it blocks indefinitely.
>
> Why does it block there? I speculate that when the SCSH process
> forked, there was some kind of lock active on file descriptor 0 due to
> the other thread attempting to read stdin, which was carried over to
> the child process. But if this is so, the lock isn't relevant in the
> child, because there is no other thread in the child which is trying
> to read stdin. In that case, maybe this is a question of deadlock due
> to killing threads (during the fork) without giving them a chance to
> return allocated resources.
>
> What supports this theory is that by adding the (sleep 1) makes the
> program work as intended, explained by that the thread doing the fork
> then has a chance to fork before there is any lock on stdin. Also,
> allowing threads to continue in the child, by adding a #t argument to
> the fork call as shown below, makes the move->fdes call unblock after
> something has been written on stdin.
>
> ; (display "not reached") (newline)) #t))))
>
> What I'd like to know is if preventing this alleged deadlock is the
> responsibility of the user, or if SCSH could help here.
Your "wild guess" describes pretty precisely what's going on. The lock
in question is part of the port data structure and is used to
synchronize access to the port. Scsh does not release locks of other
thread during forking and if the fork happens during (READ-LINE) there
is a whole I/O action going on during the fork which cannot be easily
canceled.
In cases where the point in time of creating a thread matters, I would
suggest to use a procedure like
(define (spawn/wait thunk)
(let ((placeholder (make-placeholder)))
(spawn (lambda ()
(placeholder-set! placeholder #f)
(thunk)))
(placeholder-value placeholder)))
to spawn the thread and wait for its creation.
A real solution for the example above is a lock-free I/O system as in
the recent versions of Scheme 48. It will take some time until we have
included this in scsh.
--
Martin
Up |