next up previous Back to Operating Systems Home Page
Next: Assig. 4 - Text Up: 1996 term messages Previous: Structuring header files

Assig. 4 - locks and zombies

  > Alex Newton writes:
  >
  > let's say that more processes (brokers) want to READ ONLY some data from
  > shared memory. Do we still need a semaphore on that shared memory to
  > control the access to it?

I think we already talked about this in class weeks ago.  

The answer is: generally speaking, no, you don't. But only if a strict READ
ONLY behaviour is actually verified.  In particular, a process must not lock
and modify a shared variable based on its value (or on the value of other
shared variables), unless that value is read after locking.  Consider the case
of two concurrent processes P1 and P2, sharing a variable A, and using it in
such a way that A is to be decreased if positive, and left alone if it's
zero. Now, let's see what might happen in that scenario

                start 
                A=1;
        P1 reads A (without locking it using a semaphore), 
           and decides to decrease it;
        P1 is blocked before decreasing it;
                      P2 reads A (without locking it using a semaphore),  
                         and decides to decrease it;
                      P2 locks A using a semaphore;
                      P2 decreases A;
                      P2 unlocks A;
                      P2 is blocked;
        P1 is resumed;
        P1 locks A using a semaphore;
        P1 decreases A
        P1 unlocks A
        
Now A equals -1, which was not supposed to happen.
        
        
  > I am trying to test some of my functions and the system very often says 
  > "no more processes". Even now (it's almost 1 a.m.) the same thing happens.
  > Can one do something about that?

Probably you (or some of your fellow users) are into horror movies, of the
"Zombie Night" variety ;-)

Remember that:
         
        If children processes exit before their parents, the system must save
        an entry for them in the process table, because the parents may want
        to wait(2) for them in the future (and thus learn about the children's
        exit status). These entries are named "zombie" processes because the
        processes are dead indeed (their executable images are not in memory
        anymore), but their semi-rotten corpses still hold slots of the
        process table.  Unfortunately the process table has a finite size, so
        fork(2)s of new baby processes will fail as soon as the zombies take
        it over.  The solution: always wait for your children. If you can't
        stop to wait, then periodically call waitpid NOHANG. Or use the fact
        that SIGCHLD is delivered to the parent when a child exits, and
        catch it in a handler that just waits - like:
                void sig_chld(int sig) {
                    /* Reinstall handler */
                    if (-1==signal(SIGCHLD, sig_chld)) {
                       perror("sig_chld");
                       exit(1);
                    }
                    /* Reap child */
                    wait(NULL);
                }
        Here you bet that no two children will die at the same time.

        On the other hand, when a parent exits before its children, these are
        adopted by process 1 (init), which automatically waits for them, thus
        preventing the creation of zombies. The parent's entry in the system
        process table is imediately deleted.
  
 > Robert Sim writes:
 > 
 > As a postscript to your answer to someone's question about the error
 > "No more processes", it might be helpful to note that the same error
 > is returned when a message queue reaches its capacity.  If I am not
 > mistaken, this amounts to 40 messages in the queue at a time, which is
 > a limit that could easily be exceeded with 10 processes simultaneously
 > using to the same queue.

Very good point - errno is set to EAGAIN in both cases. A proof that perror(3)
does not always give informative messages. In doubt, check the value of errno
directly, and of course the manpage of the syscall that fails.



\ Franco Callari