** Readings: Chapter 8 -- topics on critical sections and deadlock

Context Switches on the Sparc Processor

This document describes those portions of the Sparc processor and its operation needed to understand the context switching code given in the second assignment. Included is a description of the Sparc registers, register windows and subroutine linkage. Note that as the Sparc processer is relatively complex, this description has been simplified somewhat.

PC, Stack Pointer, and Frame Pointer

The Program Counter (PC) points to the instruction currently being executed. Except on branches, subroutine calls and subroutine returns, the PC is incremented by 4, the size of an instruction, whenever an instruction is executed, so as to point to the next following instruction.

The Stack Pointer points to the top of stack, and changes continuously during the execution of a program. The stack is used by subroutines to store dynamic local data (that need to be valid only during the execution of the subroutine). It is thus typically used to store variables that are local to the subroutine, the incoming parameters, and the contents of (some of the) registers when other subroutines are called (so that they can use the registers). The size of the stack is increased on each subroutine call be decrementing the Stack Pointer -- recall that the stack grows downwards. Later, on subroutine return, the size of the stack is decremented by incrementing the stack pointer appropriately.

The Frame Pointer is always equal to the value the stack pointer had when the current subroutine was called, before it was decremented for the current subroutine. Because the stack pointer is constantly changing, the data stored on the stack for the current subroutine are typically dereferenced via the Frame Pointer, since the Frame Pointer stays constant during the execution of the subroutine.

Registers

At any point in execution, a Sparc process has access to 32 registers. They are divided into four groups:
%g0..%g7:
global registers that have meaning to an entire program and are accessible from any subroutine.
%i0..%i7:
in registers that contain the incoming parameters to the subroutine in execution.
%l0..%l7:
local registers that are used for local subroutine and temporary variables.
%o0..%o7:
out registers that are used for temporaries and for passing arguments to called subroutines, as well as for returning subroutines return values.
Some of these registers are used in a special way:
%g0:
is always zero.
%o6:
contains the stack pointer
%o7:
contains the return address for subroutine return
%i6:
contains the frame pointer
%i7:
contains the subroutine return address
%fp and %sp are aliases for %i6 and %o6, respectively.


Subroutine Linkage

To call a subroutine, the "call <immediate>" Sparc instruction is typically used. This instruction has two effects:
  1. it moves the value of %pc into %o7, and
  2. changes %pc to point to the first instruction of the subroutine.
Because of the pipelined nature of the Sparc processor, the instruction immediately after the call instruction is still executed before the first instruction of the subroutine is. As a result, the instruction following a call will typically be nop (in non-optimized code) that has no effect.

On subroutine entry, the callee sees the same values in the registers as were seen by the caller (except for %o7). The save instruction is used to remap the registers as follows: the in and local registers are (logically) saved to memory, and the out registers are copied to the in registers. The old out and local registers can now be used freely by the subroutine. The global registers remain unchanged. Note that save causes the current value of the stack pointer to become the new frame pointer.

After the registers are remapped as described, the save instruction behaves as an add instruction. The actual syntax used is: "save s_reg,ival,t_reg", which adds ival to the contents of the source register, s_reg, storing the sum in the target register t_reg. The only particularity is that s_reg refers to the register before the remapping of the registers, while t_reg refers to the register after the remapping. Most often, the save instruction is used to adjust the value of the stack pointer as needed: "save %sp,-112,%sp ", which grows the stack by 112 bytes. (Recall the stack grows downwards.) The 112 bytes are enough to be able to save the registers and other data in that area of the stack. (By the way, STACK_ADJUST is necessary in part for the space necessary for saving the registers, should the OS decide to save them.)

Just before returning from a subroutine, the data that was in the registers on subroutine entry are restored by executing the restore instruction: the in registers are copied to the out registers, and the original in and local registers are restored from the stack.. Again, the global registers remain unchanged. Note that restore causes the frame pointer to become the new stack pointer (thus automagically shrinking the size of the stack to what it was before the subroutine was called). Also, restore restores the original frame pointer.

Finally, the ret instruction causes %pc to be set to the value %i7 + 8.

To illustrate this sequence of calling a subroutine, consider the following C program, bla.c:

        main()
          {
              bla() ;
          }

        bla()
          {
          }
If this program is compiled without any optimizations and with the "-S" flag, i.e. "gcc -S bla.c", then the file bla.s is created containing the following assembly program:
    main:
        save %sp,-112,%sp
        call bla,0
        nop
        ret
        restore


    bla:
        save %sp,-112,%sp

        ret
        restore
Note that because of the pipelining characteristics of the Sparc processor, the restore instruction still gets executed (while the ret instruction is being executed), even though it appears one instruction after the ret instruction.


Register Sets and Register Windows

As an optimization, the save and restore instructions do not always copy the register values to and from memory. Instead, the processor may use an internal cache in the form of a large number of registers, so that the values only have to be copied between registers in the common case. (Recall that in a modern computer, a memory access can take as much as 100 times longer than a register access.)

For this purpose, the Sparc processor provides for a register file with a mapping register that indicates which registers are currently active. For example, the processor may provide a register file with 128 registers, divided into 8 register sets of 16 registers each: 8 in registers and 8 local registers. (The actual number of register sets is implementation dependent.)

           ------------  
           | in-regs  |   \
           |----------|    >  register set 1
           | loc-regs |   /
           |==========|
           | in-regs  |   \
           |----------|    >  register set 2
           | loc-regs |   /
           |==========|
           | in-regs  |   \
           |----------|    >  register set 3
           | loc-regs |   /
           |==========|
           | in-regs  |   \
           |----------|    >  register set 4
           | loc-regs |   /
           |==========|
           | ......   |
           :          :
At any given time, a program can access only a register window that consists of the in and local registers of the currently active register set as well as the in registers of the next following register set. The in registers of the next following register set are referred to as the out registers.
                    ------------     window i         window i+1
                    | in-regs  |
                    |----------|
                    | loc-regs |
               _    |==========|
              /     | in-regs  |  <- in regs
             /      |----------|
  window i  <       | loc-regs |  <- local regs     
             \    _ |==========|
              \_ /  | in-regs  |  <- out regs      <- in regs
                /   |----------| 
   window i+1  <    | loc-regs |                   <- local regs
                \   |==========|
                 \_ | in-regs  |                   <- out regs
                    |----------|
                    | loc-regs |
                    |==========|
                    | ......   |
                    :          :
With a register file and register windows, the save instruction in the default case then just shifts the register window by one register set, and similarly, the restore instruction shifts the register window back by one register set, eliminating the need to save the registers to memory.

Registers need to be saved to or retrieved from memory, only when the window shifts past either end of the register file. When that happens, a exception occurs, and the operating system writes out the values of the register sets to the area pointed to by the frame pointer (conveniently located in %i6 of each register set), or reads in a register set from the stack, depending on the situation. A user-level program can also explicitly request the operating system to save the register sets into the stack by issuing an appropriate trap instruction:

        ta 3