Team LiB   Previous Section   Next Section

6.7 Subroutines

Subroutines and methods are the basic building blocks of larger programs. At the heart of every subroutine call are two fundamental actions: it has to store the current location so it can come back to it, and it has to transfer control to the subroutine. The bsr opcode does both. It pushes the address of the next instruction onto the control stack, and then branches to a label that marks the subroutine:

  print "in main\n"
  bsr _sub
  print "and back\n"
  end
_sub:
  print "in sub\n"
  ret

At the end of the subroutine, the ret instruction pops a location back off the control stack and goes there, returning control to the caller. The jsr opcode pushes the current location onto the call stack and jumps to a subroutine. Just like the jump opcode, it takes an absolute address in an integer register, so the address has to be calculated first with the set_addr opcode:

  print "in main\n"
  set_addr I0, _sub
  jsr I0
  print "and back\n"
  end
_sub:
  print "in sub\n"
  ret

6.7.1 Calling Conventions

A bsr or jsr is fine for a simple subroutine call, but few subroutines are quite that simple. The biggest issues revolve around register usage. Parrot has 32 registers of each type, and the caller and the subroutine share the same set of registers. How does the subroutine keep from destroying the caller's values? More importantly, who is responsible for saving and restoring registers? Where are arguments for the subroutine stored? Where are the subroutine's return values stored? A number of different answers are possible. You've seen how many ways Parrot has of storing values. The critical point is that the caller and the called subroutine have to agree on all the answers.

6.7.1.1 Reserved registers

A very simple system would be to declare that the caller uses registers through 15, and the subroutine uses 16 through 31. This works in a small program with light register usage. But what about a subroutine call from within another subroutine or a recursive call? The solution doesn't extend to a large scale.

6.7.1.2 Callee saves

Another possibility is to make the subroutine responsible for saving the caller's registers:

  set I0, 42
  save I0              # pass args on stack
  bsr _inc             # j = inc(i)
  restore I1           # restore args from stack
  print I1
  print "\n"
  end
_inc:
  saveall              # preserve all registers
  restore I0           # get argument
  inc I0               # do all the work
  save I0              # push return value
  restoreall           # restore caller's registers
  ret

This example stores arguments to the subroutine and return values from the subroutine on the user stack. The first statement in the _inc subroutine is a saveall to save all the caller's registers onto the backing stacks, and the last statement before the return restores them.

One advantage of this approach is that the subroutine can choose to save and restore only the register frames it actually uses, for a small speed gain. The example above could use pushi and popi instead of saveall and restoreall because it only uses integer registers. One disadvantage is that it doesn't allow optimization of tail calls, where the last statement of a recursive subroutine is the call to itself.

6.7.1.3 Parrot calling conventions

Internal subroutines can use whatever calling convention serves them best. Externally visible subroutines and methods need stricter rules, since they might be called from a variety of contexts, even from multiple different high-level languages.

Under the Parrot calling conventions,[10] the caller is responsible for preserving its own registers. The first 11 arguments of each register type are passed in Parrot registers, as are several other pieces of information. Register usage for subroutine calls is listed in Table 6-4.

[10] These conventions are still open to changes, so you'll want to check for the latest details in Parrot Design Document 3 (pdd03), available at http://dev.perl.org/perl6/pdd/ and in docs/pdds/pdd03_calling_conventions.pod.

Table 6-4. Calling conventions

Register

Usage

P0

Subroutine object.

P1

Continuation if applicable.

P2

Object for a method call.

P3

Array with overflow parameters.

S0

Fully qualified subroutine name.

I0

True for prototyped parameters.

I1

Number of overflow arguments.

I3

Expected return type.

I5 ... I15

First 11 integer arguments.

N5 ... N15

First 11 float arguments.

S5 ... S15

First 11 string arguments.

P5 ... P15

First 11 PMC arguments.

If there are more than 11 arguments of one type for the subroutine, overflow parameters are passed in an array in P3. Subroutines without a prototype pass all their arguments in the user stack or overflow array.[11]

[11] Prototyped subroutines have a defined signature.

Return values and additional information about them are also passed in registers. The individual registers used on return are listed in Table 6-5.

Table 6-5. Return conventions

Register

Usage

I0

Registers on the stack.

I1

Number of integer return results.

I2

Number of string return results.

I3

Number of PMC return results.

I4

Number of float return results.

P3

Array with overflow return values.

I5 ... I15

First 11 integer return values.

N5 ... N15

First 11 float return values.

S5 ... S15

First 11 string return values.

P5 ... P15

First 11 PMC return values.

Overflow return values and return values from a subroutine without a prototype are passed in the overflow array, just like subroutine arguments.

The _inc subroutine from above can be rewritten as a prototyped subroutine:

  set I0, 42
  new P0, .Sub       # create a new Sub object
  set_addr I1, _inc  # get address of function
  set P0, I1         # and set it on the Sub object
  set I5, I0         # first integer argument
  set I0, 1          # prototype used
  saveall            # preserve environment
  invoke             # call function object in P0
  save I5            # save return value
  restoreall         # restore registers
  restore I1         # restore return value from stack
  print I1
  print "\n"
  end
_inc:
  inc I5             # do all the work
  ret

Instead of using a simple bsr, this set of conventions uses a subroutine object. There are several kinds of subroutine-like objects, but Sub is a class for PASM subroutines. The location of the subroutine is set in the Sub object by the absolute address of the subroutine's label.

Subroutine objects of all kinds can be called with the invoke opcode. With no arguments, it calls the subroutine in P0, which is the standard for the Parrot calling conventions. There is also an invoke Px instruction for calling objects held in a different register.

6.7.2 Native Call Interface

A special version of the Parrot calling conventions are used by the Native Call Interface (NCI) for calling subroutines with a known prototype in shared libraries. This is not really portable across all libraries, but it's worth a short example. This is the first of some tests in t/pmc/nci.t:

  loadlib P1, "libnci.so"       # get library object for a shared lib
  print "loaded\n"
  dlfunc P0, P1, "nci_dd", "dd" # obtain the function object
  print "dlfunced\n"
  set I0, 1                     # prototype used - unchecked
  set I1, 0                     # items on stack - unchecked
  set N5, 4.0                   # first argument
  saveall                       # preserve regs
  invoke                        # call nci_dd
  save N5                       # save return result
  restoreall                    # restore registers
  restore N5
  ne N5, 8.0, nok_1             # the test functions returns 2*arg
  print "ok 1\n"
  end
nok_1:
  ...

This shows two new instructions: loadlib obtains a handle for a shared library, and dlfunc gets a function object from a loaded library (second argument) of a specified name (third argument) with a known function signature (fourth argument). The function signature is a string where the first character is the return value and the rest of the parameters are the function parameters. The characters used in NCI function signatures are listed in Table 6-6.

Table 6-6. Function signature letters

Character

Register set

C type

v

-

void (no return value)

c

I

char

s

I

short

i

I

int

l

I

long

f

N

float

d

N

double

t

S

char *

p

P

void * (or other pointer)

I

-

Parrot_Interp *interpreter

6.7.3 Closures

A closure is a subroutine that keeps values from the lexical scope where it was defined, even when it's called from an entirely different scope. The closure shown here is equivalent to this Perl 5 code snippet:

  #   sub foo {
  #       my ($n) = @_;
  #       sub {$n += shift}
  #   }
  #   my $closure = foo(10);
  #   print &$closure(3), "\n";
  #   print &$closure(20), "\n";

  # call _foo
  new P0, .Sub           # new subroutine object
  set_addr I3, _foo      # get address of _foo
  set P0, I3             # attach address
  new P5, .PerlInt       # define $n
  set P5, 10
  saveall                # caller save
  invoke                 # call foo
  save P5                # save return value
  restoreall             # restore registers
  restore P0             # get return value (the closure)

  # call _closure
  new P5, .PerlInt       # argument to closure
  set P5, 3
  saveall
  invoke                 # call closure(3)
  save P5                # return value
  restoreall
  restore P2             # print result
  print P2               # prints 13
  print "\n"

  # call _closure
  set P5, 20             # and again
  saveall
  invoke                 # call closure(20)
  save P5
  restoreall
  restore P2
  print P2               # prints 33
  print "\n"
  end

_foo:
  new_pad 0              # push a new pad
  store_lex 0, "n", P5   # store $n
  new P5, .Sub           # P5 has the lexical "n" in the pad
  set_addr I3, _closure  # because the Sub inherits the lex pad
  set P5, I3             # set address of function
  pop_pad                # cleanup
  ret                    # the Sub in P5 is the return value

_closure:
  find_lex P2, "n"       # invoking the Sub pushes the lexical pad
                         # of the closure on the pad stack
  add P2, P5             # n += shift
  set P5, P2             # set return value
  pop_pad                # on each call, the lex pad is there
  ret                    # so pop it at end and return

That's quite a lot of PASM code for such a little bit of Perl 5 code, but anonymous subroutines and closures hide a lot of magic under that simple interface. The core of this example is that when the new subroutine is created in _foo with:

new P5, .Sub            # P5 has the lexical "n" in the pad

it inherits and stores the current lexical scratchpad—the topmost scratchpad on the pad stack at the time. Later, when _closure is invoked from the main body of code, the stored pad is automatically pushed onto the pad stack. So, all the lexical variables that were available when _closure was defined are available when it's called.

6.7.4 Coroutines

As we mentioned in the previous chapter, coroutines are subroutines that can suspend themselves and return control to the caller—and then pick up where they left off the next time they're called, as if they never left.

In PASM, coroutines are subroutine-like objects:

new P0, .Coroutine

The Coroutine object has its own user stack, context stack, and pad stack. The pad stack is inherited from the caller. When the coroutine invokes itself, it returns to the caller. The next time it's invoked, it continues to execute where it returned:

  new_pad 0                # push a new lexical pad on stack
  new P0, .PerlInt         # save one variable in it
  set P0, 10
  store_lex -1, "var", P0

  new P0, .Coroutine       # make a new coroutine object
  set_addr I0, _cor
  set P0, I0               # set the address
  saveall                  # preserve enivronment
  invoke                   # invoke the coroutine
  restoreall
  print "back\n"
  saveall
  invoke                   # invoke coroutine again
  restoreall
  print "done\n"
  pop_pad
  end

_cor:
  find_lex P1, "var"       # inherited pad from caller
  print "in cor "
  print P1
  print "\n"
  inc P1                   # var++
  invoke                   # yield(  )
  print "again "
  branch _cor              # next invocation of the coroutine

This prints out the result:

in cor 10
back
again in cor 11
done

The invoke inside the coroutine is commonly referred to as "yield." The coroutine never ends. When it reaches the bottom, it branches back up to _cor and executes until it hits invoke again.

6.7.5 Continuations

A continuation is a subroutine that gets a complete copy of the caller's context, including its own copy of the call stack. Invoking a continuation starts or restarts it at the entry point:

  new P1, .PerlInt
  set P1, 5

  new P0, .Continuation
  set_addr I0, _con
  set P0, I0
_con:
  print "in cont "
  print P1
  print "\n"
  dec P1
  unless P1, done
  invoke                        # P0
done:
  print "done\n"
  end

This prints:

in cont 5
in cont 4
in cont 3
in cont 2
in cont 1
done

6.7.6 Evaluating a Code String

This isn't really a subroutine operation, but it does produce a code object that can be invoked. In this case, it's a bytecode segment object.

The first step is to get an assembler or compiler for the target language:

compreg P1, "PASM1"

Within the Parrot interpreter the only language available is PASM1, which compiles a single, fully qualified PASM instruction to bytecode:[12]

[12] IMCC also accepts PASM for PASM source files, and PIR for PIR source files.

compile P0, P1, "set_i_ic I0, 10"

This places a bytecode segment object into the destination register P0, which can then be invoked with invoke:

compreg P1, "PASM1"                # get compiler
set S1, "in eval\n"
compile P0, P1, "print_s S1"
invoke                             # eval code P0
print "back again\n"
end

Fully qualified opcode names include the types of their arguments in the name: i is an integer register, ic is an integer constant, s is a string register, sc is a string constant, n is a float register, nc is a float constant, and p is a PMC register.

    Team LiB   Previous Section   Next Section