5.6 Advanced Features

Since the languages Parrot targets (like Perl and Ruby) have sophisticated concepts as core features, it's in Parrot's best interest to have core support for them. This section covers some (but not all) of these features.

5.6.1 Garbage Collection

It's expected that modern languages have garbage collection built in. The programmer shouldn't have to worry about explicitly cleaning up after dead variables, or even identifying them. For interpreted languages, this requires support from the interpreter engine, so Parrot provides that support.

Parrot has two separate allocation systems built into it. Each allocation system has its own garbage collection scheme. Parrot also has some strict rules over what can be referenced and from where. This allows it to have a more efficient garbage collection system.

The first allocation system is responsible for PMC and string structures. These are fixed-sized objects that Parrot allocates out of arenas, which are pools of identically sized things. Using arenas makes it easy for Parrot to find and track them, and speeds up the detection of dead objects.

Parrot's dead object detection system works by first running through all the arenas and marking all strings and PMCs as dead. It then runs through the stacks and registers, marking all strings and PMCs they reference as alive. Next, it iteratively runs through all the live PMCs and strings and marks everything they reference as alive. Finally, it sweeps through all the arenas looking for newly dead PMCs and strings, which it puts on the free list. At this point, any PMC that has a custom destruction routine, such as an object with a DESTROY method, has its destruction routine called. The dead object detector is triggered whenever Parrot runs out of free objects, and can be explicitly triggered by running code. Often a language compiler will force a dead object sweep when leaving a block or subroutine.

Parrot's memory allocation system is used to allocate space for the contents of strings and PMCs. Allocations don't have a fixed size; they come from pools of memory that Parrot maintains. Whenever Parrot runs out of memory in its memory pools, it makes a compacting run—squeezing out unused sections from the pools. When it's done, one end of each pool is entirely actively used memory, and the other end is one single chunk of free memory. This makes allocating memory from the pools faster, as there's no need to walk a free list looking for a segment of memory large enough to satisfy the request for memory. It also makes more efficient use of memory, as there's less overhead than in a traditional memory allocation system.

Splitting memory pool compaction from dead object detection has a nice performance benefit for Perl and languages like it. For most Perl programs, the interpreter allocates and reallocates far more memory for string and variable contents than it does actual string and variable structures. The structures are reused over and over as their contents change. With a traditional single-collector system, each time the interpreter runs out of memory it has to do a full scan for dead objects and compact the pools after. With a split system, Parrot can just sweep through the variables it thinks are live and compact their contents. This does mean that Parrot will sometimes move data for variables and strings that are really dead because it hasn't found that out yet. That expense is normally much less than the expense of doing a full tracing run to find out which variables are actually dead.

Parrot's allocation and collection systems have some compromises that make interfacing with low-level code easier. The structure that describes a PMC or string is guaranteed not to move over the lifetime of the string or variable. This allows C code to store pointers to variables in internal structures without worrying that what they're referencing may move. It also means that the garbage collection system doesn't have to worry about updating pointers that C code might hold, which it would have to do if PMC or string structures could move.

5.6.2 Signature-Based Dispatching

Signature-based dispatching, or multimethod dispatching, is a powerful technique that uses the parameters of a function or method call to help decide at runtime what function or method Parrot should call. This is one of the new features being built into Perl 6. It allows you to have two or more subroutines or methods with the same name that differ only in the types of their arguments. At runtime Parrot looks at the parameters for a subroutine or method call and figures out what the best subroutine or method to call is.

This allows for some very powerful behaviors, as well as giving a good opportunity for optimization. For example, take the following three lines of code:

$a = 1 + 2;
$b = 1 + 2.0;
$c = 1 + "2";

The result is the same in each case—all the variables end up with a value of 3. But each of those three statements needs to execute different code.

In the first case, it's plain integer addition, which is the fastest possible way. The math itself probably takes a single CPU cycle. The second case adds an integer to a floating-point value, something that requires different code than adding two integers does. In the third case, we need to turn one of the arguments from a string into a number.

In this example we can easily figure out at compile time which kind of addition code to use. If the code was instead something like:

$a = $b + $c;

we couldn't know at compile time what kind of addition was needed. We have to check at runtime, and do the right kind of math based on the types of the two arguments.

Multimethod dispatch is also very useful for normal subroutines and methods. You've probably found yourself writing code that checks the types of its arguments and does different things based on what types are passed in, something like:

sub foo {
    my ($self, $arg) = @_;
    if ($arg->isa("Thing")) {
        #...
    } elsif ($arg->isa("OtherThing")) {
        #...
    } elsif ($arg->isa("Doodad")) {
        #...
    } else {
        #...
    }
}

This is a manual form of multimethod dispatch. If you've done it, you no doubt found that once you get past three or four different checks, the code becomes very unwieldy and hard to maintain. Using multimethods, you do the same thing with three or four methods that you don't have to check at all, as the system does it for you.

Finally, multimethod dispatch provides a way to carefully inject code into another package, which is especially useful when dealing with overloaded operators. It's not at all uncommon to have a data type that behaves differently depending on whether it's on the left or right side of an operator. Normally, the data on the left-hand side of an operator controls the operation. If you're adding a new data type such as a matrix, the code for the data type on the left-hand side of the operator won't know about your new type, so it can't perform the operation properly. Using multimethod dispatch, and allowing code to add in new variants of a function or method at runtime, makes it much more likely that your program will get the correct answer.

5.6.3 Continuations

Continuations are possibly the most powerful high-level flow control construct. Originating with lambda calculus, and built into Lisp over thirty years ago, continuations can be thought of as a closure for control flow. They not only capture their lexical scope, which gets restored when they're invoked, but also capture their call stack, so when they're invoked it's as if you never left the spot where they were created.

Continuations are phenomenally powerful, but with great power comes great confusion. Indeed, some languages specifically designed to be as obfuscated as possible include continuations just because they're so mind-warping. Still, you can duplicate any control structure you can think of with continuations, and there are times when their power is necessary to do some advanced things. Because of this, and because Ruby provides them as core functionality, Parrot has support for them.

Interestingly, this did trigger a number of other useful features. Efficient continuation support requires a framed, segmented call stack and copy-on-write semantics for interpreter control structures. These are very useful for threads, efficient string usage, and coroutines, so the feature was well worth it. It also allows Parrot to provide all the semantics necessary for an interoperable Scheme implementation, which is a good thing as well.

5.6.4 Coroutines

A coroutine is a subroutine or method that can suspend itself partway through, then later pick up where it left off. This isn't quite the same thing as a continuation, though it may seem so at first. Coroutines are often used to implement iterators and generators, as well as threads on systems that don't have native threading support. Since they are so useful, and since Perl 6 and Python provide them either directly or as generators, Parrot has support for them built in.

Coroutines present some interesting technical challenges. Calling into an existing coroutine requires reestablishing not only the lexical state and potentially the hypothetical state of variables, but also the control state for just the routine. Coroutines can be implemented in terms of continuations if need be, but that requires using a full continuation-passing function call system, something we chose not to do.