5.6 Advanced Features
Since the languages Parrot targets (like Perl and Ruby) have
sophisticated concepts as core features, it's in
Parrot's best interest to have core support for
them. This section covers some (but not all) of these features.
5.6.1 Garbage Collection
It's expected that modern languages have
garbage
collection built in. The programmer shouldn't have
to worry about explicitly cleaning up after dead variables, or even
identifying them. For interpreted languages, this requires support
from the interpreter engine, so Parrot provides that support.
Parrot has two separate allocation systems built into it. Each
allocation system has its own garbage collection scheme. Parrot also
has some strict rules over what can be referenced and from where.
This allows it to have a more efficient garbage collection system.
The first allocation system is responsible for
PMC and string structures. These are
fixed-sized objects that Parrot allocates out of arenas, which are
pools of identically sized things. Using arenas makes it easy for
Parrot to find and track them, and speeds up the detection of dead
objects.
Parrot's dead object
detection system works by first running through all the arenas and
marking all strings and PMCs as dead. It then runs through the stacks
and registers, marking all strings and PMCs they reference as alive.
Next, it iteratively runs through all the live PMCs and strings and
marks everything they reference as alive. Finally, it sweeps through
all the arenas looking for newly dead PMCs and strings, which it puts
on the free list. At this point, any PMC that has a custom
destruction routine, such as an object with a
DESTROY method, has its destruction routine
called. The dead object detector is triggered whenever Parrot runs
out of free objects, and can be explicitly triggered by running code.
Often a language compiler will force a dead object sweep when leaving
a block or subroutine.
Parrot's memory allocation system is used to
allocate space for the contents of strings and PMCs. Allocations
don't have a fixed size; they come from pools of
memory that Parrot maintains. Whenever Parrot runs out of memory in
its memory pools, it makes a compacting run—squeezing out
unused sections from the pools. When it's done, one
end of each pool is entirely actively used memory, and the other end
is one single chunk of free memory. This makes allocating memory from
the pools faster, as there's no need to walk a free
list looking for a segment of memory large enough to satisfy the
request for memory. It also makes more efficient use of memory, as
there's less overhead than in a traditional memory
allocation system.
Splitting memory pool compaction from dead object detection has a
nice performance benefit for Perl and languages like it. For most
Perl programs, the interpreter allocates and reallocates far more
memory for string and variable contents than it does actual string
and variable structures. The structures are reused over and over as
their contents change. With a traditional single-collector system,
each time the interpreter runs out of memory it has to do a full scan
for dead objects and compact the pools after. With a split system,
Parrot can just sweep through the variables it thinks are live and
compact their contents. This does mean that Parrot will sometimes
move data for variables and strings that are really dead because it
hasn't found that out yet. That expense is normally
much less than the expense of doing a full tracing run to find out
which variables are actually dead.
Parrot's allocation and collection systems have some
compromises that make interfacing with low-level code easier. The
structure that describes a PMC or string is guaranteed not to move
over the lifetime of the string or variable. This allows C code to
store pointers to variables in internal structures without worrying
that what they're referencing may move. It also
means that the garbage collection system doesn't
have to worry about updating pointers that C code might hold, which
it would have to do if PMC or string structures could move.
5.6.2 Signature-Based Dispatching
Signature-based dispatching, or
multimethod
dispatching, is a powerful technique that uses the parameters of a
function or method call to help decide at runtime what function or
method Parrot should call. This is one of the new features being
built into Perl 6. It allows you to have two or more subroutines or
methods with the same name that differ only in the types of their
arguments. At runtime Parrot looks at the parameters for a subroutine
or method call and figures out what the best subroutine or method to
call is.
This allows for some very powerful behaviors, as well as giving a
good opportunity for optimization. For example, take the following
three lines of code:
$a = 1 + 2;
$b = 1 + 2.0;
$c = 1 + "2";
The result is the same in each case—all the variables end up
with a value of 3. But each of those three statements needs to
execute different code.
In the first case, it's plain integer addition,
which is the fastest possible way. The math itself probably takes a
single CPU cycle. The second case adds an integer to a floating-point
value, something that requires different code than adding two
integers does. In the third case, we need to turn one of the
arguments from a string into a number.
In this example we can easily figure out at compile time which kind
of addition code to use. If the code was instead something like:
$a = $b + $c;
we couldn't know at compile time what kind of
addition was needed. We have to check at runtime, and do the right
kind of math based on the types of the two arguments.
Multimethod dispatch is also very useful for normal subroutines and
methods. You've probably found yourself writing code
that checks the types of its arguments and does different things
based on what types are passed in, something like:
sub foo {
my ($self, $arg) = @_;
if ($arg->isa("Thing")) {
#...
} elsif ($arg->isa("OtherThing")) {
#...
} elsif ($arg->isa("Doodad")) {
#...
} else {
#...
}
}
This is a manual form of multimethod dispatch. If
you've done it, you no doubt found that once you get
past three or four different checks, the code becomes very unwieldy
and hard to maintain. Using multimethods, you do the same thing with
three or four methods that you don't have to check
at all, as the system does it for you.
Finally, multimethod dispatch provides a way to carefully inject code
into another package, which is especially useful when dealing with
overloaded operators. It's not at all uncommon to
have a data type that behaves differently depending on whether
it's on the left or right side of an operator.
Normally, the data on the left-hand side of an operator controls the
operation. If you're adding a new data type such as
a matrix, the code for the data type on the left-hand side of the
operator won't know about your new type, so it
can't perform the operation properly. Using
multimethod dispatch, and allowing code to add in new variants of a
function or method at runtime, makes it much more likely that your
program will get the correct answer.
5.6.3 Continuations
Continuations
are possibly the most powerful high-level flow
control construct. Originating with lambda calculus, and built into
Lisp over thirty years ago, continuations can be thought of as a
closure for control flow. They not only capture their lexical scope,
which gets restored when they're invoked, but also
capture their call stack, so when they're invoked
it's as if you never left the spot where they were
created.
Continuations are phenomenally powerful, but with great power comes
great confusion. Indeed, some languages specifically designed to be
as obfuscated as possible include continuations just because
they're so mind-warping. Still, you can duplicate
any control structure you can think of with continuations, and there
are times when their power is necessary to do some advanced things.
Because of this, and because Ruby provides them as core
functionality, Parrot has support for them.
Interestingly, this did trigger a number of other useful features.
Efficient continuation support requires a framed, segmented call
stack and copy-on-write semantics for interpreter control structures.
These are very useful for threads, efficient string usage, and
coroutines, so the feature was well worth it. It also allows Parrot
to provide all the semantics necessary for an interoperable Scheme
implementation, which is a good thing as well.
5.6.4 Coroutines
A coroutine is a
subroutine or method that can suspend itself partway through, then
later pick up where it left off. This isn't quite
the same thing as a continuation, though it may seem so at first.
Coroutines are often used to implement iterators and generators, as
well as threads on systems that don't have native
threading support. Since they are so useful, and since Perl 6 and
Python provide them either directly or as generators, Parrot has
support for them built in.
Coroutines present some interesting technical challenges. Calling
into an existing coroutine requires reestablishing not only the
lexical state and potentially the hypothetical state of variables,
but also the control state for just the routine. Coroutines can be
implemented in terms of continuations if need be, but that requires
using a full continuation-passing function call system, something we
chose not to do.
|