1.13 Flattening a Nested Sequence
Credit: Luther Blissett
1.13.1 Problem
You
have
a sequence, such as a list, some of
whose items may in turn be lists, and so on. You need to flatten it
out into a sequence of its scalar items (the leaves, if you think of
the nested sequence as a tree).
1.13.2 Solution
Of course, we need to be able to tell which of the elements
we're handling are to be deemed scalar. For
generality, say we're passed as an argument a
predicate that defines what is scalar�a function that we can
call on any element and that returns 1 if the
element is scalar or 0 otherwise. Given this, one
approach is:
def flatten(sequence, scalarp, result=None):
if result is None: result = []
for item in sequence:
if scalarp(item): result.append(item)
else: flatten(item, scalarp, result)
return result
In Python 2.2, a simple generator is an interesting alternative, and,
if all the caller needs to do is loop over the flattened sequence,
may save the memory needed for the result list:
from _ _future_ _ import generators
def flatten22(sequence, scalarp):
for item in sequence:
if scalarp(item):
yield item
else:
for subitem in flatten22(item, scalarp):
yield subitem
1.13.3 Discussion
The only problem with this recipe is that determining what is a
scalar is not as obvious as it might seem, which is why I delegated
that decision to a callable predicate argument that the caller is
supposed to pass to flatten. Of course, we must be
able to loop over the items of any non-scalar with a
for statement, or flatten will
raise an exception (since it does, via a recursive call, attempt a
for statement over any non-scalar item). In Python
2.2, that's easy to check:
def canLoopOver(maybeIterable):
try: iter(maybeIterable)
except: return 0
else: return 1
The built-in function iter, new in Python 2.2,
returns an iterator, if possible. for x
in s implicitly calls the iter
function, so the canLoopOver function can easily
check if for is applicable by calling
iter explicitly and seeing if that raises an
exception.
In Python 2.1 and earlier, there is no iter
function, so we have to try more directly:
def canLoopOver(maybeIterable):
try:
for x in maybeIterable:
return 1
else:
return 1
except:
return 0
Here we have to rely on the for statement itself
raising an exception if maybeIterable is not
iterable after all. Note that this approach is not fully suitable for
Python 2.2: if maybeIterable is an iterator
object, the for in this approach consumes its
first item.
Neither of these implementations of canLoopOver is
entirely satisfactory, by itself, as our scalar-testing predicate.
The problem is with strings, Unicode strings, and other string-like
objects. These objects are perfectly good sequences, and we could
loop on them with a for statement, but we
typically want to treat them as scalars. And even if we
didn't, we would at least have to treat any
string-like objects with a length of 1 as scalars. Otherwise, since
such strings are iterable and yield themselves as their only items,
our flatten function would not cease recursion
until it exhausted the call stack and raised a
RuntimeError due to "maximum
recursion depth exceeded."
Fortunately, we can easily distinguish string-like objects by
attempting a typical string operation on them:
def isStringLike(obj):
try: obj+''
except TypeError: return 0
else: return 1
Now, we finally have a good implementation for the scalar-checking
predicate:
def isScalar(obj):
return isStringLike(obj) or not canLoopOver(obj)
By simply placing this isScalar function and the
appropriate implementation of canLoopOver in our
module, before the recipe's functions, we can change
the signatures of these functions to make them easier to call in most
cases. For example:
def flatten22(sequence, scalarp=isScalar):
Now the caller needs to pass the scalarp argument
only in those (hopefully rare) cases where our definition of what is
scalar does not quite meet the caller's
application-specific needs.
1.13.4 See Also
The Library Reference section on sequence types.
|