3.8 Filtering a String for a Set of Characters
Credit: Jürgen Hermann, Nick Perkins
3.8.1 Problem
Given a set of characters to
keep, you need to build a filtering
functor
(a function-like, callable object). The specific functor you need to
build is one that, applied to any string s,
returns a copy of s that contains only characters
in the set.
3.8.2 Solution
The string.maketrans function and
translate method of string objects are fast and
handy for all tasks of this ilk:
import string
# Make a reusable string of all characters
_allchars = string.maketrans('', '')
def makefilter(keep):
""" Return a functor that takes a string and returns a partial copy of that
string consisting of only the characters in 'keep'.
"""
# Make a string of all characters that are not in 'keep'
delchars = _allchars.translate(_allchars, keep)
# Return the functor, binding the two strings as default args
return lambda s, a=_allchars, d=delchars: s.translate(a, d)
def canonicform(keep):
""" Given a string, considered as a set of characters, return the
string's characters as a canonic-form string: alphabetized
and without duplicates.
"""
return makefilter(keep)(_allchars)
if _ _name_ _ == '_ _main_ _':
identifier = makefilter(string.letters + string.digits + '_')
print identifier(_allchars)
3.8.3 Discussion
The key to understanding this recipe lies in the definitions of the
translate
and maketrans functions in the
string module. translate takes
a string and replaces each character in it with the corresponding
character in the translation table passed in as the second argument,
deleting the characters specified in the third argument.
maketrans is a utility routine that helps create
the translation tables.
Efficiency is vastly improved by splitting the filtering task into
preparation and execution phases. The string of all characters is
clearly reusable, so we build it once and for all when this module is
imported. That way, we ensure that each filtering functor has a
reference to the same string of all characters, not wasting any
memory. The string of characters to delete depends on the set of
characters to keep, so we build it in the
makefilter
factory function. This is done quite rapidly using the
translate method to delete the characters to keep
from the string of all characters. The translate
method is very fast, as are the construction and execution of these
useful little functors. The solution also supplies an extremely
simple function to put any set of characters, originally an arbitrary
string, into canonic-string form (alphabetically sorted, without
duplicates). The same trick encapsulated in the
canonicform
function is also explicitly used in the test code that is executed
when this runs as a script.
Of course, you don't have to use
lambda (here or anywhere else). A named
function local to the factory function will do just as well. In other
words, this recipe works fine if you change
makefilter's
return statement into the following two
statements:
def filter(s, a=_allchars, d=delchars): return s.translate(a, d)
return filter
Many Pythonistas would consider this clearer and more readable.
This isn't a big issue, but remember that
lambda is never necessary. In any case in which
you find yourself straining to fit code into a
lambda's limitations (i.e., just
an expression, with no statements allowed), you can and should always
use a local named function instead, to avoid all the limitations and
problems.
With Python 2.2, or Python 2.1 and a
from _ _future_ _ import nested_scopes, you get
lexically nested scopes, so that if you want to, you can avoid
binding _allchars and delchars
as default values for arguments in the returned functor. However, it
is (marginally) faster to use this binding anyway: local variables
are the fastest kind to access, and arguments are nothing but
prebound local variables. Globals and names from nested scopes
require a little more effort from the interpreter (and sometimes,
perhaps more significantly, from a human being who is reading the
code). This is why we bind _allchars as argument
a here despite the fact that, in any release of
Python, we could have just accessed it as a global variable.
3.8.4 See Also
Documentation for the maketrans function in the
string module in the Library
Reference.
|