5.18 Implementing a Set Class
Credit: Thomas Heller
5.18.1 Problem
You need to model a set (i.e., a
collection of items, with no particular ordering and no duplicates).
5.18.2 Solution
A Python dictionary is a good representation for a set abstract data
type, but by wrapping it into a class, we can use it more naturally:
class Set:
def _ _init_ _(self, *args):
self._dict = {}
for arg in args:
self.add(arg)
def _ _repr_ _(self):
import string
elems = map(repr, self._dict.keys( ))
elems.sort( )
return "%s(%s)" % (self._ _class_ _._ _name_ _, string.join(elems, ', '))
def extend(self, args):
""" Add several items at once. """
for arg in args:
self.add(arg)
def add(self, item):
""" Add one item to the set. """
self._dict[item] = item
def remove(self, item):
""" Remove an item from the set. """
del self._dict[item]
def contains(self, item):
""" Check whether the set contains a certain item. """
return self._dict.has_key(item)
# High-performance membership test for Python 2.0 and later
_ _contains_ _ = contains
def _ _getitem_ _(self, index):
""" Support the 'for item in set:' protocol. """
return self._dict.keys( )[index]
def _ _iter_ _(self):
""" Better support of 'for item in set:' via Python 2.2 iterators """
return iter(self._dict.copy( ))
def _ _len_ _(self):
""" Return the number of items in the set """
return len(self._dict)
def items(self):
""" Return a list containing all items in sorted order, if possible """
result = self._dict.keys( )
try: result.sort( )
except: pass
return result
5.18.3 Discussion
Sets are such fundamental constructs that you often find yourself
needing one. Python dictionaries model them well, and the
Set class in this recipe is basically a thin
veneer of nice-looking syntactic sugar over a dictionary.
Of course, an important limitation (both of bare dictionaries and of
this Set class) is that the items must be hashable
(which typically means immutable). As with other recipes involving
dictionaries, to work around this one can sometimes use
cPickle.dumps(item) as the dictionary key
corresponding to item, but this may be
inapplicable or too heavyweight in some cases.
It's not hard to make this Set
into a
Bag,
if that's what you need. A Bag
differs from a Set in that each item is in a
Bag a certain number of times (i.e., duplications
are counted rather than ignored or disallowed).
For Bag-modeling purposes, rather than keeping the
item as both key and value, use the item's
membership count as the value. Adding an item becomes:
self._dict[item]=1+self.get(item,0)
and so on. You probably need both a .remove method
(decrements the count by one) and a .wipeout
method (removes the entry totally, no matter what). Similar
duplications (and hard choices about which version to use as the
basic special-method) loom for other functionality (e.g.,
what's _ _len_ _, the number of
different items or the total?). Python gives you all the bricks, but
it's still up to you to put them together to form
the right shape.
Another extension worth considering is the possibility of adding set
operations. Rather than overloading operators, which might lead to
confusion, it's probably best to define union,
intersection, and so on as either methods or standalone functions.
Both choices are quite defensible. Functions have the advantage of
being able to coerce both arguments more naturally. Of course, apart
from performance issues, set operations can be implemented in terms
of the abstract primitives supplied by the
Set class, another consideration that argues
for using free-standing functions rather than methods. For example:
def union(s1, s2):
import copy
result = copy.copy(s1)
for item in s2:
result.add(item)
return result
This allows highly polymorphic use of such operations and amounts to
a decisive advantage for free-standing functions over methods in most
cases. It is for similar reasons that many of
Python's most fundamental built-ins, such as
len, are free-standing functions (which may call a
special method if present but still afford polymorphic use on many
objects that lack the special method).
5.18.4 See Also
The Library Reference section on sequence types.
|