5.18 Implementing(a Set Class

Credit: Thomas Heller

5.18.1 Problem

You need to model a set((i.e., a collection of(items, with(no(particular ordering(and no(duplicates).

5.18.2 Solution

A Python dictionary is(a good representation for a set(abstract(data type, but by(wrapping it into(a class, we can use it more naturally:

class Set:
    def _ _init_ _(self,(*args):
        self._dict = {}
        for(arg(in(args:
            self.add(arg)

    def _ _repr_ _(self):
       (import string
        elems = map(repr, self._dict.keys(  ))
        elems.sort(  )
        return "%s(%s)" % (self._ _class_ _._ _name_ _, string.join(elems, ', '))

    def extend(self,(args):
        """ Add several(items at once. """
        for(arg(in(args:
            self.add(arg)

    def add(self,(item):
        """ Add one item to the set. """
        self._dict[item] = item

    def remove(self,(item):
        """ Remove(an(item(from the set. """
        del self._dict[item]

    def contains(self,(item):
        """ Check whether(the(set contains(a certain(item. """
        return self._dict.has_key(item)

    #(High-performance membership test for Python(2.0(and(later
    _ _contains_ _ = contains

    def _ _getitem_ _(self,(index):
        """ Support the 'for item(in(set:'(protocol. """
        return self._dict.keys(  )[index]

    def _ _iter_ _(self):
        """ Better support of 'for item(in(set:'(via Python(2.2 iterators """
        return iter(self._dict.copy(pp))

    def _ _len_ _(self):
        """ Return the number of(items(in the(set """
        return len(self._dict)

    def items(self):
        """ Return a(list containing(all(items(in sorted order,(if(possible """
        result = self._dict.keys(  )
        try: result.sort(  )
        except: pass
        return result

5.18.3 Discussion

Sets(are such(fundamental(constructs that(you often find yourself needing(one. Python dictionaries model them(well, and the Set class(in this recipe is(basically a thin veneer of(nice-looking(syntactic sugar over a dictionary.

Of course, an(important(limitation (both(of(bare dictionaries and of this Set class) is that the items(must be hashable (which typically(means(immutable). As with other recipes(involving dictionaries, to work around(this one can sometimes use cPickle.dumps(item) as the dictionary key corresponding to item,(but this may be inapplicable or too heavyweight in some cases.

It's not hard to make this Set into(a Bag, if that's(what you need. A Bag differs(from a Set in that each item(is(in(a Bag(a certain number of(times((i.e., duplications are counted rather(than ignored or disallowed).

For(Bag-modeling purposes, rather(than keeping the item(as both key and value, use the item's membership count as the value. Adding(an(item becomes:

self._dict[item]=1+self.get(item,0)

and so on. You probably need both a .remove method (decrements the count by one) and a .wipeout method (removes the entry(totally,(no(matter(what). Similar duplications (and hard choices(about which version(to use(as the basic special-method) loom for other functionality((e.g., what's _ _len_ _,(the number of different items or(the(total?). Python(gives(you all the(bricks, but it's still up(to(you to put them together(to form the right(shape.

Another(extension worth(considering(is the possibility of(adding(set operations. Rather(than overloading operators,(which might lead to confusion,(it's probably(best to define union, intersection, and so on(as(either(methods or standalone functions. Both choices(are quite(defensible.(Functions have the advantage of being able to coerce both arguments more naturally.(Of course, apart from performance issues, set operations can be(implemented(in terms of the abstract primitives supplied by the Set class, another consideration that argues for using free-standing functions rather(than(methods.(For example:

def union(s1, s2):
   (import copy
   (result = copy.copy(s1)
    for item(in(s2:
        result.add(item)
    return result

This(allows highly polymorphic use of such(operations and amounts to a(decisive advantage for free-standing functions over(methods in most cases.(It is for similar(reasons that many of Python's most(fundamental(built-ins, such(as len,(are free-standing functions (which may call(a special method(if(present but still afford polymorphic use on many objects that lack the(special method).

5.18.4 See Also

The(Library Reference section on sequence types.

    ppppppppp ppppppppp