I l@ve RuBoard Previous Section Next Section

2.4 Sorting While Guaranteeing Sort Stability

Credit: Alex Martelli, David Goodger

2.4.1 Problem

You need to sort a Python list in a guaranteed-stable way (i.e., with no alteration in the relative ordering of items that compare equal).

2.4.2 Solution

In addition to speed, decorate-sort-undecorate (DSU) offers flexibility that sort with a comparison function argument just can't match. For example, you can ensure the sort's stability, as follows:

def stable_sorted_copy(alist, _indices=xrange(sys.maxint)):
    # Decorate: prepare a suitable auxiliary list
    decorated = zip(alist, _indices)

    # Sort: do a plain built-in sort on the auxiliary list
    decorated.sort(  )

    # Undecorate: extract the list from the sorted auxiliary list
    return [ item for item, index in decorated ]

def stable_sort_inplace(alist):
    # To sort in place: assign sorted result to all-list slice of original list
    alist[:] = stable_sorted_copy(alist)

2.4.3 Discussion

The notion of a stable sort is typically not relevant if the sequences you are sorting contain objects that are uniform in type and are simple, such as a list of integers. In such cases, objects that compare equal are equal in all measurable ways. However, in cases where equality for the sake of ordering doesn't correspond to "deep" equality�such as tuples containing floating-point and integer numbers, or, more commonly, rich objects with arbitrary internal structure�it may matter that the elements that start off earlier in the list than their "equal" counterparts remain earlier after the sort.

Python lists' sort method is not guaranteed to be stable: items that compare equal may or may not be in unchanged order (they often are, but you cannot be sure). Ensuring stability is easy, as one of the many applications of the common DSU idiom. For another specific example of DSU usage, see Recipe 2.7.

First, you build an auxiliary list (the decorate step), where each item is a tuple made up of all sort keys, in descending order of significance, of the corresponding item of the input sequence. You must include in each tuple of the auxiliary list all of the information in the corresponding item, and/or an index to it, so that you can fetch the item back, or reconstruct it, in the third step.

In the second step, you sort the auxiliary list by its built-in sort method, without arguments (this is crucial for performance).

Finally, you reconstruct the desired sorted list by undecorating the now-sorted auxiliary list. Steps 1 and 3 can be performed in several ways: with map, zip, list comprehensions, or explicit loops. List comprehensions and zip are generally simpler, although they require Python 2.0 or later. If you need to be compatible with Python 1.5.2, you will have to use map, which unfortunately is often less clear and readable.

This idiom is also known as the "Schwartzian transform," by analogy with a related Perl idiom. However, that term implies using Perl's map and grep functions and performing the whole idiom inside of a single statement.

DSU inherently supplies a sorted copy, but if the input sequence is a list, we can just assign to its include-everything slice to get an in-place effect instead.

This recipe specifically demonstrates using DSU to achieve a stable sort (i.e., a sort where items that compare equal keep the same relative order in the result list as they had in the input sequence). For this specific task, passing an argument to the built-in sort method is of no use. More generally, the DSU idiom can sometimes be replaced by passing a comparison function argument to sort, but DSU tends to be much faster, and speed often matters when you are sorting sequences that aren't tiny.

The speed comes from maximally accelerating (by not using a Python-coded function as an argument to sort) the O(N log N) part, which dominates sorting time for sequences of substantial length N. The decoration and undecoration steps are both O(N), and thus they contribute negligible time if N is large enough and reasonably little time even for many practical values of N.

Note the named argument _indices. This ensures that a single copy of the necessary xrange object is generated at function-definition time. It can then be reused to decorate any argument sequence with indexes, by exploiting zip's truncation behavior on unequal-length arguments (see Recipe 1.14).

2.4.4 See Also

Recipe 1.14 and Recipe 2.7.

    I l@ve RuBoard Previous Section Next Section