I l@ve RuBoard

3.19 Printing Unicode Characters to Standard Output

Credit: David Ascher

3.19.1 Problem

You want to print Unicode strings to standard output (e.g., for debugging), but they don't fit in the default encoding.

3.19.2 Solution

Wrap the stdout stream with a converter, using the codecs module:

import codecs, sys
sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)

3.19.3 Discussion

Unicode strings live in a large space, big enough for all of the characters in every language worldwide, but thankfully the internal representation of Unicode strings is irrelevant for users of Unicode. Alas, a file stream, such as sys.stdout, deals with bytes and has an encoding associated with it. You can change the default encoding that is used for new files by modifying the site module. That, however, requires changing your entire Python installation, which is likely to confuse other applications that may expect the encoding you originally configured Python to use (typically ASCII). This recipe rebinds sys.stdout to be a stream that expects Unicode input and outputs it in ISO8859-1 (also known as Latin-1). This doesn't change the encoding of any previous references to sys.stdout, as illustrated here. First, we keep a reference to the original, ASCII-encoded stdout:

>>> old = sys.stdout

Then we create a Unicode string that wouldn't go through stdout normally:

>>> char = u"\N{GREEK CAPITAL LETTER GAMMA}"  # a character that doesn't fit in ASCII
>>> print char
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)

Now we wrap stdout in the codecs stream writer for UTF-8, a much richer encoding, rebind sys.stdout to it, and try again:

>>> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
>>> print char

3.19.4 See Also

Documentation for the codecs and site modules and setdefaultencoding in sys in the Library Reference.

I l@ve RuBoard