3.19 Printing Unicode Characters to Standard Output
Credit: David Ascher
3.19.1 Problem
You want to print Unicode strings to
standard output (e.g., for debugging), but they
don't fit in the default encoding.
3.19.2 Solution
Wrap the stdout stream with a converter, using the
codecs module:
import codecs, sys
sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)
3.19.3 Discussion
Unicode strings live in a large space, big enough for all of the
characters in every language worldwide, but thankfully the internal
representation of Unicode strings is irrelevant for users of Unicode.
Alas, a file stream, such as sys.stdout, deals
with bytes and has an encoding associated with it. You can change the
default encoding that is used for new files by modifying the
site module. That, however, requires changing your
entire Python installation, which is likely to confuse other
applications that may expect the encoding you originally configured
Python to use (typically ASCII). This recipe rebinds
sys.stdout to be a stream that expects Unicode
input and outputs it in ISO8859-1 (also known as Latin-1). This
doesn't change the encoding of any previous
references to sys.stdout, as illustrated here.
First, we keep a reference to the original, ASCII-encoded
stdout:
>>> old = sys.stdout
Then we create a Unicode string that wouldn't go
through stdout normally:
>>> char = u"\N{GREEK CAPITAL LETTER GAMMA}" # a character that doesn't fit in ASCII
>>> print char
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
Now we wrap stdout in the
codecs stream writer for UTF-8, a much richer
encoding, rebind sys.stdout to it, and try again:
>>> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
>>> print char
3.19.4 See Also
Documentation for the codecs and
site modules and
setdefaultencoding in sys in
the Library Reference.
|