4.4 Searching and Replacing Text in a File
Credit: Jeff Bauer
4.4.1 Problem
You need to change one
string into another throughout a file.
4.4.2 Solution
String substitution is most simply
performed by the
replace
method of string objects. The work here is to support reading from
the specified file (or standard input) and writing to the specified
file (or standard output):
#!/usr/bin/env python
import os, sys
nargs = len(sys.argv)
if not 3 <= nargs <= 5:
print "usage: %s search_text replace_text [infile [outfile]]" % \
os.path.basename(sys.argv[0])
else:
stext = sys.argv[1]
rtext = sys.argv[2]
input = sys.stdin
output = sys.stdout
if nargs > 3:
input = open(sys.argv[3])
if nargs > 4:
output = open(sys.argv[4], 'w')
for s in input.xreadlines( ):
output.write(s.replace(stext, rtext))
output.close( )
input.close( )
4.4.3 Discussion
This recipe is really simple, but that's what
beautiful about it�why do complicated stuff when simple stuff
suffices? The recipe is a simple main script, as indicated by the
leading "shebang" line. The script
looks at its arguments to determine the search text, the replacement
text, the input file (defaulting to standard input), and the output
file (defaulting to standard output). Then, it loops over each line
of the input file, writing to the output file a copy of the line with
the substitution performed on it. That's all! For
accuracy, it closes both files at the end.
As long as it fits comfortably in memory in two copies (one before
and one after the replacement, since strings are immutable), we
could, with some speed gain, operate on the whole input
file's contents at once instead of looping. With
today's PCs typically coming with 256 MB of memory,
handling files of up to about 100 MB should not be a problem. It
suffices to replace the for loop with one single
statement:
output.write(input.read( ).replace(stext, rtext))
As you can see, that's even simpler than the loop
used in the recipe.
If you're stuck with an older version of Python,
such as 1.5.2, you may still be able to use this recipe. Change the
import statement to:
import os, sys, string
and change the last two lines of the recipe into:
for s in input.readlines( ):
output.write(string.replace(s, stext, rtext))
The
xreadlines
method used in the recipe was introduced with Python 2.1. It takes
precautions not to read all of the file into memory at once, while
readlines must do so, and thus may have problems
with truly huge files.
In Python 2.2, the for loop can also be written
more directly as:
for s in input:
output.write(s.replace(stext, rtext))
This offers the fastest and simplest approach.
4.4.4 See Also
Documentation for the open built-in function and
file objects in the Library Reference.
|