4.24 Computing Directory Sizes in a Cross-Platform Way
Credit: Frank Fejes
4.24.1 Problem
You need to compute the
total size of a directory (or set of directories) in a way that works
under both Windows and Unix-like platforms.
4.24.2 Solution
There are easier platform-dependent solutions, such as
Unix's du, but Python also
makes it quite feasible to have a cross-platform solution:
import os
from os.path import *
class DirSizeError(Exception): pass
def dir_size(start, follow_links=0, start_depth=0, max_depth=0, skip_errs=0):
# Get a list of all names of files and subdirectories in directory start
try: dir_list = os.listdir(start)
except:
# If start is a directory, we probably have permission problems
if os.path.isdir(start):
raise DirSizeError('Cannot list directory %s'%start)
else: # otherwise, just re-raise the error so that it propagates
raise
total = 0L
for item in dir_list:
# Get statistics on each item--file and subdirectory--of start
path = join(start, item)
try: stats = os.stat(path)
except:
if not skip_errs:
raise DirSizeError('Cannot stat %s'%path)
# The size in bytes is in the seventh item of the stats tuple, so:
total += stats[6]
# recursive descent if warranted
if isdir(path) and (follow_links or not islink(path)):
bytes = dir_size(path, follow_links, start_depth+1, max_depth)
total += bytes
if max_depth and (start_depth < max_depth):
print_path(path, bytes)
return total
def print_path(path, bytes, units='b'):
if units == 'k':
print '%-8ld%s' % (bytes / 1024, path)
elif units == 'm':
print '%-5ld%s' % (bytes / 1024 / 1024, path)
else:
print '%-11ld%s' % (bytes, path)
def usage (name):
print "usage: %s [-bkLm] [-d depth] directory [directory...]" % name
print '\t-b\t\tDisplay in Bytes (default)'
print '\t-k\t\tDisplay in Kilobytes'
print '\t-m\t\tDisplay in Megabytes'
print '\t-L\t\tFollow symbolic links (meaningful on Unix only)'
print '\t-d, --depth\t# of directories down to print (default = 0)'
if _ _name_ _=='_ _main_ _':
# When used as a script:
import string, sys, getopt
units = 'b'
follow_links = 0
depth = 0
try:
opts, args = getopt.getopt(sys.argv[1:], "bkLmd:", ["depth="])
except getopt.GetoptError:
usage(sys.argv[0])
sys.exit(1)
for o, a in opts:
if o == '-b': units = 'b'
elif o == '-k': units = 'k'
elif o == '-L': follow_links = 1
elif o == '-m': units = 'm'
elif o in ('-d', '--depth'):
try: depth = int(a)
except:
print "Not a valid integer: (%s)" % a
usage(sys.argv[0])
sys.exit(1)
if len(args) < 1:
print "No directories specified"
usage(sys.argv[0])
sys.exit(1)
else:
paths = args
for path in paths:
try: bytes = dir_size(path, follow_links, 0, depth)
except DirSizeError, x: print "Error:", x
else: print_path(path, bytes)
4.24.3 Discussion
Unix-like platforms have the
du command, but that
doesn't help when you need to get information about
disk-space usage in a cross-platform way. This recipe has been tested
under both Windows and Unix, although it
is most useful under Windows, where the normal way of getting this
information requires using a GUI. In any case, the
recipe's code can be used both as a module (in which
case you'll normally call only the
dir_size function) or as a command-line script.
Typical use as a script is:
C:\> python dir_size.py "c:\Program Files"
This will give you some idea of where all your disk space has gone.
To help you narrow the search, you can, for example, display each
subdirectory:
C:\> python dir_size.py --depth=1 "c:\Program Files"
The recipe's operation is based on recursive
descent.
os.listdir
provides a list of names of all the files and subdirectories of a
given directory. If dir_size finds a subdirectory,
it calls itself recursively. An alternative architecture might be
based on
os.path.walk, which
handles the recursion on our behalf and just does callbacks to a
function we specify, for each subdirectory it visits. However, here
we need to be able to control the depth of descent (e.g., to allow
the useful --depth command-line option, which
turns into the max_depth argument of the
dir_size function). This control is easier to
attain when we administer the recursion directly, rather than letting
os.path.walk handle it on our behalf.
4.24.4 See Also
Documentation for the os.path and
getopt modules in the Library
Reference.
|