I l@ve RuBoard Previous Section Next Section

4.24 Computing Directory Sizes in a Cross-Platform Way

Credit: Frank Fejes

4.24.1 Problem

You need to compute the total size of a directory (or set of directories) in a way that works under both Windows and Unix-like platforms.

4.24.2 Solution

There are easier platform-dependent solutions, such as Unix's du, but Python also makes it quite feasible to have a cross-platform solution:

import os
from os.path import *

class DirSizeError(Exception): pass

def dir_size(start, follow_links=0, start_depth=0, max_depth=0, skip_errs=0):

    # Get a list of all names of files and subdirectories in directory start
    try: dir_list = os.listdir(start)
    except:
        # If start is a directory, we probably have permission problems
        if os.path.isdir(start):
            raise DirSizeError('Cannot list directory %s'%start)
        else:  # otherwise, just re-raise the error so that it propagates
            raise

    total = 0L
    for item in dir_list:
        # Get statistics on each item--file and subdirectory--of start
        path = join(start, item)
        try: stats = os.stat(path)
        except: 
            if not skip_errs:
                raise DirSizeError('Cannot stat %s'%path)
        # The size in bytes is in the seventh item of the stats tuple, so:
        total += stats[6]
        # recursive descent if warranted
        if isdir(path) and (follow_links or not islink(path)):
            bytes = dir_size(path, follow_links, start_depth+1, max_depth)
            total += bytes
            if max_depth and (start_depth < max_depth):
                print_path(path, bytes)
    return total

def print_path(path, bytes, units='b'):
    if units == 'k':
        print '%-8ld%s' % (bytes / 1024, path)
    elif units == 'm':
        print '%-5ld%s' % (bytes / 1024 / 1024, path)
    else:
        print '%-11ld%s' % (bytes, path)

def usage (name):
    print "usage: %s [-bkLm] [-d depth] directory [directory...]" % name
    print '\t-b\t\tDisplay in Bytes (default)'
    print '\t-k\t\tDisplay in Kilobytes'
    print '\t-m\t\tDisplay in Megabytes'
    print '\t-L\t\tFollow symbolic links (meaningful on Unix only)'
    print '\t-d, --depth\t# of directories down to print (default = 0)'

if _ _name_ _=='_ _main_ _':
    # When used as a script:
    import string, sys, getopt

    units = 'b'
    follow_links = 0
    depth = 0

    try:
        opts, args = getopt.getopt(sys.argv[1:], "bkLmd:", ["depth="])
    except getopt.GetoptError:
        usage(sys.argv[0])
        sys.exit(1)

    for o, a in opts:
        if o == '-b': units = 'b'
        elif o == '-k': units = 'k'
        elif o == '-L': follow_links = 1
        elif o == '-m': units = 'm'
        elif o in ('-d', '--depth'):
            try: depth = int(a)
            except:
                print "Not a valid integer: (%s)" % a
                usage(sys.argv[0])
                sys.exit(1)

    if len(args) < 1:
        print "No directories specified"
        usage(sys.argv[0])
        sys.exit(1)
    else:
        paths = args

    for path in paths:
        try: bytes = dir_size(path, follow_links, 0, depth)
        except DirSizeError, x: print "Error:", x
        else: print_path(path, bytes)

4.24.3 Discussion

Unix-like platforms have the du command, but that doesn't help when you need to get information about disk-space usage in a cross-platform way. This recipe has been tested under both Windows and Unix, although it is most useful under Windows, where the normal way of getting this information requires using a GUI. In any case, the recipe's code can be used both as a module (in which case you'll normally call only the dir_size function) or as a command-line script. Typical use as a script is:

C:\> python dir_size.py "c:\Program Files"

This will give you some idea of where all your disk space has gone. To help you narrow the search, you can, for example, display each subdirectory:

C:\> python dir_size.py --depth=1 "c:\Program Files"

The recipe's operation is based on recursive descent. os.listdir provides a list of names of all the files and subdirectories of a given directory. If dir_size finds a subdirectory, it calls itself recursively. An alternative architecture might be based on os.path.walk, which handles the recursion on our behalf and just does callbacks to a function we specify, for each subdirectory it visits. However, here we need to be able to control the depth of descent (e.g., to allow the useful --depth command-line option, which turns into the max_depth argument of the dir_size function). This control is easier to attain when we administer the recursion directly, rather than letting os.path.walk handle it on our behalf.

4.24.4 See Also

Documentation for the os.path and getopt modules in the Library Reference.

    I l@ve RuBoard Previous Section Next Section