3.16 Converting Between Different Naming Conventions
Credit: Sami Hangaslammi
3.16.1 Problem
You
have a body of
code whose identifiers use one of the common naming conventions to
represent multiple words in a single identifier (CapitalizedWords,
mixedCase, or under_scores), and you need to convert the code to
another naming convention in order to merge it smoothly with other
code.
3.16.2 Solution
re.sub covers the two hard cases,
converting underscore to and from the others:
import re
def cw2us(x): # capwords to underscore notation
return re.sub(r'(?<=[a-z])[A-Z]|(?<!^)[A-Z](?=[a-z])',
r"_\g<0>", x).lower( )
def us2mc(x): # underscore to mixed-case notation
return re.sub(r'_([a-z])', lambda m: (m.group(1).upper( )), x)
Mixed-case to underscore is just like capwords to underscore (the
case-lowering of the first character becomes redundant, but it does
no harm):
def mc2us(x): # mixed-case to underscore notation
return cw2us(x)
Underscore to capwords can similarly exploit the underscore to
mixed-case conversion, but it needs an extra twist to uppercase the
start:
def us2cw(x): # underscore to capwords notation
s = us2mc(x)
return s[0].upper( )+s[1:]
Conversion between mixed-case and capwords is, of course, just an
issue of lowercasing or uppercasing the first character, as
appropriate:
def mc2cw(x): # mixed-case to capwords
return s[0].lower( )+s[1:]
def cw2mc(x): # capwords to mixed-case
return s[0].upper( )+s[1:]
3.16.3 Discussion
Here are some usage examples:
>>> cw2us("PrintHTML")
'print_html'
>>> cw2us("IOError")
'io_error'
>>> cw2us("SetXYPosition")
'set_xy_position'
>>> cw2us("GetX")
'get_x'
The set of functions in this recipe is useful, and very practical, if
you need to homogenize naming styles in a bunch of code, but the
approach may be a bit obscure. In the interest of clarity, you might
want to adopt a conceptual stance that is general and fruitful. In
other words, to convert a bunch of formats into each other, find a
neutral format and write conversions from each of the
N formats into the neutral one and back again.
This means having 2N conversion functions rather
than N x
(N-1)�a big win for
large N�but the point here (in which
N is only three) is really one of clarity.
Clearly, the underlying neutral format that each identifier style is
encoding is a list of words. Let's say, for
definiteness and without loss of generality, that they are lowercase
words:
import string, re
def anytolw(x): # any format of identifier to list of lowercased words
# First, see if there are underscores:
lw = string.split(x,'_')
if len(lw)>1: return map(string.lower, lw)
# No. Then uppercase letters are the splitters:
pieces = re.split('([A-Z])', x)
# Ensure first word follows the same rules as the others:
if pieces[0]: pieces = [''] + pieces
else: pieces = pieces[1:]
# Join two by two, lowercasing the splitters as you go
return [pieces[i].lower( )+pieces[i+1] for i in range(0,len(pieces),2)]
There's no need to specify the format, since
it's self-describing. Conversely, when translating
from our internal form to an output format, we do need to specify the
format we want, but on the other hand, the functions are very simple:
def lwtous(x): return '_'.join(x)
def lwtocw(x): return ''.join(map(string.capitalize,x))
def lwtomc(x): return x[0]+''.join(map(string.capitalize,x[1:]))
Any other combination is a simple issue of functional composition:
def anytous(x): return lwtous(anytolw(x))
cwtous = mctous = anytous
def anytocw(x): return lwtocw(anytolw(x))
ustocw = mctocw = anytocw
def anytomc(x): return lwtomc(anytolw(x))
cwtomc = ustomc = anytomc
The specialized approach is slimmer and faster, but this generalized
stance may ease understanding as well as offering wider application.
3.16.4 See Also
The Library Reference sections on the
re and string modules.
|