5.6 General IP Design Strategies

Until now, I have looked only at theoretical ideas about how subnetting works. Now I want to talk about how to use it in a real network. The first step, before anything else can be done, is to decide how many segments are required and how many devices these segments need to support. These estimates need only be rough ballpark estimates because a good designer always assumes that a network will grow in time. This estimate constrains what sort of IP-address range is required. Generally, the network should be built out of subnets of a larger network because you will want to take advantage of route summarization later. Simply allocating a new distinct Class C network for every user segment is not useful.

5.6.1 Unregistered Addresses

At one time, there was a hot debate about the use of unregistered addresses on large LANs. Many organizations developed internal policies that forbade the use of unregistered addresses on principle. Before the advent of firewalls with NAT, it would have been impossible to connect these networks to the public Internet (or even to build nonpublic shared Internets between collaborating organizations) without the IETF centrally controlling all IP address allocations.

This sort of policy had an unfortunate side effect that nearly destroyed the Internet. An IP address has only 4 octets, so there can be at most 4,294,967,295 devices. Four billion sounds like it should be enough, but remember that the first half of these addresses are allocated as Class A networks, of which only 128 are possible (and some are reserved, as mentioned above). The next quarter includes the 16,384 possible Class B addresses (and again, some of these are reserved). Thus, three quarters of the available address range is used up on just 16 thousand large companies, universities, and government agencies. The Internet has many millions of participants, though, and they all must have registered IP addresses. Clearly, it isn't a possible, practical, or responsible use of scarce resources to use registered addresses on internal corporate networks.

The alternative is using unregistered addresses, but you have to be careful with unregistered addresses. If you arbitrarily pick an address range for internal use, the chances are good that this range is already in use somewhere on the Internet. As long as you hide everything behind a firewall and use NAT to hide your unregistered address, you won't conflict openly with anything. But one day you want to exchange email with whoever actually owns this address range or even connect to their web site, it will not work.

There is an easy resolution to this problem: you just need to use addresses that you know are not in use and never will be. The IETF set aside several ranges of addresses for exactly this purpose, and they are documented in RFC 1918. The allowed ranges are shown in Table 5-5.

Table 5-5. RFC-allowed unregistered IP addresses
Class	Network	Mask	Comment
Class A	10.0.0.0	255.0.0.0	One large Class A network
Class B	172.16.0.0 through 172.31.0.0	255.255.0.0	16 Class B networks
Class C	192.168.0.0 through 192.168.255.0	255.255.255.0	255 Class C networks

Anybody is free to use these addresses for anything they like, as long as they don't connect them directly to the Internet. For a very small LAN, such as a home or small office, it makes sense to use one of the 192.168 addresses. In the author's home LAN, I use 192.168.1.0, for example, with a firewall to connect to the Internet. The Internet Service Provider (ISP) supplies a registered address for the outside interface of the firewall. For larger networks where a Class B is required, the organization is free to pick from any of the 16 indicated unregistered addresses. There is only one unregistered Class A network, so almost every large network in the world uses 10.0.0.0 for its internal addressing. This doesn't cause any problems unless these organizations need to communicate directly with one another without intervening firewalls, which sometimes happens, particularly when one organization provides some sort of network service to another, as might occur with information service providers, network-management service providers, and corporate mergers. When conflicts like this occur, the best way to get around them is to carve off separate sections of the network interconnected by firewalls performing NAT.

5.6.2 IP Addressing Schemes

A successful IP addressing scheme operates on two levels. It works on a global level, allowing related groups of devices to share common ranges of addresses. It also works on a local level, ensuring that addresses are available for all local devices, without wasting addresses.

The global issue assumes that you can break up the large network into connected regions. Having done so, you should summarize routing information between these regions. To make route summarization work in the final network, you need a routing protocol that is able to do this work for you. Thus, a key part of any successful IP addressing scheme is understanding the specific routing protocol or protocols to be used.

Another important global-scale issue is the network's physical geography. An organization with a branch-office WAN usually needs a large number of small subnets for each of the branch offices. It also probably has a similar large number of point-to-point circuits (perhaps Frame Relay or ATM virtual circuits) for the actual WAN connections.

However, an organization that is concentrated on a single campus, perhaps with a small number of satellite sites, needs to break up its address ranges in a completely different way. Many organizations are a hybrid of these two extremes, having a few extremely large sites and a large number of small sites. Other organizations may start off in one extreme and, through growth, mergers, and acquisitions, find themselves at the other end. I can't really recommend a single IP addressing strategy that suits every organization, but I can talk about some principles that go into building a good strategy:

Create large, yet easily summarized, chunks
Set standard subnet masks for common uses
Ensure that there is enough capacity in each chunk for everything it needs to do
Provide enough flexibility to allow integration of new networks and new technologies

5.6.2.1 Easily summarized ranges of addresses

Take these points one at a time. First, creating large, yet easily summarized, chunks is relatively straightforward. Summarization makes it easier to build the network in distinct modules. This means that routers can deal with all of the routes for a particular section of the network with a single routing table entry. This ability is useful, no matter what sort of dynamic (or static) routing protocol is used in the network.

To be summarized, you have to be able to write the chunk of addresses with a single simple netmask. For example, if you use the 10.0.0.0 unregistered Class A range, then you might make your chunks by changing the second octet. The first chunk might be 10.1.0.0 with a mask of 255.255.0.0. This chunk will usually be written 10.1.0.0/16 to indicate 16 bits of mask. Then the second chunk would be 10.2.0.0/16, and so forth. If the mask turns out to be too small for the requirements of the network, it is easy enough to work with a shorter mask. Then you might summarize in groups of four, as in 10.4.0.0/14, 10.8.0.0/14 and so forth. Here, the mask is 255.252.0.0.

Another approach to creating easily summarized chunks of addresses uses the unregistered Class B range of addresses. In this case, you might simply start with 172.16.0.0/16 for the first chunk, 172.17.0.0/16 for the second, and so forth. Remember that only 16 of these unregistered Class B ranges are available. Instead, you might make your chunks smaller, as in 172.16.0.0/18, 172.16.64.0/18, 172.16.128.0/18, and 172.16.192.0/18.

The two key issues here are figuring out how many of these chunks are required and how big they must be to accommodate the network's requirements. If the number of chunks becomes too large, then you will need to create a hierarchy of address ranges. As I will discuss in Chapter 6, these chunks are appropriate for the size of an Open Shortened Path First (OSPF) area, but if you create too many areas, you need to be able to break your network into multiple Autonomous Systems (ASes). Then you will also require route summarization between ASes. I define these terms in Chapter 6, but for now you can just think of an OSPF area as an easily summarized group of addresses and of an AS as an easily summarized group of areas.

For a quick example of how this might be done, suppose you want to use 10.0.0.0 for the network. Then you might make your OSPF areas with a mask of 255.255.0.0, so each area has the same number of addresses as one Class B network—they will be denoted 10.0.0.0/16, 10.1.0.0/16, 10.2.0.0/16, and so forth. You might decide that for performance reasons you need to restrict the number of areas within an AS to a number like 50, for example. Unfortunately, 50 is not a nice "round" binary number, but it is not far from 64, which is.

Providing a few too many potential areas may turn out to be useful later, if you have to make one AS slightly larger than the others. It is not a bad thing to have to go to 64. In this case, the ASes are summarized on a mask of 255.192.0.0. The first one will be 10.0.0.0/10, the second will be 10.64.0.0/10, and so forth.

One final note on summarizing—the chunks do not all need to be the same size. One area can have small and large subnets, as long as the area can be summarized. Similarly, one AS can have small and large areas, as long as you can still summarize every area. You can even mix differently sized AS, as long as they also can be summarized easily.

The first AS could be 10.0.0.0/10, as noted previously. The second and third could be 10.64.0.0/11 and 10.96.0.0/11. The second 10-bit mask range is broken into two 11-bit mask ranges. Breaking up the ranges this way—by subdividing some of the larger chunks with a larger mask—is the best way to look at the problem. The same idea applies to subdividing area-sized chunks that are larger than required.

Dynamic routing protocols don't require this sort of summarization, but summarizing routes will result in a more stable and easily administered network.

5.6.2.2 Sufficient capacity in each range

How big does each chunk of addresses need to be? As mentioned before, it is easier to subdivide ranges of addresses than it is to merge them. You probably want to err on the large side, if you can. The only way to answer the question is to decide what you're going to put in this range. For the time being, suppose that the address range is for an OSPF area. The same reasoning applies to sizing OSPF AS, but on a larger scale. If you use a different dynamic routing protocol, such as RIP or EIGRP, the differences are again just a matter of the appropriate scales for these protocols. Focusing on the OSPF area version of the problem, the usual rule for area size is 50 routers in an area.

I will discuss this in more detail later when I talk about OSPF. As you will also see later, there must be a Core or backbone area that all of the other areas connect to. Thus, you have to be concerned about sizing the Core area as well as the peripheral areas.

The largest hierarchical LAN designs, discussed in Chapter 3, had two routers in each VLAN Distribution Area and a central Core with a handful of routers. Even with this design, the network would probably need at least 15 VLAN Distribution Areas before needing to be broken up into OSPF areas. Conversely, OSPF area structure becomes important very quickly in even a modest-sized WAN. The sizes of your LAN and WAN OSPF areas will probably be completely different, and one certainly wouldn't expect them to have the same sort of internal structure.

This book is about building large-scale LANs, so I will carry on with a relatively simple example involving a large campus network that connects 100 departments. Each department is its own VLAN Distribution Area in your hierarchical design model, so each department has two routers and several VLANs.

A VLAN Distribution Area is far too small to be a good OSPF area. However, all routers in each OSPF area must connect to the Core area through a small number (I will assume two) of Area Border Routers (ABRs). The main constraint in the size of each OSPF area is not the rule of 50 routers per area. Rather, you will quickly run into bandwidth limitations on those ABRs if you connect too many VLAN Distribution routers to them, so you might want to set a limit of 10 VLAN Distribution Areas per OSPF area, or perhaps only 5. A detailed bandwidth requirement study would yield the most appropriate topology.

Suppose that the network will have five VLAN Distribution Areas per OSPF area. You need to look at how many VLANs live in each Distribution Area and the netmask of each VLAN. If you know you have up to 25 VLANs, each with a mask of 255.255.255.0, then you can finish the puzzle. You need about 125 Class C-sized subnets in each OSPF area, and you need this chunk to be summarized. That number is easily accommodated in a Class B-sized range. With a mask of 255.255.0.0, you could fit in 256 subnets.

Note that this example implies that a mask one bit longer could have been used to accommodate 128 subnets. However, as I mentioned earlier, it is good to err on the high side in these sorts of estimates. The difference between 125 and 128 is only a 2% margin of error, which is far too close for such back-of-the-envelope estimates.

The whole campus has 100 departments and a total of 20 departmental OSPF areas, each containing 5 departments. In addition, the network has a Core OSPF area, with a total of 21 areas. If each area has a mask of 255.255.0.0, then the whole network has to be able to accommodate 21 masks of this size. Clearly, this network won't be able to use the 172.16.0.0/16-172.31.0.0/16 set of Class B addresses. There is more than enough room in the 10.0.0.0/8 Class A network.

This example shows the general thought process that needs to be followed when finding the appropriate sizes for the area-sized chunks of addresses. It is also easily extended to AS-sized chunks of addresses. Suppose, for example, that bandwidth and stability issues force the network engineer to break up the Core of this example campus network. To make the example interesting, suppose that the engineer has to break the network into three ASes.

There are 20 OSPF areas to divide among these three, which could mean two sets of 7 and a 6. Each of these areas has its own Core area, giving two 8s and a 7. The nearest round number in binary is 8. There is no room for growth, so once again, it is good to err on the large side and use groups of 16. This means that the ASes will have a summarization mask of 255.240.0.0. The first one would be 10.0.0.0/12, the second 10.16.0.0/12, and the third 10.32.0.0/12.

5.6.2.3 Standard subnet masks for common uses

One of the most useful things network designers can do to simplify the design of an IP network is to set up rules for how to use subnets. There are actually three types of rules:

What subnet masks to use for what functions
How to select a subnet from the larger group of addresses for the area
How to allocate the addresses within the subnet

The fewer different subnet masks in use in a network, the easier it is to work with the network. Many people, particularly less experienced network personnel, find the binary arithmetic for subnetting confusing.

In many networks, it is possible to get away with only three different subnet masks. For point-to-point links that can only physically support two devices, you can safely use the longest mask, 255.255.255.252. For most regular LAN segments, you can use 255.255.255.0, which is the same size as a Class C, and relatively easy to understand. Then you can allocate one other netmask for special subnets that are guaranteed to remain small, but nonetheless contain more than two devices. A good mask for this purpose is 255.255.255.240, which supports up to 14 devices.

Since there are broadcast-efficiency issues on larger LANs, it is best to try to keep the number of devices in a VLAN below a reasonable threshold. A good natural number for this purpose is the 254 host maximum allowed by the 24-bit mask, 255.255.255.0. Nonetheless, many organizations like to expand their VLAN-addressing range by using a mask of 255.255.254.0 or even 255.255.252.0. There is certainly nothing wrong with doing this. But if a network uses a mixture of VLANs with masks of 255.255.252.0 and 255.255.255.0, it is very easy to get confused in the heat of troubleshooting a difficult problem. For this reason, I tend to avoid these larger masks. I also feel that broadcast issues make Class C-sized subnets more efficient in a VLAN, but this latter issue will not be true on every network.

Many organizations also like to try to improve their address-allocation efficiency by using other in-between-sized subnet masks. For example, for smaller user LAN segments, they might opt to use a mask of 255.255.255.224. This mask undoubtedly winds up being necessary when trying to address a large network with a single Class B address. For example, if a network designer insisted on using a registered Class B range for a network, he might find that this kind of measure is needed to avoid running out of addresses. Getting into this sort of crunch using the unregistered Class A 10.0.0.0 would take either a monstrously huge network or terrible inefficiency.

Suppose you allocate an OSPF area's address range to a set of user VLANs. Suppose you have selected a standard netmask for all such subnets, but you also need to decide how to allocate these addresses from the larger range. This allocation is largely arbitrary, but it is useful to have a common standard for how to do it. The usual way to do this is to divide the area range into different groups according to netmask. For example, suppose the area range has a mask of 255.255.0.0 and that three different types of masks are in use—255.255.255.0 (24 bits), 255.255.224.0 (27 bits), and 255.255.252.0 (30 bits).

The range consists of 255 Class C-sized units. The first mask size uses one of these units for every subnet. The second one allows you to fit up to 8 subnets into each unit. You can also fit 64 of the smallest subnets into each unit. Then work out how many of each type of subnet you expect to require.

The smallest-sized subnets actually have two uses. It is useful to assign a unique internal loopback IP address to every router. Some networks use a mask of 255.255.255.255 for this purpose, but the rules hand over one entire Class C-sized group of addresses for these addresses. There never should be more than about 50 devices in any OSPF area. Since 64 30-bit subnets are in one of these groups, and since keeping the number of different masks to a minimum is a good idea, it makes sense to use a mask of 255.255.255.252 for these loopback addresses. You then need to see how many real point-to-point subnets are needed. This step requires a better idea of the network topology. The rules should be as general as possible. In effect, I am talking about the worst cases, so I can follow the rule no matter how much the future surprises me with new technology.

I might want to say that up to 50 routers will be in an OSPF area and perhaps 3 point-to-point circuits on each one. This tells me to set aside the first 4 Class C-sized groups for 30-bit subnets. Then I need to figure out how many 27-bit subnets I will require. I can fit 8 of these subnets into one Class C-sized group, so if I think that 64 of these will be enough, then perhaps I can reserve the next 8 groups. And this will leave the remaining 242 groups for 24-bit subnets.

Note that throughout these arguments I made an effort to break up the groups along bit-mask lines. I could have said that I wanted 5 groups of 30-bit subnets, but I chose 4 groups to keep the subgroups aligned in sets that the network can easily summarize with another netmask. I did this not because I have any foreseeable need to do so, but because one day I might have to break up an area into parts. If that happens, I want to make things as easy to split up as possible. At the same time, I don't want to make life more complicated if that split is not required.

You could make up a scheme where, for example, every sixteenth group contains 30-bit subnets and the next two are used for 27-bit subnets. This scheme would work, and it might make subdividing the area somewhat easier. However, subdividing an area will be hard no matter what you do, so it's more important to make everyday life easier.

Finally, the network designer needs to have standards for how she uses the addresses within a subnet. This depends not only on the subnet mask, but also on its use. Once again, if a small number of generally applicable rules can be made up, then troubleshooting a problem at 3 A.M. will be much easier.

A common example of this sort of rule involves the default gateway for the subnet. Most network designers like to make the first address in the subnet belong to the main router to get off this subnet. For the 24-bit subnet (255.255.255.0) 10.1.2.0/24, this address would be 10.1.2.1. In a subnet that uses HSRP or VRRP, this default gateway address would be the virtual or standby address. The real router interfaces would then have the next two addresses, 10.1.2.2 and 10.2.2.3, respectively. Many designers like to reserve a block of addresses at the start of the range just for network devices.

In a 30-bit point-to-point subnet (255.255.255.252) such as 10.1.2.4/30, only two addresses are available, 10.1.2.5 and 10.1.2.6. Devising a general rule for deciding which device gets the lower number is useful. I like to use the same rule mentioned earlier and make the lower number the address of the device leading to the Core. If this address is used to connect to a remote branch, then the remote side gets 10.1.2.6 and the head-office side will get 10.1.2.5. Sometimes this link might be a connection between two remote sites or two Core devices. In this case, which router gets the lower number becomes arbitrary. In the case of point-to-point links between a router and a host, the router gets the lower number.

Establishing specific rules for how the addresses are allocated can be useful for any VLAN. Many organizations have rules so specific that it is possible to tell from just looking at the IP address whether the device in question is a user workstation, a server, a printer, or a network device. This knowledge can greatly simplify troubleshooting and implementation.

5.6.2.4 Flexibility for future requirements

So far, I have tried to encourage designers to leave extra space. You need extra addresses in each subnet, extra subnets in each area, and extra room for more areas in the network. Network growth is driven purely by business factors that are largely unpredictable or, at least, unknown to the network designer.

One of the most profound challenges that a network designer can face is the acquisition of another company. This situation usually involves merging two networks that, in all likelihood, share address space and use conflicting standards. Even if this doesn't happen, healthy organizations tend to grow over time, which means that their networks must have growth capacity. A good IP addressing strategy always involves carefully overestimating the requirements, just in case.

5.6.3 The Default Gateway Question

The default gateway on any subnet is the IP address of the router that gets you off the segment. In fact, many routers may exist on a subnet. These routers may all lead to different destinations, but the default gateway is the one that you send a packet to when you don't know which one of the other routers can handle it.

In hierarchical network architectures, it is not common to put several routers on a segment. In this sort of design, it is generally best to use two routers and HSRP or VRRP for redundancy instead. But in a general network it is possible to have multiple routers all leading to different destinations.

The end devices need to have some sort of local routing table. In its simplest form, this routing table says two things. First, it directs all packets destined for the local subnet to just use the network interface card. Second, it contains a route for the default gateway, often expressed as a route to the IP address 0.0.0.0, with a mask of 0.0.0.0. This default gateway route is traditionally handled in one of two ways. Either it points to the local router, or it simply directs everything to use its own LAN interface without specifying a next hop. This second option requires the local router to act as an ARP proxy device for the remote networks it can route to. When the end station wants to send the packet to the unknown network, it first sends out an ARP packet for the destination device. That device is not actually on the local LAN segment, so it cannot respond to this ARP, but the router that knows how to get there responds for it. In proxy ARP, the router responds to the ARP packet with its own MAC address. The end device then communicates directly with the router at Layer 2 and the packets are routed normally.

At one time, this second option was the only way to reliably give a LAN segment router redundancy. If one router for the segment died, the second one would simply take over the traffic. Both would be configured for proxy ARP and both would handle live traffic all the time under normal operating conditions.

There are two problems with this strategy. The first is that every ARP query is a broadcast. Even in a fully switched VLAN architecture, every time a device wants to communicate outside of its subnet, it must disturb every other device in the VLAN. This disturbance is unlikely to cause large traffic overhead, but it is nonetheless inefficient. Furthermore, because it must ARP for every off-segment host separately, there is a short additional latency in every call setup.

The second problem is more serious. Most end devices use a simple ARP cache system that allows only one MAC address to be mapped to each destination IP address. If one router fails, the second will not be able to take over. Rather, the end device will continue trying the first router until the ARP cache entry times out. This timeout period is typically at least 5 minutes and often as long as 20. Clearly this time is not good enough if the network actually requires a robust fault-recovery system. But a shorter time is clearly inefficient.

This proxy ARP approach does give a convenient way to build IP-level fault tolerance for a LAN segment. However, the advent of VRRP and HSRP provides a much quicker and more efficient way of achieving the same result. In a hierarchical LAN design, the best high-availability topology involves two routers running VRRP or HSRP. Every end device on the subnet then treats the virtual address shared by these two routers as the default gateway.