2.3 Overview of TCP/IP

TCP/IP is a set of communications protocols that define how different types of computers talk to one another. It's named for its two most common protocols, the Transmission Control Protocol and the Internet Protocol. The Internet Protocol moves data between hosts: it splits data into packets, which are then forwarded to machines via the network. The Transmission Control Protocol ensures that the packets in a message are reassembled in the correct order at their final destination and that any missing datagrams are re-sent until they are correctly received. Other protocols provided as part of TCP/IP include:

Address Resolution Protocol (ARP): Translates between Internet and local hardware addresses (Ethernet, etc.)
Internet Control Message Protocol (ICMP): Error-message and control protocol
Point-to-Point Protocol (PPP): Enables TCP/IP (and other protocols) to be carried across both synchronous and asynchronous point-to-point serial links
Reverse Address Resolution Protocol (RARP): Translates between local hardware and Internet addresses (opposite of ARP)
Simple Mail Transport Protocol (SMTP): Used by sendmail to send mail via TCP/IP
Simple Network Management Protocol (SNMP): Performs distributed network management functions via TCP/IP
User Datagram Protocol (UDP): Provides data transfer, but without the reliable delivery capabilities of TCP

TCP/IP is covered in-depth in the three-volume set Internetworking with TCP/IP (Prentice Hall). The commands in this chapter and the next are described in more detail in TCP/IP Network Administration and Linux Network Administrator's Guide both published by O'Reilly.

In the architecture of TCP/IP protocols, data is passed down the stack (toward the Network Access Layer) when it is sent to the network, and up the stack when it is received from the network (see Figure 2-1).

Figure 2-1. Layers in the TCP/IP protocol architecture

2.3.1 IP Addresses

The IP (Internet protocol) address is a 32-bit binary number that differentiates your machine from all others on the network. Each machine must have a unique IP address. An IP address contains two parts: a network part and a host part. The number of address bits used to identify the network and host differ according to the class of the address. There are three main address classes: A, B, and C (see Figure 2-2). The leftmost bits indicate what class each address is.

Figure 2-2. IP address structure

A standard called Classless Inter-Domain Routing (CIDR) extends the class system's idea of using initial bits to identify where packets should be routed. Under CIDR, a new domain can be created with any number of fixed leftmost bits (not just a multiple of 8).

Another new standard called IPv6 changes the method of addressing and increases the number of fields. An IPv6 address is 128 bits. When written, it is usually divided into eight 16-bit hexadecimal blocks separated by colons. For example:

FE80:0000:0000:0000:0202:B3FF:FE1E:8329

To shorten this, leading zeros may be skipped, and any one set of consecutive zeros can be replaced with double colons. For example, the above address can be reduced to:

FE80::202:B3FF:FE1E:8329

When IPv4 and IPv6 networks are mixed, the IPv4 address can be packed into the lower four bytes, yielding an address like 0:0:0:0:0:0:192.168.1.2, or ::192.168.1.2, or even ::C0A8:102.

Because improvements in IPv4, including CIDR, have relieved much of the pressure to migrate to IPv6, organizations have been slow to adopt IPv6. Some use it experimentally, but communication between organizations using IPv6 internally are still usually encapsulated inside IPv4 datagrams, and it will be a while before IPv6 becomes common.

If you wish to connect to the Internet, contact an Internet Service Provider (ISP) and have them assign you a network address or range of addresses. If you are not connecting to an outside network, you can choose your own network address as long as it conforms to the IP address syntax. You should use the special reserved addresses provided in RFC 1597, which lists IP network numbers for private networks that don't have to be registered with the IANA (Internet Assigned Numbers Authority). An IP address is different from an Ethernet address, which is assigned by the manufacturer of the physical Ethernet card.

2.3.2 Gateways and Routing

Gateways are hosts responsible for exchanging routing information and forwarding data from one network to another. Each portion of a network that is under a separate local administration is called an autonomous system (AS). Autonomous systems connect to each other via exterior gateways. An AS also may contain its own system of networks, linked via interior gateways.

2.3.2.1 Gateway protocols

Gateway protocols include:

EGP (Exterior Gateway Protocol)
BGP (Border Gateway Protocol): Protocols for exterior gateways to exchange information
RIP (Routing Information Protocol): Interior gateway protocol; most popular for LANs
Hello Protocol
OSPF (Open Shortest Path First): Interior gateway protocols

2.3.2.2 Routing daemons

While most networks will use a dedicated router as a gateway, GNU Zebra and routed, the routing daemons, can be run on a host to make it function as a gateway. Only one of them can run on a host at any given time. Zebra is the gateway routing daemon that replaces the older gated routing daemon. It allows a host to function as both an exterior and interior gateway, and simplifies the routing configuration by combining the protocols RIP, Hello, BGP, EGP, and OSPF into a single package. We do not cover GNU Zebra in this book.

routed, a network routing daemon that uses RIP, allows a host to function as an interior gateway only, and manages the Internet routing tables. For more details on routed, see Chapter 3.

2.3.2.3 Routing tables

Routing tables provide information needed to route packets to their destinations. This information includes destination network, gateway to use, route status, and number of packets transmitted. Routing tables can be displayed with the netstat command.

2.3.3 Name Service

Each host on a network has a name that points to information about that host. Hostnames can be assigned to any device that has an IP address. Name service translates the hostnames (which are easy for people to remember) to IP addresses (the numbers the computer deals with).

2.3.3.1 DNS and BIND

The Domain Name System (DNS) is a distributed database of information about hosts on a network. Its structure is similar to that of the Unix filesystem—an inverted tree, with the root at the top. The branches of the tree are called domains (or subdomains) and correspond to IP addresses. The most popular implementation of DNS is the BIND (Berkeley Internet Name Domain) software.

DNS works as a client/server application. The resolver is the client, the software that asks questions about host information. The name server is the process that answers the questions. The server side of BIND is the named daemon. You can interactively query name servers for host information with the dig and host commands. See Chapter 3 for more details on named, dig, and host.

The name server of a domain is responsible for keeping (and providing on request) the names of the machines in its domain. Other name servers on the network forward requests for these machines to the nameserver.

2.3.3.2 Domain names

The full domain name is the sequence of names from the current domain back to the root, with a period separating the names. For instance, oreilly.com indicates the domain oreilly (for O'Reilly & Associates), which is under the domain com (for commercial). One machine under this domain is www.oreilly.com. Top-level domains include:

aero: Air-transport industry
biz: Commercial organizations
com: Commercial organizations
coop: Cooperatives
edu: United States educational organizations
gov: United States government organizations
info: Informative sites
int: International organizations
mil: United States military departments
museum: Museums
name: Names of individuals
net: Commercial Internet organizations, usually Internet service providers
org: Miscellaneous organizations
pro: Professionals, including accountants, lawyers, and physicians

Some domains (e.g., edu, gov, and mil) are sponsored by organizations that restrict their use; others (e.g., com, info, net, and org) are unrestricted. Countries also have their own two-letter top-level domains based on two-letter country codes. One special domain, arpa, is used for technical infrastructure purposes. The Internet Corporation for Assigned Names and Numbers (ICANN) oversees top-level domains and provides contact information for sponsored domains.

2.3.4 Configuring TCP/IP

Certain commands are normally run in the system's startup files to enable a system to connect to a network. These commands can also be run interactively.

2.3.4.1 ifconfig

The network interface represents the way that the networking software uses the hardware—the driver, the IP address, and so forth. To configure a network interface, use the ifconfig command. With ifconfig, you can assign an address to a network interface, setting the netmask, broadcast address, and IP address at boot time. You can also set network interface parameters, including the use of ARP, the use of driver-dependent debugging code, the use of one-packet mode, and the address of the correspondent on the other end of a point-to-point link. For more information on ifconfig, see Chapter 3.

2.3.4.2 Serial-line communication

There are two protocols for serial-line communication: Serial Line IP (SLIP) and Point-to-Point Protocol (PPP). These protocols let computers transfer information using the serial port instead of a network card and a serial cable instead of an Ethernet cable. SLIP is rarely used anymore, having been replaced by PPP.

PPP was intended to remedy some of SLIP's failings—it can hold packets from non-Internet protocols, it implements client authorization and error detection/correction, and it dynamically configures each network protocol that passes through it. Under Linux, PPP exists as a driver in the kernel and as the daemon pppd. For more information on pppd, see Chapter 3.

2.3.5 Troubleshooting TCP/IP

The following commands can be used to troubleshoot TCP/IP. For more details on these commands, see Chapter 3.

ifconfig: Provide information about the basic configuration of the network interface.
netstat: Display network status.
ping: Indicate whether a remote host can be reached.
nslookup: Query the DNS name service.
traceroute: Trace route taken by packets to reach network host.