4.2 Intel Processors
Nearly
all PCs use either an Intel CPU or an
Intel-compatible CPU made by AMD (K6/Athlon/Duron series). The
dominance of Intel in CPUs and Microsoft in operating systems gave
rise to the hybrid term Wintel, which refers to
systems that run Windows on an Intel or compatible CPU. Intel
processors are referred to generically as x86
processors, based on Intel's early
processor naming convention, 8086, 80186, 80286, etc. Intel has
produced seven CPU generations, the first five of which are now
obsolete and the sixth obsolescent.
- First generation
-
The 8086 was Intel's first
mainstream processor, and used 16 bits for both internal and external
communications. The 8086 was first used in the late 1970s in
dedicated word processors and minicomputers like the DisplayWriter
and the System/23 DataMaster. When IBM shipped their first PC in
1981, they used the 8088, an 8086 variant that used 16 bits
internally but only 8 bits externally, because 8-bit peripherals were
at that time more readily available and less expensive than were
16-bit components. The 8086 achieved prominence much later when
Compaq created the DeskPro as an improved clone of the IBM PC/XT. A
few early PCs, notably Radio Shack models, were also built around the
80186 and 80188 CPUs, which were enhanced versions of the 8086 and
8088 respectively. The 8088 and 8086 CPUs did not include a
floating-point unit (FPU), although an 8087 FPU, called a
math coprocessor, was available as an optional
upgrade chip. First-generation Intel CPUs (or their modern
equivalents) are still used in some embedded applications, but they
are long obsolete as general-purpose CPUs.
- Second generation
-
In 1982, Intel introduced the
long-awaited follow-on to their first-generation processors. The
80286, based on the iAPX-32 core, provided a quantum leap in
processor performance, executing instructions as much as five times
faster than an 808x processor running at the same clock speed. The
80286 processed instructions as fast as many mainframe processors of
the time. The 80286 also increased addressable memory from 1 MB to 16
MB, and introduced protected mode operations.
The IBM PC/AT was the first commercial implementation of the 80286.
The optional 80287 FPU chip added floating-point acceleration to
80286 systems. Although long obsolete as a general-purpose CPU, the
80286 is still used in embedded controllers.
- Third generation
-
Intel's next generation debuted in 1985 as the
80386, later shortened to just 386. The 386 was
Intel's first 32-bit CPU, which communicated
internally and externally with a 32-bit data bus and 32-bit address
bus. The 386 was available in 16, 20, 25, and 33 MHz versions.
Although 386 clock speeds were only slightly faster than those of the
80286, improved architecture resulted in significant performance
increases. The optional 80387 FPU added floating-point acceleration
to 386 systems. Intel later renamed the 386 to the 386DX and released
a cheaper version called the 386SX, which used 32 bits internally but
only 16 bits externally. The 386SX was notable as the first Intel
processor that included an internal (L1) cache, although it was only
8 KB and relatively inefficient. The 386 is long obsolete as a
general-purpose CPU, but is still commonly used in embedded
controllers.
- Fourth generation
-
Intel's next generation
debuted in 1989 as the 486 (there never was an 80486). The 486 was a
full 32-bit CPU with 8 KB of L1 cache, included a built-in FPU, and
was available in speeds from 20 MHz to 50 MHz. Intel released 486DX
and 486SX versions. The 486SX was in fact a 486DX with the FPU
disabled. Intel sold the 487SX, which was actually a full-blown
486DX. Installing a 487SX in the coprocessor socket simply disabled
the existing 486SX. The 486DX/2, introduced in 1992, was the first
Intel processor that ran internally at a multiple of the memory bus
speed. The 486DX/2 clock ran at twice bus speed, and was available in
25/50, 33/66, and 40/80 MHz versions. The 486DX/4, introduced in
1994, ran (despite its name) at thrice bus speed, doubled L1 cache to
16 KB, and was available in 25/75, 33/100, and 40/120 versions. The
486 is obsolete, but not for the reason you might think. A fast 486
with sufficient memory is still fast enough to run Windows 9X or
Linux in undemanding applications, but these systems are so old that
essentially none of them were Y2K compliant when manufactured, and
BIOS updates to make them so are generally unavailable. The only
practical way to upgrade a 486 system is to discard it and buy or
build a new system.
- Fifth generation
-
The Intel Pentium CPU defines the fifth generation. It provides much
better performance than its 486 ancestors by incorporating several
architectural improvements, most notably an increase in data bus
width from 32 bits to 64 bits and an increase in CPU-memory bus speed
from 33 MHz to 60 and 66 MHz. Intel actually shipped several
different versions of the Pentium, including:
- Pentium P54
-
The original Pentium shipped in 1993 in 50, 60, and 66 MHz versions
using a 1X CPU multiplier, ran (hot) at 5.0 volts, contained a dual 8
KB + 8 KB L1 cache, and fit Socket 4 motherboards.
- Pentium P54C
-
The "Classic Pentium" first shipped
in 1994, was available in speeds from 75 to 200 MHz using CPU
multipliers from 1.5 to 3.0, used 3.3 volts, and contained the same
dual L1 cache as the P54. P54C CPUs fit Socket 5 motherboards and
most Socket 7 motherboards.
- Pentium P55C
-
The Pentium/MMX shipped in 1997, was available in speeds from 166 to
233 MHz using CPU multipliers from 2.5 to 3.5, used 3.3 volts, and
contained a dual 16 KB + 16 KB L1 cache, twice the size of earlier
Pentiums. The other major change from the P54C was the addition of
the MMX instruction set, a set of additional instructions that
greatly improved graphics processing speed. P55C CPUs fit Socket 7
motherboards, and were still commercially available as late as 2000.
- Sixth generation
-
This generation began with the 1995 introduction of the Pentium Pro,
and includes recent Intel processors such as the Pentium II, Celeron,
and Pentium III. Late-model sixth-generation Intel desktop processors
are now relegated to entry-level systems, and will be gradually
phased out during 2002, with only the Tualatin-core Celeron
processors remaining as representatives of this generation by the end
of 2002.
- Seventh generation
-
This is the current generation of Intel processors, and includes
Intel's flagship Pentium 4 and the P4-based
Willamette128-core Celeron.
Intel currently manufactures several sixth-generation processors,
including numerous variants and derivatives of the Celeron and
Pentium III, and two seventh-generation processors, the Pentium 4 and
the Willamette-core Celeron. The following sections describe current
and recent Intel processors.
4.2.1 Pentium, Pentium/MMX
Intel originally designated their processors by number rather than by
name—Intel 8086, 8088, 80186, 80286, and so on. Intel dropped
the "80" prefix early in the life
cycle of the 80386, relabeling it as the 386. (Intel never made an
"80486" processor despite what some
people believe.) By the time Intel shipped their fourth-generation
processors, they were tired of other makers using similar names for
their compatible processors. Intel believed that these similar names
could lead to confusion among customers, and so tried to trademark
their X86 naming scheme. When Intel learned that part numbers cannot
be trademarked, they decided to drop the
"86" naming scheme and create a
made-up word to name their fifth-generation processors. They came up
with Pentium.
Intel has produced the following three major subgenerations of
Pentium:
- P54
-
These earliest Pentium CPUs, first shipped in March 1993, fit
Socket
4 motherboards, use a 3.1 million transistor core, have 16 KB L1
cache, and use 5.0 volts for both core and I/O components. P54-based
systems use a 50, 60, or 66 MHz memory bus and a fixed 1.0 CPU
multiplier to yield processor speeds of 50, 60, or 66 MHz.
- P54C
-
The so-called Classic Pentium CPUs, first
shipped in October 1994, fit Socket 5 and most Socket 7 motherboards,
use a 3.3 million transistor core, have 16 KB L1 cache, and generally
use 3.3 volts for both core and I/O components. P54C-based systems
use a 50, 60, or 66 MHz memory bus and CPU multipliers of 1.5, 2.0,
2.5, and 3.0x to yield processor speeds of 75, 90, 100, 120, 133,
150, 166, and 200 MHz.
- P55C
-
The Pentium/MMX CPUs (shown in Figure 4-1), first shipped in January 1997, fit
Socket
7 motherboards, use a 4.1 million transistor core, have a 32 KB L1
cache, improved branch prediction logic, and generally use a 2.8 volt
core and 3.3 volt I/O components. P55C-based systems use a 60 or 66
MHz memory bus and CPU multipliers of 2.5, 3.0, 3.5, 4.0, 4.5, and
5.0x to yield processor speeds of 120, 133, 150, 166, 200, 233, 266,
and 300 MHz.
The Pentium was a quantum leap from the
486 in complexity and architectural efficiency. It is a CISC (Complex
Instruction Set Computer) processor, and was initially built on a
0.35 micron process (later 0.25 micron). Pentiums, like 486s, use
32-bit operations internally. Externally, however, the Pentium
doubles the 32-bit 486 data bus to 64 bits, allowing it to access
eight full bytes at a time from memory. With the Pentium, Intel also
introduced new chipsets to support this wider data bus and other
Pentium enhancements.
The Pentium uses a dual-pipelined superscalar
design, which, relative to the 486 and earlier CPUs, allows it to
execute more instructions per clock cycle. The Pentium executes
integer instructions using the same five stages as the
486—Prefetch,
Instruction Decode,
Address Generate,
Execute, and
Write Back—but the Pentium has two
parallel integer pipelines versus the 486's one,
which allows the Pentium to execute two integer operations
simultaneously in parallel. This means that, for equal clock speeds,
the Pentium processes integer instructions about twice as fast as a
486.
The Pentium includes an improved 80-bit FPU that is much more
efficient than the 486 FPU. The Pentium also includes a
branch target buffer to provide dynamic branch
prediction, a process that greatly enhances instruction execution
efficiency. Finally, the Pentium includes a system
management module that can control power use by the
processor and peripherals.
P54 Pentiums also improved upon 486 L1 caching. The 486 has one 8 KB
L1 cache (16 KB for the 486DX/4) that uses the inefficient
write-through algorithm. P54 and P54C Pentiums
have dual 8 KB L1 caches—one for data and one for
instructions—that use the much more efficient two-way
set associative write-back algorithm.
This doubling of L1 cache buffers and the improved caching algorithm
combine to greatly enhance CPU performance. P55C Pentiums double
L1 cache size to 16 KB, providing
still more improvement.
The changes from the P54 to the P54C were relatively minor. Higher
voltages and faster CPU speeds generate more heat, so Intel reduced
the core and I/O voltages from 5.0/5.0V in the P54 to 3.3/3.3V in the
P54C, allowing them to run the CPUs faster without excessive heating.
They also introduced support for CPU multipliers, which allow the CPU
to run internally at some multiple of the memory bus speed.
The changes from the P54C Classic to the P55C MMX were much more
significant. In fact, had Intel not already introduced the Pentium
Pro (their first sixth-generation CPU) before the P55C, the P55C
might have been considered the first of a new CPU generation. In
addition to doubling L1 cache size, the P55C incorporated two major
architectural enhancements:
- MMX
-
Although sometimes described as MultiMedia
eXtensions or Matrix Math
eXtensions, Intel says officially that MMX stands for
nothing. MMX is a set of 57 added instructions that are dedicated to
manipulating audio, video, and graphics data more efficiently.
- SIMD
-
Single Instruction Multiple
Data (SIMD) is an architectural enhancement that allows one
instruction to operate simultaneously on multiple sets of similar
data.
In conjunction, MMX and SIMD greatly extend the
Pentium's ability to perform parallel operations,
processing eight bytes of data per clock cycle rather than one byte.
This is particularly important for heavily graphics-oriented
operations such as video, because it allows the P55C to retrieve and
process eight one-byte pixels in one operation rather than
manipulating those eight bytes as eight separate operations. Intel
estimates that MMX and SIMD used with non-optimized software yields
performance increases of as much as 20%, and can yield increases of
60% when used with MMX-aware applications.
For additional information about Pentium processors, including
detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium.html.
4.2.2 Pentium Pro
Intel's first
sixth-generation CPU, the Pentium Pro, was introduced in November
1995—along with the new 3.3 volt 387-pin
Socket 8 motherboards required to
accept it—and was discontinued in late 1998. Intel positioned
the Pentium Pro for servers, a niche it never escaped, and where
it continued to sell in
shrinking numbers until its replacement, the Pentium II Xeon, shipped
in mid-1998. The Pentium Pro pre-dated the P55C Pentium/MMX, and
never shipped in an MMX version. The Pentium Pro never sold in large
numbers for two reasons:
- Cost
-
The Pentium Pro was a very expensive processor to build. Its core
logic comprised 5.5 million transistors (versus 4.1 million in the
P55C), but the real problem was that the Pentium Pro also included a
large L2 cache on the same substrate as the CPU. This L2 cache
required millions of additional transistors, which in turn required a
much larger die size and resulted in a much lower percentage yield of
usable processors, both factors that kept Pentium Pro prices very
high relative to other Intel CPUs.
- 32-bit optimization
-
The Pentium Pro was optimized to execute 32-bit operations
efficiently at the expense of 16-bit performance. For servers, 32-bit
optimization is ideal, but slow 16-bit operations meant that a
Pentium Pro actually ran many Windows 95 client applications slower
than a Pentium running at the same clock speed.
The Pentium Pro shipped in 133, 150, 166, 180, and 200 MHz versions
with 256 KB, 512 KB, or 1 MB of L2 cache, and was never upgraded to a
faster version. The Pentium Pro continued to sell long after the
introduction of much faster Pentium II CPUs for only one reason: the
first Pentium II chipsets supported only two-way Symmetric
Multiprocessing (SMP) while Pentium Pro chipsets supported four-way
SMP. In some server environments, four 200 MHz Pentium Pro CPUs
outperformed two 450 MHz Pentium II CPUs. The introduction of the
450NX chipset, which supports four-way SMP, and the mid-1998
introduction of the Pentium II Xeon processor, which supports
eight-way SMP, removed the raison
d'être for the Pentium Pro,
and it died a quick death.
4.2.2.1 Pentium Pro Processor Architecture
Although the Pentium Pro is discontinued, it was the first Intel
sixth-generation processor, and as such introduced many important
architectural improvements. Understanding the Pentium Pro
vis-à-vis the Pentium helps to
understand current Intel CPU models. The two CPUs differ in the
following major respects:
- Secondary (L2) cache
-
Pentium-based systems may optionally be equipped with an external L2
secondary cache of any size supported by the chipset. Typical Pentium
systems have a 256 KB
L2 cache, but high-performance motherboards may include a 512 KB, 1
MB, or larger L2 cache. But Pentium L2 caches use a narrow (32-bit),
slow (60 or 66 MHz memory bus speed) link between the
processor's L1 cache and the L2 cache. The Pentium
Pro L2 cache is internal, located on the CPU itself, and the Pentium
Pro uses a 64-bit data path running at full processor speed to link
L1 cache to L2 cache. The dedicated high-speed bus used to connect to
cache is called the back-side bus
(BSB), as opposed
to the traditional CPU-to-memory bus, which is now designated the
front-side bus (FSB). In
conjunction, the BSB and FSB are called
the dual independent
bus (DIB) architecture. DIB architecture yields
dramatically improved cache performance. In effect, 256 KB of Pentium
Pro L2 cache provides about the same performance boost as 2 MB or
more of Pentium L2 cache.
- Dynamic execution
-
The Pentium Pro uses a combination of techniques—including
branch prediction, data flow
analysis, and speculative
execution—that collectively are referred
to as dynamic
execution. Using these techniques, the Pentium
Pro productively uses clock cycles that would otherwise be wasted, as
they are with the Pentium.
- Super-pipelining
-
Super-pipelining
is a technique that allows the Pentium Pro to use
out-of-order instruction execution, another
method to avoid wasting clock cycles. The Pentium executes
instructions on a first-come, first-served basis, which means that it
waits for all required data to process an earlier instruction instead
of processing a later instruction for which it already has all of the
data. Because it uses linear instruction
sequencing, or standard pipelining,
the Pentium wastes what could otherwise be productive clock cycles
executing no-op instructions. The Pentium Pro is the first Intel CPU
to use super-pipelining. It has a 14-stage pipeline, divided into
three sections. The first section, the in-order front
end, comprises eight stages, and decodes and
issues instructions. The second section, the out-of-order
core, comprises three stages, and executes
instructions in the most efficient order possible based on available
data, regardless of the order in which it received the instructions.
The third and final section, the in-order retirement
section, receives and forwards the results of the second
section.
- CISC versus RISC core
-
The most
significant architectural difference between the Pentium and the
sixth-generation processors is how they handle instructions
internally. Pentiums use a Complex Instruction Set
Computer (CISC) core. CISC means that the processor
understands a large number of complicated instructions, each of which
accomplishes a common task in just one instruction. The Pentium Pro
was the first Intel CPU to use a Reduced Instruction Set
Computer (RISC) core. RISC means that the processor
understands only a few simple instructions. Complex operations are
performed by stringing together multiple simple instructions.
Although RISC CPUs must perform many simple instructions to
accomplish the same task that CISC CPUs do with just one or a few
complex instructions, the simple RISC instructions execute much
faster than CISC instructions.
The Pentium Pro translates standard Intel x86 CISC instructions into
RISC instructions that the Pentium Pro micro-code uses internally,
and then passes those RISC instructions to the internal out-of-order
execution core. This translation helps avoid limitations of the
standard x86 CISC instruction set and supports the out-of-order
execution that prevents pipeline stalls, but those benefits have a
price. Although the time required is measured in nanoseconds,
converting from CISC to RISC does take time, and that slows program
execution. Also, 16-bit instructions convert inefficiently and
frequently result in pipeline stalls in the out-of-order execution
unit, which commonly result in CPU wait states of as many as seven
clock cycles. The upshot is that, for pure 32-bit operations, the
benefit of RISC conversion greatly outweighs the drawbacks, but for
16-bit operations, the converse is true.
For additional information about Pentium Pro processors, including
detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium-pro.html.
4.2.3 Pentium II Family
Intel's first mainstream sixth-generation CPU, the
Pentium II, shipped in May 1997. Intel subsequently shipped many
variants of the Pentium II, which differ chiefly in packaging, the
type and amount of L2 cache they include, the processor core they
use, and the FSB speeds they support. All members of the Pentium II
family use the Dynamic Execution Technology and DIB architecture
introduced with the Pentium Pro. Intel reduced the core voltage from
the 3.3 volts used by Pentium Pro to 2.8 volts or less in Pentium II
processors, which allows them to run much faster while using less
power and producing less heat. In effect, you're not
far wrong if you think of Pentium II, Celeron, and Pentium III
processors as faster versions of the Pentium Pro with MMX (or the
enhanced SSE version of MMX) added, and with the following major
changes:
- L2 cache
-
The Pentium Pro taught Intel the folly of
embedding the L2 cache onto the CPU substrate itself, at least for
the then-current state of the technology. Early Pentium II family
processors use discrete L2 cache Static RAM
(SRAM) chips that reside within the CPU package but are not a part of
the CPU substrate. Advances in fabrication technology have allowed
Intel again to place L2 cache directly on the processor substrate on
later Pentium II family processor models. Some Pentium II family
processors run L2 cache at full processor speed, while others run it
at half processor speed. The least expensive Pentium II family
processors have no L2 cache at all. The L2 cache in later members of
the Pentium II family is improved not just in size and/or speed, but
in functionality. The most recent Pentium III processors, for
example, use an 8-way set associative cache,
which is more efficient than the caching schemes used on earlier
variants.
- Packaging
-
The Pentium Pro used the huge, complicated 387-pin Dual
Pattern Staggered Pin Grid Array (DP-SPGA)
Socket 8. The extra pins provide
data and power lines for the on-board L2 cache. Intel developed
simplified alternative packaging methods for various members of the
Pentium II family processors, which are described below.
- Improved 16-bit performance
-
High cost aside, the major reason the Pentium Pro was never widely
used other than in servers was its poor performance with 16-bit
software. Although represented as a 32-bit operating system, Windows
95/98 still contains much 16-bit code. Users quickly discovered that
Windows 95 actually ran slower on a Pentium Pro than on a Pentium of
the same speed. Intel solved the 16-bit problem by using the Pentium
segment descriptor cache in the Pentium II.
Members of the Pentium II family include the Pentium II, Pentium II
Overdrive, Pentium II Xeon, Celeron, Pentium III, and Pentium III
Xeon. Each of these processors is described in the following
sections.
4.2.3.1 Pentium II
First-generation Pentium II processors shipped in 233, 266, 300, and
333 MHz versions with the Klamath core and a 66
MHz FSB. In mid-1998, Intel shipped second-generation Pentium II
processors, based on the Deschutes core, that
ran at 350, 400, and 450 MHz, and used a 100 MHz FSB. Pentium II
processors have 512 KB of L2 cache that runs at half internal CPU
speed, versus 256 KB to 1 MB of full CPU speed L2 cache in the
Pentium Pro. Pentium II processors use a Single-Edge
Contact Connector (SECC) or SECC2
cartridge, which contains the CPU and L2 cache. (The SECC cartridge
is shown in Figure 4-2.) The SECC/SECC2 package
mates with a 242-contact slot connector,
formerly known as Slot 1, which resembles
a standard expansion slot. Klamath-based processors run at 2.8 volts
and were built on a 0.35 fab. Deschutes-based processors,
including all 100 MHz FSB processors and recent 66 MHz FSB
processors, run at 2.0 volts and are built on a 0.25 fab.
Excepting FSB speed and fab process, all Slot 1 Pentium II processors
are functionally identical. As of June 2002, Pentium II processors
are still in limited distribution, but they are now considered
obsolescent.
For additional information about Pentium II processors, including
detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium-ii.html.
For information about the Pentium II Overdrive processor, see
http://www.hardwareguys.com/supplement/pii-overdrive.html.
For information about the Pentium II Xeon processor, see http://www.hardwareguys.com/supplement/pii-xeon.html.
4.2.3.2 Celeron
The Celeron was initially an inexpensive
variant of the Pentium II, and, in later models, an inexpensive
variant of the Pentium III or Pentium 4. Klamath-based
(Covington-core) Celerons shipped in April 1998 in 266 and 300 MHz
versions without L2 cache. Performance was poor, so in fall 1998
Intel began shipping modified Deschutes-based (Mendocino-core)
Celerons with 128 KB L2 cache. The smaller Celeron L2 cache runs at
full CPU speed, and provides L2 cache performance similar to that of
the larger but slower Pentium II L2 cache for most applications.
Mendocino (0.25) Celerons have been manufactured in 300A
(to differentiate it from the cacheless 300), 333, 366, 400, 433,
466, 500, and 533 MHz versions, all of which use the 66 MHz FSB.
With the introduction of the Coppermine-core Pentium III processor,
Intel also introduced Celeron processors based on a variant of the
Coppermine
core called the Coppermine128 core. Celerons
based on this 0.18, 1.6v core began shipping in 533A,
566, and 600 MHz versions soon after their announcement in May 2000
and were eventually produced in speeds as high as 1.1 GHz, which
approaches the limit of the Coppermine core itself.
Coppermine128-core Celerons have half of the 256 KB on-die L2 cache
disabled to bring L2 cache size to the Celeron-standard 128 KB, and
use a 4-way set associate L2 cache rather than the 8-way version used
by the Coppermine Pentium III. Coppermine128-core Celerons through
the Celeron/766, shipped in November 2000, use the 66 MHz FSB speed.
Coppermine128-core Celerons that use the 100 MHz FSB speed began
shipping in March 2001, beginning with 800 MHz units and eventually
reaching 1.1 GHz. Other than the differences in L2 cache size and
type, processor bus speed differences, and
official support for SMP, Coppermine128-core Celerons support the
standard Coppermine-core Pentium III features, including SSE,
described below.
|
Because Coppermine128 Celerons effectively are
Pentium IIIs, some may be easy to overclock. For
example, a Celeron/600 (66 MHz FSB) is effectively a down-rated
Pentium III/900 (100 MHz FSB). During the ramp-up to
Coppermine128-core Celerons, we believe that Intel recycled Pentium
III processors that tested as unreliable at 100 MHz or 133 MHz as 66
MHz Celerons, although Intel has never confirmed this. Many early
Coppermine128-core Celerons were not good overclockers, although that
changed as production ramped up. Note, however, that overclocking
Coppermine128-core Celerons is viable only for the slower 66 MHz FSB
models—the Celeron/566 and /600. Attempting to overclock a
faster Celeron by running it with a 100 MHz FSB would cause it to run
near or over 1.1 GHz, which appears to be the effective limit of the
Coppermine core itself.
|
|
In November 2001, Intel began shipping
Celerons based on the latest Pentium III core, codenamed Tualatin.
The first Tualatin-core Celerons ran at 1.2 GHz using the 100 MHz
FSB. Intel subsequently shipped a 1.3 GHz Celeron, and finally in May
2002 shipped the 1.4 GHz Celeron, the final Tualatin-core model.
These Celerons also differed from earlier models in that they include
a full 256 KB L2 cache, the same as Coppermine-core Pentium III
models. Celerons have been produced in four
form factors:
- Single-Edge Processor Package cartridge
-
All Celerons through 433 MHz were produced in Single-Edge
Processor Package (SEPP) cartridge form,
which resembles the Pentium II SECC and SECC2 package, and is
compatible with the Pentium II 242-contact slot. In mid-1999 Intel
largely abandoned SEPP in favor of PPGA, but they continue to sell
SEPP Celerons in 400 and 433 MHz varieties. Figure 4-3 shows an SEPP Celeron.
- Plastic Pin Grid Array
-
As a cheaper alternative to SEPP, Intel developed the
Plastic Pin Grid Array
(PPGA). PPGA processors fit Socket
370, which resembles Socket 7 but accepts only PPGA Celeron
processors. All Mendocino-core Celerons are manufactured in PPGA. The
Celeron/466 was the first Celeron produced only in PPGA. PPGA
processors can be used in most Socket 370 motherboards, although a
few accept only Socket 370 Pentium III processors. Figure 4-4 shows a PPGA Celeron.
-
Flip Chip Pin Grid Array
-
With the introduction of the Socket 370 version of the Pentium III,
Intel introduced a modified version of PPGA called Flip
Chip PGA (FC-PGA), which uses slightly different pinouts
than PPGA. FC-PGA essentially reverses the position of the processor
core from PPGA, placing the core on top (where it can make better
contact with the heatsink) rather than on the bottom side with the
pins. All Socket 370 Pentium III and Coppermine128-core Celerons (the
533A, 566, 600, and faster) require an FC-PGA compliant motherboard.
FC-PGA processors physically fit older PPGA motherboards, but if you
install an FC-PGA processor in a PPGA-only Socket 370 motherboard,
the processor doesn't work, although no harm is
done. Figure 4-5 shows an FC-PGA Celeron.
-
Flip Chip Pin Grid Array 2
-
Tualatin-core Celerons use the FC-PGA2 packaging, which is
essentially FC-PGA with the addition of a flat metal plate, called an
integrated heat spreader, that covers the
processor chip itself. Although these processors physically fit any
Socket 370 motherboard, only very recent Socket 370 chipsets support
the Tualatin core. Intel designates their own motherboard models that
support Tualatin as "Universal"
models. Other manufacturers use other terminology, but the important
thing to remember is that the motherboard must explicitly support
Tualatin if it is to run these processors. Figure 4-6 shows an FC-PGA2 Celeron.
Intel has produced five major variants of the PIII-based Celeron,
using four packages, four cores, two bus speeds, four fab sizes, and
more than 20 clock speeds. Table 4-1 summarizes
the major differences between these variants.
Table 4-1. Comparison of Celeron variants
Package
|
SECC
|
SECC-2 PPGA
|
FC-PGA
|
FC-PGA
|
FC-PGA2
|
Production dates
|
1998
|
1998 - 2000
|
2000 -
|
2001 -
|
2001 -
|
Clock speeds (MHz)
|
266, 300
|
300A, 333, 366, 400, 433, 466, 500, 533
|
500A, 533A, 566, 600, 633, 667, 700, 733, 766
|
800, 850, 900, 950, 1,000, 1,100
|
1,200
|
L2 cache size
|
none
|
128 KB
|
128 KB
|
128 KB
|
256 KB
|
L2 cache bus width
|
n/a
|
64 bits
|
256 bits
|
256 bits
|
256 bits
|
System bus speed
|
66 MHz
|
66 MHz
|
66 MHz
|
100 MHz
|
100 MHz
|
SSE instructions
|
|
|
|
|
|
Dual CPU capable
|
|
|
|
|
|
Fabrication process
|
0.35
|
0.25
|
0.18
|
0.18
|
0.13
|
Dual-CPU capability deserves an explanation. Although Intel never
officially supported Celerons for SMP operation, the two earliest
Celeron variants did in fact support dual-CPU operation. For
Covington-core and SECC2 Mendocino-core Celerons, dual-CPU operation
was impractical, because enabling SMP required physical surgery on
the processor package—literally drilling holes in the package
and soldering wires. With PPGA Mendocino-core Celerons, dual-CPU
operation was eminently practical, because many dual Socket 370
motherboards were designed specifically to accept two Celerons, and
no changes to the processors themselves were necessary. Beginning
with the 66 MHz Coppermine128 Celerons, Intel physically disabled SMP
operation in the core itself, so it is impossible to operate
Coppermine- or Tualatin-core Celerons in SMP mode.
For additional information about Celeron processors, including
detailed identification tables, visit http://www.hardwareguys.com/supplement/celeron.html.
4.2.3.3 Pentium III
The Pentium III, Intel's final sixth-generation
processor, began shipping in February 1999. The Pentium III has been
manufactured in numerous variants, including speeds from 450 MHz to
1.33 GHz (Intel defines 1 GHz as 1,000 MHz), two bus speeds (100 MHz
and 133 MHz), four packages (SECC, SECC2, FC-PGA, and FC-PGA2), and
the following three cores:
- Pentium III (Katmai core)
-
Initial Pentium III variants use the Katmai
core,
essentially an enhanced Deschutes with the addition of 70 new
Streaming SIMD
Instructions (formerly called Katmai New
Instructions or KNI and
known colloquially as MMX/2) that improve 3D
graphics rendering and speech processing. They use the
0.25 process, operate at 2.0v core voltage (with some
versions requiring marginally higher voltage), use a 100 MHz FSB,
incorporate 512 KB L2 cache running at half CPU speed, and have
glueless support for two-way SMP. Katmai-core processors are
available in SECC2 (Slot 1/SC242) and FC-PGA (Socket 370) packaging.
- Pentium III (Coppermine core)
-
Later Pentium III variants use the Coppermine
core, which is essentially a refined version of the Katmai
core. Coppermine processors use the 0.18 process, which
reduces die size, heat production, and cost. They operate at nominal
1.6v core voltage (with faster versions requiring marginally higher
voltage), are available at either 100 MHz or 133 MHz FSB, and (in
most variants) support SMP. Coppermine-core processors are available
in SECC2 (Slot 1/SC242) and FC-PGA (Socket 370) packaging in both 100
and 133 MHz FSB variants. Finally, Coppermine also incorporates the
following significant improvements in L2 cache implementation and buffering:
- Advanced Transfer Cache
-
Advanced Transfer
Cache (ATC) is how Intel summarizes
the several important improvements in L2 cache implementation from
Katmai to Coppermine. Although L2 cache size is reduced from 512 KB
to 256 KB, it is now on-die (rather than discrete SRAM chips) and,
like the Celeron, operates at full CPU speed rather than half.
Bandwidth is also quadrupled, from the 64-bit bus used on Katmai and
Mendocino-core Celeron processors to a 256-bit bus. Finally,
Coppermine uses an 8-way set associative cache, rather than the 4-way
set associative cache used by earlier Pentium III and Celeron
processors. Migrating L2 cache on-die increased transistor count from
just under 10 million for the Katmai to nearly 30 million for
Coppermine, which may account for the reported early yield problems
with the Coppermine.
- Advanced System Buffering
-
Advanced System
Buffering (ASB) is how Intel describes the
increase from Pentium III Katmai and earlier processors to the
Coppermine from four to six fill buffers, four to eight queue entry
buffers, and one to four writeback buffers. The increased number of
buffers was primarily intended to prevent bottlenecks with 133 MHz
FSB Coppermines, but also benefits those running at 100 MHz.
- Pentium III (Tualatin core)
-
The most recent Pentium III variants use the Tualatin
core, which is the last Pentium III core Intel will ever
produce. Tualatin processors use the 0.13 process, which
reduces die size, heat production, and cost, and allows considerably
higher clock speeds than the Coppermine core. Had it not been for
Intel's rapid transition to the Pentium 4,
Tualatin-core Pentium IIIs could have been Intel's
flagship processor through 2002 and into 2003. Intel could have
shipped Tualatins at ever-increasing clock speeds, beating the
0.18 Palomino-core AMD Athlon on both clock speed and
actual performance. Instead, Intel opted to compete using the Pentium
4. Intel has by their pricing mechanism effectively exiled
Tualatin-core Pentium IIIs to niche status by selling fast Pentium 4
processors for much less than comparable Tualatin Pentium IIIs.
Tualatins use the 133 MHz FSB, and are available in two major
variants, both of which use the FC-PGA2 packaging (with Integrated
Heat Spreader). The first variant, intended for desktop systems, has
the standard 256 KB L2 cache. The second variant, intended for
entry-level servers and workstations, has 512 KB L2 cache. Both
variants are SMP-capable. Finally, Intel removed the much-hated
Processor Serial Number from all Tualatin-core processors.
Table 4-2 summarizes the important differences
between Pentium III variants available as of June 2002. When
necessary to differentiate processors of the same speed, Intel uses
the E suffix to indicate support for ATC and
ASB, the B suffix to indicate 133 MHz FSB, and
the EB suffix to indicate both. An
A suffix designates 0.13
Tualatin-core processors. All processors faster than 600 MHz include
both ATC and ASB. Note that A-step FC-PGA processors do not support
SMP. B-step and higher FC-PGA and FC-PGA2 processors support SMP,
except the 1B GHz processor, which is not SMP-capable in any
stepping.
Table 4-2. Intel Pentium III variants as of June 2002
Package
|
FC-PGA2
|
FC-PGA2
|
SECC2
|
SECC2
|
FC-PGA
|
FC-PGA
|
SECC2
|
SECC2
|
Process size
|
0.13
|
0.13
|
0.18
|
0.18
|
0.18
|
0.18
|
0.25
|
0.25
|
FSB speed
|
133 MHz
|
133 MHz
|
133 MHz
|
100 MHz
|
100 MHz
|
133 MHz
|
133 MHz
|
100 MHz
|
L2 cache size
|
512 KB
|
256 KB
|
256 KB
|
256 KB
|
256 KB
|
256 KB
|
512 KB
|
512 KB
|
L2 cache speed
|
CPU
|
CPU
|
CPU
|
CPU
|
CPU
|
CPU
|
1/2 CPU
|
1/2 CPU
|
SMP support
|
|
|
|
|
|
|
|
|
Processor S/N
|
|
|
|
|
|
|
|
|
|
When Intel introduced the Pentium III in FC-PGA form, they changed
Socket 370 pinouts. Those changes mean that, although an FC-PGA
processor physically fits any Socket 370 motherboard, it will not run
in motherboards designed for the Celeron/PPGA. Motherboards designed
for FC-PGA processors are nearly all backward compatible with PPGA
Celeron processors. Similarly, as with Tualatin-core Celerons,
Tualatin-core Pentium IIIs operate only in late-model Socket 370
motherboards that use chipsets with explicit Tualatin support. Most
motherboards designed to use PPGA Celerons or FC-PGA Coppermine-core
Pentium IIIs are not compatible with Tualatin-core Pentium IIIs.
|
|
Figure 4-7 shows a
Pentium III processor in the Single-Edge Contact
Cartridge (SECC2) package. Some early Pentium III models
were produced in the original SECC package, which closely resembles
the Pentium II SECC package shown in Figure 4-2.
Figure 4-8 shows a Pentium III processor in the
Flip Chip Plastic Grid Array (FC-PGA) package.
Other than labeling, the Pentium III processor in the FC-PGA2 package
closely resembles the FC-PGA2 Celeron processor shown in Figure 4-6.
For additional
information about Pentium III processors, including detailed
identification tables, visit http://www.hardwareguys.com/supplement/pentium-iii.html.
For information about Pentium III Xeon processors, visit http://www.hardwareguys.com/supplement/piii-xeon.html.
4.2.4 Pentium 4
By late 2000, Intel found themselves in a conundrum. In March of that
year, AMD had forced Intel's hand by releasing an
Athlon running at 1 GHz. Intel planned to release a 1.0 GHz version
of their flagship processor, the Coppermine-core Pentium III, but not
until much later. The Athlon/1.0G introduction was a wakeup call for
Intel. They had to ship a Pentium III/1.0G immediately if they were
to remain competitive on clock speed with the Athlon. One week after
the Athlon/1.0G shipped, Intel shipped a Pentium III running at the
magic 1.0 GHz.
The problem was that the Pentium III Coppermine core effectively
topped out at about 1.0 GHz, while the Athlon Thunderbird core had
plenty of headroom. For the next several months, AMD shipped faster
and faster Athlons, while Intel remained stuck at 1.0 GHz. And to
make matters worse, AMD could ship fast Athlons in volume, while
Intel had very low yields on the fast Pentium III parts. Although 1.0
GHz Pentium IIIs were theoretically available, in reality even the
933 MHz parts were hard to come by. So Intel had to make the best of
things, shipping mostly sub-900 MHz Pentium IIIs while AMD claimed
the high end. Intel must have been gritting their collective
teeth.
Adding insult to injury, Intel attempted unsuccessfully to ship a
faster Pentium III, the ill-fated Pentium III/1.13G. These processors
were available in such small volumes that many observers believed
they must have been almost hand-made. Adding to
Intel's embarrassment, popular enthusiast web sites
including Tom's Hardware (http://www.tomshardware.com) and AnandTech
(http://www.anandtech.com)
reported that the 1.13 GHz parts did not function reliably. Intel was
forced to admit this was true and withdraw the 1.13 GHz part,
although they later reintroduced it successfully.
Intel had two possible responses to the growing clock speed gap. They
could expedite the release of 0.13 Tualatin-core Pentium
IIIs, which have clock speed headroom at least equivalent to the
Thunderbird-core and later Palomino-core Athlons, or they could
introduce their seventh-generation Pentium 4 processor sooner than
planned. Intel wasn't anywhere near ready to convert
their fabs to 0.13 Tualatin-core Pentium III production,
so their only real choice was to get the Pentium 4 to market quickly.
There were several problems with that course, not least of which were
that the 0.18 Willamette-core Pentium 4 was not really
ready for release and that the only Pentium 4 chipsets Intel had
available supported only Rambus RDRAM, which was hideously expensive
at the time. But in November 2000, Intel was finally able, if only
just, to ship the Pentium 4 processor running at 1.3, 1.4, and 1.5
GHz. Although many observers noted that that version of the Pentium 4
was a dead-end processor because it used Socket 423, which was due to
be replaced by Socket 478 only months after the initial release, and
that, despite its higher clock speed, the Pentium 4 had lower
performance than Athlons running at lower clock speeds, the Pentium 4
did at least allow Intel to regain the clock speed crown, an
inestimable marketing advantage.
Despite all that, the seventh-generation Pentium 4 (shown in Figure 4-9) is the most significant new Intel processor
since the original Pentium Pro, which kicked off the sixth
generation. The Pentium 4 is significant not so much for what it is
now as for what it will become. Just as Intel scaled the clock speeds
of sixth-generation cores from the 120 MHz of the first Pentium Pro
to the 1.2+ GHz of the last Pentium III, we expect that they will
scale the clock speed of the Pentium 4 by an order of magnitude or
more, eventually reaching 10 GHz to 15 GHz before (we presume)
introducing the Pentium 5.
With the Pentium 4, Intel has launched the fastest ramp-up in their
history. In earlier generations, new processors coexisted with older
processors for quite some time. Intel continued to derive substantial
revenues from the 386 long after the 486 shipped, from the 486 long
after the Pentium shipped, and from the Pentium long after the
Pentium II shipped. With the Pentium 4, they've
abandoned that sequence. Intel wants to kill their sixth-generation
processors as quickly as possible, leaving the Pentium 4 and its
derivatives as the only mainstream Intel processors.
4.2.4.1 Pentium 4 Processor Features
Relative to sixth-generation processors, the Pentium 4 incorporates
the following architectural improvements, which together define the
seventh generation and which Intel collectively calls NetBurst
Micro-architecture.
- Hyper Pipelined Technology
-
Hyper-pipelining
doubles the pipeline depth compared to the Pentium III
micro-architecture. The branch prediction/recovery pipeline, for
example, is implemented in 20 stages in the Pentium 4, as compared to
10 stages in the Pentium III. Deep pipelines are a double-edged
sword. Using a very deep pipeline makes it possible to achieve very
high clock speeds, but a deep pipeline also means that fewer
instructions can be completed per clock cycle. That means the Pentium
4 can run at much higher clock speeds than the Pentium III (or
Athlon), but that it needs those higher clock speeds to do the same
amount of work.
Early Pentium 4 processors were roundly condemned by many observers
because they were outperformed by Pentium III and Athlon processors
running at much lower clock speeds, which is solely attributable to
the relative inefficiency of the Pentium 4 in terms of instructions
per cycle (IPC). Ultimately, the low IPC efficiency of the Pentium 4
won't matter, because Intel can easily boost the
clock speed until the Pentium 4 greatly outperforms the fastest
Pentium III or Athlon that can be produced. What superficially
appears to be a weakness of the Pentium 4 is in fact its greatest
strength.
- Improved Branch Prediction
-
The deep pipeline of the Pentium 4 made it mandatory to use a
superior Branch Prediction Unit (BPU), because a deep pipeline with
anything less than excellent branch prediction would bring the
processor to its knees. When the pipeline is very deep, a pipeline
clog wastes massive numbers of clock ticks, and the function of a BPU
is to prevent that from happening. The Pentium 4 BPU is the most
advanced available, 33% more efficient at avoiding mispredictions
than the Pentium III BPU or the comparable Athlon BPU. The Pentium 4
BPU uses both a more effective branch prediction algorithm and a
dedicated 4 KB branch target buffer that stores detail about
branching history to achieve these results. The improved BPU is one
component of the Advance Dynamic Execution (ADE) engine,
Intel's name for their very deep, out-of-order
speculative execution engine.
- Level 1 Execution Trace Cache
-
In addition to the standard Level 1 8 KB data cache, the Pentium 4
includes a 12 KB L1 Execution Trace Cache. This cache stores decoded
micro-op instructions in the order they will be executed, optimizing
storage efficiency and performance by removing the micro-op decoded
from the main execution loop and storing only those micro-op
instructions that will be needed. By caching micro-op instructions
before they are needed, the Execution Trace Cache ensures that the
processor execution units seldom have to wait for instructions, and
that the effects of branch mispredictions are minimized.
- Rapid Execution Engine
-
Even with an excellent BPU, integer code is more likely than
floating-point code to be mispredicted, and such mispredictions have
a catastrophic effect on throughput. To minimize their effect, the
Pentium 4 includes two Arithmetic Logic Units
(ALUs) that operate at twice the processor core frequency. For
example, the Rapid Execution Engine on a 2 GHz Pentium 4 actually
runs at 4 GHz. That allows a basic integer operation (e.g., Add,
Subtract, AND, OR) to execute in half a clock cycle.
- 400 or 533 MHz System Bus
-
One Achilles' heel of the Pentium III (and, to a
lesser extent, the Athlon) is the relatively slow link between the
processor and memory. For example, using PC133 SDR-SDRAM, the Pentium
III achieves peak data transfer rates of only 1,067 MB/s (133 MHz
times 8 bytes/transfer). In practice, sustained data transfer rates
are lower still because SDRAM is not 100% efficient and the SDRAM
interface uses only minimal buffering. Conversely, the Pentium 4 has
the fastest system bus available on any desktop processor. Although
the bus actually operates at only 100 or 133 MHz, data transfers are
quad-pumped for an effective bus speed of 400 or 533 MHz. Also, Intel
uses elaborate buffering that ensures sustained true 400 or 533 MHz
data transfers when using Rambus RDRAM memory. Sustained data
transfer rates using SDR-SDRAM or DDR-SDRAM are smaller than peak
transfer rates, but are still superior to the data transfer rates of
the Pentium III or Athlon using similar memory.
In addition to its new features, the Pentium 4 also has two features
that have been significantly enhanced relative to the Pentium III:
- Advanced Transfer Cache
-
Intel has enhanced the performance of the
L2 Advanced Transfer Cache (ATC) that first appeared in the Pentium
III. The Pentium 4 uses a non-blocking, 8-way set associative,
inclusive, full-CPU-speed, on-die, L2 cache with a 256-bit interface
that transfers data during each clock cycle. Because the Pentium 4
clock is faster than that of the Pentium III, L2 cache transfers also
support a much higher data rate. For example, a Pentium III operating
at 1 GHz transfers L2 cache data at 16 GB/s whereas a Pentium 4 at
1.5 GHz transfers L2 cache data at 48 GB/s (three times the transfer
rate for a processor operating at 1.5 times the speed). The ATC also
includes improved Data Prefetch Logic that anticipates what data will
be needed by a program and loads it into cache before it is needed.
Willamette-core Pentium 4 processors have a 256 KB L2 cache.
Northwood-core Pentium 4 processors have a 512 KB L2 cache.
- Enhanced floating-point and SSE functionality
-
The Pentium 4 uses 128-bit
floating-point registers and adds a dedicated register for data
movement. These enhancements improve performance relative to the
Pentium III on floating-point and multimedia applications. The
Pentium 4 also includes SSE2, an updated version of the SSE that
debuted with the Pentium III. SSE, which stands for
Streaming SIMD Extensions, is an acronym within
an acronym. SIMD, or Single Instruction Multiple
Data, allows one instruction to be applied to a multiple
data set, e.g., an array, which greatly speeds performance in such
applications as video/image processing, encryption, speech
recognition, and heavy-duty scientific number crunching. SSE2 adds
144 new instructions to the SSE instruction set, including 128-bit
SIMD integer arithmetic operations and 128-bit SIMD double-precision
floating-point operations. These new instructions can greatly reduce
the number of steps needed to execute some tasks, but the catch is
that the application software must explicitly support SSE2. For
example, an application that is not designed to use SSE2 might run at
the same speed on a Pentium 4 and an Athlon, while an SSE2-capable
version of that application might run literally twice as fast on the
Pentium 4.
4.2.4.2 Pentium 4 Processor Variants
Intel produces Pentium 4 processors using two cores—the
0.18 Willamette core and the
0.13 Northwood core—and two
form factors, the 423-pin PGA-423 and the smaller 478-pin mPGA-478.
Willamette-core processors were produced in both PGA-423 and mPGA-478
at core speeds of 1.30, 1.40, 1.50, 1.60, 1.70, 1.80, 1.90, and 2
GHz, all with 256 KB of L2 cache. Northwood-core Pentium 4 processors
are produced only in mPGA-478, initially at 2 and 2.20 GHz core
speeds, with faster versions planned. Intel also produces
Northwood-core processors slower than 2 GHz. Northwood-core
processors have 512 KB L2 cache.
The Willamette core and PGA-423 were stopgap solutions, released
solely to combat AMD's clock speed lead until the
"real" Pentium 4—the mPGA-478
Northwood-core processor—could be shipped. Although Intel
originally intended to phase out PGA-423 as a mainstream technology
by late 2001, in the process relegating PGA-423 to upgrade status
only, the very strong demand for mPGA-478 motherboards and processors
caused shortages throughout 2001 and into 2002. We expect that
PGA-423 and Willamette parts will continue to be sold in new systems
until mid-2002. That does not change the fact that PGA-423 and
Willamette are dead-end technologies, and should be avoided. Do not
buy any Pentium 4 system or components that do not use mPGA-478 parts
and the Northwood core.
For additional information about Pentium 4 processors, including
detailed identification tables, visit http://www.hardwareguys.com/supplement/pentium-4.html.
For information about Xeon processors, visit http://www.hardwareguys.com/supplement/p4-xeon.html.
4.2.5 Celeron 4
In May 2002, Intel shipped Celeron processors based on the Pentium 4.
These processors, which we call the Celeron 4, use standard mPGA478
packaging and fit Socket 478 motherboards. Not all older Socket 478
motherboards support the Celeron 4, and those that do require a BIOS
upgrade. The first Celeron 4 models use a modified 0.18 Pentium 4
Willamette core called the Willamette128 core, which has L2 cache
halved to 128 KB. Intel shipped the 1.7 GHz Celeron 4 initially, with
the 1.8 GHz model following in June 2002. We expect Intel to ship 1.9
and 2.0 GHz Willamette128 Celeron 4 models later in 2002, followed by
faster models based on the Northwood core with L2 cache reduced from
the 512 KB Northwood standard to 256 KB.
Initial testing shows that even the 1.7 GHz Celeron 4 outperforms the
fastest available AMD Duron, so we expect forthcoming Celeron 4
models to be excellent choices for building fast, inexpensive
entry-level systems on the Intel 845G and 845GL platforms.
|
Intel has manufactured mobile variants of many of their processors,
including the Pentium, Pentium II, Celeron, and Pentium III. These
mobile versions are used in notebook computers and are not
user-replaceable, so for all intents and purposes a notebook computer
will always use the processor that was originally installed. For that
reason, we have chosen to devote our available space to issues that
are more likely to be important to more of our readers. For
additional information about Intel mobile processors, visit http://developer.intel.com/design/mobile/.
|
|
|