17.1 PC Audio Types
Sound cards support two categories of audio, which are detailed in
the following sections:
- Waveform audio
-
Waveform audio files, also called simply sound
files, store actual audio data. When you record
waveform audio, the sound card encodes the analog audio data in
digital format and stores it as a file. When you play waveform audio,
the sound card reads the digital audio data contained in the file and
converts it to analog audio, which is then reproduced on speakers or
headphones. Waveform audio files can store any type of audio,
including speech, singing, instrumental music, and sound effects. The
playback quality of waveform audio depends primarily on how much
detail was captured in the original recording and how much of that
data, if any, was lost in compressing the data before storing it on
disk. Uncompressed waveform audio files (such as .WAV files) are
large, requiring as much as 10 MB per minute of audio stored.
Compressed audio files may be one twentieth that size or smaller,
although high compression generally results in lower sound quality.
- MIDI audio
-
Rather than storing actual audio data,
MIDI
(Musical Instrument Digital Interface) files
store instructions that a sound card can use to create audio on the
fly. MIDI audio files store only instrumental music and sound
effects, not speech or singing. Originally used almost solely by
professional musicians, MIDI is now commonly used by games and other
applications for background music and sound effects, so MIDI support
has become an important sound card issue. Because MIDI sound is
created synthetically by the sound card, playback quality of MIDI
files depends both on the quality of the MIDI file itself and on the
features and quality of the MIDI support in the sound card. A MIDI
file of an orchestral concert, for example, may sound like a
child's toy when played by a cheap sound card, but
may closely resemble a CD recording when played by a high-end sound
card. MIDI audio files are small, requiring only a few KB per minute
of audio stored.
17.1.1 Waveform Audio
Waveform audio files are created using a process called
sampling or
digitizing to convert analog sound to digital
format. Sampling takes periodic snapshots of the instantaneous state
of the analog signal, encodes the data, and stores the audio in
digital form. Just as digital images can be stored at different
resolutions according to their intended use, audio data can be stored
at different resolutions to trade off sound quality against file
size. Five parameters determine the quality of digital sound files
and how much space they occupy:
- Sample size
-
Sample size specifies how much data is stored
for each sample. A larger sample size stores more information about
each sample, contributing to higher sound quality. Sample size is
specified as the number of bits stored for each sample. CD audio, for
example, uses 16-bit samples, which allow the waveform amplitude to
be specified as one of 65,536 discrete values. All sound cards
support at least 16-bit samples.
- Sampling rate
-
Sampling rate specifies how often samples are
taken. Sampling rate is specified in Hz (Hertz, or cycles/second) or
kHz (kilohertz, one thousand Hertz). Higher-frequency data inherently
changes more often. Changes that occur between samples are lost, so
the sampling rate determines the highest-frequency sounds that can be
sampled. Two samples are required to capture a change, so the highest
frequency that can be sampled, called the Nyquist
frequency, is half the sampling rate. For
example, a 10,000 Hz sampling rate captures sounds no higher than
5,000 Hz. In practice, the danger is that higher frequencies will be
improperly sampled, leading to distortion, so real-world
implementations filter the analog signal to cut off audio frequencies
higher than some arbitrary fraction of the Nyquist frequency, for
example by filtering all frequencies above 4,500 Hz when using a
10,000 Hz sampling rate. CD audio, for example, uses 44,100 Hz
sampling rate, which provides a Nyquist frequency of 22,050 Hz,
allowing full bandwidth response up to ~20,000 Hz after filtering.
All sound cards support at least 44,100 Hz sampling, and many support
the Digital Audio Tape (DAT) standard of 48,000
Hz.
- Sampling method
-
Sampling method specifies how samples are taken
and encoded. For example, Windows WAV files use
either PCM (Pulse Coded
Modulation), a linear method that encodes the absolute
value of each sample as an 8-bit or 16-bit value, or
ADPCM (Adaptive Delta
PCM), which encodes 4-bit samples based on the differences
(delta) between one sample and the preceding sample. ADPCM generates
smaller files, but at the expense of reduced audio quality and the
increased processor overhead needed to encode and decode the data.
- Recording format
-
Recording format specifies how data is structured and
encoded within the file and what means of compression, if any, is
used. Common formats, indicated by filename extensions, include
WAV (Windows audio); AU
(Sun audio format, commonly used by Unix systems and on the
Internet); AIFF or AIF
(Audio Interchange File Format, used by Apple and SGI);
RA (RealAudio, a proprietary streaming audio
format); and MP3 (MPEG-1 Layer 3). Some formats
use lossless
compression , which provides lower compression
ratios, but allows all the original data to be recovered. Others use
lossy compression, which sacrifices some less
important data in order to produce the smallest possible file sizes.
Some, such as PCM WAV, do not compress the data at all. Some
compressed formats, such as MP3, allow selectable compression ratios,
while others use fixed ratios.
- Number of channels
-
Depending on the recording setup, one channel
(monaural or
mono sound), two channels
(stereo sound), or more can be recorded.
Additional channels provide audio separation, which increases the
realism of the sound during playback. Various formats store 1, 2, 4,
or 5 audio
channels. Some formats store only two channels, but with additional
data that can be used to simulate additional channels.
Table 17-1 lists the three standard Windows
recording modes for PCM WAV files, which is the most common
uncompressed waveform audio format, and three commonly used MP3
recording modes. MP3 at 256 kb/s uses little more storage than
Windows's AM radio mode, but produces sound files
that are near CD quality. MP3 bitrates are approximate.
Table 17-1. Windows uncompressed WAV storage modes and common MP3 compressed storage modes
Telephone
|
8-bit
|
11,025 Hz
|
1 (mono)
|
661,500
|
PCM (1:1)
|
AM radio
|
8-bit
|
22,050 Hz
|
1 (mono)
|
1,323,000
|
PCM (1:1)
|
CD audio
|
16-bit
|
44,100 Hz
|
2 (stereo)
|
10,584,000
|
PCM (1:1)
|
MP3 (64 kb/s)
|
16-bit
|
44,100 Hz
|
2 (stereo)
|
~ 500,000
|
MP3 (~20:1)
|
MP3 (128 kb/s)
|
16-bit
|
44,100 Hz
|
2 (stereo)
|
~1,000,000
|
MP3 (~10:1)
|
MP3 (256 kb/s)
|
16-bit
|
44,100 Hz
|
2 (stereo)
|
~2,000,000
|
MP3 (~5:1)
|
17.1.2 MIDI Audio
A MIDI file is the digital equivalent of
sheet music. Rather than containing actual audio data, a MIDI file
contains detailed instructions for creating the sounds represented by
that file. And, just as the same sheet music played by different
musicians can sound different, the exact sounds produced by a MIDI
file depend on which sound card you use to play it.
|
Three PC MIDI standards exist. The first, General MIDI, is the
official standard, actually predates multimedia PCs, and is the
oldest and most comprehensive standard. The other two standards are
Basic MIDI and Extended MIDI. Both are Microsoft standards and,
despite the name of the latter, both are subsets of the General MIDI
standard. In the early days of sound cards, General MIDI support was
an unrealistically high target, so many sound cards implemented only
one of the Microsoft MIDI subsets. All current sound cards we know of
support full General MIDI.
|
|
MIDI was developed about 20 years
ago, originally as a method to provide a standard interface between
electronic music keyboards and electronic sound generators like Moog
synthesizers. A MIDI interface supports 16 channels, allowing up to
16 instruments or groups of instruments (selected from a palette of
128 available instruments) to play simultaneously. MIDI interfaces
can be stacked. Some MIDI devices support 16 or more interfaces
simultaneously, allowing 256 or more channels. The MIDI
specification defines both a serial communication protocol and the
formatting of the MIDI messages transferred via
that protocol. MIDI transfers 8-bit data at 31,250 bps over a 5 mA
current loop, using optoisolators to electrically isolate MIDI
devices from each other. All MIDI devices use a standard 5-pin DIN
connector, but the MIDI port on a sound card is simply a subset of
the pins on the standard DB-15 gameport connector (see Chapter 21). That means a gameport-to-MIDI adapter is
needed to connect a sound card to an external MIDI device such as a
MIDI keyboard. MIDI messages are simply a string of
ASCII bytes encoded to represent the important characteristics of a
musical score, including instrument to be used, note to be played,
volume, and so on. MIDI messages usually comprise a status
byte followed by one, two, or three
data bytes, but a MIDI feature called
Running Status
allows any number of additional bytes received to be treated as data
bytes until a second status byte is received. Here are the functions
of those byte types:
- Status byte
-
MIDI messages always begin with a status byte,
which identifies the type of message and is flagged as a status byte
by having the high-order bit set to 1. The most significant
(high-order) four bits (nibble) of this byte define the action to be
taken, such as a command to turn a note on or off or to modify the
characteristics of a note that is already playing. The least
significant nibble defines the channel to which the message is
addressed, which in turn determines the instrument to be used to play
the note. Although represented in binary as a 4-bit value between 0
and 15, channels are actually designated 1 through 16.
- Data byte
-
A data byte is flagged as such by having its
high-order bit set to zero, which limits it to communicating 128
states. What those states represent depends on the command type of
the status byte. When it follows a Note On command, for example, the
first data byte defines the pitch of the note. Assuming standard
Western tuning (A=440 Hz), this byte can assume any of 128 values
from C-sharp/D-flat (17.32 Hz) to G (25087.69 Hz). The second data
byte specifies velocity, or how hard the key was
pressed, which corresponds generally to volume, depending on the MIDI
device and instrument. The note continues playing until a status byte
with a Note Off command for that note is received, although it may
under programmatic control decay to inaudibility in the interim.
|