Before the image data is ever loaded when a JPEG image is selected for
viewing the markers must be read. In a JPEG image, the very first marker
is the SOI, or Start Of Image, marker. This is the first "hey, I'm a JPEG"
declaration by the file. The JPEG standard, as written by the Joint
Picture Expert's Group, specified the JPEG interchange format. This format
had several shortcomings for which the JFIF (JPEG File Interchange Format) was
an attempted remedy. The JFIF is the format used by almost all JPEG file
readers/writers. It tells the image readers, "Hey, I'm a JPEG that almost
anyone can understand."
Most markers will have additional information following them. When this
is the case, the marker and its associated information is referred to as a
"header." In a header the marker is immediately followed by two bytes
that indicate the length of the information, in bytes, that the header
contains. The two bytes that indicate the length are always included in
that count.
A marker is prefixed by FF (hexadecimal). The marker/header information
that follows does not specify all known markers, just the essential ones for
baseline JPEG.
A component is a specific color channel in an image. For instance, an
RGB image contains three components; Red, Green, and Blue.
© 1998 by James R. Weeks
Start of Image (SOI) marker -- two bytes (FFD8)
JFIF marker (FFE0)
- length -- two bytes
- identifier -- five bytes: 4A, 46, 49, 46, 00 (the ASCII code equivalent of
a zero terminated "JFIF" string)
- version -- two bytes: often 01, 02
- the most significant byte is used for major revisions
- the least significant byte for minor revisions
- units -- one byte: Units for the X and Y densities
- 0 => no units, X and Y specify the pixel aspect ratio
- 1 => X and Y are dots per inch
- 2 => X and Y are dots per cm
- Xdensity -- two bytes
- Ydensity -- two bytes
- Xthumbnail -- one byte: 0 = no thumbnail
- Ythumbnail -- one byte: 0 = no thumbnail
- (RGB)n -- 3n bytes: packed (24-bit) RGB values for the thumbnail pixels, n
= Xthumbnail * Ythumbnail
Define Quantization table marker (FFDB)
- the first two bytes, the length, after the marker indicate the number of
bytes, including the two length bytes, that this header contains
- until the length is exhausted (loads two quantization tables for baseline
JPEG)
- the precision and the quantization table index -- one byte: precision is
specified by the higher four bits and index is specified by the lower four
bits
- precision in this case is either 0 or 1 and indicates the precision of
the quantized values; 8-bit (baseline) for 0 and up to 16-bit for 1
- the quantization values -- 64 bytes
- the quantization tables are stored in zigzag format
Define Huffman table marker (FFC4)
- the first two bytes, the length, after the marker indicate the number of
bytes, including the two length bytes, that this header contains
- until length is exhausted (usually four Huffman tables)
- index -- one byte: if >15 (i.e. 0x10 or more) then an AC table,
otherwise a DC table
- bits -- 16 bytes
- Huffman values -- # of bytes = the sum of the previous 16 bytes
Start of frame marker (FFC0)
- the first two bytes, the length, after the marker indicate the number of
bytes, including the two length bytes, that this header contains
- P -- one byte: sample precision in bits (usually 8, for baseline JPEG)
- Y -- two bytes
- X -- two bytes
- Nf -- one byte: the number of components in the image
- 3 for color baseline JPEG images
- 1 for grayscale baseline JPEG images
- Nf times:
- Component ID -- one byte
- H and V sampling factors -- one byte: H is first four bits and V is
second four bits
- Quantization table number-- one byte
The H and V sampling factors dictate the final size of the component they are
associated with. For instance, the color space defaults to YCbCr and the H and V
sampling factors for each component, Y, Cb, and Cr, default to 2, 1, and 1,
respectively (2 for both H and V of the Y component, etc.) in the Jpeg-6a
library by the Independent Jpeg Group. While this does mean that the Y component
will be twice the size of the other two components--giving it a higher
resolution, the lower resolution components are quartered in size during
compression in order to achieve this difference. Thus, the Cb and Cr components
must be quadrupled in size during decompression.
Start of Scan marker (FFDA)
- the first two bytes, the length, after the marker indicate the number of
bytes, including the two length bytes, that this header contains
- Number of components, n -- one byte: the number of components in this scan
- n times:
- Component ID -- one byte
- DC and AC table numbers -- one byte: DC # is first four bits and AC # is
last four bits
- Ss -- one byte
- Se -- one byte
- Ah and Al -- one byte
Comment marker (FFFE)
- the first two bytes, the length, after the marker indicate the number of
bytes, including the two length bytes, that this header contains
End of Image (EOI) marker (FFD9)
------------------------------------------------
JPEG is rather complex in this aspect, so we shall just
give an overview of the basic principles (see the JPEG Book, chapter 7 for the
full picture).
JPEG data is divided into segments, each of
which starts with a 2-byte marker.
All markers are byte-aligned - they start on the byte
boundaries of the transmission/storage medium. Any variable-length data which
precedes a marker is padded with extra ones to achieve this.
The first byte of each marker is
. The second byte defines the type of
marker.
To allow for recovery in the presence of errors, it
must be possible to detect markers without decoding all of the intervening data.
Hence markers must be unique. To achieve this, if an
byte occurs in the middle of a segment,
an extra
stuffed byte is inserted after
it and
is never used as the second byte of a
marker.
Some important markers in the order they are often used
are:
Name |
Code (hex) |
Purpose |
SOI |
FFD8 |
Start of image. |
COM |
FFFE |
Comment (segment ignored by decoder). , <Text comments> |
DQT |
FFDB |
Define quantisation table(s). , < , . > |
|
FFC0 |
Start of Baseline DCT frame. , <Frame size, no. of
components (colours), sub-sampling factors, Q-table selectors> |
DHT |
FFC4 |
Define Huffman table(s). , <DC Size and AC (Run,Size)
tables for each component> |
SOS |
FFDA |
Start of scan. , <Huffman table selectors for
each component> <Entropy coded DCT blocks> |
EOI |
FFD9 |
End of image. |
In
table 1 the data which follows each marker is shown
between <> brackets. The first 2-byte word of most segments is the length
(in bytes) of the segment,
. The length of <Entropy coded DCT
blocks>, which forms the main bulk of the compressed data, is not specified
explicitly, since it may be determined by decoding the entropy codes. This also
allows the data to be transmitted with minimal delay, since it is not necessary
to determine the total length of the compressed data before any of the DCT block
data can be sent.
Long blocks of entropy-coded data are rather prone to
being corrupted by transmission errors. To mitigate the worst aspects of this,
Restart Markers (FFD0 . FFD7) may be included at regular intervals (say at the
start of each row of DCT blocks in the image) so that separate parts of the
entropy coded stream may be decoded independently of errors in other parts. The
restart interval, if required, is defined by a DRI (FFDD) marker segment. There
are 8 restart markers, which are used in sequence, so that if one (or more) is
corrupted by errors, its absence may be easily detected.
The use of multiple scans within each image frame and
multiple frames within a given image allows many variations on the ordering and
interleaving of the compressed data. For example:
- Chrominance and luminance components may be sent in separate
scans or interleaved into a single scan.
- Lower frequency DCT coefs may be sent in one or more scans
before higher frequency coefs.
- Coarsely quantised coefs may be sent in one or more scans
before finer (refinement) coefs.
- A coarsely sampled frame of the image may be sent initially and
then the detail may be progressively improved by adding differentially-coded
correction frames of increasing resolution.