Chapter 2
INTEL ARCHITECTURE MMX TECHNOLOGY FEATURES
This chapter provides a general overview of the architectural features of the Intel Architecture MMX technology.
The IA MMX technology defines the following four new 64-bit data types (See Figure 21):
The IA MMX technology provides eight 64-bit, general-purpose registers. These registers are aliased on the floating-point registers. The operating system handles the MMX technology as it would handle floating-point. (See Section 4.3 for more details on register aliasing.)
The MMX registers can hold packed 64-bit data types. The MMX instructions access the MMX registers directly using the register names MM0 to MM7 (See Figure 22).
MMX registers can be used to perform calculations on data. They cannot be used to address memory; addressing is accomplished by using the integer registers and standard IA addressing modes.
The MMX instructions implement two new principles (discussed in section Packed Data 2.4.2.):
For example, the PADDB (Add Packed Bytes) instruction adds two groups of eight packed bytes. The PADDW (Add Packed Words) instruction, which adds packed words, could operate on the same 64 bits as the PADDB instruction treating the 64 bits as four 16-bit words. X_PackedData
In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. That is, the carry is ignored.
In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for the data type (see Table 21). The result of an operation that exceeds the range of a data-type saturates to the maximum value of the range. A result that is less than the range of a data type saturates to the minimum value of the range. This is useful in many cases, such as color calculations.
For example, when the result exceeds the data range limit for signed bytes, it is saturated to 0x7F (0xFF for unsigned bytes). If a value is less than the data range limit, it is saturated to 0x80 for signed bytes (0x00 for unsigned bytes).
Saturation provides a useful feature of avoiding wraparound artifacts. In the example of color calculations, saturation causes a color to remain pure black or pure white without allowing for an inversion.
| ||||
Byte | 80H | -128 | 7FH | 127 |
Word | 8000H | -32,768 | 7FFFH | 32,767 |
Byte | 00H | 0 | FFH | 255 |
Word | 0000H | 0 | FFFFH | 65,535 |
MMX instructions do not indicate overflow or underflow occurrence by generating exceptions or setting flags.
The PADD (Packed Add) and PSUB (Packed Subtract) instructions add or subtract the signed or unsigned data elements of the source operand to or from the destination operand in wrap- around mode. These instructions support packed byte, packed word, and packed doubleword data types.
The PADDS (Packed Add with Saturation) and PSUBS (Packed Subtract with Saturation) instructions add or subtract the signed data elements of the source operand to or from the signed data elements of the destination operand and saturate the result to the limits of the signed data-type range. These instructions support packed byte and packed word data types.
The PADDUS (Packed Add Unsigned with Saturation) and PSUBUS (Packed Subtract Unsigned with Saturation) instructions add or subtract the unsigned data elements of the source operand to or from the unsigned data elements of the destination operand and saturate the result to the limits of the unsigned data-type range. These instructions support packed byte and packed word data types.
Packed Multiplication
Packed multiplication instructions perform four multiplications on pairs of signed 16-bit operands, producing 32-bit intermediate results. Users may choose the low-order or high-order parts of each 32-bit result.
The PMULHW (Packed Multiply High) and PMULLW (Packed Multiply Low) instructions multiply the signed words of the source and destination operands and write the high-order or low-order 16 bits of each of the results to the destination operand.
Packed Multiply Add
The PMADDWD (Packed Multiply and Add) instruction calculates the products of the signed words of the source and destination operands. The four intermediate 32-bit doubleword products are summed in pairs to produce two 32-bit doubleword results.
These instructions support packed byte, packed word and packed doubleword data types.
The Pack and Unpack instructions perform conversions between the packed data types.
The PACKSS (Packed with Signed Saturation) instruction converts signed words into signed bytes or signed doublewords into signed words, in signed saturation mode.
The PACKUS (Packed with Unsigned Saturation) instruction converts signed words into unsigned bytes, in unsigned saturation mode.
The PUNPCKH (Unpack High Packed Data) and PUNPCKL (Unpack Low Packed Data) instructions convert bytes to words, words to doublewords, or doublewords to quadwords.
The PSLL (Packed Shift Left Logical) and PSRL (Packed Shift Right Logical) instructions perform a logical left or right shift, and fill the empty high or low order bit positions with zeros. These instructions support packed word, packed doubleword, and quadword data types.
The PSRA (Packed Shift Right Arithmetic) instruction performs an arithmetic right shift, copying the sign bit into empty bit positions on the upper end of the operand. This instruction supports packed word and packed doubleword data types.
The MOVQ (Move 64 Bits) instruction transfers 64-bits of packed data from memory to MMX registers and vise versa, or transfers data between MMX registers.
For example, a two-operand instruction would be decoded as:
DEST(left operand) DEST (left operand) OP SRC (right operand)
The source operand for all the MMX instructions (except the data transfer instructions), can reside either in memory or in an MMX register. The destination operand resides in an MMX register.
For data transfer instructions, the source and destination operands can also be an integer register (for the MOVD instruction) or memory location (for both the MOVD and MOVQ instructions).
MMX technology uses the same interface techniques
between the floating-point architecture and the operating system
(primarily for task switching purposes). For more detail, see
Section 4.1.