Intel Developer Relations - Programmer's Reference Manual

PROGRAMERS REFERENCE MANUAL

Chapter 2
INTEL ARCHITECTURE MMX™ TECHNOLOGY FEATURES

This chapter provides a general overview of the architectural features of the Intel Architecture MMX™ technology.

2.1 NEW FEATURES

MMX technology provides the following new features, while maintaining backward compatibility with all existing Intel Architecture microprocessors, IA applications, and operating systems.

New data types
Eight MMX registers
Enhanced instruction set

The performance of applications which use these new features of MMX technology can be enchanced.

2.2 NEW DATA TYPES

The principal data type of the IA MMX technology is the packed fixed-point integer. The decimal point of the fixed-point values is implicit and is left for the user to control for maximum flexibility.

The IA MMX technology defines the following four new 64-bit data types (See Figure 21):

Packed byte Eight bytes packed into one 64-bit quantity
Packed word Four words packed into one 64-bit quantity
Packed doubleword Two doublewords packed into one 64-bit quantity
Quadword One 64-bit quantity

Figure 21. Packed Data Types

2.3 MMX™ REGISTERS

The IA MMX technology provides eight 64-bit, general-purpose registers. These registers are aliased on the floating-point registers. The operating system handles the MMX technology as it would handle floating-point. (See Section 4.3 for more details on register aliasing.)

The MMX registers can hold packed 64-bit data types. The MMX instructions access the MMX registers directly using the register names MM0 to MM7 (See Figure 22).

MMX registers can be used to perform calculations on data. They cannot be used to address memory; addressing is accomplished by using the integer registers and standard IA addressing modes.

Figure 22. MMX™ Register Set

2.4 EXTENDED INSTRUCTION SET

The IA MMX instruction set supplies a rich set of instructions that operate on all data elements of a packed data type, in parallel. The MMX instructions can operate on either signed or unsigned data elements.

The MMX instructions implement two new principles (discussed in section Packed Data 2.4.2.):

Operations on packed data
Saturation arithmetic

2.4.2 Packed Data

The MMX instructions can operate on groups of eight bytes, four words, and two doublewords. These groups of 64 bits are referred to as packed data. The same 64 bits of data can be treated as any one of the packed data types. Data is cast by the type specified by the instruction.

For example, the PADDB (Add Packed Bytes) instruction adds two groups of eight packed bytes. The PADDW (Add Packed Words) instruction, which adds packed words, could operate on the same 64 bits as the PADDB instruction treating the 64 bits as four 16-bit words. X_PackedData

2.4.3 Saturation Arithmetic and Wrap Around

The MMX technology supports a new arithmetic capability known as saturating arithmetic. Saturation is best defined by contrasting it with wraparound mode.

In wraparound mode, results that overflow or underflow are truncated and only the lower (least significant) bits of the result are returned. That is, the carry is ignored.

In saturation mode, results of an operation that overflow or underflow are clipped (saturated) to a data-range limit for the data type (see Table 21). The result of an operation that exceeds the range of a data-type saturates to the maximum value of the range. A result that is less than the range of a data type saturates to the minimum value of the range. This is useful in many cases, such as color calculations.

For example, when the result exceeds the data range limit for signed bytes, it is saturated to 0x7F (0xFF for unsigned bytes). If a value is less than the data range limit, it is saturated to 0x80 for signed bytes (0x00 for unsigned bytes).

Saturation provides a useful feature of avoiding wraparound artifacts. In the example of color calculations, saturation causes a color to remain pure black or pure white without allowing for an inversion.

Table 2. Data Range Limits for Saturation
	Lower Limit		Upper Limit
Signed	Hexadecimal	Decimal	Hexadecimal	Decimal
Byte	80H	-128	7FH	127
Word	8000H	-32,768	7FFFH	32,767
Unsigned
Byte	00H	0	FFH	255
Word	0000H	0	FFFFH	65,535

MMX instructions do not indicate overflow or underflow occurrence by generating exceptions or setting flags.

2.5 INSTRUCTION GROUP OVERVIEW

This section provides an overview of the MMX instruction groups. See Chapter 5 for detailed information on the instructions, including information on encoding, operation, and exceptions. The fifty-seven new MMX instructions are grouped into these categories:

Arithmetic Instructions
Comparison Instructions
Conversion Instructions
Logical Instructions
Shift Instructions
Data Transfer Instructions
Empty MMX State (EMMS) Instruction

2.5.2 Arithmetic Instructions

Packed Addition and Subtraction

The PADD (Packed Add) and PSUB (Packed Subtract) instructions add or subtract the signed or unsigned data elements of the source operand to or from the destination operand in wrap- around mode. These instructions support packed byte, packed word, and packed doubleword data types.

The PADDS (Packed Add with Saturation) and PSUBS (Packed Subtract with Saturation) instructions add or subtract the signed data elements of the source operand to or from the signed data elements of the destination operand and saturate the result to the limits of the signed data-type range. These instructions support packed byte and packed word data types.

The PADDUS (Packed Add Unsigned with Saturation) and PSUBUS (Packed Subtract Unsigned with Saturation) instructions add or subtract the unsigned data elements of the source operand to or from the unsigned data elements of the destination operand and saturate the result to the limits of the unsigned data-type range. These instructions support packed byte and packed word data types.

Packed Multiplication

Packed multiplication instructions perform four multiplications on pairs of signed 16-bit operands, producing 32-bit intermediate results. Users may choose the low-order or high-order parts of each 32-bit result.

The PMULHW (Packed Multiply High) and PMULLW (Packed Multiply Low) instructions multiply the signed words of the source and destination operands and write the high-order or low-order 16 bits of each of the results to the destination operand.

Packed Multiply Add

The PMADDWD (Packed Multiply and Add) instruction calculates the products of the signed words of the source and destination operands. The four intermediate 32-bit doubleword products are summed in pairs to produce two 32-bit doubleword results.

2.5.3 Comparison Instructions

The PCMPEQ (Packed Compare for Equal) and PCMPGT (Packed Compare for Greater Than) instructions compare the corresponding data elements in the source and destination operands for equality or value greater than, respectively. These instructions generate a mask of ones or zeros which are written to the destination operand. Logical operations can use the mask to select elements. This can be used to implement a packed conditional move operation without a branch or a set of branch instructions. No flags are set.

These instructions support packed byte, packed word and packed doubleword data types.

2.5.4 Conversion Instructions

Pack and Unpack

The Pack and Unpack instructions perform conversions between the packed data types.

The PACKSS (Packed with Signed Saturation) instruction converts signed words into signed bytes or signed doublewords into signed words, in signed saturation mode.

The PACKUS (Packed with Unsigned Saturation) instruction converts signed words into unsigned bytes, in unsigned saturation mode.

The PUNPCKH (Unpack High Packed Data) and PUNPCKL (Unpack Low Packed Data) instructions convert bytes to words, words to doublewords, or doublewords to quadwords.

2.5.5 Logical Instructions

The PAND (Bitwise Logical And), PANDN (Bitwise Logical And Not), POR (Bitwise Logical OR), and PXOR (Bitwise Logical Exclusive OR) instructions perform bitwise logical operations on 64-bit quantities.

2.5.6 Shift Instructions

The logical shift left, logical shift right and arithmetic shift right instructions shift each element by a specified number of bits. The logical left and right shifts also enable a 64-bit quantity (quadword) to be shifted as one block, assisting in data type conversions and alignment operations.

The PSLL (Packed Shift Left Logical) and PSRL (Packed Shift Right Logical) instructions perform a logical left or right shift, and fill the empty high or low order bit positions with zeros. These instructions support packed word, packed doubleword, and quadword data types.

The PSRA (Packed Shift Right Arithmetic) instruction performs an arithmetic right shift, copying the sign bit into empty bit positions on the upper end of the operand. This instruction supports packed word and packed doubleword data types.

2.5.7 Data Transfer Instructions

The MOVD (Move 32 Bits) instruction transfers 32 bits of packed data from memory to MMX registers and visa versa, or from integer registers to MMX registers and visa versa.

The MOVQ (Move 64 Bits) instruction transfers 64-bits of packed data from memory to MMX registers and vise versa, or transfers data between MMX registers.

2.5.8 EMMS (Empty MMX™ State) Instructions

The EMMS instruction empties the MMX state. This instruction must be used to clear the IA MMX state (empty the floating-point tag word) at the end of an MMX routine before calling other routines that can execute floating-point instructions.

2.6 INSTRUCTION OPERAND

All MMX instructions, except the EMMS instruction, reference and operate on two operands: the source and destination operands. The right operand is the source and the left operand is the destination. The destination operand may also be a second source operand for the operation. The instruction overwrites the destination operand with the result.

For example, a two-operand instruction would be decoded as:

DEST(left operand) DEST (left operand) OP SRC (right operand)

The source operand for all the MMX instructions (except the data transfer instructions), can reside either in memory or in an MMX register. The destination operand resides in an MMX register.

For data transfer instructions, the source and destination operands can also be an integer register (for the MOVD instruction) or memory location (for both the MOVD and MOVQ instructions).

2.7 COMPATIBILITY

The IA MMX state is aliased upon the IA floating-point state. No new state or mode is added to support the MMX technology. The same floating-point instructions that save and restore the floating-point state also handle the IA MMX state (for example, during context switching).

MMX technology uses the same interface techniques between the floating-point architecture and the operating system (primarily for task switching purposes). For more detail, see Section 4.1.

Legal Stuff © 1997 Intel Corporation