

## **AP-258**

## APPLICATION NOTE

# High Speed Numerics with the 80186/80188 and 8087

STEVE FARRER
APPLICATIONS ENGINEER

February 1986

Order Number: 231590-001

Information in this document is provided in connection with Intel products. Intel assumes no liability whatsoever, including infringement of any patent or copyright, for sale and use of Intel products except as provided in Intel's Terms and Conditions of Sale for such products.

Intel retains the right to make changes to these specifications at any time, without notice. Microcomputer Products may have minor variations to this specification known as errata.

\*Other brands and names are the property of their respective owners.

†Since publication of documents referenced in this document, registration of the Pentium, OverDrive and iCOMP trademarks has been issued to Intel Corporation.

Contact your local Intel sales office or your distributor to obtain the latest specifications before placing your product order.

Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from:

Intel Corporation P.O. Box 7641 Mt. Prospect, IL 60056-7641 or call 1-800-879-4683

COPYRIGHT © INTEL CORPORATION, 1996

## HIGH SPEED NUMERICS WITH THE 80186/80188 AND 8087

| CONTENTS                                                | PAGE |
|---------------------------------------------------------|------|
| 1.0 INTRODUCTION                                        | 1    |
| 2.0 OVERVIEW OF THE 80186/80188                         | 1    |
| 3.0 NUMERICS OVERVIEW                                   | 2    |
| 3.1 The Benefits of Numeric Coprocessing                | 2    |
| 3.2 Introduction to the 8087                            |      |
| 3.3 Escape Instructions                                 | 2    |
| 3.4 Host Response to Escape Instructions                | 3    |
| 3.5 Coprocessor Response to Escape Instructions         |      |
| 4.0 OVERVIEW OF THE 82188<br>INTEGRATED BUS CONTROLLER  | 4    |
| 4.1 Introduction                                        | 4    |
| 4.2 Bus Control Signals                                 | 4    |
| 4.3 Bus Arbitration                                     | 4    |
| 4.4 Interface Logic                                     | 4    |
| 5.0 DESIGNING THE SYSTEM                                | 4    |
| 5.1 Circuit Schematics of the 80186/8-82188-8087 System | 4    |
| 5.2 Queue Status Interface                              | 5    |
| 5.3 Control Signals                                     | 6    |
| 5.3.1 ALE                                               | 6    |
| 5.3.2 Read & Write                                      |      |
| 5.3.3 <u>DEN</u>                                        |      |
| 5.3.4 DT/R                                              |      |
| 5.4 Chip Selects                                        |      |
| 5.4.1 Introduction                                      |      |
| 5.4.2 CSI and CSO of the 82188                          |      |
| 5.4.3 System Design Example                             |      |
| 5.5 Wait State & Ready Logic                            |      |
| 5.5.1 Internal Wait States with Instruction Fetches     | 10   |
| 5.5.2 Internal Wait States with Data & I/C              | 10   |
| 5.5.3 Automatic Wait States at Reset                    |      |
| 5.5.4 External Ready Synchronization .                  |      |
| 5.6 Bus Arbitration                                     |      |
| 5.7 Speed Requirements                                  | 11   |

| CONTENTS                           | PAGE   | CONTENTS                    | PAGE    |
|------------------------------------|--------|-----------------------------|---------|
| 6.0 BENCHMARKS                     | 12     | 6.4 Whetstone Benchmark Rou | tine 13 |
| 6.1 Introduction                   |        |                             |         |
| 6.2 Interest Rate Calculations     | 12     | 7.0 CONCLUSION              | 15      |
| 6.3 Matrix Multiply Benchmark Rout | ine 13 | 7.0 CONCLUSION              |         |



#### 1.0 INTRODUCTION

From their introduction in 1982, the highly integrated 16-bit 80186 and its 8-bit external bus version, the 80188, have been ideal processor choices for high-performance, low-cost embedded control applications. The integrated peripheral functions and enhanced 8086 CPU of the 80186 and 80188 allow for an easy upgrade of older generation control applications to achieve higher performance while lowering the overall system cost through reduced board space, and a simplified production flow.

More and more controller applications need even higher performance in numerics, yet still require the low-cost and small form factor of the 80186 and 80188. The 8087 Numerics Data Coprocessor satisfies this need as an optional add-on component.

The 8087 Numeric Data Coprocessor is interfaced to the 80186 and 80188 through the 82188 IBC (Integrated Bus Controller). The IBC provides a highly integrated interface solution which replaces the 8288 used in 8086–8087 systems. The IBC incorporates all the necessary bus control for the 8087 while also providing the necessary logic to support the interface between the 80186/8 and the 8087.

This application note discusses the design considerations associated with using the 8087 Numeric Data Coprocessor with the 80186 and 80188. Sections two,

three, and four contain an overview of the integrated circuits involved in the numerics configuration. Section five discusses the interfacing aspects between the 80186/8 and the 8087, including the role of the 82188 Integrated Bus Controller and the operation of the integrated peripherals on the 80186/8 with the 8087. Section six compares the advantages of using an 8087 Numeric Data Coprocessor over software routines written for the host processor as well as the advantage of using an 80186/8 numerics system.

Except where noted, all future references to the 80186 will apply equally to the 80188.

#### 2.0 OVERVIEW OF THE 80186

The 80186 and 80188 are highly integrated microprocessors which effectively combine up to 20 of the most common system components onto a single chip. The 80186 and 80188 processors are designed to provide both higher performance and a more highly integated solution to the total system.

Higher integration results from integrating system peripherals onto the microprocessor. The peripherals consist of a clock generator, an interrupt controller, a DMA controller, a counter/timer unit, a programmable wait state generator, programmable chip selects, and a bus controller. (See Figure 1.)



Figure 1. 80186/8 Block Diagram



Higher performance results from enhancements to both general and specific areas of the 8086 CPU, including faster effective address calculation, improvement in the execution speed of many instructions, and the inclusion of new instructions which are designed to produce optimum 80186 code.

The 80186 and 80188 are completely object code compatible with the 8086 and 8088. They have the same basic register set, memory organization, and addressing modes. The differences between the 80186 and 80188 are the same as the differences between the 8086 and 8088: the 80186 has a 16-bit architecture and 16-bit bus interface; the 80188 has a 16-bit internal architecture and an 8-bit data bus interface. The instruction execution times of the two processors differ accordingly: for each non-immediate 16-bit data read/write instruction, 4 additional clock cycles are required by the 80188.

#### 3.0 NUMERICS OVERVIEW

## 3.1 The Benefits of Numeric Coprocessing

The 8086/8 and 80186/8 are general purpose microprocessors, designed for a very wide range of applications. Typically, these applications need fast, efficient data movement and general purpose control instructions. Arithmetic on data values tends to be simple in these applications. The 8086/8 and 80186/8 fulfill these needs in a low cost, effective manner.

However, some applications require extremely fast and complex math functions which are not provided by a general purpose processor. Such functions as square root, sine, cosine, and logarithms are not directly available in a general purpose processor. Software routines required to implement these functions tend to be slow and not very accurate. Integer data types and their arithmetic operations (i.e., add, subtract, multiply and divide) which are directly available on general purpose processors, still may not meet the needs for accuracy, speed and ease of use.

Providing fast, accurate, complex math can be quite complicated, requiring large areas of silicon on integrated circuits. A general data processor does not provide these features due to the extra cost burden that less complex general applications must take on. For such features, a special numeric data processor is required — one which is easy to use and has a high level of support in hardware and software.

#### 3.2 Introduction to the 8087

The 8087 is a numeric data coprocessor which is capable of performing complex mathematical functions while the host processor (i.e. the main CPU) performs

more general tasks. It supports the necessary data types and operations and allows use of all the current hardware and software support for the 8086/8 and 80186/8 microprocessors. The fact that the 8087 is a coprocessor means it is capable of operating in parallel with the host CPU, which greatly improves the processing power of the system.

The 8087 can increase the performance of floatingpoint calculations by 50 to 100 times, providing the performance and precision required for small business and graphics applications as well as scientific data processing.

The 8087 numeric coprocessor adds 68 floating-point instructions and eight 80-bit floating-point registers to the basic 8086 programming architecture. All the numeric instructions and data types of the 8087 are used by the programmer in the same manner as the general data types and instructions of the host.

The numeric data formats and arithmetic operations provided by the 8087 support the proposed IEEE Microprocessor Floating Point Standard. All of the proposed IEEE floating point standard algorithms, exception detection, exception handling, infinity arithmetic and rounding controls are implemented. The IEEE standard makes it easier to use floating point and helps to avoid common problems that are inherent to floating point.

#### 3.3 Escape Instructions

The coprocessing capabilities of the 8087 are achieved by monitoring the local bus of the host processor. Certain instructions within the 8086 assembly language known as ESCAPE instructions are defined to be coprocessor instructions and, as such, are treated differently.

The coprocessor monitors program execution of the host processor to detect the occurrence of an ESCAPE instruction. The fetching of instructions is monitored via the data bus and bus cycle status S2–S0, while the execution of instructions is monitored via the queue status lines QS0 and QS1.

All ESCAPE instructions start with the high-order 5-bits of the instruction opcode being 11011. They have two basic forms, the memory reference form and the non-memory reference form. The non-memory form, shown in Figure 2A, initiates some activity in the co-processor using the nine available bits of the ESCAPE instruction to indicate which function to perform.

Memory reference forms of the ESCAPE instruction, shown in Figure 2B, allow the host to point out a memory operand to the coprocessor using any host memory





Figure 2A. Non-Memory Reference ESCAPE Instructions

addressing mode. Six bits are available in the memory reference form to identify what to do with the memory operand.

Memory reference forms of ESCAPE instructions are identified by bits 7 and 6 of the byte following the ESCAPE opcode. These two bits are the MOD field of the 8086/8 or 80186/8 effective address calculation byte. Together with the R/M field (bits 2 through 0), they determine the addressing mode and how many subsequent bytes remain in the instruction.

## 3.4 Host Response to Escape Instructions

The host performs one of two possible actions when encountering an ESCAPE instruction: do nothing (operation is internal to 8087) or calculate an effective address and read a word value beginning at that address (required for all LOADS and STORES). The host ignores the value of the word read and hence the cycle is referred to as a "Dummy Read Cycle." ESCAPE instructions do not change any registers in the host other than advancing the IP. If there is no coprocessor or the coprocessor ignores the ESCAPE instruction, the ESCAPE instruction is effectively a NOP to the host. Other than calculating a memory address and reading a word of memory, the host makes no other assumptions regarding coprocessor activity.

The memory reference ESCAPE instructions have two purposes: to identify a memory operand and, for certain instructions, to transfer a word from memory to the coprocessor.

## 3.5 Coprocessor Response to Escape Instructions

The 8087 performs basically three types of functions when encountering an ESCAPE instruction: LOAD (read from memory), STORE (write to memory), and EXECUTE (perform one of the internal 8087 math functions).

When the host executes a memory reference ESCAPE instruction intended to cause a read operation by the 8087, the host always reads the low-order word of any 8087 memory operand. The 8087 will save the address and data read. To read any subsequent words of the operand, the 8087 must become a local bus master.

When the 8087 has the local bus, it increments the 20bit physical address it saved to address the remaining words of the operand.

When the ESCAPE instruction is intended to cause a write operation by the 8087, the 8087 will save the address but ignore the data read. Eventually, it will get control of the local bus and perform successive writes incrementing the 20-bit address after each word until the entire numeric variable has been written.

ESCAPE instructions intended to cause the execution of a coprocessor calculation do not require any bus activity. Numeric calculations work off of an internal register stack which has been initialized using a LOAD operation. The calculation takes place using one or two of the stack positions specified by the ESCAPE instruction. The result of the operation is also placed in one of the stack positions specified by the ESCAPE instruction. The result may then be returned to memory using a STORE instruction, thus allowing the host processor to access it.



Figure 2B. Memory Reference ESCAPE Instruction Forms



## 4.0 OVERVIEW OF THE 82188 INTEGRATED BUS CONTROLLER

#### 4.1 Introduction

The 82188 Integrated Bus Controller (IBC) is a highly integrated version of the 8288 Bus Controller. The IBC provides command and control timing signals for bus control and all of the necessary logic to interface the 80186 to the 8087.

#### 4.2 Bus Control Signals

The bus command and control signals consist of  $\overline{RD}$ ,  $\overline{WR}$ ,  $\overline{DEN}$ ,  $\overline{DT/R}$ , and ALE. The timings and levels are driven following the latching of valid signals on the status lines S0–S2. When S0–S2 change state from passive to active, the IBC begins cycling through a state machine which drives the corresponding control and command lines for the bus cycle. As with the 8288, an address enable input  $(\overline{AEN})$  is present to allow tri-stat-

ing when other bus masters supply their own bus control signals.

#### 4.3 Bus Arbitration

The IBC also has the ability to convert bus arbitration protocols of  $\overline{RQ/GT}$  to HOLD-HLDA. This allows the 82586 Local Area Network (LAN) Coprocessor, the 82730 Text Coprocessor, and other coprocessors using the HOLD-HLDA protocol to be interfaced to the 8086/8 as well as allowing the 80186/8 to be interfaced to the 8087. In addition to converting arbitration protocols, the IBC makes it possible to arbitrate between two bus masters using HOLD-HLDA with a third using  $\overline{RQ/GT}$ .

#### 4.4 Interface Logic

In addition to all the bus control and arbitration features, the IBC provides logic to connect the queue status to the 8087, a chip-select for the 8087, and the necessary READY synchronization required between the 8087 and the 80186/8.

#### 5.0 DESIGNING THE SYSTEM

#### 5.1 Circuit Schematics of the 80186/8-82888-8087 System



Figure 3. 80186/8-82188-8087 Circuit Diagram



#### 5.2 Queue Status

The 8087 tracks the instruction execution of the 80186 by keeping an internal instruction queue which is identical to the processor's instruction queue. Each time the processor performs an instruction fetch, the 8087 latches the instruction into its own queue in parallel with the processor. Each time the processor removes the first byte of an instruction from the queue, the 8087 removes the byte at the top of the 8087 queue and checks to see if the byte is an ESCAPE prefix. If it is, the 8087 decodes the following bytes in parallel with the processor to determine which numeric instruction the bytes represent. If the first byte of the instruction is not an ESCAPE prefix, the 8087 discards it along with the subsequent bytes of the non-numeric instruction as the 80186 removes them from the queue for execution.

The 8087 operates its internal instruction queue by monitoring the two queue status lines from the CPU. This status information is made available by the CPU by placing it into queue status mode. This requires strapping the  $\overline{RD}$  pin on the 80186 to ground. When  $\overline{RD}$  is tied to ground, ALE and  $\overline{WR}$  become QS0 (Queue Status #0) and QS1 (Queue Status #1) respectively.

Table 1. Queue Status Decoding

| QS1 | QS0 | Queue Operation            |  |  |
|-----|-----|----------------------------|--|--|
| 0   | 0   | No queue operation         |  |  |
| 0   | 1   | First byte from queue      |  |  |
| 1   | 0   | Subsequent byte from queue |  |  |
| 1   | 1   | Reserved                   |  |  |

Each time the 80186 begins decoding a new instruction, the queue status lines indicate "first byte of instruction taken from the queue". This signals the 8087 to check for an ESCAPE prefix. As the remaining bytes of the instruction are removed, the queue status indicates "subsequent byte removed from queue". The 8087 uses this status to either continue decoding subsequent bytes, if the first byte was an ESCAPE prefix, or to discard the subsequent bytes if the first byte was not an ESCAPE prefix.

The QS0(ALE) and QS1( $\overline{WR}$ ) pins of the 80186 are fed directly to the 82188 where they are latched and delayed by one-half-clock. The delayed queue status from the 82188 is then presented directly to the 8087.

The waveforms of the queue status signals are shown in Figure 4. The critical timings are the setup time into the 82188 from the 80186 and the setup and hold time into the 8087 from the 82188. The calculations for an 8 MHz system are as follows:





Figure 4. Queue Status Timing



#### 5.3 Bus Control Signals

When the 80186 is in Queue Status mode, another component must generate the ALE,  $\overline{\text{RD}}$ , and  $\overline{\text{WR}}$  signals. The 82188 provides these signals by monitoring the CPU bus cycle status ( $\overline{\text{S0}}-\overline{\text{S2}}$ ). Also provided are  $\overline{\text{DEN}}$  and  $\overline{\text{DT/R}}$  which may be used for extra drive capability on the control bus. With the exception of ALE, all control signals on the 82188 are almost identical to their corresponding 80186 control signals. This section discusses the differences between the 80186 and the 82188 control signals for the purpose of upgrading an 80186 design to an 80186–8087 design. For original 80186–8087 designs, there is no need to compare control signal timings of the 82188 with the 80186.

#### 5.3.1 ALE

The ALE (Address Latch Enable) signal goes active one clock phase earlier on the 80186 than on the 82188. Timing of the ALE signal on the 82188 is closer to that of the 8086 and 8288 bus controller because the bus cycle status is used to generate the ALE pulse. ALE on the 80186 goes active before the bus cycle status lines are valid.

The inactive edge of ALE occurs in the same clock phase for both the 80186 and the 82188. The setup and hold times of the 80186 address relative to the 82188 ALE signal are shown in Figure 5 and are calculated for an 8 MHz system as follows:

#### NOTE:

The hold time calculation is the same for both the 80186 and 8087.

These timings provide adequate setup and hold times for a 74LS373 address latch.



Figure 5. Address Latch Timings

Setup Time For 
$$80186 = T_{AVCH}$$
 (186 min) +  $T_{CHLL}$  (82188 min) =  $10 + 0 = 10$  ns.

For  $8087 = 0.5$  ( $T_{CLCL}$ ) -  $T_{CLAV}$  (8087 max) +  $T_{CHLL}$  (82188 min) =  $0.5$  (125) -  $55 + 0 = 7.5$ 

Hold Time =  $0.5$  ( $T_{CLCL}$ ) -  $T_{CHLL}$  (82188 max) +  $T_{CLAZ}$  (186 min) =  $0.5$  (125) -  $30 + 10 = 42.5$  ns.





Figure 6. Read and Write Timings

#### 5.3.2 Read and Write

The read and write signals of the 82188 have identical timings to those of the 80186 with one exception: the 82188  $\overline{WR}$  inactive edge may not go inactive quite as early as the 80186. This spec is, in fact, a tighter spec than the 80186  $\overline{WR}$  timing and should make designs easier. The timings for  $\overline{RD}$  and  $\overline{WR}$  are shown in Figure 6 for both the 80186 and the 82188.

#### 5.3.3 **DEN**

The  $\overline{\rm DEN}$  signal on the 82188 is identical to the  $\overline{\rm DEN}$  signal on the 80186 but with a tighter timing specification. This makes designs easier with the 82188 and makes upgrades from 80186 bus control to 82188 bus control more straightforward. The timings for  $\overline{\rm DEN}$  on both the 80186 and 82188 are shown in Figure 7.

#### 5.3.4 DT/R

The operation of the DT/ $\overline{R}$  signal varies somewhat between the 80186 and the 82188. The 80186 DT/ $\overline{R}$  signal will remain in an active high state for all write cycles and will default to a high state when the system bus is idle (i.e., no bus activity). The 80186 DT/ $\overline{R}$  goes low only for read cycles and does so only for the duration of the bus cycle. At the end of the read cycle, assuming the following cycle is a non-read, the DT/ $\overline{R}$  signal will default back to a high state. Back-to-back read cycles will result in the DT/ $\overline{R}$  signal remaining low until the end of the last read cycle.

The  $DT/\overline{R}$  signal on the 82188 operates differently by making transitions only at the start of a bus cycle. The 82188  $DT/\overline{R}$  signal has no default state and therefore will remain in whichever state the previous bus cycle required. The 82188  $DT/\overline{R}$  signal will only change states when the current bus cycle requires a state different from the previous bus cycle.



Figure 7. Data Control Timings





Figure 8. Data Transmit & Receive Timings

#### 5.4 Chip Selects

#### **5.4.1 INTRODUCTION**

Chip-select circuitry is typically accomplished by using a discrete decoder to decode two or more of the upper address lines. When a valid address appears on the address bus, the decoder generates a valid chip-select. With this method, any bus master capable of placing an address on the system bus is able to generate a chip-select. An example of this is shown in Figure 9 where an 8086/8087 system uses a common decoder on the address bus. Note the decoder is able to operate regardless of which processor is in control of the bus.



Figure 9. Typical 8086/8087 System

With high integration processors like the 80186 and 80188, the chip-select decoder is integrated onto the processor chip. The integrated chip-selects on the 80186 enable direct processor connection to the chipenable pins on many memory devices, thus eliminating an external decoder. But because the integrated chip-selects decode the 80186's internal bus, an external bus master, such as the 8087, is unable to activate them. The 82188 IBC solves this problem by supplying a chip-select mechanism which may be activated by both the host processor and a second processor.

#### 5.4.2 CSI AND CSO OF THE 82188

The  $\overline{\text{CSI}}$  (chip select in) and  $\overline{\text{CSO}}$  (chip select out) pins of the 82188 provide a way for a second bus master to select memory while also making use of the 80186 integrated chip-selects. The  $\overline{\text{CSI}}$  pin of the 82188 connects directly to one of the 80186's chip-selects while  $\overline{\text{CSO}}$  connects to the memory device designated for the chip-selects range. An example of this is shown in Figure 10.



Figure 10. Typical 80186/82188/8087 System



When the 80186 has control of the bus, the circuit acts just as a buffer and the memory device gets selected as if the circuit had not been there. Whenever  $\overline{\text{CSI}}$  goes active,  $\overline{\text{CSO}}$  goes active. When a second bus master, such as the 8087, takes control of the bus,  $\overline{\text{CSO}}$  goes active and remains active until the 8087 passes control back to the processor. At this time  $\overline{\text{CSO}}$  is deactivated.

A functional block diagram of the  $\overline{\text{CSI}}$ – $\overline{\text{CSO}}$  circuit is shown in Figure 11. A grant pulse on the  $\overline{\text{RQ}}/\overline{\text{GT0}}$  line gives control to the 8087 and also causes the 8087CONTROL signal to go active, which in turn causes  $\overline{\text{CSO}}$  to go active. The 8087CONTROL signal goes inactive when either a release is received on  $\overline{\text{RQ}}/\overline{\text{GT0}}$ , indicating that the 8087 is relinquishing control to the main processor, or a grant is received on the  $\overline{\text{RQ}}/\overline{\text{GT1}}$  line, indicating that the 8087 is relinquishing control to a third processor. Both actions signify that the 8087 is relinquishing the bus. If  $\overline{\text{CSO}}$  goes inactive because a third processor took control of the bus, then  $\overline{\text{CSO}}$  will go active again for the 8087 when a release pulse is transmitted on the  $\overline{\text{RQ}}/\overline{\text{GT1}}$  line to the 8087. This release pulse occurs as a result of SYSHLDA going inactive from the third processor.

#### **5.4.3 SYSTEM DESIGN EXAMPLE**

To provide the 8087 access to data in low memory through an integrated chip-select, the  $\overline{LCS}$  pin should be disconnected from the bank that it is currently selecting and fed directly into the 82188  $\overline{CSI}$ . The  $\overline{CSI}$  output should be connected to the banks which the  $\overline{LCS}$  formerly selected. The  $\overline{LCS}$  will still select the same banks because  $\overline{CSO}$  goes active whenever  $\overline{CSI}$  goes active. But now the 8087, when taking control of the bus, may also select these banks.

Care must be taken in locating the 8087 data area because it must reside in the area in which the chip-select is defined. If the 8087 generates an address outside of the  $\overline{LCS}$  range, the  $\overline{CSO}$  will still go active, but the address will erroneously select a part of the lower bank. Note also that this chip-select limits the size of the 8087 data area to the maximum size memory which can be selected with one chip-select. However, this does not place a limit on instruction code size or non-8087 data size. All 80186 and 8087 instructions are fetched by the processor and therefore do not require that the 8087 be



Figure 11. 82188 Chip Select Circuitry



able to address them. Likewise, non-8087 data is never accessed by the 8087 and therefore does not require an 8087 chip-select.

#### 5.5 Wait State & Ready Logic

The 8087 must accurately track every instruction fetch the 80186 performs so that each op-code may be read from the system bus by the 8087 in parallel with the processor. This means that for instruction code areas, the 80186 cannot use internally generated wait states. All ready logic for these areas must be generated externally and sent into the 82188. The 82188 then presents a synchronous ready out (SRO) signal to both the 80186 and the 8087.

## 5.5.1 INTERNAL WAIT STATES WITH INSTRUCTION FETCHES

If internal wait states are used by the processor with the 8087 at zero wait states, then the 8087 will latch opcodes using a four clock bus cycle while the processor is using between five and seven clocks on each bus cycle. If the wait states are truly necessary to latch valid data from memory, then a four clock bus cycle will force the 8087 to latch invalid data. The invalid data may then be possibly interpreted to be an ESCAPE prefix when, in reality, it is not. The reverse may also hold true in that the 8087 may not recognize an ESCAPE prefix when it is fetched. These conditions could cause a system to hang (i.e., cease to operate), or operate with erroneous results.

If the memory is fast enough to allow latching of valid data within a four clock bus cycle, then the 80186 internal wait states will not cause the system to hang. Both processors will receive valid data during their respective bus cycles. The 8087 will finish its bus cycle earlier than the processor, but this is of no consequence to system operation. The 8087 will synchronize with the processor using the status lines S0–S2 at the start of the next instruction fetch.

### 5.5.2 INTERNAL WAIT STATES WITH DATA & I/O CYCLES

With the exception of "Dummy Read Cycles" and instruction fetches, all memory and I/O bus cycles executed by the host processor are ignored by the 8087. Coprocessor synchronization is not required for untracked bus cycles and, therefore, internally generated wait states do not affect system operation. All of the I/O space and any part of memory used strictly for data may use the internal wait state generator on the 80186.

Memory used for 8087 data is somewhat different. Here, as in the case of code segment areas, the system must rely on an external ready signal or else the memory must be fast enough to support zero wait state operation. Without an external ready signal, the 8087 will always perform a four clock bus cycle which, when used with slow memories, results in the latching of invalid data.

Internal wait states will not affect system operation for data cycles performed by the 8087. In this case the 8087 has control of the bus and the two processors operate independently.

One type of data cycle has not yet been considered. Each time a numerics variable is accessed, the host processor runs a "Dummy Read Cycle" in order to calculate the operand address for the 8087. The 8087 latches the address and then takes control of the bus to fetch any subsequent bytes which are necessary. If the 8087 variables are located at even addresses, then an internally generated wait state will not present any problems to the system. If any numeric variables are located at odd addresses, then the interface between the host and coprocessor becomes asynchronous causing erroneous results.

The erroneous results are due to the 80186 running two back-to-back bus cycles with wait states while the 8087 runs two back-to-back bus cycles without wait states. The start of the second bus cycle is completely uncoordinated between the two processors and the 8087 is unable to latch the correct address for subsequent transfers. For this reason, 8087 variables in a 80186 system must always lie on even boundaries when using the internal wait state generator to access them.

Numeric variables in an 80188 system must never be in a section of memory which uses the internal wait state generator. The 80188 will always perform consecutive bus cycles which would be equivalent to the 80186 performing an odd addressed "Dummy Read Cycle."

#### 5.5.3 AUTOMATIC WAIT STATES AT RESET

The 80186 automatically inserts three wait states to the predefined upper memory chip select range upon power up and reset. This enables designers to use slow memories for system boot ROM if so desired. If slow ROM's are chosen, then no further programming is necessary. If fast ROM's are chosen, then the wait state logic may simply be reprogrammed to the appropriate number of wait states.

The automatic wait states have the possibility of presenting the same problem as described in section 5.5.1 if



the boot ROM needs one or more wait states. Under these conditions the 8087 would be forced to latch invalid opcodes and possibly mistake one for an ESCAPE instruction.

If the boot ROM requires wait states, then some sort of external ready logic is necessary. This allows both processors to run with the same number of wait states and insures that they always receive valid data.

If the boot ROM does not require wait states, then there is no need to design external ready logic for the upper chip select region. But if 8087 code is present in the upper memory chip select region, the situation described in section 3.4 regarding "Dummy Read Cycles" must be considered.

The 82188 solves this problem by inserting three wait states on the SRO line to the 8087 for the first 256 bus cycles. By doing this the 82188 inserts the same number of wait states to both processors keeping them synchronized. The initialization code for the 80186 must program the upper memory chip select to look at external ready and to insert zero wait states within these first 256 bus cycles. At the end of the 256 bus cycles, the 82188 stops inserting wait states and both processors run at zero wait states.

#### 5.5.4 EXTERNAL READY SYNCHRONIZATION

The 80186 and 8087 sample READY on different clock edges. This implies that some sort of external synchronization is required to insure that both processors sample the same ready state. Without the synchronization, it would be possible for the external signal to change state between samples. The 80186 may sample ready high while the 8087 samples ready low. This would lead to the two processors running different length bus cycles and possibly cause the system to hang.

The 82188 provides ready synchronization through the ARDY and SRDY inputs. Once a valid transition is recorded, the 82188 presents the results on the SRO output and holds the output in that state until both processors have had a chance to sample the signal.

#### **5.6 BUS ARBITRATION**

In order for the 8087 to read and write numeric data to and from memory, it must have a means of taking control of the local bus. With the 8086/88 this is accomplished through a request-grant exchange protocol. The 80186, however, makes use of HOLD/HOLD AC-

KNOWLEDGE protocol to exchange control of the bus with another processor. The 82188 supplies the necessary conversion to interface  $\overline{RQ}/\overline{GT}$  to HOLD/HLDA signals. The  $\overline{RQ}/\overline{GT}$  signal of the 8087 connects directly to the 82188's  $\overline{RQ}/\overline{GT}$ 0 input while the 82188's HOLD and HLDA pins connect to the 80186's HOLD and HLDA pins.

When the 8087 requires control of the bus, the 8087 sends a request on the  $\overline{RQ}/\overline{GT}0$  line to the 82188. The 82188 responds by sending a HOLD request to the 80186. When HLDA is received back from the 80186, the 82188 sends a grant back to the 8087 on the same  $\overline{RQ}/\overline{GT}0$  line.

The 82188 also has provisions for adding a third bus-master to the system which uses HOLD/HLDA protocol. This is accomplished by using the 82188 SYSHOLD, SYSHLDA, and  $\overline{RQ}/\overline{GT}1$  signals. The third processor requests the bus by pulling the SYSHOLD line high. The 82188 will route (and translate if necessary) the requests to the current bus master. If the 8087 has control, the 82188 will request control via the  $\overline{RQ}/\overline{GT}1$  line which should be connected to the 8087's  $\overline{RQ}/\overline{GT}1$  line.

The 8087 will relinquish control by getting off the bus and sending a grant pulse on the  $\overline{RQ/GT1}$  line. The 82188 responds by sending a SYSHLDA to the third processor. The third processor lowers SYSHOLD when it has finished on the bus. The 82188 routes this in the form of a release pulse on the  $\overline{RQ/GT1}$  line to the 8087. The 8087 then continues bus activity where it left off. The maximum latency from SYSHOLD to SYSHLDA is equal to the 80186 latency + 8087 latency + 82188 latency.

#### 5.7 SPEED REQUIREMENTS

One of the most important timing specs associated with the 80186-8087 interface is the speed at which the system should run. The 8087 was designed to operate with a 33% duty cycle clock whereas the 80186 and 80188 were designed to operate with a 50% duty cycle clock. In order to run both parts off the same clock, the 8087 must run at a slower speed than is typically implied by its dash number in the 8086/88 family.



To determine the speed at which an 8087 may run (with a 50% duty cycle clock), the minimum low and high times of the 8087 must be examined. The maximum of these two minimum spees becomes the half-period of the 50% duty cycle system clock. For example, the 8087-1 provides worst case spec compatibility with the 80186 at system clock-speeds of up to 8 MHz. The clock waveforms are shown in Figure 12 using 10 MHz timings.

The minimum clock low time spec ( $T_{\rm CLCH}$ ) of the 10 MHz 8087 is 53 ns. The clock low time of an 8 MHz 80186 is specified to be:

$$\frac{1}{2}(T_{CLCL}) - 7.5$$

Solving for  $T_{\rm CLCL}$  of the 80186 using  $T_{\rm CLCH}$  of the 8087 yields the following:

$$1/_{2}(T_{CLCL}) - 7.5 = T_{CLCH}$$
 $(T_{CLCL}) = 2(T_{CLCH} + 7.5)$ 
 $T_{CLCL} = 121 \text{ ns}$ 

The calculation shows minimum cycle time of the 80186 to be 121 ns. This time translates into a maximum frequency of 8.26 MHz.

#### **6.0 BENCHMARKS**

#### 6.1 Introduction

The following benchmarks compare the overall system performance of an 8086, 80188, and an 80186 in numeric applications. Results are shown for all three processors in systems with the 8087 coprocessor and in systems using an 8087 software emulator. Three FORTRAN benchmark programs are used to dem-

onstrate the large increase in floating-point math performance provided by the 8087 and also the increase in performance due to the enhanced 80186 and 80188 host processors.

The 8086 results were measured on an Intellec Series III Microcomputer Development System with an iSBC 86/12 board and an iSBC 337 multimodule. Typically, one wait state for memory read cycles and two wait states for memory write cycles are experienced in this environment.

The 80186 and 80188 results were measured on a prototype board which allowed zero wait state operation at 8 MHz. The benchmarks measured using the 8087 showed little sensitivity to wait states. Instructions executed on the 8087 tend to be long in comparison to the amount of bus activity required and, therefore, are not affected much by wait states.

The benchmarks measured using the software emulator are much more bus intensive and average from 10 to 15 percent performance degradation for one wait state.

All execution times shown here represent 8 MHz operation. The 8086 results were measured at 5 MHz and extrapolated to achieve 8 MHz execution times.

#### 6.2 Interest Rate Calculations

Routines were written in FORTRAN-86 to calculate the final value of a fund given the annual interest and the present value. It is assumed that the interest will be compounded daily, which requires the calculation of the yearly effective rate. This value, which is the equivalent annual interest if the interest were compounded daily, is determined by the following formula:

$$yer = (1 + (ir/np))**np - 1$$



Figure 12. Clock Cycle Timing



where:

yer is the yearly effective rate ir is the annual interest rate np is the number of compounding periods per annum

Once the yer is determined, the final value of the fund is determined by the formula:

$$fv = (1 + yer) * pv$$

where:

pv is the present value fv is the future value

Results are obtained using single-precision, double-precision, and temporary real precision operands when:

ir is set to 10% (0.1) np is set to 365 (for daily compounding) pv is set to \$2,000,000

#### THE RESULTS:

|                              | yer     | Final Value    |
|------------------------------|---------|----------------|
| Single-Precision (32-bit)    | 10.514% | \$2,210,287.50 |
| Double-Precision<br>(64-bit) | 10.516% | \$2,210,311.57 |
| Temporary Real<br>Precision  | 10.516% | \$2,210,311.57 |

The difference between the final single-precision and double-precision values is \$24.07; the difference in the final value between the double-precision and the temporary real precision is 0.000062 cents. Since the 8087 performs all internal calculations on 80-bit floating point numbers (temp real format), temporary real precision operations perform faster than single- or double-precision. No data conversions are required when loading or storing temporary real values in the 8087. Thus, for business applications, the double-precision computing of the 8087 is essential for accurate results, and the performance advantage of using the 8087 turns out to be as much as 100 times the equivalent software emulation program.

## 6.3 Matrix Multiply Benchmark Routine

A routine was written in FORTRAN-86 to compute the product of two matrices using a simple row/column inner-product method. Execution times were obtained for the multiplication of  $32\times32$  matrices using double precision. The results of the benchmark are shown in Figure 14.

The results show the 8087 coprocessor systems performing from 23 to 31 times faster than the equivalent software emulation program. Both the 80188/87 and the 80186/87 systems outperform the 8086/87 system by 34 to 75 percent. This difference is mainly attributed to the fact that the matrix program largely consists of effective address calculations used in array accessing. The hardware effective address calculator of the 80186 and 80188 allow each array access to improve by as much as three times the 8086 effective address calculation.

#### 6.4 Whetstone Benchmark Routine

The Whetstone benchmark program was developed by Harry Curnow for the Central Computer Agency of the British government. This benchmark has received high visibility in the scientific community as a measurement of main frame computer performance. It is a "synthetic" program. That is, it does not solve a real problem, but rather contains a mix of FORTRAN statements which reflect the frequency of such statements as measured in over 900 actual programs. The program computes a performance metric: "thousands of Whetstone instructions per second (KIPS)."

Simple variable and array addressing, fixed- and floating- point arithmetic, subroutine calls and parameter passing, and standard mathematical functions are performed in eleven separate modules or loops of a prescribed number of iterations.

**Table 2. Interest Rate Benchmark Results** 

|                     | 8087 Software Emulator |         |         | 8087 Coprocessor |        |        |  |
|---------------------|------------------------|---------|---------|------------------|--------|--------|--|
|                     | 80188                  | 8086    | 80186   | 80188            | 8086   | 80186  |  |
| Single Precision    | 70.3 ms                | 62.8 ms | 43.4 ms | .70 ms           | .66 ms | .61 ms |  |
| Double Precision    | 72.1 ms                | 62.9 ms | 44.4 ms | .71 ms           | .66 ms | .61 ms |  |
| Temp Real Precision | 72.6 ms                | 63.0 ms | 44.8 ms | .69 ms           | .65 ms | .59 ms |  |
| Average             | 71.7 ms                | 62.9 ms | 44.2 ms | .70 ms           | .66 ms | .60 ms |  |



The original coding of the Whetstone benchmark was written in Algol-60 and used single-precision values. It was rewritten in FORTRAN with single-precision values to exactly reflect the original intent. Another version was created using double-precision values. The results are shown in Table 3.

The results show the 8087 systems with the 80186 and 80188 outperforming the equivalent software emulation by 60 to 83 times. Additionally, the 80186 coupled with the 8087 outperformed the 8086/87 system by 22 percent.



Figure 13. Interest Rate Benchmark Results



Figure 14. Double Precision Matrix Multiplication



| Table 3. | Whetstone | Benchmar | k Results |
|----------|-----------|----------|-----------|
|----------|-----------|----------|-----------|

| Units = KIPS        | 8087 Software Emulator |      |       | 8087 Coprocessor |       |       |  |
|---------------------|------------------------|------|-------|------------------|-------|-------|--|
| Ollits Kii S        | 80188                  | 8086 | 80186 | 80188            | 8086  | 80186 |  |
| Single<br>Precision | 2                      | 2.3  | 3.3   | 165.8            | 178.0 | 197.6 |  |
| Double<br>Precision | 2                      | 2.2  | 3.2   | 151.7            | 152.0 | 185.2 |  |

#### 6.5 Benchmark Conclusions

These benchmarks show that the 8087 Numeric Data Coprocessor, coupled with either the 80186 or the 80188, can increase the performance of a numeric application by 75 to 100 times the equivalent software emulation program.

Applications which require array accessing with effective address calculations will benefit even more by using the 80188 and 80186 as the host processor as compared to the 8086. The results of the matrix multiplication show both the 80188 and 80186 outperforming the 8086 by 34 and 75%, respectively, in an 8087 system. In general, an 80186/8087 system will offer a 10% to a 75% improvement over an equivalent 8086/8087 system, depending on the instruction mix.

#### 7.0 CONCLUSION

For controller applications which require high performance in numerics and low system cost, the 16-bit 80186 or 8-bit 80188 coupled with the 8087 offers an ideal solution. The integrated features of the 80186 and

80188 offer a low system cost through reduced board space and a simplified production flow while the 8087 fulfills the performance requirements of numeric applications.

The 82188 IBC provides a straightforward, highly integrated solution to interfacing the 80188 or 80186 to the 8087. The bus control timings of the 82188 are compatible with the 80186 and 80188, allowing easy upgrades from existing designs. The 82188 features present a highly integrated solution to both new and old designs.

The coprocessing capabilities of the 8087 bring performance improvements of 75 to 100 times the equivalent 80186 or 80188 software emulation program and an 80186/8087 system will offer a 10% to a 75% improvement over an equivalent 8086/8087 system depending on the instruction mix.

In addition a growing base of high-level language support (FORTRAN, Pascal, C, Basic, PL/M, etc.) from Intel and numerous third-party software vendors facilitates the timely and efficient generation of application software.

#### **REFERENCES:**

82188 Data Sheet #231051 80186 Data Sheet #210451 80188 Data Sheet #210706 iAPX 86/88 80186/188 Users Manual Programmers Reference #210911 Hardware Reference #210912 AP-113 "Getting Started with the Numeric Data Processor" #207865



INTEL CORPORATION, 2200 Mission College Blvd., Santa Clara, CA 95052; Tel. (408) 765-8080

INTEL CORPORATION (U.K.) Ltd., Swindon, United Kingdom; Tel. (0793) 696 000

INTEL JAPAN k.k., Ibaraki-ken; Tel. 029747-8511