Big-Endian Programming Using i960(R) Processors

(#2638) Big-Endian Programming Using i960® Processors

Big-Endian Programming Using i960^® Processors

The i960^® CA and CF microprocessors can access memory in two byte order formats: Little-endian and big-endian. This document shows the difference between the two modes and describes how application developers can use the big-endian support of the i960^® processor family.

The following scenario describes a problem:

I am designing a high performance upgrade to an existing product. The new design must provide more CPU horsepower than our current design provides. I have some C source code from the current design that I hope to use for the upgrade. Our current design uses a processor that accesses memory in "big-endian" form. I feel locked into the processor family that we are already using. How can I use an 80960 processor for my upgrade?

BIG vs. LITTLE ENDIAN

Addressing Fixed Length Data

Let's examine a hypothetical processor that can address 8-bit words. This conceptual processor has registers that are 8 bits wide; each 8-bit work has a unique individual address. Table 1 shows an example of this processor addressing an 8-bit word at location 5. When the processor accesses this 8-bit quantity, the word's address is placed on the bus (5), then the 8-bit word (0xab) is transferred between memory and an internal register. If the processor has the ability to address and transfer only 8-bit words, and the software uses only 8-bit words, there is no "endian-ness".

Table 1. Processor Addressing an 8-bit Word at Location 5

This concept of not having endian-ness can be applied to a hypothetical 16-bit processor as well. Most 16-bit processors can address 8-bit data; for this example, consider one that can only address 16-bit data. Every 16-bit word has a unique individual address. When the processor accesses the word at location 5 (see Table 2) the address of this 16-bit word is placed on the address bus. Then the data transfer occurs between memory and an internal register. the value in this case is 0x6655. Address 6 registers to the next word, not to any byte. If the processor has the ability to address and transfer only 16-bit words, and the software uses only 16-bit words, there is no "endian-ness".

Table 2. Processor Addressing a 16-bit Word at Location 5

A processor that can transfer 32 bits at once into 32-bit registers that can address only 32-bit words also has no "endian-ness". When the CPU accesses address 5 in Table 3, the access is to a 32-bit word (0x6666aaaa). Again, this processor can only address 32-bit words. This means that location 6 is the word after word 5. Since the individual bytes (8 bits) within each word are not addressable, the concept of byte order cannot be applied to this processor, it is neither little-endian nor big-endian.

Table 3. Processor Addressing a 32-bit Word at Location 5

Addressing Data of Varying Sizes

Data Larger Than a Word

Referring back to the hypothetical 8-bit processor of Table 1, assume that software can assemble multiple 8-bit words to form larger data types; 16-bit integers for example. Since this processor can transfer only 8-bit words, there are two options in transferring a 16-bit quantity:

1) transfer the most significant 8-bit word of the integer first, then transfer the least significant 8-bit word, or

2) transfer the least significant 8-bit word, then transfer the most significant 8-bit word.

A programmer could access one 16-bit value with the high-order byte first and a different 16-bit value low-order byte first; however, to avoid unnecessary confusion, the programmer should follow an ordering convention to transfer a 16-bit quantity. If a 16-bit quantity is stored into memory high byte first and read out of memory low byte first, then the value will be scrambled. Byte order is not important when working only with 8-bit words butwith a 16-bit value possible byte order is very important. Referring to Table 1, a 16-bit word access to location 5 has two possible values: 0xabcd (big endian) or 0xcdab (little endian).

Data smaller than a Word

Byte order is important when assembling multiple addressable units into a larger data type as in the above example. In the above case, an 8-bit processor is accessing 16-bit data. A similar byte ordering problem exists with a 16-bit processor accessing 8-bit data.

In Table 2, which shows a 16-bit processor accessing 16-bit words, address 5 refers to the fifth addressable unit. Since an addressable unit in Table 2 is 16 bits wide, the word at address 5 is the fifth 16-bit word. Other complications arise when byte addressing capability is added to this 16-bit processor. Table 4 shows a 16-bit word machine with byte addressing capability. Even though this 16-bit processor can now address bytes, a word is still 16 bits wide.

To access the fifth 16-bit word, no longer can address 5 be used as shown in Table 2. Since each address now refers to a bytenot a 16-bit word the address of the fifth word is 10: 5 words x 2 bytes per word. A 16-bit word at byte address 10 is the fifth word.

Comparing Table 2 with Table 4, a word access at address 5 on the 16-bit word-addressable processor is the same as a word access at address 10 on the byte addressable 16-bit processor. This word's value is 0x6655. This processor supports byte addressing: therefore, the fifth word consists of two addressable bytes: address 5 and address 6.

On a big endian processor, the byte value of address 10 is 0x66; on a little endian processor, the byte value of address 10 is 0x55.

Table 4. Byte Address Assignments

Big endian processors store the most significant byte in the lowest byte address; little endian processors store the lest significant byte in the lowest byte address.

In general, endian-ness exists when it is possible to address items of more than one size, such as 8, 16 and 32-bit integers.

INTEL i960^® PROCESSOR ADDRESSING FEATURES

All i960^® processors share some addressing features:

Internally, each use 32-bit words.
Bytes are addressable.
Every i960^® processor can perform little endian memory accesses.
There are memory instructions to access:
- 8 bit bytes (stob, stib, ldob, ldib)
- 16 bit short words (stos, stis, ldos, ldis)
- 32 bit words (st, ld, atadd, atmod)
- two 32 bit words (stl, ldl)
- three 32 bit words (stt, ldt)
- four 32 bit words (stq, ldq)

In addition to these features, the i960^® CA and CF processors can also perform big endian accesses. The bus size is also configurable on these processors as either 8, 16 or 32 bits wide. Memory is divided into 16 regions. A table in memory a control tablecontains information about each region. Included in this information is each region's byte order. It is possible to have one area of memory configured as big endian and another area of memory as little endian.

i960^® Little Endian Addressing

Little endian refers to the addressing convention where the least significant portion of multi-byte values is stored at lower addresses in a little endian region.

ld 12, r3 loads 0x12345678 into register r3
ldob 12, r3 loads 0x78 into r3
ldos 12, r3 loads 0x5678 into r3

Data is transferred over the bus according to Table 6 for an 8-bit bus, Table 7 for a 16-bit bus and Table 8 for a 32-bit bus.

Note the data lines that transfer the data:
Byte 0 is always transferred on data lines D7-D0
Byte 1 is transferred on data lines D7-D0 for an 8-bit bus; lines D15-D8 for 16 and 32-bit buses
Byte 2 is transferred on data lines D7-D0 for 8 and 16-bit buses; D23-16 for 32-bit buses
Byte 3 is transferred on data lines D7-D0 for an 8-bit bus; D15-D7 for a 16-bit bus; D31-D24 for a 32-bit bus

Table 5. Little Endian 32-bit Data

Table 6. Byte Addresses on 8-bit Bus

Table 7. Byte Addresses on 16-bit Bus

Table 12. Byte Addresses on 32-bit Bus

i960^® Big Endian Addressing

When the i960^® processor performs a big endian access, it stores the most significant byte to the lowest address. Table 9 shows eight words stored in big endian memory:

ld 12, r3 loads 0x12345678 into register r3 as in the little endian example above
ldob 12, r3 loads 0x12 into r3 which is different than little endian, which loaded 0x78
ldos 12, r3 loads 0x1234 into r3

The data lines are the same ones used in the little endian model (see Tables 10, 11 and 12).

Table 9. Big Endian 32-bit Data

Table 10. Byte Addresses on 8-bit Bus

Table 11. Byte Addresses on 16-bit Bus

Table 12. Byte Addresses on 32-bit Bus

How Other Processors Perform a Transfer

Many 32-bit processors and DMA devices that perform big endian transfers use different data lines for the transfers. Byte 0 on such processors use data lines D31-D24. If any other processors or big endian DMA devices are used in a i960^® processor-based system, be sure to match the byte addresses. Connect byte address 0 on the "other" system to byte address 0 on the i960^® processor. As shown in Figure 1, this causes bits D31-D24 to connect to D7-D0; D23-D16 to D15-D7; D7-D0 to D31-D24.

Figure 1. Byte Address Matching Between the i960^® Processor and Other 32-bit Device

MOVING AN APPLICATION TO A i960^® PROCESSOR

The previous section explained the differences between little and big endian modes. This section covers the steps required to use another processor's big endian code on a i960^® processor.

Starting From C Code

Figure 2 is an example of application code that looks at Ethernet frames and filters out IP frames.

Figure 2. Big Endian C Code Filter

The function fram_check is called after an Ethernet frame is received. The Ethernet length field contains the frame type. If this value is 0x0800, the frame is an IP frame. If the frame is IP, the function frame_check calls address_check which performs optional address checking.

Address checking may or may not be performed in the address_check routine; the important thing to notice is that a structure is being passed as a parameter to that function. If the address_check function returns a non-zero value, frame_check returns the value 1, which means that the frame passed the filter. If the frame is not an IP frame or if address_check returns zero, frame_check returns zero, which means that the frame did not pass the filter.

Other Big Endian Processors

Assume that an external Ethernet controller chip, programmed in big endian mode, creates the Ethernet structure. Note that Ethernet controller byte 0 connects to the processor's byte 0. Assume that the Ethernet controller transfers the most significant byte over data lines D31-D24. Assuming also that the processor's most significant byte is D31-D24, the data lines are connected as shown in Table 13.

Table 13. Ethernet Controller to Non-Intel Big Endian Processor Data Line Connections

The buffer is filled with data in the form:

Structure Format

The structure enet_hdr is defined as six bytes of destination address, followed by six bytes of source address, followed by a 16-bit integer that is the length field. Table 14 shows the structure member offsets.

Table 14. Big Endian Structure Member Offsets

dst_addr and src_addr are arrays of bytes. enet_len is a 16-bit integer. The high order byte of enet_len is at offset 12; its low order byte is at offset 13. The compiler defines the structure as follows:

The format is the same as the data in the buffer starting at offset 16. To use the structure, the program assigns buf+ to hdr. The reference to hdr -> enet_len allows access to the correct 16-bit value in memory. If the program had a reference to dst_addr[5] it would access the DA6 byte in memory; a reference to src_addr[0] would access SA1.

Little Endian Processor

What would happen if the big endian processor, interfaced to a big endian Ethernet controller, is replaced with a little endian processor? If the program described above is compiled for a little endian processor, the structure changes. Table 15 shows the locations used for the structure members.

Table 15. Little Endian Structure Member Offsets

At first glance, this might look correct. The actual byte locations used are the same as the big endian compilation, but the data format does not match the data from the Ethernet controller. A little endian compilation yields the following structure definition:

This looks very different from the buffer structure that the Ethernet controller creates. Notice the different notation for the byte addresses. The big endian structure shows the addresses on the left side of a word; the little endian structure shows the addresses on the right. This is because the tables show how words look in memory. A closer look at word organization reveals an interesting problem. If we look at the connection between the Ethernet controller and the little endian processor we see that the byte addresses are crossed (see Table 16).

Table 16. Ethernet Controller to Little Endian Processor Data Line Connections
(byte addresses crossed; data lines match)

With D31-D0 of the little endian processor connected to D31-D0 of the big endian Ethernet controller, the program does not access the correct data. The reference to hdr -> enet_len accesses the 16-bit value after the real enet_len field in memory. If the program had a reference to dst_addr[5] it would access the SA1 byte in memory; a reference to src_addr[0] would access DA6. It seems all we have to do is to connect byte 0 to byte 0, 1 to 1, 2 to 2, and 3 to 3 and our structure should match. So let us connect the big endian Ethernet controller to the little endian processor with the correct byte addresses (see Table 17).

Table 17. Ethernet Controller to Little Endian Processor Data Line Connections
(Byte Addresses match; Data Lines crossed)

The big endian Ethernet controller transfers data just as it did when connected to a big endian processor. The little endian processor now sees memory as:

If the program had a reference to dst_addr[5] it would access the DA6 byte in memory. That is what we want. A reference to src_addr[0] would access SA1. That is also correct. A reference to hdr -> enet_len yields scrambled data. It is not possible to match the little endian structure with the memory contents even if the byte addresses match.

i960^® Processor Big Endian Support

Currently the i960^® processor family has two members that support both little endian and big endian accesses: the i960^® CA and i960^® CF processors. There are at least two Intel compilers that support big endian data: the GNU960 C compiler and the CTOOLS960 compiler. There is a command line switch for each compiler to tell the tools to compile for a little endian or big endian system (-Gbe for CTOOLS960; -G for GNU960). The example program above creates the following structure when using the big endian switch:

This is exactly the same as the previous big endian example of a non-Intel big endian processor. There is a difference in how the actual data lines are connected to the big endian Ethernet controller, however. the i960^® CA and CF processors keep the same byte ordering on the bus in both little and big endian modes. Byte 0 is data lines D7-D0, byte 3 is D31-D24; the Ethernet controller has byte 0 as D31-D24 and byte 3 as D7-D0. To connect the two together follow the convention shown in Table 18.

Table 18. Ethernet Controller to Intel Big Endian Data Line Connections

The reference to hdr -> enet_len accesses the correct 16-bit value in memory. If the program had a reference to dst_addr[5] it would access the DA6 byte in memory; a reference to src_addr[0] would access SA1. The program will access the correct data.

Unaligned Big Endian Accesses

The i960^® CA processor's D-step and the i960^® CF processor's C-step both support unaligned big endian data. Steppings previous to these did not support unaligned big endian. Aligned big endian data can be accessed using any load or store instructions without any restrictions; however, there is a restriction when using unaligned big endian accesses. Unaligned accesses larger than 32 bits are not supported.

There are very few instances where multi-word unaligned accesses can occur using the GNU960 or CTOOLS960 C compilers. The compilers generate aligned data unless a #pragma align or #pragma pack (available on the GNU960 compiler) are used to force different alignment of structure elements. It is possible that a pointer to a structure is unaligned. In the example C code in Figure 2, the compiler generates a quad-word load to pass (*hdr) to the address_check function. If hdr was not aligned this would cause problems if compiled with the Intel CTOOLS960 compiler. The CTOOLS960 compiler requires proper alignment when performing structure assignments and when performing structure parameter passing by value as in this example.

When using the GNU960 compiler, the programmer must use the #pragma align directive to ensure that the compiler recognizes the potential for unaligned structures. With the proper #pragma align directive, the compiler properly handles any alignment. Structure assignments and passing structures to functions were not allowed in the Kernighan & Ritchie version of C, but ANSI C does not have this restriction.

This is correct for CTOOLS960 Release 4.0; however, the next release is expected to limit unaligned big endian accesses to one word.

Big-Endian Programming Using i960(R) Processors

Big-Endian Programming Using i960® Processors

©INTEL CORPORATION, 1995

Big-Endian Programming Using i960^® Processors