Intel Compilers White Paper (C/C++ Site)

Features/Benefits

White Paper

What's New -- Revision History

System Requirements

Licensing

Apply for Beta Program

Technical Support

Intel C/C++ and FORTRAN Compilers

White Paper
The Intel C/C++ Compiler plug-in and FORTRAN compiler allow software developers to gain superior performance on Intel Architecture processors. The objective of these compilers is to allow software to take advantage of the full potential of Intel Architecture processors. The Intel C/C++ Compiler plug-in and FORTRAN compiler are the first compilers to offer processor-specific optimizations with the introduction of each new Intel processor generation. This allows developers to take immediate advantage of each new processor's improvements. Support of MMX™ technology is a vivid example of the Intel compilers’ evolution in parallel with the latest line of Intel Architecture processors.
Feature Summary
Intel compilers have been developed to be compatible with Microsoft* Visual C++*. To ensure ease of use, the Intel C/C++ Compiler plug-in is usable as a first generation plug-in to the Microsoft Developer Studio*. This capability combines the power of Intel compilers with the features of Microsoft's Integrated Development Environment (IDE).
The following summarizes the features for each of the compilers:
C/C++ plug-in:

Provides MMX™ technology support through the use of C "intrinsics." This allows C developers to take advantage of MMX Technology using C function call syntax rather than having to manually code assembly language statements.
Supports in-line assembly language insertions for C developers who have particular performance demands.
Offers profile-guided optimizations, which allows the compiler to adjust the flow of the program to achieve optimum performance based on previous executions with the same data set.
Provides a "blended" code optimization switch that allows you to generate code with optimal performance for any Intel Architecture processor. Also, for developers with more specific targets, the compilers provide processor-specific optimizations that maximize performance for a specific Intel processor.
Provides maximum floating-point instruction throughput by using the full power of the floating-point stack.

FORTRAN:

Supports dynamic COMMON
Supports thread-safe code generation for multi-threaded applications

System Requirements
The Intel C/C++ and FORTRAN compilers run under the Microsoft Windows* 95 or Windows NT* operating systems.
Language Support
The Intel C/C++ compiler is a plug-in to Microsoft Visual C++* version 4.x, which provides the development and run-time environments plus the MFC* libraries.
Note: There are known compatibility problems with MSVC++ 5.0 (vs. MSVC++ 4.2) which some users may experience. For specifics, please consult the release notes for this release. Intel expects to have all MSVC++ 5.0 issues resolved in a mid-April '97 Beta update.
The following FORTRAN languages and extensions are supported:

Full support for ANSI FORTRAN 77 (X3.9-1978) and ISO 1539:1980
Many extensions popularized by DEC* (VMS*)

Microsoft Visual C/C++ 4.x Compatibility
The Intel C/C++ compiler is compatible with Microsoft Visual C++ 4.x in the following areas:

Compilation switches
Makefile support
In-line assembly language syntax
Object module, library, and DLL formats
Debug and C++ symbol "ic" formats
If you have MSVC++ 4.x on your system when you install the Intel C/C++ compiler, the installation procedure automatically integrates the Intel C/C++ compiler within the tools menu of the Visual C++ IDE. This gives you the choice of using the Intel C/C++ compiler to compile the projects that you create in Visual C/C++. Just click on "Tools," then click on "Select Compiler" and the selection window provided by Intel appears.
Application Support
The Intel C/C++ Compiler plug-in is particularly efficient in support of the applications described in the sections that follow.
Graphics / Multimedia Applications

MMX™ technology adds 57 powerful new assembly instructions to the Intel Architecture instruction set which are designed to efficiently manipulate and process video, audio, and graphical data. The Pentium® and Pentium Pro processors with MMX™ technology include these new instructions to enhance performance of multimedia applications. The Intel C/C++ Compiler plug-in supports these new MMX instructions in C/C++ programs by using special compiler intrinsics that are coded using C function call syntax.
The compiler allows you to use C language variables in place of hardware registers, which frees you from managing these registers. The compiler generates the corresponding MMX instructions and reorders them to maximize performance through the Pentium processor’s dual instruction pipeline. In addition, the compiler also handles the loads and stores of the C variables to and from memory. Here is an example of an Intel C Compiler intrinsic and a description of its function:
_m64 _m_pmaddwd (__m64 m1, __m64 m2)
where
(__m64 m1, __m64 m2) means to multiply four 16-bit values in m1 by four 16-bit values in m2 to produce four 32-bit intermediate results, which are then summed by pairs to produce two 32-bit results.
The Intel C/C++ Compiler plug-in also provides a rounding control option, which optimizes floating-point-to-integer conversions. The system default rounding mode is round-to-nearest. Because the C language requires that floating-point-to-integer conversions be truncated, the compiler must generate additional instructions to change the rounding mode to truncation before each floating-point instruction and then change it back afterwards. With the -Qrcd switch you can optimize your code by eliminating the additional overhead of instructions required to change the rounding mode back and forth. This option has no effect on floating-point calculations, but conversions to integer will not conform to C semantics. Graphics applications that use floating-point data as input into their rendering operations can benefit from this type of optimization.
Consider the following example:
int a;
float f;

void func()
{
a = f;
}

The following is the standard code generation that would take place:

fld DWORD PTR _f[0+eax*4]

fnstcw [esp+24]

mov DWORD PTR [esp+20], eax

mov eax, DWORD PTR [esp+24]

or eax, 3072

mov DWORD PTR [esp+16], eax

mov eax, DWORD PTR [esp+20]

fldcw [esp+16]

fistp DWORD PTR _a[0+eax*4]

fldcw [esp+24]

Notice that it takes ten instructions to complete this function. Using the rounding control option -Qrcd, the optimized code looks like this:

fld DWORD PTR _f[0+eax*4]

fistp DWORD PTR _a[0+eax*4]

You can see that it takes only two instructions to complete the function. This has reduced the total number of instructions by 80%.
Scientific / Engineering Applications

The Intel C/C++ Compiler plug-in provides analysis for interprocedural optimizations to assist you with programs that contain many small or medium-sized frequently used functions, especially for programs that contain calls within loops. Potential optimizations around calling points are normally inhibited due to a lack of information about what happens in the called procedure. Interprocedural analysis examines the relationship between calling and called procedures and enables the following optimizations:

Function inlining
Passing arguments in registers
Interprocedural constant propagation

In addition, the Intel C/C++ Compiler plug-in exploits the use of the floating-point (FP) stack by implementing code generation optimizations that allow FP instructions to execute more efficiently. Most floating-point operations require that one operand and the result use the top of stack. This makes each FP instruction dependent on the previous one and inhibits overlapping the instructions. The compiler breaks this dependency by allowing a program to arrange for one of the inputs for the next operation to always be at the top of stack. It provides this capability by effective use of the FXCH instruction, which comes at almost no additional cost on the Pentium^® Pro processor.
Consider the following expression:
a = ((b + c) * b) + ((d + e) * d);.
This expression can be presented graphically as follows:

Serial Instruction Sequence Parallel Instruction Sequence

The serial instruction sequence depicts instructions executed one at a time with no overlapping because of the top-of-stack dependency. The parallel instruction sequence uses the FXCH instruction that provides the following gains:

Overlapping instructions that can put their calculation results in any stack register, not necessarily to the top of the stack, but different stack registers
More parallelism achieved

Database Server Applications

The Intel C/C++ Compiler plug-in has been proven to assist large database applications through combination of interprocedural analysis and profile-guided optimization. Profile-guided optimization provides detailed information on program execution. Therefore, you can optimize performance-critical areas of large applications where the execution time is mostly spent. Profile-guided optimizations can help eliminate instruction cache thrashing by reorganizing code layout, shrinking code size, and reducing branch mispredictions.
Information collected during program execution can be fed back into the compiler to allow a higher degree of optimization. For example, profile-guided optimization might find that a particular section of code is rarely executed. This code would then be moved to the end of the module resulting in the processor fetching instructions more efficiently. The following are the three phases of profile-guided optimization that, when completed, provide the data that can significantly improve the performance of large applications:

Phase 1: Instrumentation Compilation The compiler inserts code into your program to produce profile information. The resulting code is said to be instrumented by the compiler.

Phase 2: Instrumented Execution When you execute the instrumented program, it creates a dynamic information file. This file contains data that represents the actual behavior of the program during execution.

Phase 3: Feedback Compilation When you compile your program a second time, the compiler uses the data in the dynamic information file to help optimize your program. This data helps the compiler determine the most heavily traveled paths through the program and optimizes along these paths. You can use additional optimization switches during this phase so that other compilation optimization routines can also benefit from the dynamic information.

Application Optimizations Summary

The following table summarizes the optimizations that the compiler applies to your program for each optimization switch. The entry "any" in the Option column means that the compiler automatically performs this optimization, even when optimizations are disabled.

Optimization Affected Aspect of the Program Option
optimized code selection instruction selection / addressing modes any

global register allocation register use -O1 / -O2

instruction scheduling instruction reordering -O1 / -O2

register variable detection register use -O1 / -O2

common subexpression elimination constants and expression evaluation -O1 / -O2

dead-code elimination instruction sequencing -O1 / -O2

variable renaming register use -O1 / -O2

loop-invariant code movement instruction sequencing -O1 / -O2

copy propagation constants and expression evaluation -O1 / -O2

constant propagation constants and expression evaluation -O1 / -O2

strength reduction/induction variable simplification instruction selection/sequencing constants and expression evaluation -O1 / -O2

tail recursion elimination calls, further optimization -O1 / -O2

in-line function expansion calls, jumps, branches, and loops -Qip / -Qipo

interprocedural constant propagation arguments, global variables, and return values -Qip / -Qipo

passing arguments in registers calls, register usage -Qip / -Qipo

monitoring module-level static variables further optimizations, loop invariant code -Qip / -Qipo

multifile optimization affects the same aspects as -Qip, but across multiple files -Qipo

Future Enhancements
The following list summarizes the enhancements expected to be added to forthcoming releases of the Intel compiler products:

Support for FORTRAN 90 / FORTRAN 95 / MIL-STD 1793
Many extensions popularized by Cray*, IBM*, Sun*, and Microsoft*
Multi-threading support (including SGI*-compatible SMP directives)
Improved optimizer that requires less memory and runs faster
Automatic MMX ™ technology code generation for vector operations
Global pointer tracking for improved alias detection
Improved dependence analysis for threading and loop transformations
Optimizations in presence of exception handling
Enhanced code and data layout optimizations to improve cache efficiency
Code coverage tool with a Graphical User Interface (GUI)
Interprocedural pointer analysis that includes knowledge of library functions

Conclusion
Intel is dedicated to providing a suite of software performance products to assist developers with creating the most powerful applications that run on Intel Architecture processors. The Intel C/C++ and FORTRAN compilers make up a part of this suite, and as Intel’s microprocessor technology evolves, our advanced compiler technology will be right there alongside our newest high-performance processors to let you benefit from every performance gain.

Legal Stuff © 1997 Intel Corporation