Performance Comparisons

Performance Comparison of Sequential and Parallel Fortran 77, Fortran 90 and C++ Programs

A small subset of programs written in Fortran 77, Fortran 90 and C++ are compared based on their run-time performance. The Fortran 90 and C++ programs are object-oriented, derived from the original Fortran 77 programs. A variety of simulations have been developed in one, two and three dimensions. For illustrative purposes we have selected some test cases from a two-stream instability experiment.

The benchmark code used is a plasma particle-in-cell code based on the General Concurrent PIC algorithm [2]. The Fortran 77 codes have been well-benchmarked [1]. The Fortran 90 and C++ [3,4,5] versions were designed from the original Fortran 77 codes.

IBM RS/6000 (AIX 4.1) Sequential Performance Comparison

Machine	Language	Compiler	Particles	Time (sec)
One-Dimensional Program
RS/6000	Fortran 77	IBM xlf	450,000	245.49
RS/6000	Fortran 90	IBM xlf90	450,000	364.25
RS/6000	C++	IBM xlC	450,000	508.00
Two-Dimensional Program
RS/6000	Fortran 90	IBM xlf90	327,680	526.71
RS/6000	Fortran 77	IBM xlf	327,680	549.23
RS/6000	C++	IBM xlC	327,680	667.00

Functions calling private data without in-lining contributed to the Fortran 90 program overhead in the one-dimensional program. A different object model, which included better abstractions, allows the Fortran 90 program to perform better than the Fortran 77 and C++ versions in the two-dimensional case as seen in the graph below.

IBM SP2 Parallel Performance Comparison

The table below shows performance comparisons for a two-dimensional parallel Fortran 90 program using the MPI message passing library.

Machine	PEs	Language	Compiler	Particles	Time (sec)
Two-Dimensional Program
SP2	32	Fortran 77	IBM xlf	3,571,712	159.08
SP2	32	Fortran 90	IBM xlf90	3,571,712	202.88
SP2	32	C++	IBM xlC	3,571,712	359.00
Two-Dimensional Program
SP2	4	Fortran 77	IBM xlf	327,680	114.31
SP2	4	Fortran 90	IBM xlf90	327,680	117.49
SP2	4	C++	IBM xlC	327,680	249.00

Much more extensive performance comparisons are available in the publications, including comparisons among various machines and compilers from additional vendors. A plot of the 32 processor experiment is shown below.

Performance of a three-dimensional parallel Fortran 90 program, using MPI, is also available. Details of this work can be found in the following paper [5].

Machine	PEs	Language	Compiler	Particles	Time (sec)
Three-Dimensional Program
SP2	32	Fortran 77	IBM xlf90	7,962,624	1548.71
SP2	32	Fortran 77	IBM xlf	7,962,624	1550.14
SP2	32	Fortran 90	IBM xlf90	7,962,624	1339.91
SP2	32	C++	IBM xlC	7,962,624	2797.00

The Fortran 90 version outperformed the Fortran 77 versions due to improved cache-utilization of field components. The Fortran 90 (and C++) version encapsulates components into a single derived type, but the Fortran 77 version stores field elements in separate arrays.

Comparison against the KAI Optimizing C++ Compiler

The chart below shows results for a 3D code on the Cornell SP, recently upgraded with the P2SC Chips. The C++ code used the KAI C++ compiler.

The most aggressive optimizations produced the fastest timings; these are represented in the table. The KAI C++ compiler with K3 -O3 --abstract_pointer spent OVER 2 HOURS in the compilation process. The IBM F90 compiler with -O3 -qlanglvl=90std -qstrict -qalias=noaryovrlp used 5 MINUTES for compilation. (The KAI compiler generated faster executables than the IBM xlC C++ compiler.)

Times in yellow use the -qarch=pwr2 -qtune=pwr2 hardware optimization switches.

3D Parallel Plasma PIC Experiments - CPU Times for Various Compilers
(KAI C++, IBM F90, and IBM F77 with IBM MPI)

References

Skeleton PIC Codes for Parallel Computers
V. K. Decyk
Computer Physics Communications, 87(1&2):87-94, May II, 1995.
A General Concurrent Algorithm for Plasma Particle-in-Cell Simulation Codes
P. C. Liewer and V. K. Decyk
J. of Computational Physics, 85:302-322, 1989.
On Parallel Object Oriented Programming in Fortran 90
C. D. Norton, V. K. Decyk, and B. K. Szymanski
ACM SIGAPP Applied Computing Review, 4(1):27-31, Spring 1996.
Object Oriented Parallel Computation for Plasma Simulation
C. Norton, B. Szymanski and V. Decyk
Communications of the ACM, 38(10):88-100, Oct. 1995.
High Performance Object-Oriented Programming in Fortran 90
C. D. Norton, V. K. Decyk, and B. K. Szymanski
To appear in Proc. Eighth SIAM Conference on Parallel Processsing for Scientific Computing, March 14-17, 1997.