HiPERiSM's Technical Reports

HiPERiSM - High Performance Algorism Consulting

What we found out when we tested products with applications - see the summary below, and a collection of downloadable PDF files.

Our benchmarks compare compilers on ia32 and ia64 architectures.



 
 

HiPERiSM Consulting issues ad-hoc technical reports on selected products and problems in multiprocessor computing.  These technical reports are available in electronic form at this site and are copyright by HiPERiSM Consulting, LLC. All trade names mentioned are the property of the owners and opinions expressed here are not necessarily shared by them. HiPERiSM Consulting, LLC, will not be liable for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of the products or source code discussed in these reports.

These reports will compare performance of compilers with workstations as targets. The comparison will usually be for the Microsoft Windows™ and Linux™ operating systems (with occasional proprietary vendor platforms). The aim is not to subscribe to a "winner-takes-all" approach but rather to learn how different compilers behave and what features they offer applications developers. Our focus is with a select group of benchmarks that we are familiar with on multiple platforms. An extensive performance analysis on ia32 and ia64 platforms for serial and parallel version of CMAQ has been completed (under contract to the U.S. EPA) and is also presented here. GPGPU benchmarks for CMAQ loop nests are also included.

Each compiler studied in these reports has been completely tested and debugged by the respective vendors, and each is considered to be an "industrial strength" product. The results of these reports do not in anyway imply defectiveness in the products discussed, on the contrary, each product has been chosen for study because it is generally acknowledged to have outstanding features or performance. Where technical support staff of the respective vendors have made suggestions, or otherwise responded to the results reported here, this is acknowledged where appropriate.

Report

Benchmark

Hardware

Compiler

HCTR-1999-1

Kallman1

Intel Pentium 2

Absoft 6.0, Digital VF 5.0, NAGWare FTN90 2.18

HCTR-2001-1

Kallman1

Intel Pentium 3

Absoft 6.2, Portland 3.1

HCTR-2001-2

SOM2 (OpenMP)

Intel Pentium 3

Absoft 6.2, Portland 3.1

HCTR-2001-3

SOM2 (1-D  MPI)

Sun Micro Systems E10000

Sun f77

HCTR-2001-4

SOM2 (2-D  MPI)

Sun Micro Systems E10000

Sun f77

HCTR-2001-5

SOM2 (1-D  MPI+OpenMP)

Sun Micro Systems E10000

Sun f77

HCTR-2001-6

SOM2 (2-D  MPI+OpenMP)

Sun Micro Systems E10000

Sun f77

HCTR-2004-1

Kallman1

Intel Pentium 3, Intel Pentium 4 Xeon

Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0

HCTR-2004-2

SOM2

Intel Pentium 3, Intel Pentium 4 Xeon

Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0

HCTR-2004-3

POM3

Intel Pentium 3, Intel Pentium 4 Xeon

Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0

HCTR-2004-4

STREAM4

Intel Pentium 3, Intel Pentium 4 Xeon

Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0

HCTR-2004-5

STREAM4 (OpenMP)

Intel Pentium 3, Intel Pentium 4 Xeon

Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0

HCTR-2004-6

MM55

Intel Pentium 3, Intel Pentium 4 Xeon

Intel 8.0, Portland 4.0 & 5.1

HCTR-2005-1

AERMOD6

Intel Pentium 4 Xeon, Intel Pentium 4 Xeon 64EMT

Absoft 9.0, Intel 9.0, Portland 6.0

HCTR-2006-1

AERMOD6

Intel Pentium 4 Xeon, Intel Pentium 4 Xeon 64EMT

Absoft 10.0, Intel 9.1, Portland 6.1

HCTR-2006-2

CAMx7

Intel Pentium 4 Xeon, Intel Pentium 4 Xeon 64EMT

Absoft 10.0, Intel 9.1, Portland 6.1

HCTR-2009-1

CMAQ8

Intel IA64, Pentium 4 Xeon 64EMT Intel X5450

Intel 11.x

HCTR-2010-1, HCTR-2010-2, HCTR-2010-3, HCTR-2010-4

CMAQ8

Intel IA64, Intel X5450, and W5590

Intel 11.x, Portland 10.x

HCTR-2010-4, HCTR-2010-5

CMAQ8

Intel X5450 and Nvidia C1060 GPGPU device

Portland 10.x, Accelerator Fortran

HCTR-2011-1

Bandwidth9

Intel Itanium2, X5450, W5590, AMD 6176SE

pgcc and gcc for b_eff.c

HCTR-2011-2

SOM2

AMD 6176SE

Absoft 11.0, Intel 11.0, Portland 10.6

HCTR-2011-3

SOM2

Intel W5590

Absoft 11.0, Intel 11.0, Portland 10.6

HCTR-2011-4

CMAQ8

Intel W5590, AMD 6176SE

Intel 11.0

HCTR-2011-5

SOM2

AMD 6176SE

Absoft 11.1, Intel 12.0, Portland 11.1

HCTR-2011-6

SOM2

Intel W5590

Absoft 11.1, Intel 12.0, Portland 11.1

1) Kallman is an integer logical algorithm with a small instruction and data set that resides entirely in cache and produces negligible memory traffic and is suitable to test the limits of CPU speed.

2) SOM is the Stommel Ocean Model where the compute Kernel is a Jacobi iteration that sweeps over a two-dimensional grid and the loop structure is excellent for testing compiler optimizations and problem size scalability.

3) POM is the Princeton Ocean Model. This is an example of a "real world" model that has over five hundred vectorizable loops. This version of POM was developed to produce good scalability for vector register architectures and is suitable to test how well compilers can optimize for cache based architectures.

4) STREAM is the benchmark for Sustainable Memory Bandwidth in High Performance Computers (http://www.cs.virginia.edu/stream) and is used here to test memory bandwidth differences between compilers on commodity hardware with dual processor platforms. The OpenMP version is used to measure memory bandwdith loss as the threadcount is increased.

5) MM5 is the PSU/NCAR Mesoscale Modeling System (also known as MM5 Modeling System Version 3). This is an example of a "real world" model that has vectorizable loops and was developed to produce good scalability for vector register architectures and is suitable to test how well compilers can optimize for cache based architectures.

6) AERMOD is an Air Quality Model (AQM) in current use and describes pollutant dispersion and deposition. The source is characterized by negligible vector code, voluminous memory traffic with large rates of control transfer instructions such as branching logic, high procedure calling overhead, and I/O.

7) CAMx is an Air Quality Model (AQM) in current use and describes atmospheric chemistry. The source is characterized by negligible vector code, voluminous memory traffic with large rates of control transfer instructions such as branching logic, high procedure calling overhead, and voluminous I/O.

8) CMAQ Community Multiscale Air Quality model is an Air Quality Model (AQM) in current use and describes atmospheric chemistry (http://www.cmaq-model.org/). The source is characterized by some vector code (depending on the solver used), heavy memory traffic with, high procedure calling overhead, and voluminous I/O. A hybrid parallel version has been developed by HiPERiSM Consulting

9) Bandwidth is measured with the b_eff.c package ( https://fs.hlrs.de/projects/par/mpi/b_eff/)

Web sites that offer down-loadable files of benchmark suites are listed in the following table.

Source Web site
NAS Parallel Benchmarks https://www.nas.nasa.gov/software/npb.html
Polyhedron Software, Ltd. http://www.polyhedron.co.uk

 

backnext page

HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)

(919) 806-2813 (Facsimile)