Hiperism Consulting, LLC: Technical Reports

HiPERiSM's Technical Reports

HiPERiSM - High Performance Algorism Consulting

What we found out when we tested products with applications - see the summary below, and a collection of downloadable PDF files.

Our benchmarks compare compilers on ia32 and ia64 architectures.

HiPERiSM Consulting issues ad-hoc technical reports on selected products and problems in multiprocessor computing. These technical reports are available in electronic form at this site and are copyright by HiPERiSM Consulting, LLC. All trade names mentioned are the property of the owners and opinions expressed here are not necessarily shared by them. HiPERiSM Consulting, LLC, will not be liable for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of the products or source code discussed in these reports.

These reports will compare performance of compilers with workstations as targets. The comparison will usually be for the Microsoft Windows™ and Linux™ operating systems (with occasional proprietary vendor platforms). The aim is not to subscribe to a "winner-takes-all" approach but rather to learn how different compilers behave and what features they offer applications developers. Our focus is with a select group of benchmarks that we are familiar with on multiple platforms. An extensive performance analysis on ia32 and ia64 platforms for serial and parallel version of CMAQ has been completed (under contract to the U.S. EPA) and is also presented here. GPGPU benchmarks for CMAQ loop nests are also included.

Each compiler studied in these reports has been completely tested and debugged by the respective vendors, and each is considered to be an "industrial strength" product. The results of these reports do not in anyway imply defectiveness in the products discussed, on the contrary, each product has been chosen for study because it is generally acknowledged to have outstanding features or performance. Where technical support staff of the respective vendors have made suggestions, or otherwise responded to the results reported here, this is acknowledged where appropriate.

Report	Benchmark	Hardware	Compiler
HCTR-1999-1	Kallman¹	Intel Pentium 2	Absoft 6.0, Digital VF 5.0, NAGWare FTN90 2.18
HCTR-2001-1	Kallman¹	Intel Pentium 3	Absoft 6.2, Portland 3.1
HCTR-2001-2	SOM² (OpenMP)	Intel Pentium 3	Absoft 6.2, Portland 3.1
HCTR-2001-3	SOM² (1-D MPI)	Sun Micro Systems E10000	Sun f77
HCTR-2001-4	SOM² (2-D MPI)	Sun Micro Systems E10000	Sun f77
HCTR-2001-5	SOM² (1-D MPI+OpenMP)	Sun Micro Systems E10000	Sun f77
HCTR-2001-6	SOM² (2-D MPI+OpenMP)	Sun Micro Systems E10000	Sun f77
HCTR-2004-1	Kallman¹	Intel Pentium 3, Intel Pentium 4 Xeon	Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0
HCTR-2004-2	SOM²	Intel Pentium 3, Intel Pentium 4 Xeon	Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0
HCTR-2004-3	POM³	Intel Pentium 3, Intel Pentium 4 Xeon	Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0
HCTR-2004-4	STREAM⁴	Intel Pentium 3, Intel Pentium 4 Xeon	Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0
HCTR-2004-5	STREAM⁴(OpenMP)	Intel Pentium 3, Intel Pentium 4 Xeon	Absoft 8.0, Intel 7.1 & 8.0, Lahey 5.6 & 6.2, Portland 4.0
HCTR-2004-6	MM5⁵	Intel Pentium 3, Intel Pentium 4 Xeon	Intel 8.0, Portland 4.0 & 5.1
HCTR-2005-1	AERMOD⁶	Intel Pentium 4 Xeon, Intel Pentium 4 Xeon 64EMT	Absoft 9.0, Intel 9.0, Portland 6.0
HCTR-2006-1	AERMOD⁶	Intel Pentium 4 Xeon, Intel Pentium 4 Xeon 64EMT	Absoft 10.0, Intel 9.1, Portland 6.1
HCTR-2006-2	CAMx⁷	Intel Pentium 4 Xeon, Intel Pentium 4 Xeon 64EMT	Absoft 10.0, Intel 9.1, Portland 6.1
HCTR-2009-1	CMAQ⁸	Intel IA64, Pentium 4 Xeon 64EMT Intel X5450	Intel 11.x
HCTR-2010-1, HCTR-2010-2, HCTR-2010-3, HCTR-2010-4	CMAQ⁸	Intel IA64, Intel X5450, and W5590	Intel 11.x, Portland 10.x
HCTR-2010-4, HCTR-2010-5	CMAQ⁸	Intel X5450 and Nvidia C1060 GPGPU device	Portland 10.x, Accelerator Fortran
HCTR-2011-1	Bandwidth⁹	Intel Itanium2, X5450, W5590, AMD 6176SE	pgcc and gcc for b_eff.c
HCTR-2011-2	SOM²	AMD 6176SE	Absoft 11.0, Intel 11.0, Portland 10.6
HCTR-2011-3	SOM²	Intel W5590	Absoft 11.0, Intel 11.0, Portland 10.6
HCTR-2011-4	CMAQ⁸	Intel W5590, AMD 6176SE	Intel 11.0
HCTR-2011-5	SOM²	AMD 6176SE	Absoft 11.1, Intel 12.0, Portland 11.1
HCTR-2011-6	SOM²	Intel W5590	Absoft 11.1, Intel 12.0, Portland 11.1

1) Kallman is an integer logical algorithm with a small instruction and data set that resides entirely in cache and produces negligible memory traffic and is suitable to test the limits of CPU speed.

2) SOM is the Stommel Ocean Model where the compute Kernel is a Jacobi iteration that sweeps over a two-dimensional grid and the loop structure is excellent for testing compiler optimizations and problem size scalability.

3) POM is the Princeton Ocean Model. This is an example of a "real world" model that has over five hundred vectorizable loops. This version of POM was developed to produce good scalability for vector register architectures and is suitable to test how well compilers can optimize for cache based architectures.

4) STREAM is the benchmark for Sustainable Memory Bandwidth in High Performance Computers (http://www.cs.virginia.edu/stream) and is used here to test memory bandwidth differences between compilers on commodity hardware with dual processor platforms. The OpenMP version is used to measure memory bandwdith loss as the threadcount is increased.

5) MM5 is the PSU/NCAR Mesoscale Modeling System (also known as MM5 Modeling System Version 3). This is an example of a "real world" model that has vectorizable loops and was developed to produce good scalability for vector register architectures and is suitable to test how well compilers can optimize for cache based architectures.

6) AERMOD is an Air Quality Model (AQM) in current use and describes pollutant dispersion and deposition. The source is characterized by negligible vector code, voluminous memory traffic with large rates of control transfer instructions such as branching logic, high procedure calling overhead, and I/O.

7) CAMx is an Air Quality Model (AQM) in current use and describes atmospheric chemistry. The source is characterized by negligible vector code, voluminous memory traffic with large rates of control transfer instructions such as branching logic, high procedure calling overhead, and voluminous I/O.

8) CMAQ Community Multiscale Air Quality model is an Air Quality Model (AQM) in current use and describes atmospheric chemistry (http://www.cmaq-model.org/). The source is characterized by some vector code (depending on the solver used), heavy memory traffic with, high procedure calling overhead, and voluminous I/O. A hybrid parallel version has been developed by HiPERiSM Consulting

9) Bandwidth is measured with the b_eff.c package ( https://fs.hlrs.de/projects/par/mpi/b_eff/)

Web sites that offer down-loadable files of benchmark suites are listed in the following table.

Source	Web site
NAS Parallel Benchmarks	https://www.nas.nasa.gov/software/npb.html
Polyhedron Software, Ltd.	http://www.polyhedron.co.uk

HiPERiSM Consulting, LLC, (919) 484-9803 (Voice)

(919) 806-2813 (Facsimile)