| Phase |
Step |
Action |
| Baseline |
1 |
Select performance metrics from
the VTune performance analyzer |
| 2 |
Select suitable test code/data
and run base-line case |
| 3 |
Record
baseline metric values of all counters in the VTune performance
analyzer |
| Serial |
4 |
Generate a routine calling tree |
| 5 |
Profile and rank the routines by
decreasing CPU time usage |
| 6 |
In the top ranking routine
analyze the loop structure |
| 7 |
Optimize the top ranking routine
using code modifications or compiler options. |
| 8 |
Repeat steps 2-3 for the modified
test code and compare to the base-line case |
| 9 |
Repeat 4-8 (for each new top
routine) |
| Parallel |
10 |
Present serial optimized test
code to the vendor auto-parallel preprocessor |
| 11 |
Study output source and replace
vendor directives with OpenMP directives (modify as needed) |
| 12 |
Repeat steps 2-3 for parallel
test code and compare to base-line case |
| 13 |
If parallel code produces
incorrect numerical results go to step 15 |
| 14 |
Optimize the OpenMP parallel code
using the Intel Thread Checker™ |
| 15 |
Validate the parallel code using
the Intel Thread Checker™ |
| 16 |
Repeat steps 2-3 for modified
test code and compare to base-line case |