Previous: Examples, Up: General


2.10 Example ACML programs demonstrating performance

The /examples/performance subdirectory of the top ACML installation directory (for example, possible default locations are /opt/acml3.6.1/gnu64/examples/performance under Linux, or c:\Program Files\AMD\acml3.6.1\gnu32\examples\performance under windows) contains several timing programs designed to show the performance of ACML when running on your machine. Again, a GNUmakefile may be used to build and run them.

Depending on where your copy of the ACML is installed, and which compiler and flags you wish to use, it may be necessary to modify some variables in the GNUmakefile before using it.

The 32- and 64-bit Windows versions of ACML assume that you have the Cygwin UNIX-like tools installed, and can use the make command that comes with them to build the examples.

In addition, the GNUmakefile uses the gnuplot plotting program to display graphs of the timing results. If you do not have gnuplot installed, the timing programs will still run and show their results, but you will see no graph plots. Under linux, gnuplot may come with your linux distribution, but you may need to explicitly ask for it to be installed. Note that version 4.0 or later of gnuplot is required.

The gnuplot program is also available for Windows machines. See http://www.gnuplot.info for more information.

If you are on an SMP (multiprocessor) machine and have installed an OpenMP version of the ACML, then in the examples/performance directory a command such as

      % make OMP_NUM_THREADS=5

will run the timing programs on P processors, where P = 1, 2, 4, 5; i.e., P equals an integer power of 2 and also equals OMP_NUM_THREADS if this value is not a power of 2. The results for a particular routine are concatenated into one file. gnuplot then shows on one graph for each routine the results of varying the number of processors for that routine.

Setting OMP_NUM_THREADS in this way is not useful if you are not on an SMP machine or are not using an OpenMP version of ACML. Neither is it useful to set OMP_NUM_THREADS to a value higher than the number of processors (or processor cores) on your machine. A way to find the number of processors (or cores) under linux is to examine the special file /proc/cpuinfo which has an entry for every core.

Not all routines in ACML are SMP parallelized, so in this context the OMP_NUM_THREADS setting only applies to those examples, including time_cfft2d.f, time_dgemm.f and time_dgetrf.f, which are for parallelized routines. The other timing programs run on one thread regardless of the setting of OMP_NUM_THREADS.

In all cases, timing graphs can be viewed without regenerating timing results by typing the command

      % make plots

Note that all results generated by timing programs will vary depending on the load on your machine at run time.