Future Technologies Group
Berkeley Lab Computing Sciences

Group Members

Current Projects

Past Projects

Related Sites

    

Currently, we are collecting results for all kinds of high-performance platforms.

If you like to help us for this effort, you can download the code and send us your results. We plan to publish all the results. Following are some sample results we have collected.

Sequential Results

The sequential results are obatained by using the following command:

./Apex-Map-Seq -n1000 -m67108864 -i1024

It reports the performance changes when temporal locality (a) changes from 0.001 to 1, spatial locality changes from 1 to 65536 in cycles/data-access when memory size (M) is 512MB.

Following images show the results on a superscalar platform and on a vector platform. We can clearly find that the performance of the vecto platform is strongly depedent on the spatial locality (L). The higher the spatial locality, the better the performance. There is almost a perfect linear relationship between the length of L and the performance when L <= 256. However, the temporal locality almost has no effect.

On the contrary, both temporal locality and spatial locality affect the performance significantly.

superscalar.seq.gif

vector.seq.gif

Parallel Results

Following two pictures shows the parallel results on 256 MPI processes obtained on the same two platforms as the sequential case by using following command:

mpirun -np 256 ./Apex-Map-Par -n1000 -m67108864 -i1024 -t

Both of them show the performance changes when temporal locality (a) changes from 0.001 to 1, spatial locality changes from 1 to 65536 words in aggregate bandwidth (MB/s) achieved when local memory size is 512MB. Compared with the sequential case, the shapes of the following two figures are much more close. For higher spatial locality, the vector platform delivers over 40 times higher aggregate bandwidth.

superscalar.par.gif

vector.par.gif

In particular, we are also interested in the performance comparison on all possible number of processes for random access when L=1 and 4096. The command is :

mpirun -np P ./Apex-Map-Par -n1000 -m67108864 -i1024 -a1.0 -l1 -t

mpirun -np P ./Apex-Map-Par -n1000 -m67108864 -i1024 -a1.0 -l4096 -t

First, we notice the SMP effect (16 cpus per node) on the superscalar platform. Secondly, the vector platform scale much better than the superscalar platform. The aggregate bandwidth using 4096 processors on the superscalar platform is only slightly over the aggregate bandwdith using 16 processors on the vector platform.

aggregate.gif

Go Back to Apex