 |
|
| |
|
Application Performance Charaterization - Memory Access Probe (APEX-Map)
Our current focus is on the performance aspects of memory access, since memory performance has increasingly become the dominant performance factor for many applications. It is called application performance charaterization – memory access probe (Apex-MAP). There are two challenging steps to develop it. The first step is to identify the parameters to characterize the applications, such a characterization is also an efficient and accurate description of computational requirements to manufacturers. The second step is to generate a memory access stream based on these parameters in Apex-MAP to simulate the corresponding application’s memory access pattern. In our approach, we characterize memory access streams by their regularity, amount of memory accessed (M), the vector length of data access (L), the temporal reuse of data (k), and the stride used in memory access (S).
We first characterize memory access with respect to its regularity. We distinguish only two extreme categories: random access and regular access. To implement these two cases we use non-uniform random access and regular loop-based access streams. To characterize the temporal reuse of data by an application we approximate the accumulated temporal distribution of the memory access stream by a power distribution function. The shape parameter (α) of the power distribution function is used to characterize data reuse. This parameter has a simple relationship to an appropriately define reuse number (k). M is the total memory size accessed by the application. L is the number of data accessed contiguously after a random address has been selected. It represents the spatial locality. With these three input parameters, Apex-MAP generates a non-uniform data access stream using the power distribution function to simulate the application’s memory access behavior.
For the regular case, we currently mainly focus on stride access. This need to define another parameter, the stride S of memory access. Generating the regular access by Apex-MAP is relatively straightforward.
Here is the apex-map code, including both sequential and parallel implementations.
We have charaterized several important kernel applications using one or two access streams for sequential case.
The performance of Apex-Map corelates very well with the corresponding kernel applications. Following
figures shows the performance ratio between Apex-Map and the corresponding kernel applications,
Nbody and Radix, on six different platforms, including IBM Power3 (375MHz and 200MHz),
IBM Power4, Intel Xeon, Cray X1, and AMD Opteron.
More results are discuued in paper MASCOTS'04.
Currently we are collecting performance data for different high-performance computing platforms.
It is extremely welcome if you test the code and send your results to us.
|