Because of the increasing gap between processor speeds and Dynamic Random Access Memory (DRAM) speeds, the performance of the memory subsystem will most likely govern the performance of the system as a whole. This especially is true for modern, as well as future, symmetric multiprocessor (SMP) systems, which can consist of hundreds of processors and thousands of gigabytes of memory. Consequently, the main focus of our research is the accurate and efficient study of the memory subsystem for large SMP systems.
Our approach employs the use of sampled performance monitor event traces, which are captured in real-time with no perturbation of the application execution. Upon the periodic occurrence of an event of interest, an event record that contains information associated with the recognized event is stored in a trace.
In order to analyze the sampled event traces, we created a performance evaluation framework. The framework consists of a database management system that stores the event traces, as well as Java tools that allow the user to both load the event traces into the database and query the database to generate reports of interest.
Using this methodology, we have successfully conducted memory performance studies of eight- and 32-processor Power-based, shared-memory multiprocessor systems. The following is an overview of the results we have obtained thus far, as well as a pointer to the corresponding publication.
• Identification of "hot" levels of the memory hierarchy, address regions,segments, pages, and cache lines.
• Analysis of thread migration, compulsory misses, and false sharing.
• Characterization of the differences between private and shared data loads.
Current work includes process/thread characterization, as well as the application of our performance evaluation framework in studying the affects of Processor Virtualization, available on Power5-based SMP systems, on memory subsystem performance.
This work is being conducted in collaboration with the AIX Performance Group, IBM Corporation in Austin, TX.