The significance of cache-conscious algorithms has gradually become familiar to computer scientists. However, cache-conscious algorithms are not as widely studied as they ought to be, given that the processor-memory performance gap is steadily growing. The primary roadblock is difficulty in cache profiling. Various cache profiling methods and tools have been developed, but there has been no simple and portable solution to the study of basic cache performance. Many tools require a large amount of work to get started with them, and most are bound to a specific hardware architecture and operating system. This paper describes SMAT (STL Memory Access Tracer), a portable and easy-to-use cache profiling tool in the form of a trace-driven cache simulator, with traces collected by the source-level instrumentation. The trace-generating function is implemented as an iterator adaptor compatible with the C++ Standard Template Library (STL), so it requires little or no modification of the algorithm to be profiled. The difficulty in source-based trace generation is that it traces all objects, while object-based instrumentation, which is widely used, traces only objects which are not assigned to registers. In addition, source-level instrumentation potentially disables several important compile-time optimizations, such as copy propagation. How SMAT overcomes these complications and provides convincing cache profiling is a principal focus of this paper. Also detailed are applications of SMAT to profile STL sorting algorithms, including new cache-conscious versions of introsort and mergesort. |