CoEx: Multi-Grained Level Application Profiler
The increase in the complexity of embedded systems has been a constant for the past 30 years. We have advanced a long way since the first 8-bit microprocessors, experiencing a "blur" in the frontier between the boundaries of general purpose computing and embedded systems. Today we have a myriad of such systems in mobile consumer devices that can execute virtually any application. This demands not only an increase in the device overall performance, but also imposes severe constraints on the power/energy consumption on the device. On top of this the design lifetime of a product shrunk, with a new version of a given product being released practically every year.
Application Specific Instruction Set Processors (ASIPs) have been proven repeatedly to be one of the most adequate solutions to fulfill the performance/energy requirements, but this technology must rely on mature and efficient tools and methodologies that expedite design time. The design cycle of an ASIP starts with an application, which is initially profiled and analyzed in order to discover optimization, customization and acceleration opportunities that lay within the application functionality. This step is traditionally performed by several different tools at the source level (source level profiling - SLP), which although provide valuable information about the code, might be too high level for a microarchitecture analysis.
CoEx is developed to fill the gap between SLP and microarchitectural level profiling, and its main role in ASIP design is to provide a complete solution that helps an engineer through the entire development cycle of a new tailored processor. With CoEx the designer is able to get the same statistics that traditional SLP provide but also to focus on the most interesting functions (for which the ASIP is going to be implemented), while changing the granularity of the analysis at every step to collect either general or very detailed information. Due to this we consider the profiler not only to be fine-grained, but multi-grained, and this characteristic helps designers to quickly prune out uninteresting parts of the code and therefore increase productivity.
CoEx consists of three parts that although in the current implementation are integrated in a single tool, can be easily reused individually in different technologies.
- A multi-granularity LLVM based code instrumenter and API that gets configurations in an XML form and modifies the code with API defined function calls. The API implementation can be replaced at any time, enabling its reuse inside custom flows.
- A highly optimized profiling library to do profiling information storage and bookkeeping.
- A compiler framework independent GUI in which all language and compiler framework specific information is contained within a standardized XML format.
The multi-granularity of the approach relays in several levels. At the highest level the designer can choose to gather only execution time information about the program. This is the top-level most lightweight analysis and serves as an initial peek & peep into the application behavior. As a second level of granularity the user can pick about performing the instrumentation globally throughout the program, or locally for "interesting" functions. This feature helps controlling the amount of information to understand, to focus application analysis and to enable easy optimization and algorithmic exploration. In this level, engineers can choose amongst several analyses, each with different functionality and overhead:
- Function and basic block execution analysis: keeps track of the number of times the program functions and basic blocks were executed. While doing it, keeps track of the branches within a BB, their nature (conditional/unconditional), and also records the function calls within a BB. This analysis also keeps track of the operation nature and count in a given basic block, which helps to analyze the nature of the underlying algorithm.
- Heap/Stack profiling and debugging: when configured, keeps track of the maximum application stack size. It also keeps statistics on memory allocations and maximum required heap size, and informs the user about possible leaks.
- Memory access and value profiling: This two combined analysis keep track of which statements access a given source level memory element (heap variable, local variable or argument) and tracks the minimum and maximum value for each individual variable. This feature is interesting when tailoring architectures and also while doing algorithmic exploration.
- Tracing: Tracing can be enabled individually for function and basic blocks, heap related operations (allocation/deallocation), memory accesses, stack size and heap size.
Output from the tool is received in XML form for the execution statistics and as binary files for trace statistics. Execution statistics are usually the input to the GUI, which is used by designers to analyze the application. The GUI displays the information in an intuitive way, linking each result with its corresponding statement inside the code. In such a way, the designer is able to immediately relate profiling information with application, therefore increasing its understanding about it.
The results can be nevertheless fed to other tools for instruction set customization, automatic software partitioning and memory mapping among others, feature that makes the tool extremely valuable for complex embedded design flows.
Juan Fernando Eusse
Ceng, J., Hohenauer, M., Leupers, R., Ascheid, G., Meyr, H. and Braun, G.: C Compiler Retargeting Based on Instruction Semantics Models., in DATE(Munich, Germany), in DATE(Munich, Germany), Mar. 2005
Meyr, H., Schliebusch, O., Wieferink, A., Kammler, D., Witte, E. M., Lüthje, O., Hohenauer, M., Braun, G. and Chattopadhyay, A.: Designing and Modeling MPSoC Processors and Communication Architectures, in Building ASIPs: The Mescal Methodology, Springer, pp. 229-280, Jun. 2005, ISBN: 0-387-26057-9