Today, I have three tips for using Swap & Play with Linux systems.
- Launching benchmark software automatically on boot
- Setting application breakpoints for Swap & Play checkpoints
- Adding markers in benchmark software to track progress
With the availability of the Cortex-A15 and Cortex-A53 Swap & Play models as well as the upcoming release of the Cortex-A57 Swap & Play model, Carbon users are able to run Linux benchmark applications for system performance analysis. This enables users to create, validate, and analyze the combination of hardware and software using cycle accurate virtual prototypes running realistic software workloads. Combine this with access to models of candidate IP, and the result is a unique flow which delivers cycle accurate ARM system models to design teams.
Swap & Play Overview
Carbon Swap & Play technology enables high-performance simulation (based on ARM Fast Models) to be executed up to user specified breakpoints, and the state of the simulation to be resumed using a cycle accurate virtual prototype.One of the most common uses of Swap & Play is to run Linux benchmark applications to profile how the software executes on a given hardware design. Linux can be booted quickly and then the benchmark run using the cycle accurate virtual prototype. These tips make it easier to automate the entire process and get to the system performance analysis.
Launch Benchmarks on Boot
The first tip is to automatically launch the benchmark when Linux is booted.Carbon Linux CPAKs on System Exchange use a single executable file (.axf) for each system with the following artifacts linked into the images:
- Minimal Boot loader
- Kernel image
- Device Tree
- RAM-based File System with applications
To customize and automate the execution of a desired Linux benchmark application, a Linux device tree entry can be created to select the application to run after boot.
The device tree support for “include” can be used to include a .dtsi file containing the kernel command line, which launches the desired Linux application.
Below is the top of the device tree source file from an A15 CPAK. If one of the benchmarks to be run is the bw_pipe test from the LMBench suite a .dtsi file is included in the device tree.
The include line pulls in a description of the kernel command line. For example, if the bw_pipe benchmark from LMbench is to be run, the include file contains the kernel arguments shown below:
The rdinit kernel command line parameter is used to launch a script that executes the Linux application to be run automatically. The bw_pipe.sh can then run the bw_pipe executable with the desired command line arguments.
Scripting or manually editing the device tree can be used to modify the include line for each benchmark to be run. A unique .axf file for each Linux application to be run can be created. This gives an easy to use .axf file that will automatically launch the benchmark without the need for any interactive typing. Having unique .axf files for each benchmark also makes it easy to hand off to other engineers since they don’t need to know anything about how to run the benchmark; just load the .axf file and the application will automatically run.
I also recommend to create an .axf image which runs /bin/bash to use for testing new benchmark applications in the file system. I normally run all of the benchmarks manually from the shell first on the ARM Fast Model to make sure they are working correctly.
Setting Application Breakpoints
Once benchmarks are automatically running after boot, the next step is to set application breakpoints to use for Swap & Play checkpoints. Linux uses virtual memory which can make it difficult to set breakpoints in user space. While there are application-aware debuggers and other techniques to debug applications, most are either difficult to automate or overkill for system performance analysis.
One way to easily locate breakpoints is to call from the application into the Linux kernel, where it is much easier to put a breakpoint. Any system call which is unused by the benchmark application can be utilized for locating breakpoints. Preferably, the chosen system call would not have any other side effects that would impact the benchmark results.
To illustrate how to do this, consider a benchmark application to be run automatically on boot. Let’s say the first checkpoint should be taken when main() begins. Place a call to the sched_yield() function as the first action in main(). Make sure to include the header file sched.h in the C program. This will call into the Linux kernel. A breakpoint can be placed in the Linux kernel file kernel/sched/core.c at the entry point for the sched_yield system call.
Here is the bw_pipe benchmark with the added system call.
Put a breakpoint in the Linux kernel at the system call and when the breakpoint is hit save the Swap & Play checkpoint. Here is the code in the Linux kernel.
The same technique can be used to easily identify other locations in the benchmark application including the end of the benchmark top stop simulation and gather results for analysis.
The sched_yield system call yields the current processor to other threads, but in the controlled environment of a benchmark application it is not likely to do any rescheduling at the start or at the end of a program. If used in the middle of a multi-threaded benchmark it may impact the scheduler.
Tracking Benchmark Progress
From time to time it is nice to see that a benchmark is proceeding as expected and be able to estimate how long it will take to finish. Using print statements is one way to do this, but adding to many print statements can negatively impact performance analysis. Amazingly, even a simple printf() call in a C program to a UART under Linux is a somewhat complex sequence involving the C library, some system calls, UART device driver activations, and 4 or 5 interrupts for an ordinary length string.
A lighter way to get some feedback about benchmark application process is to bypass all of the printf() overhead and make a system call directly from the benchmark application and use very short strings which can be processed with 1 interrupt and fit in the UART FIFO.
Below is a C program showing how to do it.
By using short strings which are just a few characters, it’s easy to insert some markers in the benchmark application to track progress without getting in the way of benchmark results. This is also a great tool to really learn what happens when a Linux system call is invoked by tracing the activity in the kernel from the start of the system call to the UART driver.
Summary
Hopefully these 3 tips will help Swap & Play users run benchmark applications and get the most benefit when doing system performance analysis. I’m sure readers have other ideas how to best automate the running of Linux application benchmarks as well locating Swap & Play breakpoints, but these should get the creative ideas flowing.
Jason Andrews