Recently, Carbon released the first ARMv8 Linux CPAK utilizing theARM CoreLink CCN-504 Cache Coherent Network on Carbon System Exchange. The CCN family of interconnect offers a wide range of high bandwidth, low latency options for networking and data center infrastructure.
The new CPAK uses anARM Cortex-A57octa-core configuration to run Linux on a system withAMBA 5 CHI. Switching the Cortex-A57 configuration from ACE to CHI on Carbon IP Exchange is as easy as changing a pull-down menu item on the model build page. After that, a number of configuration parameters must be set to enable the CHI protocol correctly. Many of them were discussed in aprevious article covering usage of the CCN-504. Using native AMBA 5 CHI for the CPU interface coupled with the CCN-504 interconnect provides high-frequency, non-blocking data transfers. Linux is commonly used in many infrastructure products such as set-top boxes, networking equipment, and servers so the Linux CPAK is applicable for many of these system designs.
Selecting AMBA 5 CHI for the memory interface makes the system drastically different at the hardware level compared to a Linux CPAK using the ARM CoreLink CCI-400 Cache Coherent Interconnect, but the software stack is not significantly different.
From the software point of view, a change in interconnect usually requires some change in initial system configuration. It also impacts performance analysis as each interconnect technology has different solutions for monitoring performance metrics. An interconnect change can also impact other system construction issues such as interrupt configuration and connections.
Some of the details involved in migrating a multi-cluster Linux CPAK from CCI to CCN are covered below.
Software Configuration
Special configuration for the CCN-504 is done using the Linux boot wrapper which runs immediately after reset. The CPAK doesn’t include the boot wrapper source code, but instead usesgitto download it fromkernel.organd then patch the changes needed for CCN configuration. The added code performs the following tasks:
- Set the SMP enable bit in the A57 Extended Control Register (ECR)
- Terminate barriers at the HN-I
- Enable multi-cluster snooping
- Program HN-F SAM control registers
The most critical software task is to make sure multi-cluster snooping is operational. Without this Linux will not run properly. If you are designing a new multi-cluster CCN-based system it is worth running a bare metal software program to verify snooping across clusters is working correctly. It’s much easier to debug the system with bare metal software, and there are a number ofmulti-cluster CCN CPAKsavailable with bare metal software which can be used.
I always recommend a similar approach for other hardware specific programming. Many times users have hardware registers that need to be programmed before starting Linux and it’s easy to put this code into the boot wrapper and less error prone compared to using simulator scripts to force register values.The Linux device tree provided with the CPAK also contains a device tree entry for the CCN-504. The device tree entry has a base address which must match the PERIPHBASE parameter on the CCN-504 model. In this case the PERIPHBASE is set to 0x30 which means the address in the device tree is 0x30000000.
All Linux CPAKS come with an application note which provides details on how to configure and compile Linux to generate a single .axf file.
GIC-400 Identification of CPU Accesses
One of the new things in the CPAK is the method used to get the CPU Core ID and Cluster ID information to the GIC-400.
The GIC-400 requires the AWUSER and ARUSER bits on AXI be used to indicate the CPU which is making an access to the GIC. A number between 0 and 7 must be driven on these signals so the GIC knows which CPU is reading or writing, but getting the proper CPU number on the AxUSER bits can be a challenge.
In Linux CPAKs with CCI, this is done by the GIC automatically by inspecting the AXI transaction ID bits and then setting the AxUSER bits as input to the GIC-400. Each CPU will indicate the CPU number within the core (0-3) and the CCI will add information about which slave port received the transaction to indicate the cluster.
Users don’t need to add any special components in the design because the mapping is done inside the Carbon model of the GIC-400 using a parameter called “AXI User Gen Rule”. This parameter has a default value which assumes a 2 cluster system in which each cluster has 4 cores. This is a standard 8 core configuration which uses all of the ports of the GIC-400. The parameter can be adjusted for other configurations as needed.
The User Gen Rule does even more because the ARM Fast Model for the GIC-400 uses the concept of Cluster ID to indicate which CPU is accessing the GIC. The Cluster ID concept is familiar for software reading the MPIDR register and exists in hardware as a CPU configuration input, but is not present in each bus transaction coming from a CPU and has no direct correlation to the CCI functionality of adding to the ID bits based on slave port.
To create systems which use cycle accurate models and can also be mapped to ARM Fast Models the User Gen Rule includes all of the following information for each of the 8 CPUs supported by the GIC:
- Cluster ID value which is used to create the Fast Model system
- CCI Port which determines the originating cluster in the Cycle Accurate system
- Core ID which determines the CPU within a cluster for both Fast Model and Cycle Accurate systems
With all of this information Linux can successfully run on multi-cluster systems with the GIC-400.
AMBA 5 CHI Systems
In a system with CHI the Cluster ID and the CPU ID values must also be presented to the GIC in the same way as the ACE systems. For CHI systems, the CPU will use the RSVDC signals to indicate the Core ID. The new CCN-504 CPAK introduces a SoC Designer component to add Cluster ID information. This component is a CHI-CHI pass through which has a parameter for Cluster ID and adds the given Cluster ID into to the RSVDC bits.
For CCN configurations with AXI master ports to memory, the CCN will automatically drive the AxUSER bits correctly for the GIC-400. For systems which bridge CHI to AXI using the SoC Designer CHI-AXI converter, this converter takes care of driving the AxUSER bits based on the RSVDC inputs. In both cases, the AxUSER bits are driven to the GIC. The main difference for CHI systems is the GIC User Gen Rule parameter must be disabled by setting the “AXI4 Enable Change USER” parameter to false so no additional modification is done by the Carbon model of the GIC-400.
Conclusion
All of this may be a bit confusing, but demonstrates the value of Carbon CPAKs. All of the system requirements needed to put various models together to form a running Linux system have already been figured out so users don’t need to know it all if they are not interested. For engineers who are interested, CPAKs offer a way to confirm the expected behavior read in the documentation by using a live simulation and actual waveforms.