AR100

The AR100, also called the CPUS, arisc, or ARISC in SoC documentation, is a coprocessor present in the A31 and newer sunxi SoCs (including the popular H3 and all 64-bit chips). It is not another ARM core, but instead uses the 32-bit OpenRISC 1000 instruction set architecture.

Allwinner releases a closed-source firmware blob for the AR100 as part of their BSP. This blob provides power management services to software running on the ARM CPUs, such as Linux and U-Boot. It also implements deep power-saving modes ("super standby") for the BSP kernel. The AR100 is not currently used for anything on mainline Linux, as power management there is implemented using native drivers. A few projects have begun to write free firmware for the AR100, using it for power management or as an independent microcontroller.

= Hardware =

While the name "AR100" refers only to the OpenRISC CPU core, the processor is tightly integrated with other "RTC block" hardware. In general, any device whose name begins with "R_" is intended to be controlled by the AR100. This includes the R_PIO, R_PRCM, and several timers. This also includes the R_CIR infrared receiver, so a remote control can be used to wake the SoC from deep sleep.

CPU core
The AR100 is based on the OR1200 implementation of the OpenRISC 1000 architecture. This is an open-source CPU; Verilog source code is available at https://github.com/openrisc/or1200. The AR100 CPU reports that it is hardware revision 1 in its "Version Register" SPR (below). The revision was changed to 8 in commit 31c7fde6, so that means the AR100 hardware cannot be based on any OR1200 commit newer than 4a4a9675. Thus, the AR100 is a very old design with several known bugs, some of which are detailed below.

Instruction set
The OpenRISC architecture is very flexible, with many optional features. The AR100 only supports the 32-bit base instruction set ("ORBIS32"), so it has no floating-point or vector arithmetic instructions. Even within the ORBIS32, several instructions are optional. Running an unimplemented instruction should cause an "Illegal Instruction" exception, but this is not always the case due to bugs. Optional instruction support is described in the following table:

There is also space for 8 "custom" instructions reserved in the ORBIS32 instruction set. None of those are implemented.

OR1200 features
The OR1200 itself is also very configurable. Bits in the "Unit Present Register" and other SPRs describe which features are available. In summary:


 * General purpose registers: 32
 * Instruction set(s) supported: ORBIS32
 * Delay slot: 1 present
 * Byte ordering: big-endian only (but see Memory below)
 * Instruction cache: 4KiB, one-way, physically tagged, 16-byte cache blocks
 * Data cache: not present
 * MMU: not present
 * Multiply-Accumulate (MAC) unit: present
 * Debug unit: present
 * Performance counters: not present
 * Power management: present (broken)
 * Programmable interrupt controller: present (broken)
 * Tick timer: present
 * FPU: not present

SPR data
The OpenRISC 1000 architecture defines several special-purpose registers, or SPRs. Some of the informational ones are detailed here:

Note: the DCCFGR values are not meaningful, since the data cache is not implemented.

Byte swapping/endianness
While the CPU itself is big-endian, the address and data buses coming out of the CPU are byte swapped. This makes 32-bit memory access appear to be "little-endian", as each group of 4 memory bytes is reversed. This is extremely convenient, as the MMIO register definitions from the SoC manual can be used as is. However, 8 or 16-bit memory reads/writes will access the wrong data (see the table below), so transfering strings or small integers between the ARM world and the AR100 requires swapping bytes. For this reason, MMIO access from the AR100 must always use 32-bit loads and stores.

Byte swapping also affects the AR100's instruction stream. The toolchain writes instructions in big-endian byte order, and the AR100 CPU expects to read them in big-endian byte order. However, due to the byte swapping, if the instructions are stored in SRAM as-is, they will be read by the CPU as little-endian, and they will not run. To solve this, the instructions must be reversed before writing them to SRAM; they will be un-reversed when read by the AR100. This can be done using objcopy when creating a binary firmware image:

${CROSS_COMPILE}objcopy -O binary --reverse-bytes 4 firmware.elf firmware.bin

H3
To be investigated: something seems to be weird about the SRAM A1 and DRAM access times in H3 when compared to A31. Maybe the MBUS clock speed makes some difference too?

A64/H5
The ARM BROM is mapped into the AR100 address space because it was moved into the previous location of SRAM A1, but the ARM/AR100 remapping was not updated. (SRAM A1 was moved to make space for the BROM; the BROM was moved to allow for more than 2 GiB of DRAM address space).

The copy of the end of SRAM A2 mapped at the AR100's 0x00050000 is interesting because its access is so much slower than expected, as if the data is going in a loop through several buses in the SoC back to AHB0.

CPUS_CLK_CFG_REG
The CPU clock can be configured with a register referenced as CCMU_CPUS_CFG in the Allwinner sun6i Linux source code. It is documented in the A80 and A83T manuals under R_PRCM as CPUS_CLK_CFG_REG and CPUS_CLK_REG, respectively.

The register is generally the first register in the PRCM, located at 0x01f01400 on H3/A64/H5.

Known issues
Since the AR100 is based on an extremely old OR1200 commit, any bugs in the CPU core since then can be considered "known issues". This includes:
 * Multiply-accumulate unit (MAC) bugs, fixed by e.g. d24b2173 and 57a449d2.
 * l.fl1 returns the same value as l.ff1</tt> (fixed by 66efe9cd).
 * More multiply/divide bugs, fixed by e.g. bc9b53bc.
 * Arithmetic carry/overflow flags are not implemented (done in 2c0765d7).
 * The "infamous l.rfe</tt> fix", in f0255fab.
 * l.lws</tt> does not do anything (fixed by 385ffbf3).
 * A bug with filling the instruction cache (fixed by bd5b48dc).
 * l.ror</tt> appears to be implemented, even though it is not (fixed by 26febe37).
 * Plus other bugfix commits not explained in the commit message (they point to a now-defunct bugzilla instance).

Other issues found while developing for the AR100 include:
 * The division instructions claim to be implemented but do not work at all. They return either 2 or 10 for all inputs.
 * l.cmov</tt> has some undefined effect when present 4 bytes into an instruction cache block (so at address 0x???4) with the instruction cache enabled. It appears to affect later instructions in the pipeline, as if the next few instructions are skipped. Workaround is to not use l.cmov</tt> (it's not generated by gcc</tt> by default anyway).
 * l.ror</tt> does not work, most likely because it is unimplemented. Due to a known bug in the OR1200 (see above), it does not cause an "Illegal Instruction" exception even when unimplemented.
 * All bits in the power management register appear to be ignored. Probably this means the signals from the OR1200 core are not connected to any control logic in the SoC. This makes it impossible to stop or slow down the AR100 CPU when it is idle. Workaround is to control the clock using the register in the PRCM.
 * The programmable interrupt controller (PIC) registers claim to be implemented, but have no effect. No workaround is needed, as all interrupts come in through an external interrupt controller (R_INTC</tt>, compatible with the interrupt controller in the A13).

These issues may be due to a (later fixed or still unfixed) bug in the OR1200, modifications made by Allwinner to the OR1200 core, or a silicon bug in the SoC.

= Software =

Toolchain
GCC has been ported to the OpenRISC architecture twice. Still, the upstream GNU GCC does not support OpenRISC at the moment. The first port, during the GCC 3.x era, used the or32</tt> architecture name. It is still available at an archive of the meansoffreedom.com website (file listing). This toolchain, based on GCC 3.4.4, is the one used by Allwinner to compile their firmware blob. Because it is able to elide function prologues/epilogues, it actually generates smaller code than the newest or1k-gcc</tt> release.

The second port is known as or1k-gcc</tt> (so it calls the architecture or1k</tt>). It has not been updated since an experimental GCC 6 release, with the latest stable version being GCC 5.4.0. It is available at https://github.com/openrisc/or1k-gcc; however, that version will never be contributed upstream. The reason is explained in the #openrisc irc log from 2016-11-25:

olofk Everyone, except for one guy has given permission for copyright assignment. Unfortunately, his work is very early in the development, so technically the rest is based upin that wbx   and this guy is no longer interested in or1k? olofk The latest idea we had was to see if the stuff he wrote has actually been replaced by other patches olofk He is actually running very much involved, as he works for a company that makes proprietary versions of OpenRISC wbx   isn't is possible to convince the guy to just offer his code as public domain, so no special fsf agreement required. olofk No. His standpoint is that he doesn't want to give up his ownership of the code olofk Which of course is just pure fucking bogus wbx   so he thinks his company benefits from these actions so that toolchain support isn't upstream or what? wbx   i don't understand such actions from people working with open source. olofk Well, me neither. But there is not much more we can do to convince him :/

The forked GCC is still a GPL licensed free software, so using it is perfectly legal (the linux-sunxi wiki provides installation instructions). But packaging a usable OpenRISC toolchain in Linux distributions (such as Debian) is another story because this may involve some political arm wrestling.

Downloads of or1k-gcc</tt> binaries are available here and here. The musl toolchain is the smallest, and also works for bare metal development. If you want to build your own GCC, smaeul's fork of musl-cross-make has integrated the GCC 5.4.0 patch. To use it, run:

make TARGET=or1k-linux-musl GCC_VER=5.4.0

On the other hand, binutils, gdb, and newlib all have upstream OpenRISC 1000 support. It is possible to use the latest binutils release with both or1k-gcc</tt> and or32-gcc</tt> (though some symlinking is necessary in this second case because the architecture names are different).

Allwinner blob
Allwinner's blob provides a power management API using the message box and spinlock devices, as well as a shared memory area in DRAM. The API is used by Linux (<tt>drivers/arisc</tt>), U-Boot, and ATF. All three clients have code for loading and starting the firmware blob. An example of the header containing the API definitions is available in the ATF source at <tt>plat/sun50iw2p1/include/arisc.h</tt>.

Blobs can be found in some BSP Linux kernel source trees, e.g. https://github.com/tinalinux/linux-3.10/tree/r18-v0.9/drivers/arisc/binary. The <tt>arisc_*.code</tt> files contain both the blob and an encrypted source archive. The blobs can also be found in the BSP as <tt>tools/pack/chips/*/bin/scp.bin</tt>.

Blob versions
These versions have been extracted from the blobs in smaeul's sunxi-blobs repository. Hopefully these can give some insight into the evolution of Allwinner's blob over time and between SoCs.

Decompiling the H3 blob
To ease reverse engineering of the firmware for H3, you can use a script that takes arisc_sun8iw7p1.bin file (available in the lichee H3 sdk from Draco) and produces readable pseudocode. Pseudocode is split into cross-referenced functions and basic blocks; code within basic blocks is emulated; and register assignments use evaluated values if they are known. Memory and register addresses are renamed based on the map of known locations. Most of the functions are named based on their purpose.

Code can be used to understand the suspend/resume function in H3 in particular and write a mainline implementataion.

It is available on github: megous/h3-ar100-firmware-decompiler

Reverse-engineering tools for all blob versions
Another project, sunxi-blobs, provides more generic tools for disassembling and analyzing AR100 firmware blobs (as well as boot ROMs). These scripts can take a firmware dump and generate an annotated disassembly listing as well as an SVG control flow graph (using graphviz). See the project's README for more details.

Information-gathering programs

 * https://github.com/skristiansson/ar100-info
 * https://github.com/Icenowy/ar100-info (A64/H5 support in the is-a64-h5 branch).
 * https://github.com/megous/h3-firmware

Power management firmware

 * h3fakeoff (from the H3Droid project)
 * https://github.com/Icenowy/h3-arisc-shutdown
 * https://github.com/crust-firmware/crust

Resource Sharing
The AR100 and the Arm cores run asynchronously, and can access the same memory and registers. In order to prevent the AR100 from corrupting the state of the ARM cores, one of two things needs to be done:
 * 1) IF the AR100 ONLY accesses chip resources that are otherwise unused from Linux, it can do so safely at any time.
 * 2) IF the AR100 needs to access shared resources (such as GPIO) then it needs to synchronise with the kernel, and the kernel needs to be aware of it.

To synchronise with the kernel, the chips implement hardware spinlocks. There is a kernel driver for these spinlocks, but it has not yet been finally submitted upstream due to a lack of testing.
 * https://github.com/montjoie/linux/tree/sun8i-hwspinlock-wip_next-20180301

Getting this driver in mainline should be considered essential for any use of the AR100 as a realtime coprocessor. At the minimum, hardware spinlocks should protect GPIO Writes, and GPIO Direction. Other GPIO states, such as function, should be initialised by the kernel before the AR100 firmware is started. There should be little need to reinitialise GPIO modes inside AR100 code.

For simplicity of implementation of AR100 firmware, it is suggested that a set of known Hardware Spinlocks be defined in a kernel header file. AR100 firmware should then compile against this header, which would ensure synchronisation of the firmware with the kernel version. The predefined hardware spinlocks are initialised early by the kernel, during board initialisation using: struct hwspinlock *hwspin_lock_request_specific(unsigned int id);

See for details.

= Documentation =


 * OpenRISC 1000 Architecture Manual 1.2
 * OpenRISC 1000 Architecture Manual 1.1 (HTML)
 * OpenRISC 1200 IP Core Specification (Preliminary Draft)
 * OpenRISC 1200 Supplementary Programmer's Reference Manual

= Links =


 * OpenRISC project website
 * OpenRISC on GitHub
 * or1k port of <tt>libgloss</tt> (contains low-level library functions and useful headers)

= Notes =