AR100/HardwareSharing

A proper suspend and resume implementation for the sunxi SoC family requires running code outside the normal ARM CPUs, for three major purposes:
 * 1) Cleanly shutting down the main CPU cores and clusters
 * 2) Listening for a wakeup event
 * 3) Cleanly turning the main clusters and CPU cores back on

A successful suspend/resume cycle requires a very specific set of steps, that must be performed in the right order and with precise timing. These include flushing the L2 cache, asserting resets and "clamps" which isolate the CPU cores from the rest of the SoC, and turning off the power supply. Resuming the system requires reconfiguring the CPU cores before turning them back on. Otherwise they would jump back to the BROM and the device would hang or reboot.

In addition to the actual suspend/resume process, power management firmware is responsible for performing other operations that save power while the system is suspended. These include turning off unnecessary clocks and PLLs, disabling external voltage regulators, and putting the DRAM into self-refresh mode. While some of these actions can be done from the host OS (Linux), others require the main CPU cores to be off first.

In order to perform these tasks, the A31 and newer SoCs include the AR100, a secondary OpenRISC CPU, where management firmware runs. The peripheral devices ("IP blocks") inside the sunxi family of SoCs are divided into two groups, the and the, roughly based on which CPU is intended to control them. The main block of devices are designed for use by Linux; the devices in the RTC block are prefixed with R_ and are generally reserved for management firmware.

However, there is significant overlap. Two devices in the main block, the message box and the hardware spinlock controller, are specifically designed for communication between the host OS and the firmware. And some peripherals in the RTC block are useful to the host OS. For example the eDP bridge in the Pine Pinebook laptop is connected to R_TWI, and the AC100 audio codec is usually connected to R_RSB. Where either the OS or firmware could equally well implement a feature (such as DVFS or PMIC control), Allwinner generally chose to put their implementation in firmware, while mainline developers prefer to write a "native" Linux driver. Most problematic are the peripherals, such as the RTC-block pin and clock controllers, which must be used by both the host OS and the management firmware at different times, or even the same time.

Because of this overlap, and the resulting synchronization requirements, OS and firmware implementers must reach consensus about which driver has permission to access (read or write) a peripheral at any given time, and how to synchronize access to hardware that must be shared between multiple software packages. Furthermore, considering the security features available in sunxi SoCs while developing these rules is key to future support for booting the SoC with the secure mode eFuse bit set.

Scope
This document applies to the four major software components that implement low-level hardware access: Linux, U-Boot, and ARM Trusted Firmware-A (abbreviated ATF) running on the ARM CPUs; and SCP (System Control Processor, aka power management) firmware running on the AR100.

While no official choice has been made about the name or maintainership of the SCP firmware, Crust is the most mature implementation to date, having demonstrated successful suspend and resume on various A64 and H5-based boards, including the Pinebook. For that reason, "Crust" will be used interchangeably with "the SCP firmware" in the remainder of this document. However, other implementations of firmware for the AR100 exist, and the current version of Crust does not implement the specification below. If a decision is made to continue development under a new name, this document will be updated accordingly.

This document only applies to the "mainline" version of each software project. Allwinner provides a binary blob for use with their Linux 3.4/3.10/4.4 BSP. The Allwinner blob is much larger (in scope and code size) than the firmware described below, and uses an [entirely different API] for power management requests. Thus it is not compatible with mainline ATF or Linux.

While this document applies to all sunxi SoCs that contain an AR100–meaning the A31 and newer–the newest few generations of SoCs are receiving the most attention at this time. SCP firmware support is planned in the following order:
 * 1) A64/H5 -- Existing, if buggy, support; used in current laptops, which makes it the highest priority
 * 2) H3 -- Most similar to the A64/H5; hardware widely available for testing
 * 3) H6 -- Reorganizes the memory address space and the PRCM; hardware only recently available
 * 4) A83T -- The oldest SoC with blobs easily available; less common in hardware; different clock tree from newer SoCs

Adding support for each SoC will likely require reverse-engineering a version of the proprietary blob released for that SoC, as Allwinner only provides documentation for this part of the SoC under NDA. (Do blobs for sun8i SoCs older than the A83T exist? If you know, ping smaeul on #linux-sunxi)

Development Status
smaeul is actively (albeit slowly) working on implementation of the plan in this document for all four projects involved. Help with this specification, blob reverse-engineering, development (patches), help with upstream, are all welcome. Comments on the design are especially welcome, to ensure that the requirements of all interested parties are met. Email me (my address is on my GitHub profile), or ping me on #linux-sunxi.

Most work to date has been done under the [crust-firmware] GitHub organization.
 * [Linux] adds the mailbox controller driver, irqchip mailbox client, required clock changes, and the device tree updates to enable SCPI.
 * [ATF] adds the mailbox controller driver, the PSCI implementation backed by SCPI (using the existing SCPI client), and code to start Crust.
 * [U-Boot] adds code to load Crust into SRAM (from SPL; it goes in the FIT image with ATF and U-Boot proper).

An updated version of the Linux side is being developed here:. This is an RFC patch series that includes preparation patches (which can go in today), as well as v2 of the mailbox controller driver, and the initial implementation of the irqchip mailbox client.

Upstreaming Plan
Nothing is upstreamed yet. However, waiting to test until everything is upstream is not an option. Either firmware won't work at all, or it is required to boot. There's no middle ground. We can't switch everybody over until it's tested, so people have to test with out-of-tree patches. Ideally we minimize these to A/B enable/disable patches.

Simple, preferred plan with a single flag day, where we pick a future Linux release where firmware becomes required:
 * 1) Upstream CCU prep patches and Linux drivers now. These won't be enabled by default or be in the DT, so nobody is affected.
 * 2) Upstream U-Boot now. People who don't want to use firmware just don't provide it at build time.
 * 3) Upstream ATF now. People who don't want to use firmware must use an old ATF commit, which should be fine (there's no functionality impact--it's a stable project). People who do use firmware must use a patched version of Linux with the DT bits.
 * 4) At the flag day, once the firmware and ATF part is stable, change over the device tree and Kconfig defaults in upstream Linux.

Memory
Firmware components use mostly SRAM areas A1 and A2. Other SRAM areas (C, CE, etc.) are designed for specific purposes, may be unreliable, and and may require [additional setup to use].

DRAM
Allwinner's firmware blob is split into two parts. The first takes up the entirety of SRAM A2, and the second is loaded at the beginning of DRAM. The only reason for using DRAM appears to be because they ran out of space in SRAM A2. Accessing DRAM is incredibly slow since the AR100 does not have a data cache. If firmware space becomes an issue in the future, it would make more sense to move ATF to DRAM (or SRAM A1), and allow Crust to use more of SRAM A2. To be specific, the largest function in Allwinner's blob is the DRAM initialization. Supporting a suspend mode that requires DRAM re-init is the only reason that firmware would need significantly more space than currently allocated.

Mainline ATF does not currently use any DRAM, since it is also loaded into SRAM A2. However Allwinner's ATF blobs (and their source code dump) load ATF at the beginning of DRAM, and use the TZASC to limit that region of DRAM to secure access only. Returning to this arrangement is a future option for mainline ATF if either SCP firmware or ATF increases in size so that both no longer fit in SRAM A2. Furthermore, a TEE, if one is loaded, would also go in a secure region of DRAM, so it makes sense to put that adjacent to ATF. Moving ATF to DRAM requires coordination with U-Boot to update the load addresses and reserve the secure memory area in the Linux DT.

SRAM A1
SRAM A1 is where the BROM loads U-Boot SPL (or some other first-stage loader). Currently, this region of SRAM is unused after U-Boot SPL transfers control to ATF. It is not possible for SPL to load a firmware image here without relocating itself. However, this SRAM could be used for dynamic data. A [modification to the ATF linker script] puts its .bss, stacks, page tables, locks, etc. in SRAM A1. Although it might be difficult to upstream, this is the simplest way to free up space in SRAM A2.

SRAM A1 is not mapped into the AR100's address space by default on some SoCs, thus making it unusable by the SCP firmware. Allwinner claims that it can be mapped, but provides no information about how to do so.

SRAM A2
Both ATF and SCP firmware are loaded into SRAM A2. SCP firmware must be stored there; ATF uses this memory region out of convenience. Because the two firmwares are separate binaries, and because neither firmware is relocatable, the division between the two is ABI. If either firmware needs to grow such that the other shrinks, the ABI change must be synchronized with the other firmware, as well as U-Boot, which is responsible for loading both into SRAM. Furthermore, if the shared memory area used for SCPI moves, the Linux device tree must also be updated.

To minimize the risk of changing the ATF load address or the SCPI shared memory address, it has been decided to put ATF in the lower portion of SRAM A2, and SCP firmware in the upper portion. This also avoids adding code to U-Boot (a bounce buffer) to handle the special OpenRISC vector region at the beginning of SRAM A2.

However, since the SCP firmware is not loaded adjacent to the OpenRISC vectors, those will need to be initialized separately by ATF before turning on the AR100. This also places limitations on the SCP firmware, since its exception handling scheme becomes ABI.

The line of division between the two is based size constraints. This table is for the A64 and H5:

The H3 has a smaller SRAM A2, but does not use ATF since it is a 32-bit SoC. The A83T has the same size of SRAM A2 as the A64, but like the H3, does not need to share it with ATF. The H6 does need to share, but has a larger SRAM A2. No decision has been made about where the split will be on the H6. However, for the reasons given above, the SCPI shared memory should be kept at the very end of the region.

CCU
Crust will only access the CCU while Linux is not running, and then only for three purposes: 1) Enabling gates and resets at boot for devices needed to run the firmware (below, mostly MSGBOX). This is done while ATF is spinning in a loop waiting for the "SCP Ready" message. 2) Adjusting clock speed when entering/exiting suspend (think DRAM), while the ARM cores are off. 3) Turning off clocks and PLLs during shutdown after the ARM cores are off. (ATF does some of this now; SCP firmware can do it more thoroughly).

Crust would need to read the PLL_PERIPH0 register at arbitrary times if that was set as the CPUS/AHB0 clock parent. Since 24MHz has been plenty fast enough so far, I don't expect that to happen (plus it increases power usage, and requires reparenting the CPU while it is running, which has caused crashes in the past).

HWSPINLOCK
This isn't used yet, but I want to reserve it (at least partially) for future use with Crust. It would be needed [especially for the PIO].

MSGBOX
This can be set up (draining FIFOs, setting directions) by either Crust or ATF. ATF will likely start polling for new messages before Crust boots enough to turn the MSGBOX on, but that's fine. Whoever sets it up should turn off interrupts by default for the ARM cores, since we can't have IRQs in EL3.

Linux should not touch the directions or ATF/Crust's IRQs. It should only send/receive messages and control its own IRQs.

PIO
This probably shouldn't be used by Crust, since it has major synchronization issues, but I'm sure some board has a power button or LED here.

THS
This makes sense to use in Crust, but Linux controls the module clock and likes to turn it off. Since Linux is doing DVFS, I'm inclined to not let Crust touch it.

USB (EHCI0)
This is a possible wakeup source, mentioned in Allwinner's firmware: "USB EHCI0 wakeup" is a string in the [tinalinux blob]. It would be only used for that purpose, so it would only be touched while Linux/ATF are not running. I don't know how complicated this would be to support (Is it just an IRQ, or do we have to program things?)

RTC
The interesting parts for Crust are the PLL/oscillator control and the "general purpose" registers in the RTC power domain. Would allow RTC wakeup. Not currently used by Crust, need to evaluate it. I don't think any of the parts that the AR100 would use are used by Linux (maybe the one clock control?).

R_CIR-RX
This is a possible wakeup source. It should be owned by Linux while it is running, and should only be used by Crust while the ARM cores are off.

This could potentially be complicated to support in Crust if we care about waking up on a specific key code.

This is the device that has a module clock in [R_PRCM]. So if Linux doesn't set that up, it would need to be done in Crust. A fixed configuration would be fine for Crust's use, but if we want to provide the SCPI clock API, we'd need fuller clock rate calculation code.

R_CPUCFG
Crust uses several registers related to turning the cores/clusters/power domains on and off.

The only thing here that ATF should do here is take the AR100 out of reset. Linux shouldn't use this at all.

R_INTC
Necessary for any functioning AR100 firmware. Linux currently uses it for the NMI, but will need to stop using it. I have a solution that forwards IRQs over a pair of mailbox channels, which works well so far (see below).

R_PIO
Both Linux and Crust need to use this... possibly use HWSPINLOCK for synchronization if that is an issue.

Crust should not need to write any registers while Linux is running. It only needs to write to
 * change pin functions at boot (R_RSB)
 * set up interrupts for pins (power button, lid switch, WLAN/BT wakeup, etc.)
 * power LED GPIO

It would only need to read registers while Linux was running in order to handle IRQs (which it may not need to do while Linux is running).

Currently both Linux and Crust read the IRQ directly, and Linux usually wins, so Crust cannot receive IRQs while Linux is running. This could be solved by putting the R_PIO_PL/R_PIO_PM interrupts behind the message-box forwarder.

R_PRCM
Currently used (the "CCU" part and the security lockdown registers) by Linux/ATF. This only needs to be touched at boot or suspend/resume while Linux/ATF aren't running, so it could be shared if everything was marked CLK_IGNORE_UNUSED in Linux. On the other hand, SCPI provides a clock API, and most of these are simple gates, so it would be ideal to let Crust own the hardware block and Linux use the API.

Particularly, Linux should not touch the CPUS or APB0 clock registers, since Crust may rely on child clocks having specific rates.

Crust uses both the module gates/resets and several other registers related to turning the cores/clusters/power domains on and off.

R_PWM
Used by Linux; not used by Crust. Owned by Linux.

R_RSB
This needs to be used by both Linux and Crust. We need to divide functionality between the two (battery charge, GPIO, LEDs, etc.). Linux owns all regulators and IRQs while it is running (IRQ sharing doesn't work from my experience). Currently Crust saves and restores all IRQ settings on suspend/resume. While suspended, it turns off all IRQs except wakeup, and turns off unnecessary regulators (e.g. DCDC1-DCDC3).

How to handle the PMIC I²C→RSB switchover? It seems that doing it multiple times doesn't hurt. How to handle the clock rate?

R_TIMER
Used by Crust in the past, not used by the new version. There's no reason for Linux/ATF to use this.

R_TWD
Used by Crust. It's secure-only anyway. There's no reason for Linux/ATF to use this.

R_TWI
Used for PMIC and non-PMIC communication. Owned by Linux while it is running. Crust may use it while the ARM cores are off for suspend/resume as long as it doesn't change settings... of course, that assumes the controller is on, which is not true with device PM. Probably best to not touch this one in Crust.

R_UART
Usually used for Bluetooth, could possibly be a wakeup source. Crust currently logs to UART0, though once the bugs are fixed it shouldn't log at all :)

Owned by Linux while Linux is running?

R_WDOG
Not used by Crust (R_TWD is nicer). There's no reason for Linux/ATF to use this.

PMIC
The plan at the moment is that ATF does all of the boot-time setup, Linux does all of the runtime regulators, and Crust just does suspend/resume control of regulators and wakeup IRQs.

Unclear parts:
 * ADC/TS inputs, PMIC thermal IRQs
 * GPIO -- e.g. if there's a power LED
 * Battery charge control
 * Do we want A/C attachment to be a wakeup source? low battery? battery full?

Who sets them up? Who is allowed to touch them and when? Who is responsible for handling IRQs?

IRQ Forwarding

 * Client: https://github.com/smaeul/linux/blob/1b86d762959416d60861457c4ff9fe4929ca4313/drivers/irqchip/irq-mbox.c
 * Server: https://github.com/crust-firmware/crust/blob/crust-mini/common/irqf.c

SCPI
Implemented commands for ATF:
 * SCP Ready
 * Set system power
 * CSS Get/Set

Implemented commands for Linux:
 * SCP Capability
 * Clock Cap/Info/Set/Get
 * PSU Cap/Info/Set/Get for reset lines?

SCPI shared memory size: 2 clients × 2 directions × 128 byte payload → 512 bytes

See:
 * https://github.com/crust-firmware/crust/blob/crust-mini/common/scpi.c
 * https://github.com/crust-firmware/crust/blob/master/common/scpi_cmds.c

Suspend/Resume Process
Questions:
 * How complicated is entering and exiting self-refresh mode? Is it mostly clock setup, or full retraining?

Additional Notes
TODO:
 * Specify which clocks get CLK_IGNORE_UNUSED or CLK_CRITICAL -- applies to those mentioned in CCU, as well as the whole PRCM as long as Linux touches it.
 * Check which peripherals are secure-only or can be switched to secure. Secure-only means Linux shouldn't be using in the first place. Switchable devices only used on the AR100 or ATF should be switched to secure if possible.
 * Come up with a way for Crust to request an orderly sleep/shutdown from Linux.
 * Is "Crust" a good name for the AR100 firmware? Any suggestions?
 * Should we continue working on Crust or start something clean/fresh? (I sort of did this already)
 * Add URLs for links
 * Update linked crust-firmware READMEs