Bootable SPI flash

Introduction
All currently known Allwinner SoCs can boot from SPI flash, which usually has the lowest boot priority and is probed only after all the other options fail (SD card, NAND and eMMC).

Information for devboard designers
The SPI flash can be used to store a bootable firmware on the low cost development boards, which do not offer any other kind of non-removable storage (NAND or eMMC). The minimum amount of the required storage would be 1 MiB (8 Mbit) to fit a user friendly bootloader with some advanced features. The prices of suitable SPI NOR flash chips seem to be around 10-20 cents on AliExpress. This is non-negligible, but might be still worth it at least to avoid the frustrated "I plugged the power but there is nothing on the monitor" support requests from inexperienced users.

The SPI flash chip needs to be connected to SPI0 pins (port C), which are multiplexed with NAND. The table below lists the exact pins for different SoC variants and some additional notes:

It looks like SPI is getting gradually phased out from the newer Allwinner SoCs. This may be a problem for providing the necessary SPI pins for the Raspberry Pi compatible expansion headers or OLIMEX UEXT connectors.

The BROM implementation details


The BROM sets up ~5.5 MHz clock frequency for SPI0 and then issues a sequence of Read Data Bytes (03h) commands. Each of these commands is encoded in 4 bytes (1 byte for the command id and 3 bytes for the address). The first command reads 256 bytes from the address 0. If a valid eGON header is recognized, then a sequence of commands reading 2048 byte blocks is done next. The first 2048 byte block is read from the address 0, the second 2048 byte block is read from the address 2048 and this continues until the whole first stage bootloader is transferred.

As an experiment, it is possible to configure SPI on one board in the slave mode, connect jumper wires and emulate the SPI flash for the BROM in another board. But the timing constraints are too tight to do a perfect emulation. A perfect emulation would need to correctly handle the Read Data Bytes command, which means that after the last bit of the address is received, the first bit of data from that address needs to be served back in the next cycle. With such a protocol, we can't benefit from any kind of receive and transmit buffering and make use of the hardware SPI controller. For the GPIO bit-banging implementation of the 5.5MHz SPI, we have around ~180 cycles per SPI bit at the 1008 MHz CPU clock frequency. This time is comparable to the DRAM access latency, so we are in a big trouble if we get any L2 cache misses (though this can be mitigated by prefetching the right cache line after receiving just enough of the address bits). Moreover, the GPIO itself is relatively slow and it takes a huge amount of CPU cycles to read/write GPIO registers. So a simplistic approach is just to use the SPI controller hardware, ignore any received commands and stream the data according to the expected pattern. That would be 4 padding bytes, then 256 initial bytes of the firmware, then 4 bytes padding again and 2048 initial bytes of the firmware, etc. And this works, see the picture on the right side :-)

Such SPI flash emulation using another board probably does not make much practical sense because it is possible to just connect a real SPI flash chip to the SPI pins. This was only a method get some information about the BROM behaviour. A more complete SPI flash emulation might be an interesting exercise to be done using the upcoming iCE40HX1K-EVB open source hardware FPGA board from OLIMEX.

The SPL
This is expected to be really trivial. The SPL can check the fact that we have booted from SPI (by looking at the byte at the offset 0x28 in the SPL header). Then setup pins muxing, initialize the SPI0 controller and read the main U-Boot binary starting from the address 32768. Doing all of this in a pretty much the same way as the BROM does. Having very small and simple code is the best, especially considering that we have pretty tight size limit for the SPL.

Taking care of the ATF and the AR100 firmware may need some additional tricks on AArch64 (packing multiple blobs in a FIT container? or use something more lightweight?). Though this is not really SPI boot specific.

Running SPI at only 5.5 MHz might be not fast enough and adding something like 0.5 second to the boot time when loading the main U-Boot binary. In order to improve boot time, probably the SPL header can be also extended to include a special optional field for the supported SPI clock speed and also the number of dummy cycles for the Read Data Bytes at Higher Speed (0Bh) command. This information can be added to the SPL header by the firmware flasher software (see the "Upgrading the SPI flash firmware" section).

The main U-Boot binary
The main U-Boot binary can get a more complete implementation for handling SPI flash, also with a full write support by making use of the driver model and the existing SPI framework. However please also see the "Security considerations" section below, because it might be unreasonable to allow accessing the SPI flash from U-Boot in the case if U-Boot runs in the non-secure mode on AArch64. Either way, the SPI flash support in the main U-Boot binary is very much optional.

Security considerations
It's a good idea to prevent unauthorized update of the firmware code (search for "BIOS trojan" keywords on google for more information on this topic). A malicious software trying to gain even more control over the system after exploiting one of the privilege escalation bugs in Linux could try a few tricks, listed in the table below.

Upgrading the SPI flash firmware
Multiple methods could be potentially used:
 * For dealing with completely bricked non-bootable boards, the most simple solution would be probably to use the sunxi-fel tool with an added feature to backup and flash the firmware.
 * For additional users convenience, it would be nice to support upgrading the firmware from the running system too.