Bootable SPI flash

Introduction
All currently known Allwinner SoCs can boot from SPI flash, which usually has the lowest boot priority and is probed only after all the other options fail (SD card, NAND and eMMC).

Information for devboard designers
The SPI flash can be used to store a bootable firmware on the low cost development boards, which do not offer any other kind of non-removable storage (NAND or eMMC).

The minimum amount of the required storage would be 1 MiB (8 Mbit) to fit a user friendly bootloader with some advanced features. The prices of suitable SPI NOR flash chips seem to be around 10-20 cents on AliExpress (or even as low as only 4 cents?). This is non-negligible, but might be still worth it at least to avoid the frustrated "I plugged the power but there is nothing on the monitor" support requests from inexperienced users.

There is no point using SPI NOR flash chips larger than 16 MiB (128 Mbit). The primary use for this additional space would be a storage of some size reduced Linux kernel together with a small Buildroot generated compressed initrd image. And if no operating system is found on the SD card, then this built-in kernel+initrd can be booted instead. The whole point of having a kernel+initrd bundle in addition to just a bootloader is that developing the initrd is relatively easy, because it can be implemented using scripting languages and rely on existing GUI toolkits (Qt, FLTK, ...) or offer a text based user interface (via dialog or something similar). As for the provided functionality, it can do some hardware diagnostic self-tests and even update itself over Internet. In order to have a realistic size estimate, we can look at the kernel+intrd used for lima-memtester and see that it's size is less than 7 MiB, which would fit even into a 8 MiB (64 Mbit) SPI NOR flash chip. So it's a tight fit with 8 MiB (64 Mbit) and a lot of headroom with 16 MiB (128 Mbit). And an additional factor to consider is that programming NOR flash is slow and maxes at around ~200 KiB/s, so having smaller firmware reduces the time needed for flashing.

The SPI flash chip needs to be connected to SPI0 pins (port C), which are multiplexed with NAND. The table below lists the exact pins for different SoC variants and some additional notes:

It looks like SPI is getting gradually phased out from the newer Allwinner SoCs. This may be a problem for providing the necessary SPI pins for the Raspberry Pi compatible expansion headers or OLIMEX UEXT connectors.

The BROM implementation details


The BROM sets up 6 MHz (OSC24M with divisor 4?) clock frequency for SPI0 and then issues a sequence of Read Data Bytes (03h) commands. Each of these commands is encoded in 4 bytes (1 byte for the command id and 3 bytes for the address). The first command reads 256 bytes from the address 0. If a valid eGON header is recognized, then a sequence of commands reading 2048 byte blocks is done next. The first 2048 byte block is read from the address 0, the second 2048 byte block is read from the address 2048 and this continues until the whole first stage bootloader is transferred.

As an experiment, it is possible to configure SPI on one board in the slave mode, connect jumper wires and emulate the SPI flash for the BROM in another board. But the timing constraints are too tight to do a perfect emulation. A perfect emulation would need to correctly handle the Read Data Bytes command, which means that after the last bit of the address is received, the first bit of data from that address needs to be served back in the next SPI cycle. With such a protocol, we can't benefit from any kind of receive and transmit buffering and make use of the hardware SPI controller. For the GPIO bit-banging implementation of the 6 MHz SPI, we have around ~170 cycles per SPI bit at the 1008 MHz CPU clock frequency. This time is comparable to the DRAM access latency, so we are in a big trouble if we get any L2 cache misses (though this can be mitigated by prefetching the right cache line after receiving just enough of the address bits). Moreover, the GPIO itself is relatively slow and it takes a huge amount of CPU cycles to read/write GPIO registers. So a simplistic approach is just to use the SPI controller hardware, ignore any received commands and stream the data according to the expected pattern. That would be 4 padding bytes, then 256 initial bytes of the firmware, then 4 bytes padding again and 2048 initial bytes of the firmware, etc. And this works, see the picture on the right side :-)

Such SPI flash emulation using another board probably does not make much practical sense because it is possible to just connect a real SPI flash chip to the SPI pins. This was only a method get some information about the BROM behaviour. A more complete SPI flash emulation might be an interesting exercise to be done using the upcoming iCE40HX1K-EVB open source hardware FPGA board from OLIMEX.

Software development and trying something here and now


While there are no known boards with built-in SPI flash, it is still possible to arrange some test setup. The SPI0 pins are often available on the expansion headers. The picture on the right side shows how the SPI0_CS,SPI0_CLK,SPI0_MOSI,SPI0_MISO,3V3,GND pins on the Xunlong Orange Pi PC expansion header are connected to the CS,CLK,DI,DO,VCC,GND pins on the W25Q SPI flash module. The wires are entangled and tied in a knot, with the SPI flash module being more or less fixated in place and sticking upwards. Not a very aesthetically pleasing sight, but it works fine for testing the software.

The availability of the SPI0 (port C) pins on the expansion headers is both a blessing and a curse for the existing development boards, such as Xunlong Orange Pi PC and Pine64:
 * On one hand, we can connect an SPI flash module to these pins using jumper wires and use this for prototyping and debugging code. Also some vendor may start making and selling nice factory-made SPI flash shields for the Raspberry Pi compatible expansion header.
 * On the other hand, adding on-board SPI flash in the next revision of the same development board may be problematic because this can break compatibility with the existing shields and the software already developed for them. The device tree does not seem to be good enough (please correct this statement if it is wrong) and has no concept of describing standard GPIO expansion headers with full flexibility of transparently remapping pins in different board revisions. For example, replacing SPI0 pins with SPI1 pins on the 40pin Raspberry Pi compatible expansion header and describing this change only in one place in the board-specific DTS file (so that no changes are necessary in any shield-specific device tree overlays or in the userland software).

The SPL
This is expected to be really trivial. The SPL can check the fact that we have booted from SPI (by looking at the byte at the offset 0x28 in the SPL header). Then setup pins muxing, initialize the SPI0 controller and read the main U-Boot binary starting from the address 32768. Doing all of this in a pretty much the same way as the BROM does. Having very small and simple code is the best, especially considering that we have pretty tight size limit for the SPL.

Taking care of the ATF and the AR100 firmware may need some additional tricks on AArch64 (packing multiple blobs in a FIT container? or use something more lightweight?). Though this is not really SPI boot specific.

Running SPI at only 6 MHz might be not fast enough and adding something like ~0.5 second to the boot time (needed to transfer ~500KB of the main U-Boot binary). In order to improve boot time a little bit, probably the SPL header can be extended to include a special optional field for the maximum supported SPI clock speed and also the number of dummy cycles for the Read Data Bytes at Higher Speed (0Bh) command. This information can be added to the SPL header by the firmware flasher software (see the "Upgrading the SPI flash firmware" section).

The main U-Boot binary
The main U-Boot binary can get a more complete implementation for handling SPI flash, also with a full write support by making use of the driver model and the existing SPI framework. However please also see the "Security considerations" section below, because it might be unreasonable to allow accessing the SPI flash from U-Boot in the case if U-Boot runs in the non-secure mode on AArch64. Either way, the SPI flash support in the main U-Boot binary is very much optional.

Reliability considerations
The SPI NOR flash chips are typically rated for 100000 erase cycles and 20 years of data retention. Also they have much lower data density than NAND and are sold without bad blocks. Still it is an open question whether any bad blocks may appear over time on some fraction of devices (if anyone has any relevant references, please add them here).

As a way to mitigate the risks, it may be possible to write a small bootable stub into the first 4096 byte sector on the SPI flash. The smallest possible size reduces the chances of it being damaged, also it does not need to be updated nearly as frequently as U-Boot. Right after this stub, there can be two copies of the regular U-Boot SPL. The bootable stub can then do the checksum verification and pick a non-damaged SPL copy (if one of them goes bad). As an additional bonus, such stub can support 40 KiB size for the SPL, thus overcoming the BROM limitation. As for the main U-Boot binary, storing two copies would waste too much space. But having CRC32 protected data blocks and an extra parity block can make it more damage resistant.

A rather old, but interesting post in the linux-mtd mailing list explained how the NOR flash wears out. Presumably it takes time for the unreliable bit to flip from 0 to 1, so this has some implications on the verification stage after the firmware had been programmed (do we need an extra delay there?).

Security considerations
It's a good idea to prevent unauthorized update of the firmware code (search for "BIOS trojan" keywords on google for more information on this topic). A malicious software trying to gain even more control over the system after exploiting one of the privilege escalation bugs in Linux could try a few tricks, listed in the table below.

Upgrading the SPI flash firmware
Multiple methods could be potentially used:
 * For dealing with completely bricked non-bootable boards, the most simple solution would be probably to use the sunxi-fel tool with an added feature to backup and flash the firmware.
 * For additional users convenience, it would be nice to support upgrading the firmware from the running system too.

Using the sunxi-fel tool
A preliminary experimental wip branch with SPI flash read/write support on Allwinner A13/A64/H3 is here: https://github.com/ssvb/sunxi-tools/commits/20160523-spiflash-wip

The list of known SPI flash chips
The Read JEDEC ID (9Fh) command is supposed to be around since 2003. The Read SFDP command is relatively new and is documented in the JEDEC standard JESD216, published on 2011. The updated JESD216B standard from 2013 also describes how to use capacities larger than 128 Mbit in a generic way (such capacities exceed the legacy 24-bit addressing mode and can't be used with the old commands).

Winbond 25Q128FVSG


This is a 128 Mbit chip. Supports the JEDEC ID (9Fh) command and returns the 0xEF4018 id. Also supports the Read SFDP (0x5A) command and returns: 00000000: 53 46 44 50 00 01 00 ff 00 00 01 09 80 00 00 ff SFDP............ 00000010: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000040: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000060: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000080: e5 20 f1 ff ff ff ff 07 44 eb 08 6b 08 3b 42 bb. ......D..k.;B. 00000090: fe ff ff ff ff ff 00 00 ff ff 21 eb 0c 20 0f 52 ..........!.. .R 000000a0: 10 d8 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ................ 000000b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ It is conforming to the old JESD216 standard. This old standard can only specify write granularity as either 1 byte or 64 bytes, while we want to use full 256 bytes page size for better performance. So it makes sense to identify this chip using the JEDEC ID (9Fh) command and use the retrieved id for a table lookup.

Based on the information from the datasheet, the Typical Page Program Time (256 bytes) is 0.7 ms and the Typical Block Erase Time (64 KiB) is 150 ms. Simple calculations show that the expected flashing speed would be

1 / (0.7 ms / 256 bytes + 150 ms / 65536 bytes) = ~199 kB/s.

Note that unlike the erasing/programming operation, reading speed is very fast for the NOR flash and is only limited by the SPI interface.

25Q128FV


This is a noname 128 Mbit chip. Supports the JEDEC ID (9Fh) command and returns the 0xEF4018 id (the same as the Winbond 25Q128FVSG). Does not support the Read SFDP (0x5A) command, but other than this seems to be pretty much compatible.