SATA

An integrated Serial ATA interface is available on Allwinner A10, A20 and R40 SoCs.

Specifications
From A10 EVB Manual :


 * Supports SATA 1.5Gb/s, and SATA 3.0Gb/s
 * Compliant with SATA Spec. 2.6, and AHCI Revision 1.3 Specifications
 * Supports industry-standard AMBA High-Performance Bus (AHB) and it is fully compliant with the AMBA Specification, Revision 2.0.
 * Supports 32-bit Little Endian
 * OOB signaling detection and generation
 * SATA 1.5Gb/s and SATA 3.0Gb/s speed negotiation when Tx OON signaling is selected
 * Supports device hot-plugging
 * Support power management features including automatic Partial to Slumber transition
 * Internal DMA Engine for Command and Data Transactions
 * Supports hardware-assisted Native Command Queuing (NCQ) for up to 32-entries
 * Support external SATA (eSATA)

The A20 user manual lists identical SATA/AHCI interface features.

Current state
SATA sequential throughput is unbalanced for unknown reasons: With appropriate cpufreq settings it's possible to get sequential read speeds of +200 MB/s while write speeds retain at approx. 45 MB/s. This is caused by wrong dma settings in the original allwinner driver, that was copied in the mainline kernel. See also relevant message in lkml: https://lkml.org/lkml/2019/5/12/84

Unlike other platforms sequential SATA transfer rates on A10/A20/R40 scale somewhat linearly with both cpufreq settings and DRAM clock. In case you use the wrong cpufreq settings it's impossible to achieve maximum SATA performance (eg. using the ondemand governor without io_is_busy setting).

On the dual-core A20 setting both CONFIG_SCHED_MC=y and CONFIG_SCHED_SMT=y at kernel compile time seems to increase SATA throughput (sequential reads +10 MB/s). Please be aware that this still needs to be confirmed.

Also worth a look are Linux' I/O schedulers. If your SATA disk is available as /dev/sda you can query /sys/block/sda/queue/scheduler to get the list of available I/O schedulers (the active printed in brackets) and change the scheduler either globally by supplying elevator=deadline to bootargs environment or on a per device basis using echo deadline >/sys/block/sdN/queue/scheduler (deadline seems to be the most performant scheduler on A10/A20)

Since irqbalancing isn't working on sunxi/ARM one way to get better SATA throughput on A20/R40 devices is to assign all AHCI/SATA IRQs away from the 1st CPU core using something like echo 2 >/proc/irq/$(awk -F":" '/ahci/ {print $1}' </proc/interrupts)/smp_affinity

Measuring performance / interpreting numbers
It should ne noted that 'passive benchmarking' especially with slow ARM devices often goes wrong ('passive' in contrast to active benchmarking where the goal is to produce insights and not just numbers). You should always ensure that you have an eye on CPU utilization (use 'htop' in another shell, run 'iostat 5' in another, check cpufreq/governor) since many storage benchmarks get bottlenecked by CPU. This is somewhat different on SoCs that are made for this purpose (eg. from Marvell, please see this thread for some numbers) but with Allwinner SoCs it's always an issue.

This also affects how to interpret results: if you take this comparison of A20 SATA performance and A64 USB/UAS performance for example then random IOPS numbers look pretty close or A64's UAS mode even seems to outperform A20's SATA implementation. But by looking at CPU utilization it's obvious that this test is tampered by CPU performance/utilization since all CPU cores run with 90% or above. Pine64 has 4 cores running at 1152 MHz while A20 was running dual-core at 960 MHz. By repeating this test with R40 (quad-core up to 1.2GHz) SATA will outperform USB for sure since the CPU bottleneck is gone. And the same reason why this synthetic benchmark shows lower numbers for A20/SATA compared to A64/USB won't affect 99.9% of real-world use cases at all: since in real-world scenarios random accesses do not happen constantly but just from time to time and then SATA should always outperform USB due to less overhead.

Port multipliers
A port multiplier allows to connect multiple SATA devices to a single SATA host port. Since sunxi devices with SATA are restricted to only one port, support for the port multiplier protocol (PMP) is a desirable feature. However, this requires suitable hardware (SATA controller) and software (AHCI driver).

PMP support - using SATA port multipliers with sunxi devices

 * A10 is frequently said not to support PMP due to hardware limitations and/or older SATA specification. But some documents (A10 EVB manual) indicate capabilities identical to the A20 (see above), and a patch submission from Hans de Goede suggests he tested PMP with both A10- and A20-based devices.
 * The A20's SATA controller is confirmed to support PMP from a variety of sources. It only supports the slower Command-based switching and not the faster FIS-based mode.
 * R40/V40 also support port multipliers (it's the same simple  flag that has to be set/removed and performance is as low as with A20).

Originally the sunxi_ahci driver derived from Allwinner sources deliberately disabled PMP by always indicating. The reason probably was that while the A20 can do PMP, enabling it breaks compatibility with single drive (non-PMP) mode - so the two are mutually exclusive. (The patch mentioned above states this is due to an inability to issue a proper soft reset to a single drive after port multiplier mode gets enabled.)

A workaround is to compile the driver as a module, and use the  option as desired at load time (/etc/modprobe.d/ahci-sunxi.conf in most distros) or adding   to kernel parameter with mainline kernel.

Caveats
If you rely exclusively on a port multiplier to access multiple drives, you're introducing a single point of failure (SPoF). Using this technology in an attempt to increase reliability (e.g. by constructing a RAID array) therefore is questionable.

Cheap port multipliers like JMB321/JMB393 are prone to overheating under load and then start to corrupt data or stop working at all. This adds significantly to the SPoF problem since if you build a RAID on top of such a port multiplier setup, the likelihood that you lose your whole array when you would need it will increase dramatically - as running a rebuild (after replacing a failed disk) will put significant stress on the system. Combining cheapest/unreliable components to increase reliability might work in some cases, but definitely not with PM based RAID.

Mechanical quality
Always keep in mind that all SATA implementations on sunxi devices rely on internal SATA connectors. Unlike eSATA these connectors are specified for only 50 matings and cheap cables/connectors die way earlier or start to corrupt data. SATA uses a relatively primitive checksum mechanism (ICRC – Interface Cyclic Redundancy Check) to detect data corruption on the wire. The corresponding S.M.A.R.T. attribute is 199 (unfortunately disk series exist where this counter does not increase when CRC errors occur – the value remains 0). If you exchanged cables/disks or notice that SATA performance dropped dramatically (due to a huge amount of data retransmits) it's always a good idea to check this attribute using smartctl (contained in the smartmontools package). If the counter increases something's wrong with the interconnection disk to SoC.