A10 DRAM Controller Performance

Tests with the lima-memspeed program on a Cubietruck with a 1920x1080-32@60Hz monitor
The lima-memspeed program is a tool, which tries to simulate different memory intensive workloads and measure how much of the memory bandwidth is really available to be consumed by the CPU, GPU and other peripherals. Basically, it should provide an answer to the question about the optimal relationship between the MBUS and DRAM clock frequencies and whether increasing the DRAM clock speed provides any practical improvements. The systems with a 32-bit dram bus width can't be analyzed well by just the tinymembench program, because tinymembench only focuses on the bandwidth available to a single CPU core. And the CPU alone can't consume all the DRAM bandwidth and needs help from the other peripherals and other CPU cores.

The systems with a 32-bit memory interface want to have both MBUS and DRAM clocked at high speed. The Allwinner A20 systems with a 16-bit memory interface (such as A20-OLinuXino-LIME) should not have any obvious extra bandwidth problems if MBUS is clocked slower than DRAM.

Tests with the tinymembench program on a A13-OLinuXino-Micro and screen blanked
The Allwinner A13 user manual tells us about the 300MHz clock speed limit for MBUS. And indeed, when having only 16-bit external DDR3 memory interface to deal with, clocking MBUS at a very high speed may be unnecessary (assuming that MBUS internally has the same width as in the A10/A20 siblings). So it is quite interesting to check if running MBUS at half-speed of DRAM is fast enough for A13.

Benchmarks have been run on Olimex_A13-OLinuXino-Micro for different MBUS/DRAM clock settings. The CPU clock speed was 1008MHz, AXI clock speed 504MHz (overclocked). The screen was blanked in order not to drain memory bandwidth. The performance numbers are obtained using a tinymembench tool for the 'NEON read prefetched (64 bytes step)', 'NEON fill', 'NEON copy prefetched (64 bytes step)' subtests. The DRAM timings are accurately calculated for each clock frequency, assuming JEDEC Speed Bin 1333H (DDR3 1333 9-9-9).

Please note that the DRAM clock speeds above 533 MHz (and MBUS above 300 MHz) may be considered overclocking if we trust the Allwinner manuals!

The non-cached read latency numbers from the table above should have no TLB misses and exactly one DRAM access per read. They seem to fit the (12 * mbus_cycle_time + 95 ns) formula quite nicely. It might be that MBUS contributes 12 its cycles to the memory access latency.