CedarX/Reverse Engineering

= Initial information =

CedarX are hardware accelerator for media files. Cedar x constain sub-engines that used for each meda type class


 * MPEG Engine
 * MPEG1
 * MPEG2
 * MPEG4
 * MS-MPEG
 * VP6
 * MJPEG (JPEG)
 * XDIV/DIVX (mpeg 3.11)
 * H264 Engine
 * VC1 Engine
 * RMVB Engine
 * ISP Engine
 * AVC Encoder Engine
 * ISP Engine
 * AVC Encoder Engine
 * AVC Encoder Engine
 * AVC Encoder Engine

On June 15 2012 Iain Bullard started reverse engineering the proprietary libraries.

Unfortunately driver are proxy that maps HW Regs to Userspace using mmap all stuff constain in libvecore.so


 * open_cdxalloc as an free reimplementation of Allwinner's libcederxalloc.a.
 * CedarXWrapper as a LD_PRELOADed wrapper to help understanding the proprietary libraries.
 * CedarXPlayerTest as a basic player to use when testing.
 * ReCedro gitorious, has similar tools as those from IanB above, but with a different angle, works really well.

= Object file observations = While android and linux are different beasts from the userspace sense, It could be that the code was written in such way, that it could compile to both targets. Meaning that object files could be similar enough.

From android
The android-linux libvecore.a (md5sum 1c347a9ad3072ce3288bd6dba625b2a4) static lib contains the following files: android functions

From linux-armhf
The linux-armhf libvecore.so (md5sum a026d27307e5204db191878651cc6394) shared object contains the following functions: linux-armhf functions The rest of the bits are all open source, see the linux-sunxi github. The exception is libcedarxalloc.a, but as mentioned above, we have open_cdxalloc.

Function references
So far the following references can easily be observed with readelf -W -s. This is just an indication of some functions, by far complete as it would take way to long and is not really needed.

FFmpeg huffman tree builder: ff_huff_build_tree http://ffmpeg.org/doxygen/trunk/huffman_8c.html

libjpeg: get_soi http://sourceforge.net/p/libjpeg-turbo/code/HEAD/tree/trunk/jdmarker.c

libvp62: VP62_InitCoeffScaleFactors http://en.verysource.com/code/5378534_1/libvp62.h.html

H264/AVC Reference encoder/decoder: remove_frame_from_dpb http://iphome.hhi.de/suehring/tml/doc/lenc/html/mbuffer_8c.html#901bd781eb9aef8b79e98b8e10fbc2aa

VC1 Reference decoder: vc1_eResult vc1DECPIC_UnpackInterlaceMVModeParams http://wiki.multimedia.cx/index.php?title=Understanding_VC-1#vc1DECPIC_UnpackInterlaceMVModeParams

MPEG2: There seems to have happened some function renaming etc, the one that google found though was: ParseQuantMatrixExtension http://sources.team-mediaportal.com/svn/public/tags/Release%201.0.2/DirectShowFilters/TsReader/source/MpegPesParser.cpp = Memory Buffers =

CedarX requset lineral memory for decoding process.

MPEG-Engine Used Buffers
All buffers constain pre/final image in YCrCb MPEG Engine have registers for Y and C component (Cb+Cr have same size as Y)

Reconstruct buffer constain ready frame from prev step

Forward buffer constain place for new frame

BACK buffer (not used in my mp4 playback)

ROT (Rotate-Scale buffer) - used when need rotate frame before show (??)

>>>>MEM ADDR>>>> 1) (FOR)->(REC)          (ROT) 2)        (FOR)->(REC) (ROT) 3)              (FOR)->

= Driver IOCTL guide = Blob mostly use MMIO Access but CedarX should be gate-on and support PLLs should be confugired before

CORE IOCTL
IOCTL_GET_ENV_INFO = 0x101

return some configuration info, like reserved memory address for cedar

IOCTL_WAIT_VE = 0x102

IOCTL_RESET_VE = 0x103

do reset cedarx engine

IOCTL_ENABLE_VE = 0x104

start base clocks for cedarx

IOCTL_DISABLE_VE = 0x105

disable base clocks for cedarx

IOCTL_SET_VE_FREQ = 0x106

config cedarx plls

AVS2 IOCTL
IOCTL_CONFIG_AVS2 = 0x200

IOCTL_GETVALUE_AVS2 = 0x201

IOCTL_PAUSE_AVS2 = 0x202

IOCTL_START_AVS2 = 0x203

IOCTL_RESET_AVS2 = 0x204

IOCTL_ADJUST_AVS2 = 0x205

ENGINE IOCTL
IOCTL_ENGINE_REQ = 0x206

count references to cedar hardware and more important start some clocks that required for cedar init

IOCTL_ENGINE_REL = 0x207

decrement reference count

IOCTL_ENGINE_CHECK_DELAY = 0x208

IOCTL_GET_IC_VER = 0x209

IOCTL_ADJUST_AVS2_ABS = 0x20a

IOCTL_FLUSH_CACHE = 0x20b do invalidate CPU cache for internal cedar dma

= HW Registers guide = REGS_BASE = 0x01C00000 A10  IO register base addr

MACC_REGS_BASE = (REGS_BASE + 0x0E000) media accelerate VE IO space(4 kb)

Reset/Clock register
MACC_REGS_BASE + 0x00

On some cases reset logic not same with Cedar revisions

For the case of a 1625(A13)

Default: 00000007 write 0: 00000000 write~0: 1333030f

VE Ready register
MACC_REGS_BASE + 0x1c

when ready == 0

when not ready == 0x3f00

VE Revision register
MACC_REGS_BASE + 0xF2

Can be used after IOCTL sequence

Constain SoC ID - as VE version

Possible cases:

0x1625 - a13

0x1623 - a10

0x1620 - ???

0x1619 - ???

MPEG Engine
Base address

MPEG_REGS_BASE = (MACC_REGS_BASE + 0x100)

Media File Header Register(mphr)
MPEG_REGS_BASE + 0x00

Video Object Plane Header Register (vophr)
MPEG_REGS_BASE + 0x04

Video file size(fsize)
MPEG_REGS_BASE + 0x08

constain video frame Width:Height in word:word format

Frame Size Register
MPEG_REGS_BASE + 0x0c

constain video frame size for example for 320x240 media file this register must be set 0x014000f0 witch means Width(31-16 bits):Height(15-1 bits) format

0x0140 = 320 0x00f0 = 240

Macro Block Address Register(mbaddr)
MPEG_REGS_BASE + 0x10

Control register(vectrl)
MPEG_REGS_BASE + 0x14

Constain IRQ enable bit

VE Trigger Register(vetrigger)(??)
MPEG_REGS_BASE + 0x18

Status register (vestat)
MPEG_REGS_BASE + 0x1c

Busy statuses 14 bit(not sure) - mc free (may be Macrocell or montion compensation)

13 bit(not sure) - Busy status

12 bit(not sure) - idct in empty (Inverse Discrete Cosine Transform)

11 bit(not sure) - iqis in empty (Inverse Quantization and Inverse Scan)

VE ??(trbtrdfld)(??)
Distance in time to last B or P frame

TRB =display_time(B)-display_time(I)

TRD =display_time(P)-display_time(I)

MPEG_REGS_BASE + 0x20

VE ??(trbtrdfrm)(??)
MPEG_REGS_BASE + 0x24

Variable-Length Decoding(VLD) Block Address Register (vldbaddr)
MPEG_REGS_BASE + 0x28

Variable-Length Decoder(VLD) Block Offset Register(vldboffset)
MPEG_REGS_BASE + 0x2c

Variable-Length Decoding(VLD) Length Register(vldlen)
MPEG_REGS_BASE + 0x30

Video Buffer Verifier(VBV) Address Register(vbvsize)
Constain Maximum VBV buffer address

MPEG_REGS_BASE + 0x34

Variable Length Decoder(VLD) Offset Register(vldoffset) or ??
have SECOND usage

MPEG_REGS_BASE + 0x38

VLD length or ??(vldlen)(dcacaddr)(??)
have SECOND usage

MPEG_REGS_BASE + 0x3c

Block Address Register(blkaddr)(??)
MPEG_REGS_BASE + 0x40

?? Address Register(??)(ncfaddr)
MPEG_REGS_BASE + 0x44

Reconstruct Buffer Luma Address Register (rec_yframaddr)
YCbCr color space Y component buffer

Constain Prev frame for decoder work.

MPEG_REGS_BASE + 0x48

Reconstruct Buffer Croma Address Register(rec_cframaddr)
Constain Prev frame for decoder work.

C component YCbCr

MPEG_REGS_BASE + 0x4c

Forward Buffer Luma Address Register(for_yframaddr)
Space for decoding frame Y Component

MPEG_REGS_BASE + 0x50

Forward Buffer Croma Address Register(for_cframaddr)
Place for croma (C) component decoding frame

MPEG_REGS_BASE + 0x54

BACK Buffer Luma Address Register(back_yframaddr)
MPEG_REGS_BASE + 0x58

BACK Buffer Croma Address Register(back_cframaddr)
MPEG_REGS_BASE + 0x5c

?? Register(??)(socx)
MPEG_REGS_BASE + 0x60

?? Register(??)(socy)
MPEG_REGS_BASE + 0x64

?? Register(??)(sol)
MPEG_REGS_BASE + 0x68

?? Register(??)(sdlx)
MPEG_REGS_BASE + 0x6c

?? Register(??)(sdly)
MPEG_REGS_BASE + 0x70

?? Register(??)(spriteshifter)
MPEG_REGS_BASE + 0x74

?? Register(??)(sdcx)
MPEG_REGS_BASE + 0x78

?? Register(??)(sdcy)
MPEG_REGS_BASE + 0x7c

Inverse Quantization Minimum Level Register(iqminput)
iq minimum settings(video compresson level) for MPEG decoding

for MJPEG decoding, in this register before frame decoding you must load (push) IQ tabe for current frame

MPEG_REGS_BASE + 0x80

Inverse Quantization Level Register(qcinput)
iq settings(compress level)

MPEG_REGS_BASE + 0x84

MS-MPEG header(msmpeg4_pichdr)(??)
MPEG_REGS_BASE + 0x88

VP6 header(vp6_pichdr)(??)
MPEG_REGS_BASE + 0x8c

Inverse Quantization and Inverse Discrete Cosine Transform Input Register(iqidctinput)(??)
MPEG_REGS_BASE + 0x90

Macro Block Height Register(mbah)(??)
look like macro cell size reg

MPEG_REGS_BASE + 0x94

Macro Block Vector 1(mbv1)(??)
MPEG_REGS_BASE + 0x98

Macro Block Vector 2(mbv2)(??)
MPEG_REGS_BASE + 0x9c

Macro Block Vector 3(mbv3)(??)
MPEG_REGS_BASE + 0xa0

Macro Block Vector 4(mbv4)(??)
MPEG_REGS_BASE + 0xa4

Macro Block Vector 5(mbv5)(??)
MPEG_REGS_BASE + 0xa8

Macro Block Vector 6(mbv6)(??)
MPEG_REGS_BASE + 0xac

Macro Block Vector 7(mbv7)(??)
MPEG_REGS_BASE + 0xb0

Macro Block Vector 8(mbv8)(??)
MPEG_REGS_BASE + 0xb4

JPEG Decoder Control Register(jpeg_sdctl)
MPEG_REGS_BASE + 0xb8

Jpeg MCU Register (jpeg_mcu)
MPEG_REGS_BASE + 0xbc

JPEG Reset Inverse Transform Matrices Register (jpeg_resint)
MPEG_REGS_BASE + 0xc0

Error Flag Register(errflag)
MPEG_REGS_BASE + 0xc4

?? (crtmb)
MPEG_REGS_BASE + 0xc8

Rotate-Scale Buffer Luma Address Register(rotf_yfrmaddr)
Result buffer for MJPEG decoder, Luma Component

MPEG_REGS_BASE + 0xcc

Rotate-Scale Buffer Croma Address Register(rotf_cfrmaddr)
MPEG_REGS_BASE + 0xd0

Extra Functions Control Register(extra_func_ctrl)
Control rotate and etc.

MPEG_REGS_BASE + 0xd4

JPEG MCU (macrocell) Start Address Register (Jpg_start_mcuco)
MPEG_REGS_BASE + 0xd8

JPEG MCU (macrocell) End Address Register (Jpg_end_mcuco)
MPEG_REGS_BASE + 0xdc

MJPEG/JPEG Huffman Table Reset Register
Before load new huffman table for MJPEG decoding this register must be set "0" for clean old huffman table

MPEG_REGS_BASE + 0xe0

JPEG Huffman Table Load Register
MJpeg decoder push dwords to it in batch,  look like values automaticly moves to shadow registers

MPEG_REGS_BASE + 0xe4

H264 Engine
Base address

H264_REGS_BASE = (MACC_REGS_BASE + 0x200)

Interupt enable reg = H264_REGS_BASE + 0x20

bitmask 0x0111 enable/disable IRQs from H264 decoder

VC1 Engine
Base address

VC1_REGS_BASE = (MACC_REGS_BASE + 0x300)

Interupt enable reg = VC1_REGS_BASE + 0x24

RMVB Engine
Base address

RMVB_REGS_BASE = (MACC_REGS_BASE + 0x400)

Interupt enable reg = RMVB_REGS_BASE + 0x14

ISP ??
Base address

ISP_REGS_BASE = (MACC_REGS_BASE + 0xa00)

Interupt enable reg = ISP_REGS_BASE + 0x8

AVC Encoder engine
Base address

ISP_REGS_BASE = (MACC_REGS_BASE + 0xb00 )

Interupt enable reg = ISP_REGS_BASE + 0x8 = Decoding processes =

This part describe how in steps cedarx decoder must be used for each filetype

Kernel driver init procedure
This required for make CedarX hardware regs in workable state

IOCTL_ENABLE_VE -> IOCTL_SET_VE_FREQ -> IOCTL_ENGINE_REQ -> IOCTL_RESET_VE

after this step user-space lib must mmap /dev/cedar_dev and get direct access to hardware registers

and after than cedarx version show be checked by reading "VE Revision register" that constain chip version

0x1623 for a10, 0x1625 for a13

MPEG Engine reset/clock init procedure
Before use MPEG engine for MPEG/MJPEG/DIVX/MS-MPEG/VP6 files, MPEG engine should be clocked and reseted

TODO

MJPEG/JPEG Decoding process
Mjpeg are simply bunch jpeg files

!!!ALPHA VERSION!!!!

MPEG Engine can Decode JPEG

JPEG decoding process (Huffman(VLD) decode ) | (Inverse Quantization(IQ)) | (Inverse Discrete Cosine Transform(IDCT)) | (YCrCb to RGB) (disp do it???)

1) do driver IOCTL init sequence and set up "Reset/Clock register" TODO: full description

2) Reset IQ Table [MPEG_BASE] <- 0xc0 (jpeg_resint)

3) Set input format [MPEG_BASE+0x1b] <- 0x3 | (format << 3)

4) Parse from jpeg and load IQ table to MPEG_BASE+ 0x80 (IQ Min Input register) [MPEG_BASE+0x80] <- TABLE table are TWO 8x8 MATRIX first for chroma, second for luma. All 2 * 64 8bit values are written to this reg one after another (and copied to ve-sram maybe).

5) Set Result buffer (Rotate-Scale buffer regs)

Must be physical address (in reseved space) and relative to DRAM start [MPEG_BASE+0x1c] <- Chroma output buffer address [MPEG_BASE+0xd0] <- Luma output buffer address Data output is in 32x32 pixel blocks, DEFE should be able to reorder and convert this according to A13 manual.

6) Set picture size in MCUs [MPEG_BASE+0xb8] <- HEIGHT:WIDTH

Height in upper bits (31:16), width in lower (15:0) beginning with 0 for up to one MCU

7) (??) [MPEG_BASE + 0xd4] <- 0 (extra functions control register) 8) Reset huffman table

[MPEG_BASE+0xe0] <- 0 (huffman control register)

9) Parse from jpeg and load Huffman table to MPEG_BASE+ 0xe4(Huffman table register) [MPEG_BASE+0xe4] <- TABLE Cedar Huffman Tables are 2KiB of data written through this register. First half contains description of Huffman-tree, second half contains the data.

+--+--+--+--+ - - - -- - - - -- - - - -+ | LumaDC  |  LumaAC  | ChromaDC | ChromaAC | Filled with zero (maybe more trees are possible) | | 64 bytes | 64 bytes | 64 bytes | 64 bytes |                   768 bytes                     | +--+--+--+--+ - - - -+ - - - -+ - - - -+ |               Luma DC Data               |  Luma AC Data  | Chroma DC Data | Chroma AC Data | |                 256 bytes                |   256 bytes    |   256 bytes    |   256 bytes    | +---+ - - - -+ - - - -+ - - - -+

Each of the 64 byte tree-description has the following format: First 16 halfwords: first bitstream used for datacodes in corresponding depth (or 0xffff if no more data) Next 16 bytes: offset in data section for corresponding depth Rest (16 bytes): Filled with zero

The 256 byte data sections contain the codes in same format as in JPEG.

10) Set VBV (limit address) maxumum reseved this is for IRQ when we need more data than reserved in mem for new part [MPEG_BASE+0x34] <- SRC_BUFF+ SRC_MAX_BUFF_SIZE-1  usualy 0x047fffff

11) Set work mode in Control register (sure? looks more like enabling interrupt (W:not only)) [MPEG_BASE+0x14] <- 0x0000007c

12) Set SRC Buff parameters

[MPEG_BASE+0x2c] <- Offset in SRC buffer in bits (frame offset when may frames) [MPEG_BASE+0x30] <- VLD LEN in bits [MPEG_BASE+0x28] <- (SRC address relative to DRAM start) | 0x70000000  How to access ram above 256MB?

13) Start [MPEG_BASE+0x18] <- 0xe Trigger start

14) Wait IRQ (or end somehow) than check MPEG_BASE + 0x1c register for finush (1-st bit ??? unsure here/...)

MPEG4 Decoding process
!!!ALPHA VERSION!!!!

MPEG decoding request several operations:

1) Huffman decoding (VLD)(Varable Length Decoding)

2) Inverse Quantization (IQ)

3) Inverse Cosine Transform (IDCT)

4) Inverse Scan (IS)

5) ....

CedarX MPEG Engine can do it in automatic or semi-automatic mode

MPEG decoding procedure (Previous VOP)<- /             | STREAM -(DMUX) - montion -> (Motion Decoding) -> (Montion Compensation)    | \                                                               \  /                \- textures -> (VLD) -> (IS)-> (Inverse AC and DC prediction) \ | / | |/ /---/ / |                                                                             /  \-->(IQ)->(IDCT)--\                                                         / |                                                       /                    \--->(VOP Reconstruction)<>-/

Blob exported funtions description
libv_open do clock setup and initial configuration and init selected engine doing selected *_open

MJPEG Engine API
mjpeg_setup_anaglagh_transform

mjpeg_set_vbv -Video buffering verifier config (CBR/VBR select) call calbacks vbv_get_base_addr and vbv_get_size that we have in source and save infor to internal structure.

mjpeg_set_parent - save pointer to internal structure

mjpeg_set_minor_vbv - STUD

mjpeg_reset - do reset using ve_reset_hardware and internal reset function that touches clock/reset register

mjpeg_release - call callback fbm_release and ve_reset_hardware that we have in sources

mjpeg_open - init mjpeg decoder called by libve_open

mjpeg_io_control -

mjpeg_get_stream_info - use callback for memcopy

mjpeg_get_minor_fbm STUD

mjpeg_get_fbm_num return 1;

mjpeg_get_fbm - return int valure

mjpeg_flush - STUD

mjpeg_decode - general big decoding function

mjpeg_close_anaglagh_transform

mjpeg_close - do callbacks ve_reset_hardware and fbm_release that we have in source

Other findings

 * Old and new kernel drivers both offer a way to directly access registers. It looks like for quite some functions, this approach was chosen. Probably to hide the gory details in the library.