当前位置 >>亮点文章 >>亮点文章

CU partition mode decision for HEVC hardwired intra encoder using convolution neural network

Zhenyu Liu,Member, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, Dongsheng Wang.IEEE Transactions on Image Processing (TIP),2016.

时间: 2016-11-01 点击: 256 次 返回列表

We devise the convolution neural network based fast algorithm to decrease no less than two CU partition modes in each CTU for full rate-distortion optimization (RDO) processing, thereby reducing the encoder's hardware complexity.

The intensive computation of High Efficiency VideoCoding (HEVC) engenders challenges for the hardwired encoderin terms of the hardware overhead and the power dissipation.On the other hand, the constrains in hardwired encoder designseriously degrade the efficiency of software oriented fast codingunit (CU) partition mode decision algorithms. A fast algorithmis attributed as VLSI friendly, when it possesses the followingproperties. First, the maximum complexity of encoding a codingtree unit (CTU) could be reduced. Second, the parallelism ofthe hardwired encoder should not be deteriorated. Third, theprocess engine of the fast algorithm must be of low hardware- andpower-overhead. In this paper, we devise the convolution neuralnetwork based fast algorithm to decrease no less than two CUpartition modes in each CTU for full rate-distortion optimization(RDO) processing, thereby reducing the encoder’s hardwarecomplexity. As our algorithm does not depend on the correlationsamong CU depths or spatially nearby CUs, it is friendly tothe parallel processing and does not deteriorate the rhythmof RDO pipelining. Experiments illustrated that, an averaged61.1% intraencoding time was saved, whereas the Bjøntegaard-Delta bit-rate augment is 2.67%. Capitalizing on the optimalarithmetic representation, we developed the high-speed [714 MHzin the worst conditions (125 ◦C, 0.9 V)] and low-cost (42.5kgate) accelerator for our fast algorithm by using TSMC 65-nmCMOS technology. One accelerator could support HD1080pat 55 frames/s real-time encoding. The corresponding powerdissipation was 16.2 mW at 714 MHz. Finally, our acceleratoris provided with good scalability. Four accelerators fulfill thethroughput requirements of UltraHD-4K at 55 frames/s.

上一篇:Fully Convolutional Instance-Aware Semantic Segmentation 下一篇:已经是第一篇了

用户登录

用户注册