GSoC 2018 - rav1e

This page summarizes my contributions to the rav1e project as part of Google Summer of Code 2018.
Organization: VideoLAN
Mentor: Nathan Egge

Proposal

The project information and original proposal description are available on the GSoC website.

Project

rav1e is an experimental AV1 encoder programmed in Rust. Its minimum viable product (MVP) version is planned for delivery in early Autumn 2018. As of August 2018, it implements a minimal subset of features, including intra frames and inter frames without motion compensation, square blocks up to 32x32 pixels, and a subset of prediction modes and transform types.

Contributions

My main contributions included expanding the rate-distortion optimization (RDO) loop to select appropriate partition sizes, prediction modes and transform types. In total, my contributions have enhanced the Bjøntegaard-Delta rate values gathered in quality tests by an estimated 15–20%, though the targets shift as additional features are implemented in the encoder. Merged commits that I authored during the GSoC period are available here.

Two partitioning strategies were implemented: a traditional strategy involving breaking down each partition to its smallest possible size before coding each one and recursively evaluating larger partitions and selecting the optimal layouts, as well as a faster method that begins with large partitions sizes and break them down if and only if the smaller partitions are more efficient.

Prediction modes and transform types are tested exhaustively. My contributions in these areas included implementing the following prediction modes natively:

As well as linking to the reference implementations of the following transform types:

In addition, I implemented unit tests for the prediction modes implemented, as well as comparative tests used to compare the performance of the native algorithms with that of the reference encoder's implementations.

Finally, I have performed maintenance and support work on the project, including cleaning up outdated issues, contributing to documentation, performing code reviews, assisting users with technical issues, implementing support for building the project on Windows, implementing high bit depth video processing, as well as cleaning up and refactoring project code.

Other Initiatives

During four weeks, I worked on implementing an experimental neural network-based system trained using the reference encoder that was aimed to predict rate and distortion measures used to speed up the decision-making process as part of RDO.

Unfortunately, lack of information on the network's architecture, training and implementation caused difficulties in integrating it in rav1e at such an early stage. In addition, the network architecture and coefficients changed multiple times in the course of this work, indicating that the reference implementation remained in an experimental stage and may not have been expected to provide high-quality results.

The first working reimplementation of the system generated exceedingly poor decisions. As the reference encoder did not make use of this machine learning method by default, it was not possible to compare the results obtained using the reimplemented system. As such, the project was placed on indefinite hold.

This work is available here.

Subsequent Contributions

I have continued to contribute to the project on a volunteer basis after the GSoC period. Merged commits that I authored since the program completion are available here. I have also begun contributing to other VideoLAN projects, for which my activity is visible here.