Deep Residual Learning in the JPEG Transform Domain¶

These notes cover in detail the development of algorithms required for implementing the ResNet CNN architecture in the JPEG transform domain. This means that the new architecture can perform inference and learning directly on JPEG compressed images. They do not need to be decompressed being being fed into the network.

Although this allows time to be saved up front by allowing the decompression process to be skipped, that is not considered to be the main contribution of the work. JPEG files are highly sparse, and CNNs are mostly performing adds and multiplies. This means that many such operations on a JPEG should be noop, greatly speeding up the entire network processing. Furthermore, sparse data can be stored in a much smaller space than dense data, so this should permit larger batch sizes and therefore more accurate gradients, increasing the accuracy of the network. Finally, JPEG is by far the most popular image file compression scheme due to it's high compression ratio, so this method should be able to find wide applicability. For example, the ImageNet data set and challenge consists entirely of JPEG images.

Contributions¶

A general method for CNN processing in the JPEG transform domain
A model conversion algorithm for pre-trained spatial domain models
Approximated Spatial Masking: An accurate approximation algorithm for computing piecewise linear functions on DCT coefficients (see "Approximated Spatial Masks and ReLu")
Half-Spatial Masking: A highly efficient algorithm for applying spatial domain masks to DCT coefficients (see "Approximated Spatial Masks and ReLu")
The DCT Mean-Variance Theorem (see "Batch Normalization")

Notes¶

The notes are separated by topic and the individual components of ResNet are developed in isolation.

Background

Understanding the DCT

Convolutions

Tensor Methods for Linear Pixel Manipulation

Nonlinearity and Utilities

End-to-End

Appendix

Proofs

Code¶

https://gitlab.com/Queuecumber/jpeg-domain-resnet

Paper¶

Ehrlich, Max, and Larry S. Davis. "Deep residual learning in the JPEG transform domain." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3484-3493. 2019.

http://openaccess.thecvf.com/content_ICCV_2019/papers/Ehrlich_Deep_Residual_Learning_in_the_JPEG_Transform_Domain_ICCV_2019_paper.pdf