Chroma From Luma Intra Prediction for NETVCMozilla331 E Evelyn AveMountain View94041USAnegge@mozilla.comMozilla331 E Evelyn AveMountain View94041USAluc@trud.caXiph.Org Foundation21 College Hill RoadSomerville, MA1124USAb@rr-dav.id.au
Applications and Real-Time Area (art)
NETVC Working GroupInternet-DraftChroma from luma (CfL) prediction is a new and promising chroma-only intra
predictor that models chroma pixels as a linear function of the coincident
reconstructed luma pixels. In this document, we propose the CfL predictor
adopted in Alliance Video 1 (AV1) to the NETVC working group. The proposed CfL
distinguishes itself from prior art not only by reducing decoder complexity,
but also by producing more accurate predictions. On average, CfL reduces the
BD-rate, when measured with CIEDE2000, by 5% for still images and 2% for video
sequences.Still image and video compression is typically not performed using red, green,
and blue (RGB) color primaries, but rather with a color space that separates
luma from chroma. There are many reasons for this, notably that luma and chroma
are less correlated than RGB, which favors compression; and also that the human
visual system is less sensitive to chroma allowing one to reduce the resolution
in the chromatic planes, a technique know as chroma subsampling .Another way to improve compression in still images and videos is to subtract a
predictor from the pixels. When this predictor is derived from previously
reconstructed information inside the current frame, it is referred to as an
intra prediction tool. In contrast, an inter prediction tool uses information
from previously reconstructed frames. For example, “DC” prediction is an intra
prediction tool that predicts the pixels values in a block by averaging the
values of neighboring pixels adjacent to the above and left borders of the
block .Chroma from luma (CfL) prediction is a new and promising chroma-only intra
predictor that models chroma pixels as a linear function of the coincident
reconstructed luma pixels . It was proposed for the HEVC video coding
standard , but was ultimately rejected, as the decoder model fitting
caused a considerable complexity increase.More recently, CfL prediction was implemented in the Thor
codec as well as in the Daala codec . The inherent
conceptual differences in the Daala codec, when compared to HEVC, led to
multiple innovative contributions by Egge and Valin to CfL
prediction. Most notably a frequency domain implementation and the absence of
decoder model fitting.As both Thor and Daala are part of NETVC working group, a research initiative
was established regarding CfL, the results of which are presented in this
draft. The proposed CfL implementation not only builds on the innovations
of , but does so in a way that is compatible with the more
conventional compression tools found in Alliance Video 1 (AV1). The following
table details the key differences between LM Mode , Thor CfL
, and Daala CfL (the previous version of this draft):LM ModeThor CfLDaala CfLAV1 CfLPrediction DomainSpatialSpatialFrequencySpatialBitsream SignalingNoNoSign bitSignsPVQ Gain+ IndexRequires PVQNoNoYesNoEncoder Model FittingYesYesVia PVQSearchDecoder Model FittingYesYesNoNoThis new
implementation is considerably different from its predecessors. Its key
contributions are:Parameter signaling, which avoids model fitting on the decoder and,
as explained in , results in more precise
predictions, as the chroma reference pixels are used for fitting
(which is impossible when fitting on the decoder). The actual
signaling is described in .Model fitting the “AC” contribution of the reconstructed luma
pixels, as shown in , which simplifies the model and
allows for a more precise fit.Chroma “DC” prediction for “DC” contribution, which requires no
signaling and, as described in , is more precise.Finally, presents detailed results of the compression
gains of the proposed CfL prediction implementation in AV1.As described in , CfL prediction models chroma pixels as a linear
function of the coincident reconstructed luma pixels. More precisely, Let L
be an M X N matrix of pixels in the luma plane; we define C to be the
chroma pixels spatially coincident to L. Since L is not available to the
decoder, the reconstructed luma pixels, L^r, corresponding to L are
used instead. The chroma pixel prediction, C^p, produced by CfL uses the
following linear equation:Some implementations of CfL , and
determine the linear model parameters alpha and beta using linear least-squares
regressionWe classify , , and as implicit
implementations of CfL, since alpha and beta are not signaled in the bitstream,
but are implied from the bitstream. The main advantage of the implicit
implementation is the absence of signaling.However, implicit implementations have numerous disadvantages. As mentioned
before, computing least squares considerably increases decoder complexity.
Another important disadvantage is that the chroma pixels, C, are not available
when computing least squares on the decoder. As such, prediction error
increases since neighboring reconstructed chroma pixels must be used instead.In , the authors argue that the advantages of explicit signaling
considerably outweigh the signaling cost. Based on these findings, we propose a
hybrid approach that signals alpha and implies beta.In , Egge and Valin demonstrate the merits of separating the “DC”
and “AC” contributions of the frequency domain CfL prediction. In the
pixel domain, the “AC” contribution of a block can be obtained by
subtracting it by its average.An important advantage of the “AC” contribution is that it is zero mean,
which results in significant simplifications to the least squares model
parameter equations. More precisely, let L^AC$ be the zero-meaned
reconstructed luma pixels. Becausesubstituting L^r by L_AC yields the following simplified model parameters
equations:We define the zero-mean chroma prediction, C_AC, like soWhen computing the zero-mean reconstructed pixels, the resulting values are
stored using 1/8th precision fixed-point values. This ensures that even with
12-bit integer pixels, the average can be stored in a 16-bit signed integer.By combining the luma subsampling step with the average subtraction step not
only do the equations simplify, but the subsampling divisions and the
corresponding rounding error are removed. The equation corresponding to the
combination of both steps simplifies to:Note that this equation uses an integer division.In the previous equation, sx and sy are the subsampling steps for the x and y
axes, respectively. The proposed CfL only supports 4:2:0, 4:2:2, 4:4:0 and
4:4:4 chroma subsamplings , for which:Also, because both M and N are powers of two, M * N is also a power of two. It
follows that the previous integer divisions can be replaced by bit shift
operations.Switching the linear model to use zero mean reconstructed luma pixels also
changes beta_AC, to the extent that it now only depends on C. More precisely,
beta_AC is the average of the chroma pixels.The chroma pixel average for a given block is not available in the decoder.
However, there already exists an intra prediction tool that predicts this
average. When applied to the chroma plane, the “DC” prediction predicts the
pixel values in a block by averaging the values of neighboring pixels adjacent
to the above and left borders of the block .Concretely, the output of the chroma “DC” predictor can be injected inside the
proposed CfL implementation as an approximation for beta_AC.The proposed CfL prediction is expressed as follows:Signaling the scaling parameters allows encoder-only fitting of the linear
model. This reduces decoder complexity and results in a more precise
prediction, as the best scaling parameter can be determined based on the
reference chroma pixels which are only available to the encoder. The scaling
parameters for both chromatic planes are jointly coded using the following
scheme.First, we signal the joint sign of both scaling parameters. A sign is either
negative, zero, or positive. In the proposed scheme, signaling (zero, zero) is
not permitted as it results in “DC” prediction. It follows that the joint sign
requires an eight-value symbol.As for each scaling parameter, a 16-value symbol is used to represent values
ranging from 0 to 2 with a step of 1/8th. The entropy coding details are beyond
the scope of this document; however, it is important to note that a 16-value
symbol fully utilizes the capabilities of the multi-symbol entropy
encoder . Finally, scaling parameters are signaled only if they are
non-zero.Signaling the scaling parameters fundamentally changes their selection. In
this context, the least-squares regression used in , , and
does not yield an RD-optimal solution as it ignores the
trade-off between the rate and the distortion of the scaling parameters.For the proposed CfL prediction, the scaling parameter is determined using the
same rate-distortion optimization mechanics as other coding tools and
parameters of AV1. Concretely, given a set of scaling parameters A, the
selected scaling parameter is the one that minimizes the trade-off between the
rate and the distortionIn the previous equation, the distortion, D, is the sum of the squared error
between the reconstructed chroma pixels and the reference chroma pixels.
Whereas, the rate, R, is the number of bits required to encode the scaling
parameter and the residual coefficients. Furthermore, lambda is the weighing
coefficient between rate and distortion used by AV1.To ensure a valid evaluation of coding efficiency gains, our testing
methodology conforms to that of . All simulation parameters and a
detailed sequence-by-sequence breakdown for all the results presented in this
paper are available online at . Furthermore, the bitstreams generated
in these simulations can be retrieved and analyzed online at .The following tables show the average percent rate difference measured using
the Bjontegaard rate difference, also known as BD-rate . The
BD-rate is measured using the following objective metrics: PSNR,
PSNR-HVS , SSIM , CIEDE2000 and
MSSIM . Of all the previous metrics, only the CIEDE2000 considers
both luma and chroma planes. It is also important to note that the distance
measured by this metric is perceptually uniform .As required in , for individual feature changes in libaom, we use
quantizers: 20, 32, 43, and 55. We present results for three test sets:
Objective-1-fast , Subset1 and Twitch .In the following table, we present the results for the Subset1 test
set. This test set contains still images, which are ideal to
evaluate the chroma intra prediction gains of CfL when compared to other intra
prediction tools in AV1.PSNRPSNR CbPSNR CrPSNR HVSSSIMMS SSIMCIEDE 2000Average-0.53-12.87-10.75-0.31-0.34-0.34-4.87For still images, when compared to all of the other intra prediction tools of
AV1 combined, CfL prediction reduces the rate by an average of 5% for the same
level of visual quality measured by CIEDE2000.For video sequences, next table breaks down the results obtained over the
objective-1-fast test set .PSNRPSNR CbPSNR CrPSNR HVSSSIMMS SSIMCIEDE 2000Average-0.43-5.85-5.51-0.42-0.38-0.40-2.411080p-0.32-6.80-5.31-0.37-0.28-0.31-2.521080psc-1.82-17.76-12.00-1.72-1.71-1.75-8.22360p-0.15-2.17-6.45-0.05-0.10-0.04-0.80720p-0.12-1.08-1.23-0.11-0.07-0.12-0.52Not only does CfL yield better intra frames, which produces a better reference
for inter prediction tools, but it also improves chroma intra prediction in
inter frames. We observed CfL predictions in inter frames when the predicted
content was not available in the reference frames. As such, CfL prediction
reduces the rate of video sequences by an average of 2% for the same level of
visual quality when measured with CIEDE2000.The average rate reductions for 1080psc are considerably higher than those of
other types of content. This indicates that CfL prediction considerably
outperforms other AV1 predictors for screen content coding. As shown in the
following table, the results on the Twitch test set , which
contains only gaming-based screen content, corroborates this finding.PSNRPSNR CbPSNR CrPSNR HVSSSIMMS SSIMCIEDE 2000Average-1.01-15.58-9.96-0.93-0.90-0.81-5.74Furthermore, individual sequences in the Twitch test set show considerable
gains. We present the results for Minecraft_10_120f (Mine), GTAV_0_120F (GTAV),
and Starcraft_10_120f (Star) in the following table. It would appear that CfL
prediction is particularly efficient for sequences of the game Minecraft both
sequences reduces the average rate by 20% for the same level of visual quality
measured by CIEDE2000.PSNRPSNR CbPSNR CrPSNR HVSSSIMMS SSIMCIEDE 2000Mine-3.76-31.44-25.54-3.13-3.68-3.28-20.69GTAV-1.11-15.39-5.57-1.11-1.01-1.04-5.88Star-1.41-6.18-6.21-1.43-1.38-1.43-4.15In this document, we presented the chroma from luma prediction tool adopted in
AV1 that we proposed for NETVC. This new implementation is considerably
different from its predecessors. Its key contributions are: parameter
signaling, model fitting the “AC” contribution of the reconstructed luma
pixels, and chroma “DC” prediction for “DC” contribution. Not only do these
contributions reduce decoder complexity, but they also reduce prediction error;
resulting in a 5% average reduction in BD-rate, when measured with CIEDE2000,
for still images, and 2% for video sequences.Possible improvements to CfL for AV2 include non-linear prediction models and
motion-compensated CfL.Video Processing and CommunicationsFundamentals of MultimediaNew intra chroma prediction using inter-channel correlationCE6.a.4: Chroma intra prediction by reconstructed luma samplesPredicting chroma from luma with frequency domain intra predictionImproved chroma predictionDaala: Building A Next-Generation Video Codec From Unconventional TechnologyAV1 Bitstream AnalyzerVideo Codec Testing and Quality MeasurementAre We Compressed Yet?Calculation of average PSNR differences between RD-curvesTwo new full-reference quality metrics based on HVSImage Quality Assessment: From Error Visibility to Structural SimilarityMultiscale structural similarity for image quality assessmentColor Image Quality Assessment Based on CIEDE2000Test SetsResults of Chroma from Luma over the twitch test setResults of Chroma from Luma over the Subset1 test setResults of Chroma from Luma over the Objective-1-fast test set