There has thus been the need to use several criteria for encoding video sequencing to meet the requirements of the diverse systems, as shown in Figure 1 [ 2 ]. The multiple coding availed by SVC depends on the reconstruction of the lower resolution or lower quality signals from partial bit streams, as shown in Figure 2 [ 4 , 5 ].
SVC enables the efficient incorporation of three types of scalabilities, namely, spatial, quality, and temporal scalabilities. The spatial and quality scalabilities can be realized by a layered approach, wherein one base layer BL is used to encode the lowest temporal, spatial, and quality representations of the video stream, and one or more enhancement layers ELs are used to encode additional information.
Using the base layer as a starting point, temporal versions of the video or versions with higher qualities and resolutions can be reconstructed during the decoding process [ 6 ]. The block size of the SVC system can be varied to match the motion estimation that is used to reduce temporal redundancy between frames.
To ensure efficient video coding, additional processes are required, including interlayer prediction. The main objective of interlayer prediction is the reuse of the motion and texture information of the BL at the EL. Interlayer prediction can be done using any of three methods, namely, interlayer motion prediction, interlayer residue prediction, and interlayer intraprediction from a lower layer.
Whereas all three methods enable enhancement of the coding efficiency, they increase the computational complexity of video encoding, making it difficult to realize an SVC encoder in a real application [ 10 ].
Several algorithms for fast mode decision schemes have been proposed to reduce the computational complexity. Shen et al. Kim et al. Li et al. The algorithm enabled reduction of the number of candidate modes. Wang et al. Dawei et al. This was done by calculating the mean peak signal-to-noise ratio PSNR for all the frames at the temporal level for a specific mode. The present paper introduces four algorithms based on the sending of partial EL data, thus reducing the number of sending bits.
They also shorten the transmission time TT across the network and reduce the amount of required EL encoding, thereby also shortening the encoding time ET. The rest of this paper is organized as follows. Section 2 presents an overview of the different types of scalabilities and their underlying concepts. The proposed fast mode decision is introduced in Section 3 , while Section 4 describes the experiments performed using the proposed algorithms and also presents and discusses the results.
The paper is ended in Section 5 with a brief presentation of the conclusions of the study. There are three types of scalabilities in video coding, namely, spatial scalability, quality scalability, and temporal scalability. Figure 3 shows the differences among the three types of scalabilities.
Spatial scalability involves the coding of a video using multiple spatial resolutions. As shown in Figure 4 , the data decoded at lower resolutions can be used to predict the data of higher resolutions to reduce the bit rate.
Quality scalability is considered as a special case of spatial scalability because the generated stream can be used to predict and decode the video with different qualities, as shown in Figure 5 [ 17 ]. Conversely, in temporal scalability coding, structures containing bidirectional B and prediction P pictures in the BL are decoded as B and P frames, respectively.
The EL frames are predicted using the lower temporal layer frames as the references frames. SVC uses the rate-distortion optimization RDO technique to select the best coding mode for each MB, and this enables the achievement of the given bit rate with minimum distortion [ 19 , 20 ]. The ZNCC video analysis method was used in this study to detect the motion type of the video [ 21 ], whether fast motion, medium motion, or slow motion.
It was one of the important factors considered in the trade-off among the benefits of the different proposed methods. The ZNCC equation is as follows: where and , respectively, denote video frames obtained at times and , and denote the corresponding frame histograms, is the th. Frame interpolation is one of the main concepts used in the development of the algorithms proposed in this work. The two most popular methods employed in frame interpolation algorithms are the Lagrangian and Eulerian methods.
Lagrangian methods are based on motion models [ 14 , 23 ], while Eulerian methods are based on color change per pixel over time [ 24 , 25 ]. The interpolation algorithms proposed in this paper may be considered to utilize an extension of Eulerian methods [ 26 ].
Considering input frames F 1, F 2, and Fout and denoting the steerable pyramid decompositions by P 1 and P 2, the steps of the proposed interpolation algorithms are as follows: 1 Calculate , [ 27 ]. There is a relationship between the frames in the different layers. The layers are all identical with the exception of certain parameters, which depend on the type of SVC.
This similarity can be used to minimize the number of selected modes of the macroblocks. In the present study, only a limited number of macroblocks modes were selected for testing for intra- and interprediction [ 29 ], resulting in reduced encoding time.
In the case of spatial SVC, the presently proposed macroblock mode of the EL has the same mode as the corresponding macroblocks in the BL, which is selected by exhaustive search [ 30 ]. The goal of the proposed algorithms is the reduction of the processing time encoding time and transmission time.
Two concepts are employed for this purpose. The main concept involves sending only a part of the EL frame data, with the missing frame data generated at the decoder by interpolation using the information on the surrounding frames, as mentioned in Section 2.
The second concept is based on the mode-distribution correlation between the BL and EL, and this is affected by the spatial scalability, as mentioned in Section 2. Using these two concepts, the four proposed algorithms enable significant shortening of both the TT and ET. The first two steps of the four algorithms are the same. In the first step, the BL frames are encoded through an exhaustive search. In the second step, the up sample technique is used to make the BL frames the same size as the EL frames.
The subsequent steps differ among the algorithms. Outlined in Figure 7 , this algorithm utilizes the interlayer residual concept for spatial SVC. To encode the EL frames, only the odd frames of the interlayer residuals are transmitted after being encoded by exhaustive search to achieve a high quality. Everything else in this specification is normative. This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent.
In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant. The term simulcast envelope refers to the maximum number of simulcast streams and the order of the encoding parameters. The term "S mode" refers to a scalability mode in which multiple encodings are sent on the same SSRC.
The configuration of SVC-capable codecs implemented in browsers fits within this restriction. Implementations of this specification utilize RTCError and OperationError in the prescribed manner when an invalid scalabilityMode value is provided to setParameters or addTransceiver. In this situation the scalabilityMode values configured in sendEncodings may not be supported by the eventually negotiated codec.
However, an error will result only if the requested scalabilityMode value is invalid for any supported codec. To determine whether the requested scalabilityMode values have been applied, an application can call the RTCRtpSender. If the configuration is not satisfactory, the setParameters method can be used to change it.
When sendEncodings is used to request the sending of multiple simulcast streams using addTransceiver , it is not possible to configure the sending of "S" scalability modes.
As a rule these streams are basic and secondary ones. The basic stream is transferred in standard quality, while the secondary one — in the enhanced quality, for example, with higher frame rate or video resolution. SVC enables video conferencing server adjust video stream to the varying characteristics of the endpoint terminals: CPU capabilities and bandwidth.
The server sets the specific stream type for each device: endpoints with high bandwidth decode the whole stream, while endpoints with low bandwidth or device mobile phones and tablets receive only the basic stream with lower data transfer rate. The image and video coding group contributed the following techniques to the SVC design and encoding concepts: The first SVC Model that became the first Working Draft Hierarchical prediction structures for providing temporal scalability, generally improving the coding efficiency, and improving the effectiveness for inter-layer prediction tools in spatial and quality scalable coding Inter-layer prediction tools for spatial and quality scalable coding The concept of decoding enhancement layers with a single motion compensation loop as part of the inter-layer prediction design The key picture concept for efficiently controlling the drift in packet-based quality scalable coding The concept of transform coefficient partitioning for increasing the granularity of packet-based quality scalable coding An rd-optimized multi-layer encoder control for improving the coding efficiency of SVC encoders.
Schwarz, D. Marpe, and T. Schwarz and M.
0コメント