A codec (COder/DECoder) is a device and/or software program that is used typically to convert analogue information (such as speech, image, audio or video content) into a digital stream for transmission, and then back to analogue again at the receiving end. At the same time, codecs are generally designed to compress the information to minimize the amount of bandwidth (bit rate) or storage space used. This might lead to some loss of quality in the transmitted information, so the design seeks to use minimal bandwidth and/or storage whilst preserving an acceptable quality in the information conveyed.
Specification of speech and audio codecs has been one of ETSI's (and 3GPP’s) many success stories.
Our Role & Activities
EVS codec Extension for Immersive Voice and Audio Services (IVAS codec)
The introduction of 4G/5G high-speed wireless access to telecommunications networks, combined with the availability of increasingly powerful hardware platforms, will enable advanced communications and multimedia services to be deployed more quickly and easily than ever before.
The 3GPPTM Enhanced Voice Services (EVS) codec has delivered a highly significant improvement in user experience with the introduction of super-wideband (SWB) and full-band (FB) speech and audio coding, together with improved packet loss resiliency. For a truly immersive experience though, extended audio bandwidth is just one of the dimensions required. Support beyond the mono and multi-mono currently offered by EVS is ideally required to immerse the user in a convincing virtual world in a resource-efficient manner.
In addition, the currently specified stereo codecs in 3GPPTM, e.g. Enhanced aacPlus (eAAC+) and AMR-WB+ provide suitable quality and compression for stereo content in an adequate bit rate range but lack the conversational features (e.g. sufficiently low latency) needed for conversational voice and teleconferencing. These coders also lack multi-channel functionality that is necessary for immersive services, including e.g. live streaming, virtual reality (VR) and immersive teleconferencing.
The purpose of this codec is therefore to fill this technology gap and to address the increasing demand for rich multimedia services. In addition, teleconferencing applications over 4G/5G will benefit from this next generation codec used as an improved conversational coder supporting multi-stream coding (e.g. channel, object and scene-based audio). Use cases for this next generation codec include, but are not limited to, conversational voice, multi-stream teleconferencing, VR conversational and user generated live and non-live content streaming. The approach proposed is to build upon the EVS codec with the goal of developing a single codec with attractive features and performance (e.g. excellent audio quality, low delay, spatial audio coding support, appropriate range of bit rates, high-quality error resiliency, practical implementation complexity). In the scope of 3GPPTM the predominant audio rendering instrument is envisaged to be headphones but configurations with e.g. tablet speaker playback may also be of relevance.
Enhanced Voice Services (EVS) codec
The codec for Enhanced Voice Services (EVS), the successor of the current mobile high definition (HD) voice codec AMR-WB, was standardized by the 3rd Generation Partnership Project (3GPPTM) in September 2014. The EVS codec addresses 3GPP's needs for cutting-edge technology enabling operation of 3GPPTM mobile communication systems in the most competitive means in terms of communication quality and efficiency.
The EVS codec enhances coding efficiency and quality for NarrowBand (NB) and Wide Band (WB) for a large bit rate range, starting from 5.9 kbps variable bit rate (VBR). It further provides a significant step in quality over these traditional telephony bandwidths with Super Wide Band (SWB) and Full Band (FB) operation starting from 9.6 and 16.4 kbps, respectively. Maximum bit rate is 128 kbps. The ability to switch the bit rate at every 20-ms frame allows the codec to easily adapt to changes in channel capacity.
The codec features discontinuous transmission (DTX) with algorithms for voice/sound activity detection (VAD) and comfort noise generation (CNG). An error concealment mechanism mitigates the quality impact of channel errors resulting in lost packets. The codec also contains a system for jitter buffer management (JBM). Furthermore, it features a special channel-aware mode achieving increased robustness in particularly adverse channel conditions. Enhanced interoperation with AMR-WB is provided over all nine bit rates between 6.6 kbps and 23.85 kbps.
Mobile streaming audio and messaging services may contain speech only, music only, or speech mixed with music on background. For this expected mixed streaming content, the codecs described so far have difficulties in performing consistently well for both speech and music at low bit-rates (i.e. well below 32 kbit/s).
Radio resources and channel capacity set further limitations on data rates available for streaming. Streamed audio content should be made available at a low bit-rate well below 32 kbit/s, corresponding to the bit-rate range already used in the AMR-WB codec. If video is included in the content, the data rate should be as low as possible.
For these reasons, in March 2005 3GPPTM introduced (in Release 6) two new 'audio' codecs for Packet Switched Streaming Service (PSS), Multimedia Messaging Service (MMS), Multimedia Broadcast and Multicast Service (MBMS), IMS Messaging Service and Presence Service. These are the Extended AMR Wide Band (AMR-WB+) codec and the Enhanced aacPlus codec.
Enhanced aacPlus codec
The Enhanced aacPlus is an extended and improved version based on the recommended Release 5 Audio codec AAC-LC. It is optimized for high audio quality at low bitrates and is therefore well suited for services such as Packed-Switched Streaming service (PSS), Multimedia Messaging Service (MMS), Multimedia Broadcast Multicast Service (MBMS), and Presence. The Enhanced aacPlus codec offers the following capabilities:
- Excellent (CD-like) audio quality at bitrates well below 64 kbit/s
- Efficient stereo modes, enabling high quality stereo starting at bitrates below 24 kbit/s
- Music quality across the full bit-rate exceeding that of any other audio codec known today
- Flexible configuration allowing use of any particular bit-rate starting from 8 kbit/s
- Low computational complexity for decoder and encoder
- Fully specified in 3GPPTM, including optimized floating-point and fixed-point source code
Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec
The Extended AMR-WB codec (AMR-WB+) was initially targeted on wideband applications; it extends the AMR-WB codec (with new modes) for use in packet-switched streaming and messaging services, as well as for MBMS, IMS Messaging and Presence. As this codec brings just additional modes to the existing AMR-WB codec, there are no service or architectural impacts.
The work therefore consisted in enhancing the AMR-WB codec for audio applications by developing an audio extension based on the 3GPPTM AMR-WB speech codec. The audio extension is primarily intended for non-conversational services. Among the main objectives of the audio extension were:
- High perceptual quality with speech, music and mixed content
- Music performance comparable to the quality of state-of-the-art audio codecs
- Speech performance at least as good as that of AMR-WB
- Similar bit-rates as the AMR-WB codec to ensure efficient use of radio resources
- Mono and stereo coding
Adaptive Multi-Rate Wideband (AMR-WB) codec
In March 2001, 3GPPTM approved the technical specifications for the Adaptive Multi-Rate Wideband (AMR-WB) coding algorithm, as part of 3GPPTM Release 5. The International Telecommunication Union (ITU-T) Study Group 16 approved the same wideband coding algorithm as Recommendation G.722.2 and its Annexes in January 2002.
The AMR-WB codec provides a bandwidth of 50 Hz up to 7 kHz, compared to the conventional 3.1 kHz of the traditional telephony (300-3400 Hz). The codec includes Voice Activity Detection (VAD), Discontinuous Transmission (DTX), and Comfort Noise Generation (CNG) operations consisting of nine modes (bit rates) between 6.60 and 23.85 kbit/s. The coding scheme is called 'Multi-Rate Algebraic Code Excited Linear Prediction'.
The range of bit-rates allows the application of the AMR-WB codec for GSM Full Rate channels, GERAN (Enhanced Data rate for GSM Evolution, EDGE) 8-Phase Shift Keying (8-PSK) channels, and 3G UMTSTM Terrestrial Radio Access Network Wideband Code Division Multiple Access (UTRAN WCDMA) channels. In GSM, link adaptation is used to optimize the perceived transmission quality based on measurement reports of the radio channel quality. AMR-WB is required in 3GPPTM for Multimedia Messaging Service (MMS), Packet-switched Streaming Service (PSS), Multi Broadcast Multicast Services (MBMS) and Packet-Switched Conversational Services, when 16 kHz sampled speech is used.
In addition to 3GPPTM wireless applications, further applications were targeted by ITU-T standardization, including Voice over IP (VoIP), Internet applications, Public Switched Telephone Network (PSTN) and Integrated Services Digital Network (ISDN) wideband telephony, and audio/video teleconferencing.
Adaptive Multi-Rate (AMR) codec
Between the encoding and the decoding processes which take place in the transmitting and receiving ends of a communication over a digital network, another important function takes place. This is the 'channel coding' process, described in 3GPPTM (GSM) Technical Specification 45.003: this process is indispensable to protect the encoded speech signal against interference in the radio link. The need to balance speech coding and channel coding to optimize network capacity led to the Adaptive Multi-Rate (AMR) speech coder, which appeared in GSM Release 98.
The AMR coder balances the proportion of available GSM radio channel bit rate (22800 for the full-rate or 11400 bit/s for the half-rate) between the Speech coding and the Channel coding, enabling the most effective use of the radio resources.
For the adaptation of the uplink codec mode, the network must estimate the channel quality, identify the best codec mode for the existing propagation conditions and send this information to the Mobile Station (handset etc.) over the air interface. For the downlink codec adaptation, the Mobile Station must estimate the downlink channel quality and send quality information to the network. This information is used to define a 'suggested' codec mode.
Each link may use a different codec mode but it is mandatory for both links to use the same channel mode (either full rate or half rate). The channel mode is selected by the Radio Resource management function in the network: it is done at call set up or after a handover between cells. The channel type can further be changed during a call as a function of the channel conditions.
Enhanced Full Rate (EFR) codec
In the mid-1990s, the qualitative drawbacks shown by the HR codec, together with the advent of more advanced and powerful digital signal processing technologies, pushed the GSM Association to request the Speech Experts Group (SEG) of ETSI to provide a new and better sounding speech coding algorithm, called Enhanced Full Rate (EFR), working at 12.2 kbit/s and leaving 10.6 kbit/s for the channel coding, which assured better error protection and avoided dropping the call in poor interference conditions. After 30 years, this is still the best (narrowband) codec used for speech communications over mobile phones in 2G and 3G networks!
GSM Half Rate codec
At the completion of the GSM Full Rate exercise, the Half Rate (HR) speech coding exercise was started, with the objective of meeting the same basic quality of the GSM full rate by using half the bit rate (the GSM full rate speech codec requires 13 kbit/s, added to 9.8 kbit/s to the channel coder, making a total rate for the GSM speech channel of 22.8 kbit/s).
The resulting new algorithm produced the standardized GSM Half Rate codec that used only 5600 bit/s, leaving 5800 bit/s to the associated channel coder, making a total rate for the GSM half speech channel of 11.4 kbit/s. Unfortunately, the HR codec showed that it could suffer in terms of perceived quality in extreme conditions (e.g. with certain background noises, mobile-to-mobile communications (tandeming), or certain languages).
GSM Full Rate codec
When GSMTM was first being specified, the challenge was to prove that the limited available spectrum could be exploited more efficiently than with the existing analogue systems. The capacity of systems (i.e. number of customers the mobile network can support for a given amount of licensed frequency allocation) could be maximized whilst preserving, or even improving, the speech quality as perceived by the user. The work resulted in a digital 'full-rate speech' coding algorithm.