A comparison of Internet audio compression formats

Note: This section is on Non-Speech Audio Compression. Speech Compression is a separate field and will be covered seperately.

Copyright (c) 1995-97, 2001, 2003 Serious Cybernetics
Commissioned by Radio Australia
Original research by Andrew Pam
Updated 1997 by Ben Hemming
"Further information" updated 2001 by Andrew Pam
Ogg information updated 2003 by Andrew Pam with thanks to Joel Forsberg

Quick comparison chart

8KHz mono audio formats
Audio format	16-bit PCM	G.711 mu-law	32Kbps MPEG-1	IMA/DVI ADPCM	GSM 06.10	InterWave VSC112	TrueSpeech 8.5	RealAudio v1.0	ToolVox for the Web
File extension	.wav or .aiff	.au	.mpa or .mp2	.wav	.gsm	.vmf	.wav	.ra	.vox
Data rate	128Kbps	64Kbps	32Kbps	32Kbps	13.2Kbps	11.2Kbps	8.5Kbps	8Kbps	2.4Kbps
File size per minute	960K	480K	240K	240K	96K	82K	62K	59K	18K
Compression factor	1:1	2:1	4:1	4:1	10:1	11:1	15:1	16:1	53:1
Sound quality	5 (sample)	4 (sample)	4 (sample)	3 (sample)	2 (sample)	2 (sample)	2 (sample)	1 (sample)	0-3 (sample)
Relative compression speed	N/A	10	(not tested)	1.2	0.75	3	0.5	0.2	0.25
Windows player	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Mac player	Yes	Yes	Yes	Yes	Yes	No	Yes	Yes	Yes
Unix player	Yes	Yes	Yes	Some	Yes	No	No	only v2.0	Promised
Supports higher sample rates	Yes	Yes, but rarely used	Yes	Yes	No	Yes	No	in v2.0	in other products
Streaming playback file	None	None	None	None	.gsd	.vmd	.tsp	.ram	None
Credits	None	None	None	None	None	In .vmd file	None	In .ra file	None

Audio compression formats

16-bit PCM

Original source material is uncompressed. 8-bit sampling is possible, but the dynamic range is much lower; 24-bit and 32-bit also exist but are rarely supported. PCs use the Microsoft .WAV format, Macs and Unix use the .AIFF format.

mu-law

mu-law is the international standard telephony encoding format, also known as ITU (formerly CCITT) standard G.711. It packs each 16-bit sample into 8 bits by using a logarithmic table to encode with a 13-bit dynamic range and dropping the least significant 3 bits of precision. Encoding and decoding is very fast and support is universal. There is a slight variation called A-law used in European telephone systems.

MPEG

MPEG (from the Motion Picture Experts Group) is the international standard for multimedia. It incorporates both audio and video encoding at a range of data rates. MPEG audio and video are the standard formats used on Video CDs and DVDs. The lowest data rate supported for MPEG-1 mono audio is 32Kbps. Sample rates of 32KHz, 44KHz (audio CD) and 48KHz (Digital Audio Tape) are supported; I used 32KHz for the 8KHz source material. There are three types of MPEG audio encoding, layer I, layer II and layer III in increasing order of sound quality and encoding time. Layer I is the "PASC" compression used in Digital Compact Cassettes and Layer II is the "MUSICAM" compression format. Layer III (aka "MP3") has recently become very popular on the Internet due to its combination of high quality and high compression ratio. MPEG-2 provides broadcast quality audio and video at higher data rates and MPEG-3 has been absorbed into MPEG-2. The new MPEG-4 standard will add support for lower sample rates (16KHz, 22KHz and 24KHz) and low data rate encoding (down to 8Kbps).

ADPCM

ADPCM (Adaptive Differential Pulse Code Modulation) comes in many varieties. There is the IMA (Interactive Multimedia Association) DVI standard, the ITU (formerly CCITT) standards G.726 and G.727 which supercede the earlier G.721 and G.723 standards, and proprietary versions from Microsoft, Creative Labs, Yamaha and Oki. There is also Sub-Band ADPCM (G.722) which is used for audio on ISDN phone lines.

GSM 06.10

GSM 06.10 is the international standard digital mobile telephony encoding format. It uses linear predictive coding to substantially compress the data by predicting the likely shape of the sound wave and recording the differences between the actual sound and the prediction. Compression and decompression are slow and the quality is not great, but the algorithm is freely available resulting in widespread use in products such as CyberPhone, NetPhone and Speak Freely.

InterWave

InterWave is a proprietary encoding format created by VocalTec. It is designed specifically for real-time audio on the Internet and features a friendly user interface and very rapid encoding times. A small program is provided which uses the Common Gateway Interface on a World Wide Web server to support repositioning during real-time playback. InterWave is also the basis for VocalTec's Internet Phone product. The following encoding formats are available to support a variety of sample rates:

Name Data rate Original sample rate

VSC77 7.7Kbps 5KHz

VSC112 11.2Kbps 8KHz

VSC154 15.4Kbps 11KHz

VSC224 22.4Kbps 16KHz

Name	Data rate	Original sample rate
VSC77	7.7Kbps	5KHz
VSC112	11.2Kbps	8KHz
VSC154	15.4Kbps	11KHz
VSC224	22.4Kbps	16KHz

TrueSpeech

TrueSpeech is a proprietary encoding format created by DSP Group, Inc. It is designed for digital telephony use (such as WebPhone) and intended to be implemented in hardware using Digital Signal Processing chips. The decoder can play all TrueSpeech formats but a software encoder is currently only available for TrueSpeech 8.5. The following encoding formats are available to provide varying degrees of compression, using increasingly powerful chips:

Name Data rate Compression factor

Truespeech 8.5 8.5Kbps 15:1

Truespeech 6.3 6.3Kbps 20:1

Truespeech 5.3 5.3Kbps 24:1

Truespeech 4.8 4.8Kbps 27:1

Name	Data rate	Compression factor
Truespeech 8.5	8.5Kbps	15:1
Truespeech 6.3	6.3Kbps	20:1
Truespeech 5.3	5.3Kbps	24:1
Truespeech 4.8	4.8Kbps	27:1

RealAudio

RealAudio is a proprietary encoding format created by Progressive Networks. It was the first compression format to support live audio over the Internet and thus gained considerable support, but it requires proprietary server software in order to provide the real-time playback facility. It also supports repositioning during real-time playback. Version 2.0 offers two encoding algorithms, the original v1.0 8Kbps data rate and a new faster data rate which provides higher audio quality from source material with higher sample rates (11KHz, 22KHz and 44KHz). However, RealAudio 2.0 requires the latest hardware; a Pentium or a PowerPC Mac is required for best results, although it will work on a 68040/25 or faster Mac or a 486/66MHz or faster PC.

ToolVox for the Web

A proprietary encoding format created by VoxWare which achieves very high compression by using vocal modelling. This allows the sound to be played faster or slower without changing the pitch, but is only designed to work with spoken voice material. Music and sound effects will usually not compress properly. VoxWare's MetaVoice technology is the basis for their TeleVox, ToolVox for Multimedia and ToolVox RT products and will also be the codec used in Netscape's LiveMedia standard.

OggSquish

OggSquish is intended to compete with MPEG layer III, but is still in alpha test. It will provide a range of compression factors from 5:1 up to 18:1 plus a "lossless" compression mode. It is optimised for very high sound quality (source material at 30-48KHz sample rates).

(Later update:) The original OggSquish evolved into the Xiph.org foundation which includes the audio codec "Ogg Vorbis" and a lossless codec named "FLAC"

ASPEC

ASPEC is one of the higher quality sound compression algorithms. ASPEC can produce CD quality sound and supports several different bitrates ranging from 128Kbps and down including 64Kbps. ASPEC uses the frequency limitations of human hearing as well as complex entropy coding for it's lossy compression.

The best bits of the ASPEC and MUSICAM compression formats have been combined for the MPEG Layer III audio compression standard.

Mac problems

Mac hardware does not support sampling at 8KHz. Therefore to compress audio on a Mac you will need to sample at a supported rate such as 11KHz and either use a format which supports the Mac rates (mu-law, MPEG or ADPCM) or resample the audio down to 8KHz before compression. This is very CPU-intensive and further reduces the audio quality. However, you should be able to play back 8KHz samples created on other hardware.

Windows problems

When playing compressed audio files under Windows, there may be annoying pauses in the sound. There are two main causes for this: not being able to effectively use a fast (28.8Kbps) modem, or interruptions caused by accessing the hard drive during playback. Both are symptoms of interrupt handling problems in Windows which do not occur under other operating systems such as Windows NT, OS/2, or Unix. Also, high speed modems require a "buffered UART" such as the 16550 chip rather than the older 8250 and 16450 chips still used in many PC serial ports. If you are using a 28.8Kbps modem check that you have this hardware and the appropriate driver software installed.

Further information

More comparative samples: Bunny People and The Cyberspace Report (30 minutes!)
More information about audio formats
Information about MPEG layer 3 compression
Lossy and Lossless compression of Audio
More useful links
Newsgroups: comp.compression and comp.speech

Send comments to xanni@sericyb.com.au (Andrew Pam)