Bonk

homeblogmastodonthingiverse



Bonk is a an audio compression program that operates in both lossy and lossless modes. The compression method Bonk uses is quite simple compared to other lossy formats (such as MP3), and is based on speech compression techniques. It is nevertheless capable of producing a high compression ratio while maintaining good sound quality.

In lossy mode, it averages 14:1 compression while remaining almost perceptually lossless. This corresponds to 95 kbps for reasonable quality audio.

In lossless mode the original audio file can be recovered exactly. Lossless mode typically achieves around 2:1 compression.

Bonk compiles under Linux, FreeBSD, NetBSD and OpenBSD. (Note: I personally only have access to Linux machines. I can't test Bonk on other operating systems, just apply patches people send me. So if it isn't working on other systems, please email me!)

Advantages (as compared to MP3)

Disadvantages

Download

Fabrice Haberer-Proust has written a Bonk plug-in for XMMS.

Jimmy Taino has written an n-curses interface for Bonk.

Changes

0.1  initial release
0.2  fixed problem with warbling artifact
     for very pure tones
0.3  better high frequency reproduction,
     reduced default quantization level,
     better speech compression mode, options
     to specify output file and artist and
     track name
0.4  removed a clipping problem, more
     elegant (and faster) encoding algorithm,
     slightly faster decoding, endian safe
     play-back (ie can now play-back on PowerPCs)
0.5  support for BSD variants (courtesy of a
     patch by Christian Weisgerber), now
     automatically increases bitrate for
     tonal sounds
0.6  compile with g++ 3.1

How it works

Bonk's approach to lossy compression of music is, as far as I know, unique. Here is a very, very brief overview.

Basic idea:

Break sound into a sequence of short, equal sized segments. Create a linear predictor for each segment, and use that linear predictor to compress the segment. The compressed file consists of a sequence of blocks, each block describing one segment of sound. Each block consists of the segment's linear predictor plus the sound compressed using that predictor. Both the linear predictor and the compressed sound are quantized to reduce their size.

Some details:

Ok... so you've got a linear predictor. Apply the predictor to the sound, and you get white noise. Un-apply the predictor to that particular white noise, you get the original sound back. Un-apply the predictor to just any old random white noise, you get a random signal with the same frequency envelope as the original sound. (Well, more or less... if you have only a limited sample it's not possible to infer an absolutely precise frequency envelope.)

So here's the trick: once you've applied the predictor to the sound, you quantize that whitened sound -- throw out the bottom few bits. Turns out that's very much like adding white noise to it. Which means, when you un-apply the filter to get the original back, you've got added in to it a random signal with much the same frequency envelope. And so (hopefully) that random signal then gets very nicely masked by the actual sound.

Why does this work? Well, it turns out a linear predictor created from a chunk of sound of just the right length gives a frequency response curve that looks suspiciously like the frequency masking curves for human hearing!

Something that makes this a little bit trickier is that you have to encode in the compressed file not only the whitened sound but also the predictor. And to get that masking effect to work properly, you need a pretty big predictor... way bigger than, say, FLAC would use for plain old lossless compression. So you need to quantize the predictor. Lots. However, it turns out if you just quantize ordinary predictor coefficients chances are you'll make the filter unstable. Feed in anything but the exact original whitened signal and the result quickly heads off to infinity. Not a problem for lossless compression, big problem for lossy compression. So the predictor has to be put into a form where you can guarantee the quantized version will be stable. One such form where you can guarantee this easily (at the cost of a few extra multiplies) is a "lattice filter". Just keep the coefficients -1 < k < 1 and it's guaranteed to be stable. So this is what Bonk uses.




[æ]