24 – Compression Algorithms

24 – Compression Algorithms

Data compression is a term used to describe reducing the size of a file whilst keeping it in a usable format. There would be little point in making a file smaller if we could no longer use it, so different algorithms have been developed to allow us to make efficient use of space on our digital devices.

Compression techniques can be categorised into two methods:

Lossy – data is removed from the original file to reduce its size.

Lossless – data is not removed from the original file, or can be restored in full.

Lossy Compression

Lossy compression can be applied to files where losing data may not be detected, or if it is, the file is still usable. This generally means that images, video, and sound use lossy compression techniques (but not always!).

When you are watching a streamed video, you will often be given the choice of watching in HD or SD. Whilst we know the Standard Definition video is of inferior quality, it does have the benefits of being a much smaller file so it will work on slower connections. Even though the file is of a lower quality (due to lossy compression), it remains usable.

JPEG image files use a lossy compression technique where the number of colours in the image are reduced to just those within the human spectrum of vision. Often, the number of colours is reduced to far less without a visible difference. In the case of JPEG files, the removal of colours is balanced by replacing the standard RGB (red green blue) colours with luma (brightness & contrast) & chroma (red – blue colour).

Lossless Compression

Lossless compression is employed when we need to reduce the size of the file without losing any data. This may be because removing data would make the file unusable as is the case for text files or program code, or to allow us to uncompress the file back to it’s original state for editing.

Zipped files make use of lossless compression, allowing the files & folders to be uncompressed after sending without losing data. Compressing data, can be acheived using a variety of algorithms, one being Huffman Encoding which reduces the number of characters and makes efficient use of bit patterns.

Another technique that can be applied to both text and images is known as Run Length Encoding. In RLE, pixels or characters that are the same are encoded as a run instead of idividually. For example:

Uncompressed Text:  HAAAHAAAHAAA 

RLE: H1 A3 H1 A3 H1 A3

Activity 1