Navigation Bar

Windows Storage - Compression

You use compression to save on disk space, but the act of compression to store, then de-compression to read, uses CPU power. Its a trade off, you need to decide which is the most important, saving CPU cycles or saving disk storage. Generally, you would compress files which are not used much, and not compress very active files. Compression could reduce your disk usage by about 60%.
FAT does not support compression, so if you've set Windows up to use the FAT file system, forget it.

On NTFS volumes, you can compress individual files, folders, or entire drives by simply checking a box on the Objects properties sheet. When you compress a whole folder, any new files added to that folder are automatically compressed as well.
How do you know if compression is active? Open any folder window; choose 'View', 'Options'; and on the View tab check the box labeled 'Display Compressed Files and Folders with Alternate Color'.
Compression is supported from Windows NT upwards, but NT 4.0 permits data compression on a case-by-case basis, and also adds security information to every folder and file.

How does it work?

The engine compresses blocks of 4096 bytes of plain-text data using a modified LZ77 algorithm. Basically, the engine checks for repeated substrings of data, then stores them by referring back to the string position and length.
For example, consider the plain text

  Name:Brown Date-of-birth 11/04/65
  Name:Thompson Date-of-birth 11/04/70

This is compressed to

  Name:Brown  Date-of-birth 11/04/65
  (-34,5)Thompson(-39,22)70

So after reading the first line, the compression engine recognizes that -34 bytes from the current position, it has already seen the text 'name: '. 'Thompson' has to be stored in full, but 'Date-of-birth 11/04/' can all be retrieved from 39 bytes back. Finally, the last two characters, '70' have to be stored in full.

The reference pair (-34,5), is recorded in two bytes, so there is no point in compressing out one- or two-byte substrings. The number pairs are actually stored as positive integers, with 3 subtracted from the first integer, as you will never compress a string of less than 3 characters. So the numbers above are stored as (31,5), and (36,22).

There are cases where compression is applied and the compressed data is actually bigger than the original. This is because control bytes are needed for every data block, which indicate if the block is compressed, and how big it is. If the data won't compress, then the control bytes are just an overhead.

A UK firm, Ripcord Software, has introduced IISxpress 2.0, which is a compression engine designed to cope with HTTP objects, specifically for Microsoft's IIS web server. IISxpress comes as a Community Edition, which is free for non-commercial and non-governmental use. It supports Windows 2000, XP and 2003 server.

Some IISxpress features include

  • Better small file performance
  • Configuration wizards to help you to configure compression based on your data
  • Multiple CPU support to improve performance.
  • 64 bit support
  • You can select compression mode yourself without requiring a server reboot
  • Real time statistics to help you see the benefits of compression
  • IISxpress will maintain a history of previous compression performance, and will not compress a file again if it had minimal performance gain previously.
  • IISxpress will run compression as low priority if the CPU is busy, so compression does not degrade overall application performance.

If you are working with a very distributed infrastructure, maybe using Distributed File System, then you could end up transferring a lot of data around your network.
Remote Differential Compression (RDC) is intended to help manage this data transfer over limited-bandwidth networks. If a file is updated, RDC will only transfer the changed parts of files, called deltas, instead of the whole file. Microsoft claims that RDC can reduce bandwidth requirements by as much as 400:1.

back to top


By entering and using this site, you accept the conditions and limitations of use

 

 

 

Advertising banner for Lasconet