Implementing applications that communicate over the wire imposes many challenges to the developer. You suddenly have to think much more about the security and performance of your application. To protect data from unauthorized use, we can encrypt it. In .NET, classes that deal with cryptography were available since the first release. In order to improve the performance while transferring data over low-bandwidth channels, we'll compress it. Fortunately, classes that help us with compression has been shipped with the .NET Framework since version 2.0. In many cases, we want to compress and encrypt data at the same time. Moreover, they're all based on streams, so it's very easy to plug them together as demonstrated by the following code snippet:
using System;
using System.IO;
using System.IO.Compression;
using System.Security.Cryptography;
class Program
{
static void Main(string[] args)
{
using (FileStream inputStream = new FileStream(@"D:\Desktop\blog.xml", FileMode.Open))
using (FileStream outputStream = new FileStream("Output.out", FileMode.CreateNew))
using (DeflateStream compressor = new DeflateStream(outputStream, CompressionMode.Compress))
using (CryptoStream encryptor = new CryptoStream(compressor,
new RijndaelManaged().CreateEncryptor(), CryptoStreamMode.Write))
{
byte[] buffer = new byte[2048];
int count;
while ((count = inputStream.Read(buffer, 0, buffer.Length)) != 0)
encryptor.Write(buffer, 0, count);
}
}
}
As you can see, the output of the encryption algorithm is passed to a compression stream and it outputs it to a file. This code works correctly, but is it the ideal way to do so? The answer is no. It's not the correct way of doing so. There's an important thing you should make sure of when you're compressing and encrypting data in general. You should compress data before encryption. Why? Let's take a high level look at how they work. Most compression algorithms are based on recognizing redundant patterns in the input and replacing them with smaller tokens. They usually use Huffman coding in some way to accomplish this task. Achieving a good level of compression is all about good level of pattern recognition. Compression ratios also depend on the nature of input data since if there are no redundant sequences, there'll be nothing to replace. A characteristic of a good encryption algorithm is that you should not be able to distinguish its output from random data. After all, it's their goal to ruin every possible pattern in the output. Did you notice a theme here? The output of a good encryption algorithm is the worst input you can feed to a compression algorithm! Compressing encrypted data usually increases its size. The above code should be written as:
using (FileStream inputStream = new FileStream(@"D:\Desktop\blog.xml", FileMode.Open))
using (FileStream outputStream = new FileStream("Output.out", FileMode.CreateNew))
using (CryptoStream encryptor = new CryptoStream(outputStream,
new RijndaelManaged().CreateEncryptor(), CryptoStreamMode.Write))
using (DeflateStream compressor = new DeflateStream(encryptor, CompressionMode.Compress))
{
byte[] buffer = new byte[2048];
int count;
while ((count = inputStream.Read(buffer, 0, buffer.Length)) != 0)
compressor.Write(buffer, 0, count);
}
This is essentially the same code snippet with the order of encryption and compression reversed. I compared the size of the output file generated by running different methods on a 44KB XML document:
| Original document |
44KB |
| Compressed only |
12KB |
| Encrypted only |
45KB |
| Compressed then encrypted |
12KB |
| Encrypted then compressed |
68KB |
It'll make a huge difference if you switch the order of encryption and compression. Never compress encrypted data. It's basically useless.