Encryption: keys

So we've decided that security by obscurity is not a good option for encoding our data. Essentially, it's not a good option because:

we may need to encode independent pieces of data (be they files, communications across a network etc), and we might want them to be encoded differently so that just because Person A can decode File A' with our system doesn't necessarily mean that Person B can decode File B' with our system;
we would like our system to be "provably secure"— or at least, since the latter is impossible in most practical cases— we'd like some reasonable confidence that it's not insecure; in practice, it's difficult to have several experts scrutinise an algorithm for security problems whilst at the same time keeping it secret¹;

So instead, the vastly more practical and secure option is as follows:

design a public algorithm which each time encrypts data using a specific key;
design the algorithm so that essentially any random key will work (or at least, any random key chosen according to some well-defined criteria);
now, the only secret is the key in a particular conversation or use of the algorithm.

For now, we can imagine that a "key" is just some specified number of random bytes. Roughly speaking, this is a sequence of bytes that we'll "mix in" when we encrypt a particular conversation or file. And the key can change for, say, every file or communication that we encrypt. But the actual encryption algorithm— which defines the way in which we "mix in" the key— will always be the same. Our premise is essentially that "with this algorithm and any permitted randomly-chosen key, the encrypted data will always be totally undecipherable without a key that is 100% correct". Then, we make our encryption system public and encourage the world's cryptography experts to deliberately try and find flaws in our algorithm. Without the specific key that encrypted a given piece of data, the algorithm by itself is not enough to decrypt that data, so it is of no security concern (quite the opposite, in fact) that the algorithm is public.

The AES process showed that organising a competition for a new standard (where the winning team gets fame/money beyond their hamster's wildest dreams) can help with this: participants then have an incentive to look for flaws in each others' submissions, and even for those outside the competition, there'd be enough kudos in writing a paper that identified a flaw in an emerging NSA standard that some experts would surely start looking. Previous examples of active incentives to guage the strength of cryptographic algorithms include the contest held by Dr Dobb's Journal to find flaws in Blowfish², or RSA's previous competition to factor various large numbers.

After some time, we round up the resulting attacks and analyses of our algorithm, and assess how secure our system is. If in several years, none of the world's leading cryptanalysts has found a flaw (and we're sure they've been looking), then this gives a high level of confidence that our algorithm is "secure". If after 6 months, somebody publishes a paper showing how to break a 9-round version of our algorithm and our algorithm uses 10 rounds (see the section on block ciphers for the notion of rounds), then we should be much more worried.

Assuming that our algorithm is secure, then as a secure, key-based algorithm, it then has some desirable properties:

it is secure whatever the key, except possibly for some well-published set of "weak keys" that we can deliberately avoid;
the only secret is the key, of which we can easily³ generate as many as we need— it's much easier for a computer program to generate a series of 16 bytes than to generate a unique, secure encryption algorithm on each run!— that means that our server can encrypt Conversation A with Person A' and, with the selfsame algorithm (but a different key), encrypt Conversation B with Person B', and A' and B' cannot illegitimately decrypt each others' conversations;
a more minor issue, but still useful in practice as we'll see, is that it reduces and fixes the amount of secret information that we need to manage— there can be just (say) 16 bytes of key information per entire file/conversation/hard disk etc.

Now, our notion of "secure" actually implies a couple of extra things. In particular, we want decryption to be an "all-or-nothing" operation. It would be no good, for example, if the encrypted/decrypted output with key 01234567 was similar to the output with key 01234568 in any detectable way. If it were, then: (a) an attacker could eventually guess our key by just "trying lots of keys", using failures as a clue as to how "warm" they were, and thus which key to try next; (b) Person A might be able to use his/her supposedly unique encryption key to decrypt Person B's data without authorisation. The relationship between the key and the resulting encrypted text have to appear "random".

Guarantee of security?

Unfortunately, there's probably no practical encryption system whose security can be guaranteed: at any given time, we deem a system "secure" in the sense that the world's cryptanalysts have so far not found a flaw. But no mathematical proof will generally tell us that a flaw can't ever be found, or that one will even be needed with some future technology⁴.

Next: symmetric key encryption

On the next page, we begin looking at symmetric key encryption in Java. This is arguably the simplest case, where the key used to encrypt is the same as that used to decrypt, and is thus a secret between the parties involved.

1. Even if you think you can just "pay enough experts" to look at it while signing a non-disclosure agreement, it's extremely difficult to know that it's been seen by "enough" of the "right" experts. Think about how the Internet has made certain sites explode in popularity virtually overnight. Your home-grown scheme, or the one that you paid one expert to look at for one week, may appear to offer adequte security now. But if your program suddenly gains in popularity, that situation could be reversed too rapidly for you to do much about the insecurity before it's too late.
2. See Bruce Schneier's 1995 article: Blowfish: One Year Later.
3. Compared to most uses, we actually need to be careful about how to generate random numbers: see the discussion of SecureRandom.
4. For example, all of our estimates of the effort required to guess a key will be incorrect once quantum computers are mainstream.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey