CSCI.4210 Operating Systems Fall, 2009 Class 22
Security I: Cryptography

Computer Security

Many serious security breaches are not done by people breaking codes or otherwise using high tech methods to find security holes; they are done by low tech methods like looking over someone's shoulder as they type their password. In fact many security breaches are inside jobs in which people who have access to information steal or modify it.

Everything discussed in this and the following classes is based on the assumption that passwords and keys are secure, and this is not a valid assumption.

Cryptography

Most computer security is based on cryptography, so let's have a little diversion on cryptography basics. I'll start with some definitions

People have been sending coded messages for thousands of years. The original codes were substitutions, in which one letter was substituted for another Here is an example. The substitution key looks like this
ABCDEFGHIJKLMNOPQRSTUVWXYZ
QWERTYUIOPASDFGHJKLZXCVBNM
So the plaintext message ATTACK AT DAWN would be encrypted to QZZQEA QZ RQVF

There are 26! (that's 26 factorial or 26 x 25 x 24 x 23 ...) or roughly 4,000,000,000,000,000,000,000,000 possible keys, so substitution ought to be a pretty safe system.

Wrong!

These are easily broken because of different letter frequencies. In fact there is a substitution cipher called Cryptoquote in the Poly every week.

Remarkably, substitution ciphers were the primary means of encoding for thousands of years. However, as techniques for breaking these codes were learned, people started developing more complex systems. A substitution cipher is based on a one to one mapping between a plaintext character and an encrypted character. All of the more modern ciphers (i.e. after 1500) are polyalphabetic. The letter A could be represented by several different characters. Typical of these is the The Vigenere Cipher (1586).

This requires a matrix of 26 by 26 characters like this.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
R S T U V W X Y Z A B C D E F G H I J K L M N O P Q
S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z A B C D E F G H I J K L M N O P Q R S T U V W X Y

There is a key word or phrase, known only to the sender and the receiver, CHIP for example. To encrypt a message, align the key up along the plain text message, repeating it as often as necessary, like this.

ATTACK AT DAWN
CHIPCH IP CHIP

The encrypted character is determined from the matrix, with the plain text character indicating the column and the key indicating the row. Thus, the encrypted message would be

CABPES II FHEC

Note that the letter A, which appears in the plaintext four times, is encrypted to four different letters.

Decryption was simply the reverse of this process. Receivers of this message know the key and the encrypted letter, so they can determine the original message

There were numerous variants of this cipher developed, and for about three centuries, this was thought to be entirely unbreakable.

However, in the 19th century, several people (including Charles Babbage) were able to break such ciphers. The first step in breaking such a key is to try to find the period, i.e. the length of the key (four in our example). Once the period is found, then classical substitution cryptanalysis methods can be used.

There are several ways to guess at the period. The first is to look for strings that repeat. These are likely to be in the same phase, and so the period would be a factor of the phase.

The second method is considerably more complex. Note the actual positions of every letter. Take the difference in each and every case. Get the frequency of each factor for each case. For example

A is at positions 6, 45, 83, 89, 92, and 115
B is at positions 16, 17, 40, 41, 80, 86, 104
etc

Differences 45-6 = 39, 83-45=38, 83-6=77

Look for common factors in these values because these are likely to be the period. A variant of this is to look at the distance between repeat digraphs (two letter pairs).

Decryption is easier if you happen to know something about the nature of the plaintext message. If you suspect that certain words will appear often in the plaintext, you can do a search for these.

Clearly, the longer the key, the more difficult it is to break the code. The extreme of this is a one time pad in which the key is infinitely long. Encryption with a one time pad is still completely unbreakable using cryptanalysis (you can still break it if you bribe someone for the key), but it is not widely used because the problem of key distribution is nontrivial, and it is difficult to generate a truly random string.

The Enigma Machine and variants

Until the 20th century, essentially all encryption and decryption was done manually. In 1920 Rudolf Zschweigert was granted a patent on a transposition cipher machine. This was a class of machines that had a series of rotors with different periods. The machine had an ordinary keyboard. As a key was struck, the rotors advanced, and the outputted character was a function of the settings of all of the rotors. The machine was set by setting the start position of each rotor. Suppose a machine had six rotors, with the following periods.
26, 25, 23, 21, 19, 17
The period of such a machine was 26 x 25 x 23 x 21 x 19 x 17 because these are relative primes.

Later versions also added other features such as plug boards to do other forms of translation.

One such class of machines was called the Enigma Machine and was used by the German Navy to communicate with their submarines during World War II. Alan Turing and other mathematicians were able to break this code after intensive effort, and this contributed to the Allied victory in the War.

The details of how this was done are fascinating, but beyond the scope of this course. Here is the Wikipedia page for the interested student.

The Computer Takes Over

The development of computers lead to vast changes in both encryption and decryption. Because a computer could search a huge number of keys fairly quickly, breaking codes like the Vigenere Cipher were trivial, at least with relatively short keys. However, computers also allowed far more elaborate encryption methods which involved passing a message through an encryption function multiple times. Imagine how much more difficult it would be to break the code if the encrypted message of a Vigenere Cipher was encrypted a second time with a different key.

In 1977 the National Bureau of Standards (Now called NIST, the National Institute of Standards and Technology) adopted DES (Data Encryption Standard) as an encryption standard. DES is an iterated cryptosystem based on iterating a weaker function many times. Each iteration is called a round.

DES was based on a system called LUCIFER developed at IBM. LUCIFER was a block cipher based on 128 bit blocks with a key of 128 bits (so every 16 bytes are combined together). When it was adopted by NBS, the keysize was reduced to 56 bits The actual key is 64 bits, but 8 are for parity checking. Block size is also 64 bits.

It involves an initial permutation, followed by 16 iterations of various key based transformations followed by a final permutation. The middle iterations involve successive applications of permutations (i.e. moving bits around) and substitutions. Each of the 16 iterations uses a different 48 bit key based on selected bits of the original 56 bit key.

Details are beyond the scope of this course. The interested student can go to this website to learn more about DES.

DES has the desirable feature that changing any one bit in the key potentially affects every bit in the encrypted block.

The fact that the key size was reduced from 128 bits to 56 bits has caused speculation that the National Security Agency, which has the most powerful computers in the world, requested this because it knew how to break DES with a 56 bit key, but could not break it with the original 128 bit key.

DES can be put on a chip so it can be done very quickly. It is still a widely used standard. The only known way to crack DES is by brute force; i.e. trying all possible keys. Since there are 256 possible keys, this takes a while.

As computers get faster, breaking DES using brute force has moved from the realm of impossible to merely difficult. In 1998 the Electronic Frontier Foundation, using a specially developed computer called the DES Cracker, managed to break DES in less than 3 days. Reports of people breaking DES are now routine. Thus, newer encryption methods have been developed. If you are sufficiently paranoid, you can doubly or even triply encode using two or three different keys. Run DES on your message with the first key, then encrypt the crypted message with the second key, then encrypt this encrypted message with the third key. With triple DES, the effective key size is 168 bits, and virtually everyone is confident that even NSA cannot decrypt triple DES.

There are other encryption standards based on similar strategies. One widely used scheme is the International Data Encryption Algorithm (IDEA), adopted in 1990. This is also a block algorithm, but it works with a 128 bit key. More recent encryption standards include the Advanced Encryption Standard (AES), and Skipjack.

Skipjack was the encryption scheme for the clipper chip. This was proposed by the Federal Government as a standard for encrypting voice. However, the Government installed a backdoor in the chip so that they could (with court authorization) eavesdrop on enrypted conversation. This caused such an uproar in the nerd community that it was abandoned.

All of the systems described so far are symmetric; the same key is used to encrypt and decrypt the message, and this is their fundamental flaw. The key has to be distributed, and any key distribution system is subject to being broken.

Consider the common case where two people who have never met before would like to communicate securely; the obvious example is someone who wishes to make a purchase on the Internet using a credit card. DES and similar schemes are worthless for this purpose, because there is no way for the two users to agree on a key. If one person sends the key to the other, it is possible for a third party to be listening in on the connection, grab the key, and thus be able to read all of the traffic.

Two solutions to this problem were developed about 30 years ago. One is the Diffie Hellman exponential key exchange algorithm, the other is the RSA public key encryption scheme.

The Diffie Hellman exponential key exchange

This scheme allows two people to set up a secret but unauthenticated connection, It allows them to negotiate a secret key, but cannot confirm who is at the other end. It works even if someone is listening on the line and is able to steal all of the communication between the two sides. Here is how it works.

k is the key. Both Alice and Bob know it, but an eavesdropping bad guy cannot calculate it from the information sent over the network.

Here is an example (of course a real example would use much bigger numbers)


Public Key Encryption

The current standard for encryption without key exchange is public key encryption, which uses the RSA algorithm, named after its three developers Rivest, Shamir and Adelman. This is an asymmetrical cipher. The message is encrypted with one key and decrypted with a different key. A person, Alice for example, makes his or her public key known to the world. Everyone knows it, even the bad guys. Someone who wants to send an encrypted message to Alice uses her public key to encrypt the message. However, knowing the key which was used to encrypt the message does not help in decrypting the message. It can only be decrypted using a different key, known only to Alice. This key is called the private key.

Of course there is a relationship between the public key and the private key. I will not go into the mathematics behind RSA but you should know that RSA unbreakability is based on the problem of factoring large numbers. Currently factoring numbers of a hundred or more digits is impossibly difficult. If someone can figure out how to factor large numbers in a reasonable amount of time, much of the world's secure communication will be compromised. Of course it is entirely possible that someone has already solved this problem and but just hasn't told anyone, because they would rather steal credit card numbers or do other nefarious things.

Unlike DES or IDEA, encrypting and decrypting a message using RSA is quite computationally intensive. As a result, RSA is not usually used to encrypt long messages. Rather, it is used to encrypt and send a DES key or and IDEA key, and then the DES or IDEA key is used to decrypt the message.

Other uses of encryption

When people think about encryption, they tend to think solely in terms of protecting the content of a message from intruders. There are a number of other threats as well. Here is a less than complete list.

The last two do not involve bad guys, but reflect a lack of trust between the sender and the receiver There are some encryption related techniques that can be used to address many of these issues.

For Example, RSA can also be used for authentication. Authentication is the problem of confirming that the person on the other end of the connection is who he/she says he/she is. RSA has a unique reversibility feature that allows this. If a person uses RSA to encrypt a message using their private key, it can be decrypted using their public key. Thus, if you get a message from Alice, and you can use Alice's public key to decrypt it, then you know that it must have been encrypted using Alice's private key. Presumably, this is known only to Alice, and therefore the message was sent by Alice, and not by someone pretending to be Alice.

Digital Fingerprints

It is sometimes useful to encode a message using a one-way function, also known as a cryptogrphic hash function. Such a function produces a fixed length result independent of the original document size. One of the most widely used is MD5 (Message Digest Algorithm version 5) which is an Internet standard. A more secure function SHA (Secure Hash Function) developed by NSA is also used (This is actually a family of algorithms which vary depending on the size of the hash result). Such functions are similar to checksums but more secure. Their purpose is to confirm that a message has not been altered in transmission, either by accident or by a malicious entity. If the message has been altered, it would produce a different value. These functions can also be used by a virus checker to confirm that a file has not been altered.

In any discussion of computer and network security, we make these assumptions:

Let's see how two people (Alice and Bob) can communicate securely using these above techniques. These methods are used by PGP (Pretty Good Privacy) a system developed by Phil Zimmerman to provide confidentiality and authentication for Internet communication such as email. PGP does not involve new technology, but it is an assemblage of best practices in cryptography. It was made available for free, it was and still is very widely used. Phil Zimmerman got in a lot of legal trouble because he violated restrictions on exporting munitions without a license (even though the bad guys could read the algorithms in any of a number of books on cryptography).

Here is how it works. Suppose Alice wants to send a message to Bob, and she wants to make sure that no one other than Bob can read it, and Bob wants to be able to authenticate the message; i.e. confirm that it is really from Alice and not someone pretending to be Alice. Both Bob and Alice need to have PGP installed , so each has a public key and a corresponding private key based on RSA. Alice knows Bob's public key and Bob knows Alice's public key. Eve also know both public keys.

Alice writes the message. PGP uses MD5 to generate a 128 bit hash code (fingerprint) of the message. This digital fingerprint is encrypted using Alice's private key, and this encrypted code is appended to the message.

The message and appended encrypted digital fingerprint are compressed using zip.

The compressed file is encrypted using IDEA with a one-time session key, generated by Alice. The one-time session key is then encrypted using RSA with Bob's public key.

This message, with encrypted one-time session key appended, is then emailed to Bob. Eve can see it, but cannot decrypt it.

Bob receives the message. Using his private key, he decrypts the one-time session key (recall that it was encrypted with his public key).

Using the one time session key, he decrypts the message, which had been encrypted using IDEA with the same key.

He unzips the file and removes the digital fingerprint. He decrypts the digital fingerprint with Alice's public key. He also calculates the digital fingerprint using MD5 and compares his answer to the the decrypted value. If they are the same, Bob knows that the message was sent from Alice, because the fingerprint had been encrypted using Alice's private key. He also knows that the message is the same as the one Alice sent, because the MD5 hash code was the same.

Digital Certificates

One flaw in the original PGP was that there was no way that Alice could be sure that Bob's public key was really from Bob and not from someone pretending to be Bob. The solution is a Digital Certificate which has been authenticated by a trusted certificate authority (CA).

Here is a simplified version of how a CA can confirm the authenticity of a public key. An entity (Alice in our example, but more often a company) applies to the CA, providing enough information so that the CA is sure that Alice is who she says she is. She is then issued a public key/private key pair. The CA also sends her a certificate, containing her public key and some other information about Alice, encrypted with the CA's private key. When Bob wants to communicate with Alice, she sends them this certificate. Bob decrypts the certificate using the CA's public key. If the certificate decrypts properly, Bob knows that he is communicating with Alice, and not someone pretending to be Alice, and he also knows that the public key is valid.

There is a standard for such certificates, X.509.

The best known CA is VeriSign.

This method is used to insure secure web communication using a protocol called the Secure Socket Layer (SSL). Every SSL web server has its own certificate. Web browsers come preloaded with the public keys of about 40 CAs.

Here is Alex's file on building your own OS from scratch

Return to the course home page