Understanding RSA Encryption

 

This microstudy serves as the second part of my series in Java cryptography programming.

Diffie and Hellman invented early asymmetric encryption in 1976 and based on that MIT misters Rivest, Shamir, and Adleman, invented RSA (acronym after their names) in 1979.

The basic scheme relies of the difficulty of factoring large prime numbers making the reverse engineering of a key pair difficult and time consuming. The basic RSA process is described n the book CYBER SECURITY ESSENTIALS by editors Graham, Howard, and Olson from CRC Press which I summarize here…

The recipient creates 3 numbers which will be used as:

  • The exponential – e
  • The modulus – n
  • The multiplicative inverse of e with respect to d

The modulus n is the product (result of multiplication) of two very large (in the order of 309 digits or more) prime numbers. These prime numbers will be labeled as p and q so that…

n = pq

The recipient publishes his public key as e,n.

The sender takes his text message and transforms it into an integer. The ascii character code of course would work for that.

The number however must be between zero and (n-1).

Aand n-1 is found by first subtracting 1 from p, and 1 from q, and then multiplying the two together.

And if the message can not fit into the confines of this integer the message is broken into blocks. And of course that’s how most encryption streams work as a byte (8 bits) is one character.

The sender generates the ciphertext c using formula…

c = me mod n

Now the book gives a specific example with small prime numbers for simple explanation and I summarize that here also…

Recipient chooses 2 prime numbers: p = 17 and q = 11 and multiplies them to get n so that n = 187.

The recipient chooses an exponent that is less than (p-1)(q-1) or in other words less than 160 since 17-1 = 16 and q-1 = 10 and both multiplied is 160. But the chosen number must also be relatively prime to 160. Note: The website mathisfun.com says: Two numbers are “relatively prime” when they have no common factors other than 1. In other words you cannot evenly divide both by some common value. 7 and 20 are relatively prime (no common factor). 6 and 20 are not relatively prime because you can evenly divide both by 2 (2 is a common factor. :)

Between zero and 160 there are a number of prime numbers available for selection. These are: 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157. But one must be found which is not relatively prime and so without getting into the mathematics of how that is determined the example just says the chosen number is 7.

Then the value of d is found by the formula…

de = 1 (mod 16) with d < 160. Again some of the math here is skipped over but the example says d becomes equal to 23.

Now the recipient develops a private key of 23,187 and public key of 7,187.

The sender having a message which was translated into the integer 88 which is between 0 and 186  the sender calculates 887 mod 187 which is equal to 11 and so the number 11 is the ciphertext so that number or the text equivalent of that number is sent to the recipient.

To decrypt it the recipient uses formula…

1123 mod 187 which equals 88.

There you have it folks. Basic RSA encryption.

The diagram shows that plaintext 88 becomes encryption scheme 887 mod 187 = 11 which is the cypertext and the decryption is 1123 mod 187 = 88 becomes plaintext 88.

The number 7,187 is from 887 mod 187 and that stands for the public key of the recipient. And the number 23,187 is the private key of the recipient derived from 1123 mod 187.

Crazy huh?

Now in the original paper describing RSA the team wrote, “A message is encrypted by representing it as a number M, raising M to a publicly specified power e, and then taking the remainder when the result is
divided by the publicly specified product, n, of two large secret prime numbers p and q. Decryption is similar; only a different, secret, power
d is used, where e · d ≡ 1 (mod (p−1)·(q−1)). The security of the system rests in part onthe difficulty of factoring the published divisor, n.”

The fact that encryption is not regularly and mandatorily used as part of email shows a purposeful plot to undermine privacy for the necessity of such a system in the design of email to safeguard privacy and ensure it was like regular mail was thought out well in advance the team writing, “The era of “electronic mail” [10] may soon be upon us; we must ensure that two important properties of the current “paper mail” system are preserved: (a) messages are private, and (b) messages can be signed. We demonstrate in this paper how to build these capabilities into an electronic mail system.”

They write the procedures have four properties…

(a) Deciphering the enciphered form of a message M[message] yields M. Formally,

D(E(M) = M.

(b) Both E and D are easy to compute.

(c) By publicly revealing E the user does not reveal an easy way to compute D. This means that in practice only he can decrypt messages encrypted with E, or compute D efficiently.

(d) If a message M is first deciphered and then enciphered, M is the result. Formally,

E(D(M) = M.

They write, “How can user Bob send Alice a “signed” message M in a public-key cryptosystem? He first computes his “signature” S for the message M using DB:

S = DB(M) .

They write their encryption and decryption methods are… “To encrypt a message M with our method, using a public encryption key (e,n), proceed as follows. (Here e and n are a pair of positive integers.)

First, represent the message as an integer between 0 and n − 1. (Break a long message into a series of blocks, and represent each block as such an integer.) Use any standard representation. The purpose here is not to encrypt the message but only to get it into the numeric form necessary for encryption.

Then, encrypt the message by raising it to the e th power modulo n. That is, the result (the ciphertext C) is the remainder when M e is divided by n.

To decrypt the ciphertext, raise it to another power d, again modulo n. The encryption and decryption algorithms E and D are thus:

C ≡ E (M) ≡ Me(mod n), for a message M .

D (C)≡Cd (mod n), for a ciphertext C .

Note that encryption does not increase the size of a message; both the message and the ciphertext are integers in the range 0 to n − 1.

The encryption key is thus the pair of positive integers (e,n). Similarly, the decryption key is the pair of positive integers (d,n). Each user makes his encryption key public, and keeps the corresponding decryption key private. (These integers should properly be subscripted as in nA,eA, and dA, since each user has his own set.

However, we will only consider a typical set, and will omit the subscripts.)

How should you choose your encryption and decryption keys, if you want to use our method?

You first compute n as the product of two primes p and q:

n=p · q .

These primes are very large, “random” primes. Although you will make n public, the factors p and q will be effectively hidden from everyone else due to the enormous difficulty of factoring n. This also hides the way d can be derived from e.

You then pick the integer d to be a large, random integer which is relatively prime to (p − 1) · (q − 1). That is, check that d satisfies:

gcd(d,(p−1) · (q−1)) = 1

(“gcd” means “greatest common divisor”).

The integer e is finally computed from p, q, and d to be the “multiplicative inverse” of d, modulo (p − 1) · (q − 1). Thus we have

e · d ≡ 1 (mod (p − 1) · (q − 1)).

There is a detailed section in the team’s paper on the underlying mathematics to proof the work and then a section on how to find large prime numbers which is important in any kind of cryptic work.

They say each user in the scheme must choose two large random numbers being called p and q as they are necessary for creating the public and private keys and that the numbers must be large so it is not computationally feasible for anyone to factor n = p times q.

The number n is placed in a public file accessible for general use for people to send a message to a person but the numbers p and q will not be.

The team recommended using a 100 digit decimal prime numbers for p and q so that n will have 200 digits. Of course we know that the RSA system recommends even larger numbers.

But the procedure for finding a 100 digit “random” prime number is to generate odd 100 digit random number until a prime number is found. According to the prime number theorem about (ln 10100)/2 = 115 numbers will be tested before a prime is found.

They say that to test a large number b for primality they recommend the elegant “probabilistic” algorithm from Solovay and Strassen where a random number a is picked from a uniform distribution on {1,…, b – 1} and tests whether…

gcd(a,b) = 1 and J(a,b) =a(b−1)/2 (mod b),

where J(a,b) is the Jacobi symbol [7]. If b is prime (6) is always true. If b is composite (6) will be false with probability at least 1/2. If (6) holds for 100 randomly chosen values of a then b is almost certainly prime; there is a (negligible) chance of one in 2 100 that b is composite. Even if a composite were accidentally used in our system, the receiver would probably detect this by noticing that decryption didn’t work correctly. When b is odd, a ≤ b, and gcd(a,b) = 1, the Jacobi symbol J(a,b) has a value in{−1,1} and can be efficiently computed by the program:

J(a,b) = if a = 1 then 1 else

if a is even then J (a/2,b) · (−1)(b2−1)/8

else J(b (mod a),a) · (−1)(a−1) · (b−1)/4

So obviously testing large prime numbers is a pretty lengthy process and that’s the whole reason this type of encryption works. Just getting a set of prime numbers is difficult for a computer and becoming increasingly so as the number grows in size. However today simple methods for determining if a number is prime can be much more straight forward http://www.javawithus.com/programs/prime-numbers

public static boolean isPrime(int x){
if (x <= 1){//check is the number is 1 or less
return false;
}
for (int i = 2;i<Math.sqrt(x);i++){
if (x % i==0){
return false;
}
}
return true;
}

The computational power to crack the scheme then means that if the scheme is in common use the decryption of all traffic across a network is impossible. While although as they say even in their time a high-speed computer could determine in several seconds whether a 100-digit number is prime and can find the first prime after a given point in a minute or two reversing the product of two large primes like this to factor them is a huge workload for a computer even in our time.

Okay in the next installment we are going to run through a java program that incorporates classes which help us to use asymmetric encryption. Stay tuned for part 3 of these encryption lessons for java.

Sources: https://people.csail.mit.edu/rivest/Rsapaper.pdf

http://www.mathsisfun.com/definitions/relatively-prime.html

CYBER SECURITY ESSENTIALS Edited by James Graham Richard Howard Ryan Olson Copyright 2011 Auerbach Publications
Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742