Cryptanalysis of the Vigenere Cipher

The keyword of a Vigenère cipher describes the rotation among the Caesar cipher alphabets that are used. Since we know the frequency at which English text appears, we can figure out the plainText by detecting the frequency characteristics of the cipherText.

This is the main weakness of monoalphabetic ciphers; although the letters themselves change, their frequency does not. If we know the length of the keyword, we can often determine the keyword and, hence, decrypt all messages encrypted with that keyword.

Step 1: Finding the key length

The Vigenere cipher uses different Caesar ciphers to consecutive letters, which means that -depending on the key length- the letter at index 0 will be enciphered with a different value than the letter at index 2,3,4..etc. If the key is 'KROB', the first letter will be enciphered with a key of 11 (since K is the 11th letter of the alphabet). The second letter will be enciphered with a key of 18 (R), the third with a key of 15 (O) and the fourth with a key of 2 (B). When we get to the fifth letter, the key starts over using 11 (K).

Using 'KROB' as the key, and starting with the first letter, every fourth letter will be enciphered with K -1, 5, 9, 13...etc. Starting with the second letter, every fourth letter will be enciphered with R -2, 6, 10, 14. Repeat the same process for O and B. If you choose two random letters from the English alphabet, there is a 1 in 26 probability (0.0385) that you chose the same letter both times. If you choose two random letters from English plaintext, then you have approximately a 2 out of 30 probability (0.0667) that the same letter will be choosen both times. If you calculate the probability of coincidences for a text, then calculate the ratio of that probability to the probability of random coincidences in the English language, this is the index of coincidence (IC).

The IC is a statistical technique which gives an indication of how English-like a piece of text is. The I.C. does not change if you apply a substitution cipher to the text, because the I.C. is based on the frequency of letters, and substitution ciphers do not modify the individual letter frequencies. Text similar to English will have an IC around 0.06, and closer to .03-.04 if the characters are uniformly distributed.

To determine the key length of a Vigenere cipher, we extract sequences from our cipherText, which are determined by the current key length we are testing. To be throuough, you could test for key lengths from 2-20, but since the median lenth of a password is 8, I like to first test 6-10; for simplicity, I will test for a key length of 2-4 below.

Using 'STCVVUBPDHIJDVFFWVACOIVPGKCURICXKSCPWVFBXXPVDVJFXKIBVCMJDTONOSODUKCNO' as our cipherText, and first assuming a key length of 2, we extract the two sequences 1,3,5,7... and 2,4,6,8... from the cipherText.

text S T C V V U B P D H I J D V F F W V A C O I V P G K C U R I C X K S C P W V F B X X P V D V J F X K I B V C M J D T O N O S O D U K C N O
1 S   C   V   B   D   I   D   F   W   A   O   V   G   C   R   C   K   C   W   F   X   P   D   J   X   I   V   M   D   O   O   O   U   C   O
2   T   V   U   P   H   J   V   F   V   C   L   P   K   U   I   X   S   P   V   B   X   V   V   F   K   B   C   J   T   N   S   D   K   N  

Using the formula below, we find the IC for both sequences, then the average IC of both sequences. IC calculator

where fi is the count of letter i (where i = A,B,...,Z) in the ciphertext, and N is the total number of letters in the ciphertext.

Key of 2
text S T C V V U B P D H I J D V F F W V A C O I V P G K C U R I C X K S C P W V F B X X P V D V J F X K I B V C M J D T O N O S O D U K C N O IC
1 S   C   V   B   D   I   D   F   W   A   O   V   G   C   R   C   K   C   W   F   X   P   D   J   X   I   V   M   D   O   O   O   U   C   O 0.0554622
2   T   V   U   P   H   J   V   F   V   C   L   P   K   U   I   X   S   P   V   B   X   V   V   F   K   B   C   J   T   N   S   D   K   N   0.0534759
                                                                                                                                    Average 0.0544691

Key of 3
text S T C V V U B P D H I J D V F F W V A C O I V P G K C U R I C X K S C P W V F B X X P V D V J F X K I B V C M J D T O N O S O D U K C N O IC
1 S     V     B     H     D     F     A     I     G     U     C     S     W     B     P     V     X     B     M     T     O     D     C     0.027668
2   T     B     P     I     V     W     C     V     K     R     X     C     V     X     V     J     K     V     J     O     S     U     N   0.055336
3     C     U     D     J     F     V     O     P     C     I     K     P     F     X     D     F     I     C     D     N     O     K     O 0.0592885
                                                                                                                                Average 0.0474308

Key of 4
text S T C V V U B P D H I J D V F F W V A C O I V P G K C U R I C X K S C P W V F B X X P V D V J F X K I B V C M J D T O N O S O D U K C N O IC
1 S       V       D       D       W       O       G       R       K       W       X       D       X       V       D       O       U       O 0.0784314
2   T       U       H       V       V       I       K       I       S       B       X       V       K       C       T       S       K       0.0661765
3     C       B       I       F       A       V       C       C       C       F       P       J       I       M       O       O       C     0.0955882
4       V       P       J       F       C       P       U       X       P       B       V       F       B       J       N       D       N   0.0588235
                                                                                                                                  Average 0.0747549

The sub-set testing for a key of 4 has the highest IC, which indicates the key is probably of length 4.

Step 2: Finding the key

Inferring our key is a length of 4, we only need to break 4 Ceaser ciphers. We start by extracting every fourth letter into its own sequence -totaling 4 sub-sets. Tool to get sub-sets.

sub-set text
1 SVDDWOGRKWXDXVDOU
2 TUHVVIKISVXVKCTS
3 CBIFAVCCCFPJIMOO
4 VPJFCPUXPBVFBJND

Now on each sub-set, we will compare the frequency distribution of our sub-sets using the Chi-squared statisics. In short, we want to decipher each sub-set with each of the 25 possible Ceasers ciphers, and compare frequency distribution of the deciphered text with the frequency distribution of English text for each key. Most of the time, the correct key will match the deciphered text with the lowest Chi-squared statistic. To start, your table should look like the table below.

0 S V D D W O G R K W X D X V D O U Chi-squared
1                                    
2                                    
3                                    
4                                    
5                                    
6                                    
7                                    
8                                    
9                                    
10                                    
11                                    
12                                    
13                                    
14                                    
15                                    
16                                    
17                                    
18                                    
19                                    
20                                    
21                                    
22                                    
23                                    
24                                    
25                                    

After we have shifted each letter in our first sub-set, it should look like the table below.

0 S V D D W O G R K W X D X V D O U
1 T W E E X P H S L X Y E Y W E P V
2 U X F F Y Q I T M Y Z F Z X F Q W
3 V Y G G Z R J U N Z A G A Y G R X
4 W Z H H A S K V O A B H B Z H S Y  
5 X A I I B T L W P B C I C A I T Z  
6 Y B J J C U M X Q C D J D B J U A  
7 Z C K K D V N Y R D E K E C K V B  
8 A D L L E W O Z S E F L F D L W C  
9 B E M M F X P A T F G M G E M X D  
10 C F N N G Y Q B U G H N H F N Y E  
11 D G O O H Z R C V H I O I G O Z F  
12 E H P P I A S D W I J P J H P A G  
13 F I Q Q J B T E X J K Q K I Q B H  
14 G J R R K C U F Y K L R L J R C I  
15 H K S S L D V G Z L M S M K S D J  
16 I L T T M E W H A M N T N L T E K  
17 J M U U N F X I B N O U O M U F L  
18 K N V V O G Y J C O P V P N V G M  
19 L O W W P H Z K D P Q W Q O W H N  
20 M P X X Q I A L E Q R X R P X I O  
21 N Q Y Y R J B M F R S Y S Q Y J P  
22 O R Z Z S K C N G S T Z T R Z K Q  
23 P S A A T L D O H T U A U S A L R  
24 Q T B B U M E P I U V B V T B M S  
25 R U C C V N F Q J V W C W U C N T