Frequency Analysis with Excel

Aim

You are going to use Excel VBA to develop frequency analysis tools.

Text Entry Box.

This workbook uses textboxes and command buttons to count the frequency distribution of the letters in the ciphertext. This is one of the first jobs that you should perform on a new cipher text as it often yields vital clues to the type of cipher that you are dealing with.

The workbook will shift the message into uppercase automatically and eliminate any punctuation.

This webpage assumes that you have already completed:

Open a new Excel workbook and get the worksheet into Design Mode by clicking on the Design Mode toggle.

Add a large text box and a command button to your workbook.
Copy and paste the frequency analysis results for standard English into your workbook.

LetterFrequency %
A 8.2
B 1.5
C 2.8
D 4.3
E 12.7
F 2.2
G 2.0
H 6.1
I 7.0
J 0.2
K 0.8
L 4.0
M 2.4
N 6.7
O 7.5
P 1.9
Q 0.1
R 6.0
S 6.3
T 9.1
U 2.8
V 1.0
W 2.4
X 0.2
Y 2.0
Z 0.1

The first block of VBA code is taken from the clean and case button of the Simple Shift Cipher. Part 2 project.
The only alteration needed is to change textbox2 in the final line to textbox1 as this project only needs one textbox.

Test your code to check that it strips out punctuation etc. and converts lower to uppercase.

Private Sub CommandButton1_Click()
newStr = ""
newStr1 = ""
sourceStr = TextBox1.Value
newStr = Format(sourceStr, ">")
For i = 1 To Len(sourceStr)
If (Asc(Mid(newStr, i, 1)) >= 65) Then
If (Asc(Mid(newStr, i, 1)) <= 90) Then
newStr1 = newStr1 + Mid(newStr, i, 1)
End If
End If
If (Asc(Mid(newStr, i, 1)) = 32) Then
newStr1 = newStr1 + Mid(newStr, i, 1)
End If
Next i
TextBox1.Value = newStr1
End Sub

Now we need to add the code to work through the ciphertext and count up occurences of each letter.
This code is added to the end of the last section of code, just before the closing End Sub line.

sourceStr = TextBox1.Value
For i = 0 To 25
Worksheets("Sheet1").Cells(6 + i, 28).Value = 0
Next i
For i = 1 To Len(sourceStr)
If ((Asc(Mid(sourceStr, i, 1)) >= 65) And (Asc(Mid(sourceStr, i, 1)) <= 90)) Then
letterCount = Worksheets("Sheet1").Cells(Asc(Mid(sourceStr, i, 1)) - 59, 28).Value + 1
Worksheets("Sheet1").Cells(Asc(Mid(sourceStr, i, 1)) - 59, 28).Value = letterCount
End If
Next i
End Sub

Note


The statistics are going to be written into the cells of column M starting at cell M6. To do this you use the Cells command. This takes worksheet references in the form (row,column) as numerical indices counting A as 1. M6 is therefore (6,13).
The first for next loop writes 0's into all the cells in column M.
The second for next loop then reads through the ciphertext. It finds the ASCII code of each letter and then subtracts 59 (65 = ASCII A - 6 for were the data table starts.) to use it as an index in the data array. The current data value is read and incremented before being written back into the cell.

Use Excel in the normal way to draw frequency graphs for normal English and the ciphertext.
I have added the code =SUM(M6:M31) to cell M32 and then used this value to generate the %'s for my graph.

Use this code =100*M6/M$32 in cell N6 to calculate the percentage of A's in the text. Copy it down the column for the other letters.

Extension to Periodic Viginere Ciphers.

Start a new workbook.

Add a 'spin button' from the controls toolbar to the sheet and use this to alter the period of the suspected Viginere repeat. Put the label 'period' into cell I9 and make the target of the spin button cell I10.

Add a button and call it 'run'. Also add the statisticss for plain English and a large text box.

Make sure that you are in design mode and double click the 'run' button to summon the VBA editor. The code you need is shown below with commentary on the right.

Private Sub CommandButton1_Click()
freqjump = Worksheets("Sheet1").Cells(10, 9).Value
tot = 0
sourceStr = TextBox1.Value
Worksheets("Sheet1").Range("L4:bZ29").ClearContents
For n = 1 To freqjump
tot = 0
Worksheets("Sheet1").Cells(3, 11 + n).Value = n
For i = n To Len(sourceStr) Step freqjump
If ((Asc(Mid(sourceStr, i, 1)) >= 65) And (Asc(Mid(sourceStr, i, 1)) <= 90)) Then
tot = tot + 1
Worksheets("Sheet1").Cells(Asc(Mid(sourceStr, i, 1)) - 61, 11 + n).Value = Worksheets("Sheet1").Cells(Asc(Mid(sourceStr, i, 1)) - 61, 11 + n).Value + 1
End If
Worksheets("Sheet1").Cells(31, 11 + n).Value = tot
Next i
For i = 1 To 26
Worksheets("Sheet1").Cells(3 + i, 11 + n).Value = 100 * Worksheets("Sheet1").Cells(3 + i, 11 + n).Value / tot
Next i
Next n
End Sub

freqjump is the number of letters in the suspected password.

The code counts the frequency of every nth letter.

tot is the current total of nth letters

Clear the results area

outer loop goes from 1 to freqjump

reset the total to 0 for the current nth letter and put n at the top of the results column.

The count only works on uppercase letters

Move through the text from start to finish but only counting the nth letter

Use the ascii code value of the nth letter as an index to increment the correct row within the nth column. Capital letters begin with Ascii 65 and an offset is needed for the 3 empty columns at teh top of the sheet.

Then use the total for the column to obtain the percentages for the nth letter.

Example of the 'product'. In this case the guess of 6 for the keyword length is wrong as the stats don't look close enough to english. They are not that for off though.

Once you get a match. Try to identify letters such as ETAION and use these to work back to candidate keywords. IE if E has become such and such what letter is A? The cipher text letter corresponding to A will be one of teh letters of the keyword.

Extension to use with product moment correlation coefficients.

last updated 6th October 2013