Product moment correlation coefficient with Excel

Aim

This webpage assumes that you have already completed:

  • Excel Frequency Analysis.
  • This sheet assumes that you have completed a previous Excel spreadsheet which will carry out frequency analysis and produce a list of percentages for the letters present in the cipher text.

    Correl function

    Copy your statistics for plaintext and the cipher text to two adjacent columns of a new worksheet. Keep the letters of each adjacent to them and then sort plaintext letters against their statistics and ciphertext letters against their statistics.

    If the ciphertext has teh same frequency distribution as plaintext then if the pairs of the two columns of numbers are taken as pairs together then a straight line graph should be produced.

    Excel has a function correl which will give you the product moment correlation coefficient of the two lists of numbers. 1 = perfect positive correlation which indicates that you have a ciphertext with the right statistics.
    0 = no correlation. Anything above 0.9 on a couple of hundred letters is significant enough to warrant careful attention.

    =CORREL(B1:B26,C1:C26)

    B1:B26 are the plaintext values and C1:C26 are the ciphertext values

    Extensions

    PMCC really comes into its own for cracking Viginere or Beaufort ciphers. You can add it to a frequency analysis tool so that having counted the letter frequency of every nth letter you then work through every letter in the alphabet to see if the stats starting on that letter match plaintext. Do this by writing your own subroutine to calculate a PMCC and then roll it through the table with a loop.
    Or cut and paste your stats out and just use the correl function. E or its ciphertext equivalent should be the most abundant letter so chop teh column with this in mind.

    Viginere cipher text = plaintext + key. So its alphabets go down the column like the plaintext.

    Beaufort cipher text = key - plaintext. So its alphabets go up the column opposite to the plaintext.

    NB if you have used sort on your statistics then all you know is that the statistics are right but you then need to work out which direction they are running in etc.

    WIKI PMCC page

    WIKI HOW page on PMCC

    The Excel function help pop up also gives the formula.


    LetterFrequency %
    A 8.2
    B 1.5
    C 2.8
    D 4.3
    E 12.7
    F 2.2
    G 2.0
    H 6.1
    I 7.0
    J 0.2
    K 0.8
    L 4.0
    M 2.4
    N 6.7
    O 7.5
    P 1.9
    Q 0.1
    R 6.0
    S 6.3
    T 9.1
    U 2.8
    V 1.0
    W 2.4
    X 0.2
    Y 2.0
    Z 0.1

    last updated 6th October 2013