## Solving Cryptograms

A cryptogram is a message in code which you are invited to decrypt. The encryptian scheme is that of a simple substitution cipher, each letter is replaced by a different letter, and with the original blank spaces and punctuation left intact. Each A might have become an H, B might have become W, etc. "The" may have become "JYB." If you had the key, you could make the backwards substitutions, and then read the original message. But it is often possible to deduce the key and the original message, just by studying the encrypted message. In that case, you would use your knowledge of certain biases of the English (or other) language.

In the English language, E is the most common letter, followed by T, A, O, N (with I and S close behind). The words A and I are the most common one letter words (with O at the beginning of a sentence, a very rare third place). The most common two letter words are OF, TO, IN, IS, IT, AS, HE, BE, BY, ON, OR, AT (with a few other alternatives). The most common three letter words are THE and AND (followed by FOR, HIS, NOT, BUT, YOU, ARE, HER, HAD). The most common four letter word is THAT (WITH, HAVE, FROM). Q is always followed by U. E is the most common letter at the end of a word. T is the most common letter at the beginning of a word.

There are also patterns of repeated letters. ABB may be ALL (although one letter is normally not substituted for itself) or TOO or SEE or one of many rarer words. NRVN is most likely to be THAT. Also THERE, WHERE, THESE, WHICH, and LITTLE also form familiar patterns.

In puzzle books, cryptograms are often begin with "THE BEST..." or "THE ONLY..." or "ONE OF THE..." Although your best first assumption is that the most common letter in a cryptogram should be E, you may be wrong. Some day you may see a puzzle which contains no E's whatsoever. Solving cryptograms is a trial and error process (even though I normally solve them with pen and ink). Substituting ETAON for the five most common letters in your cryptogram may help you solve it, but some of those letters are almost surely wrong. Knowing one or two letters of a word often gives you the rest of the word. Words that often go together, like "it is" can be valuable clues, too.

Solving a cryptogram is a cummulative process (like solving a cross word puzzle) in which you keep accummulating clues. Longer cryptograms are easier to solve than shorter ones.

Cryptograms often contain errors. These can be frustrating for the solver. But it also gives some insight into the problems confronting code breakers who solve more complicated and more serious ciphers (in the military, for example). They often have to deal with more errors, in an encrypted message, than we do.

Here is a cryptogram which you may enjoy solving:

HJYVW XR BQZ PYJJNJ, YWVNJOPXR BQOB MNX OJZ! OVA BOSZ UOJZ VNB BN UNVBOPYVOBZ BQZ WKORR LYBQ BQZ JZEKZUBYNV NE MNXJ FYROWZ.

Solution:

We can count letters, and see that Z occurs most often (ten times), and is probably "e". We also see that BQZ occurs three times, and may be "the". When we see BQOB, and know that the most common four letter word is "that", then we are almost certain that Z=e, B=t, Q=h, and O=a. We also see the two letter word tN. The only common two letter word that starts with "t" is "to." So N is probably o. Let's make those substitutions (in small letters):

HJYVW XR the PYJJoJ, YWVoJaPXR that MoX aJe! aVA taSe UaJe Vot to UoVtaPYVate the WKaRR LYth the JeEKeUtYoV oE MoXJ FYRaWe.

Examining the various words: PYJJoJ looks intriguing; aJe can only be a few words (are, ate, ace, age, ale, ape); aVA may be "and" because of its place at the beginning of the second sentence; taSe can only be a few words; Vot may be "not" or "got" because it is followed by "to"; LYth can only be a few words; oE can be "of", "on", or "or". Let's look at LYth. Y is a vowel, of which "i" and "u" are all that remain, and "Luth" doesn't seem to work, so Y is probably "i" and LYth becomes "with":

HJiVW XR the PiJJoJ, iWVoJaPXR that MoX aJe! aVA taSe UaJe Vot to UoVtaPiVate the WKaRR with the JeEKeUtioV oE MoXJ FiRaWe.

JeEKeUtioV probably ends in "n". That should help:

HJinW XR the PiJJoJ, iWnoJaPXR that MoX aJe! anA taSe UaJe not to UontaPinate the WKaRR with the JeEKeUtion oE MoXJ FiRaWe.

We can probably guess some more words: anA is probably "and"; UontaPinate is probably "contaminate"; oE is certainly "of". Let's try those:

HJinW XR the miJJoJ, iWnoJamXR that MoX aJe! and taSe caJe not to contaminate the WKaRR with the JefKection of MoXJ FiRaWe.

Now, miJJoJ can only be "mirror". Then caJe becomes "care" and the second sentence obvious begins "And take care..." And refKection becomes "reflection" especially because of the mention of "mirror". Let's add those:

HrinW XR the mirror, iWnoramXR that MoX are! and take care not to contaminate the WlaRR with the reflection of MoXr FiRaWe.

Now we can see that MoX must be "you" and MoXr must be "your":

HrinW uR the mirror, iWnoramuR that you are! and take care not to contaminate the WlaRR with the reflection of your FiRaWe.

Now, uR becomes "us", and WlaRR becomes "glass", iWnoramuR becomes "ignoramus", and HrinW becomes "bring":

bring us the mirror, ignoramus that you are! and take care not to contaminate the glass with the reflection of your Fisage.

Finally Fisage becomes "visage". Restoring the capitals at the beginning of the sentences:

Bring us the mirror, ignoramus that you are! And take care not to contaminate the glass with the reflection of your visage.

This is one of my favorite quotes. It is from the play Precious Provincials by Moliere (and translated into English by George Gravely).

What if I had encrypted the above message twice, using two different substitution ciphers? Would that be twice as difficult to decrypt? The answer to that is, "No." It is exactly the same difficulty. Two substitution ciphers transform to a third substitution cipher (unless one is the inverse of the other, in which case the encrypted message becomes the original unencrypted message).

Above, I said, "In the English language, E is the most common letter, followed by T, A, O, N (with I and S close behind)." That list varies with various sources. One source gives ETOANIRSHDLCWUMFYGPBVKXQJZ. ETAOINSHLDRU is famous. Another source gives ETAOINSRHLDCU... These lists may come from frequency counting using newspapers, novels, or other English texts. They are all useful when studying the letter frequencis of an encrypted message. The disagreement between them should be a warning that solving a cryptogram is seldom automatic, but requires some guess work.

Another pattern in a language is the digraph, popular two letter combinations. In English, the most popular digraphs are TH, HE, AN, IN, ER, RE, ES, ON, EA, TI... Words with a recognizable internal pattern are DID, NOON, and several four letter words with EE or OO in the middle.

My cryptogram above is based on this key:

```     ABCDEFGHIJKLMNOPQRSTUVWXYZ
dt??fv?b?rlwyoamhsk?cnguie```

If I had used the following key, what do you notice about it:

```     ABCDEFGHIJKLMNOPQRSTUVWXYZ
rnjzluyihcwepbxmtavqfskogd```

This key is unusual in that A=R=A, B=N=B, C=J=C, etc. So, if I send the message to you, and we both know the key, you don't have to use the key backwards to decrypt it. You can just re-encrypt the encrypted message, and it comes out plain.