Cryptopals Set 1

Challenge 1 - Convert hex to base64

The string:
49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
Should produce:
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
So go ahead and make that happen. You’ll need to use this code for the rest of the exercises.
Cryptopals Rule
Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.

If you observe the string, you will notice that all characters of the string belong to the hexadecimal (or hex) number system’s character space - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, and f. Hex encoding is an encoding format where every character encodes 4 bits of information (2^4 = 16). First the original string is decoded from hex to raw bytes:

import binascii
raw_string = binascii.unhexlify("49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d")

Next, the string is base64 encoded. Base64 encoding is a an encoding format where every base64 character is used to encode 6 bits of information (2^6 = 64).

import base64
base64_string = base64.b64encode(raw_string)

Both hex and base64 encoding methods are used to print raw bytes in readable form. However base64 encoding results in a smaller string length as compared to hex encoding. Refer to the Wikipedia articles for more information about base64 and hex encoding.

The base64 module in python is used to convert the hex encoded string to a base64 encoded string.

The base64 encoded string is then compared with the final string.

assert OUTPUT_STRING.encode("ascii") == get_base64_string_from_hex_encoded_string(INPUT_STRING)

The entire code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_01.py file.

Challenge 2 - Fixed XOR

Write a function that takes two equal-length buffers and produces their XOR combination.
If your function works properly, then when you feed it the string:
1c0111001f010100061a024b53535009181c
… after hex decoding, and when XOR’d against:
686974207468652062756c6c277320657965
… should produce:
746865206b696420646f6e277420706c6179

XOR is a logical operation between two bits which results in a 1 if both the bits are different, and a 0 if both the bits are the same. Refer to the Wikipedia article to learn for about the XOR operation.

Bitwise XOR in python is done using the ^ operation. First, the two input strings and the output string are converted to raw bytes.

import binascii

INPUT_STRING_1 = "1c0111001f010100061a024b53535009181c"
RAW_INPUT_STRING_1 = binascii.unhexlify(INPUT_STRING_1)

INPUT_STRING_2 = "686974207468652062756c6c277320657965"
RAW_INPUT_STRING_2 = binascii.unhexlify(INPUT_STRING_2)

OUTPUT_STRING = "746865206b696420646f6e277420706c6179"
RAW_OUTPUT_STRING = binascii.unhexlify(OUTPUT_STRING)

The following function takes two raw strings and XOR’ed byte by byte:

def xor(sequence_1, sequence_2):

    if len(sequence_1) != len(sequence_2):
        raise Exception("Inputs are of unequal length")

    return bytes(a ^ b for a, b in zip(sequence_1, sequence_2))

The two input strings are XOR’ed using the function defined above:

The result is encoded back to hex to be compared with the the final string:

assert RAW_OUTPUT_STRING == xor(RAW_INPUT_STRING_1, RAW_INPUT_STRING_2)

The entire code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_02.py file.

Challenge 3 - Single-byte XOR cipher

The hex encoded string:
1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
… has been XOR’d against a single character. Find the key, decrypt the message.
You can do this by hand. But don’t: write code to do it for you.
How? Devise some method for “scoring” a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.
Achievement Unlocked
You now have our permission to make “ETAOIN SHRDLU” jokes on Twitter.

This challenge is a modified version of the famous “Caesar Cipher”. Instead of rotating the alphabets in a string by a fixed amount as in the case of Caesar cipher, here a single byte is XOR’ed with the entire plaintext.

The simplest attack against this encryption scheme would be to try all possible keys and identify strings that make sense in the English language. This is because the number of keys is very small. The key is a byte, which is 8 bits in length. Therefore, the total number of keys would be 256 (=2^8).

import binascii
INPUT_STRING = "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"
RAW_INPUT_STRING = binascii.unhexlify(INPUT_STRING)

We have a simple function that takes a string and a character, and XORs the whole string with that character.

def single_byte_xor(input_string, key):
    return ''.join(map(lambda x: chr(x ^ ord(key)), input_string))

This function is used to loop over the entire ASCII characterspace. This will generate 256 strings, and they have to manually inspected to identify the correct plaintext. Since the number of keys is limited, this is possible.

There are several ways the number of strings to inspect can be reduced, if there is information available about the plaintext. For example, if it is known that the plaintext is an english sentence, all computed plaintexts without a space can be ignored. Similarly, all computed plaintexts with characters not in the English alphabet can be ignored. The list goes on. After reducing plaintexts in this fashion, the next strategy would be to employ frequency analysis.

Several types of frequency analyses exist, and for this challenge the relative letter frequency analysis can be used. In English sentences, some letters are occur much more frequently than others.

FREQUENCIES = { 
    'a': 0.0651738,
    'b': 0.0124248,
    'c': 0.0217339,
    'd': 0.0349835,
    'e': 0.1041442,
    'f': 0.0197881,
    'g': 0.0158610,
    'h': 0.0492888,
    'i': 0.0558094,
    'j': 0.0009033,
    'k': 0.0050529,
    'l': 0.0331490,
    'm': 0.0202124,
    'n': 0.0564513,
    'o': 0.0596302,
    'p': 0.0137645,
    'q': 0.0008606,
    'r': 0.0497563,
    's': 0.0515760,
    't': 0.0729357,
    'u': 0.0225134,
    'v': 0.0082903,
    'w': 0.0171272,
    'x': 0.0013692,
    'y': 0.0145984,
    'z': 0.0007836,
    ' ': 0.1918182 
}

The following function loops through the whoe ciphertext, computes a score for each key character, and retuns the character with the highest score.

def detect_xor_key(ciphertext):
    scores = {}
    results = {}

    for char_int in range(0,256):
        char = chr(char_int)

        output_string = single_byte_xor(ciphertext, char)

        score = 0.0

        for x in output_string:
            if x.lower() in FREQUENCIES:
                score += FREQUENCIES.get(x.lower())

        scores[char] = score
        results[char] = output_string


    if (len(scores) > 0):
        sorted_scores = dict(sorted(scores.items(), key=lambda item: item[1], reverse=True)[:1])

        for key in sorted_scores.keys():
        return key, results[key], sorted_scores[key]
    else:
        return "NONE", "", 99999

This gives a key of X and a plaintext of Cooking MC's like a pound of bacon.

The entire source code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_03.py file.

Challenge 4 - Detect single-character XOR

One of the 60-character strings in this file has been encrypted by single-character XOR.
Find it.
(Your code from #3 should help.)

The next steps are straight-forward, with calling the detect_xor_key() function from challenge 3 on each line of the file, and keeping track of the maximum value.

import set_1_challenge_03 as challenge_3

def identify_single_char_xor():
    source_file = open("./challenges/files/4.txt", "r")
    lines = source_file.readlines()

    max_line = 0
    max_score = 0.0
    max_key = ''
    max_plaintext = ''

    for counter, line in enumerate(lines):
        ciphertext_line = line.rstrip()
        raw_ciphertext_line = binascii.unhexlify(ciphertext_line)

        current_key, current_plaintext, current_score = challenge_3.detect_xor_key(raw_ciphertext_line)

        if current_score > max_score:
            max_score = current_score
            max_plaintext = current_plaintext
            max_key = current_key
            max_line = counter

    print("Line #" + str(max_line) + " : Key '" + max_key + "' : " + max_plaintext)

This will print the follwoing result -

Line #170 : Key '5' : Now that the party is jumping

The entire source code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_04.py file.

Challenge 5 - Implement repeating-key XOR

Here is the opening stanza of an important work of the English language:
Burning 'em, if you ain't quick and nimble
I go crazy when I hear a cymbal
Encrypt it, under the key “ICE”, using repeating-key XOR.
In repeating-key XOR, you’ll sequentially apply each byte of the key; the first byte of plaintext will be XOR’d against I, the next C, the next E, then I again for the 4th byte, and so on.
It should come out to:
0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272
a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f
Encrypt a bunch of stuff using your repeating-key XOR function. Encrypt your mail. Encrypt your password file. Your .sig file. Get a feel for it. I promise, we aren’t wasting your time with this.

This challenge uses the xor() function defined in Challenge 2.

def repeated_xor_encrypt(plaintext, key):

    key_length = len(key)

    ciphertext = b''

    for counter, char in enumerate(plaintext):
        current_key_char = key[counter%key_length]

        current_ciphertext_char = challenge_2.xor(char.encode('utf-8'), current_key_char.encode('utf-8'))

        ciphertext = ciphertext + current_ciphertext_char

    return ciphertext

The plaintext and the key are provided to the repeated_xor_encrypt() function for encryption.

The final output is hex encoded to be printable.

INPUT_STRING = "Burning 'em, if you ain't quick and nimble\nI go crazy when I hear a cymbal"

KEY = "ICE"

OUTPUT_STRING = "0b3637272a2b2e63622c2e69692a23693a2a3c6324202d623d63343c2a26226324272765272a282b2f20430a652e2c652a3124333a653e2b2027630c692b20283165286326302e27282f"

ciphertext_string = repeated_xor_encrypt(INPUT_STRING, KEY)
hex_ciphertext = binascii.hexlify(ciphertext_string).decode('ascii')

assert hex_ciphertext == OUTPUT_STRING

The entire source code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_05.py file.

Challenge 6 - Break repeating-key XOR

It is officially on, now.
This challenge isn’t conceptually hard, but it involves actual error-prone coding. The other challenges in this set are there to bring you up to speed. This one is there to qualify you. If you can do this one, you’re probably just fine up to Set 6.
There’s a file here. It’s been base64’d after being encrypted with repeating-key XOR.
Decrypt it.
Here’s how:
Let KEYSIZE be the guessed length of the key; try values from 2 to (say) 40.
Write a function to compute the edit distance/Hamming distance between two strings. The Hamming distance is just the number of differing bits. The distance between:
this is a test
and
wokka wokka!!!
is 37. Make sure your code agrees before you proceed.
For each KEYSIZE, take the first KEYSIZE worth of bytes, and the second KEYSIZE worth of bytes, and find the edit distance between them. Normalize this result by dividing by KEYSIZE.
The KEYSIZE with the smallest normalized edit distance is probably the key. You could proceed perhaps with the smallest 2-3 KEYSIZE values. Or take 4 KEYSIZE blocks instead of 2 and average the distances.
Now that you probably know the KEYSIZE: break the ciphertext into blocks of KEYSIZE length.
Now transpose the blocks: make a block that is the first byte of every block, and a block that is the second byte of every block, and so on.
Solve each block as if it was single-character XOR. You already have code to do this.
For each block, the single-byte XOR key that produces the best looking histogram is the repeating-key XOR key byte for that block. Put them together and you have the key.
This code is going to turn out to be surprisingly useful later on. Breaking repeating-key XOR (“Vigenere”) statistically is obviously an academic exercise, a “Crypto 101” thing. But more people “know how” to break it than can actually break it, and a similar technique breaks something much more important.
No, that’s not a mistake.
We get more tech support questions for this challenge than any of the other ones. We promise, there aren’t any blatant errors in this text. In particular: the “wokka wokka!!!” edit distance really is 37.

First, define a function to compute hamming distance between two strings.

def hamming_distance(str1, str2):
    assert len(str1) == len(str2)

    hamming_distance = 0

    for counter, str1_char in enumerate(str1):
        str1_char_bits = format(str1_char, '08b')
        str2_char_bits = format(str2[counter], '08b')

        for bit_counter, str1_char_bit in enumerate(str1_char_bits):
            if str1_char_bit != str2_char_bits[bit_counter]:
                hamming_distance = hamming_distance + 1

    return hamming_distance

Next step is to identify the key size. The find_key_size() takes 12 chunks of the ciphertext, with each chunk varying in length between 2 and 41, and computes the highest average hamming distance for each chunk size. The function then sorts the sizes in decreasing order of hamming distances and returns a dictionary.

def get_smallest_normalized_edit_distance(string):
    keysize_dict = {}

    for x in range(2, 40):
        blocks = [string[i:i+x] for i in range(0, len(string), x)][0:5]
        pairs = list(itertools.combinations(blocks, 2))
        scores = [hamming_distance(p[0], p[1])/float(x) for p in pairs]

        keysize_dict[x] = sum(scores) / len(scores)

    return sorted(keysize_dict.items(), key=lambda x:x[1])[0][0]

A function to compute the transpose of a string is also defined. This function makes a string out of the first character of every block, a string out the secod charaters of every block, and so on. This function retruns a block size number of strings.

def transpose(text, size):
    output = {}

    for i in xrange(size):
        output[i] = ''

    for count, char in enumerate(text):
        output[count%size] = output[count%size] + char

    return output

Next, we define a function that takes a ciphertext and performs the decryption of the After identifying the key size, the ciphertext is transposed by breaking up the string as per the key size. Then, all keys are looped over and XOR’ed with each block. For each key, frequency scoring string is identified using the detect_xor_key() function. All individually identified key bytes are combined to recreate the key. The ciphertext is then repeating-key XOR’ed with the key to get the plaintext.

def break_repeating_key_xor(ciphertext):
    raw_ciphertext = base64.b64decode(ciphertext)

    potential_keysize = get_smallest_normalized_edit_distance(raw_ciphertext)

    transposed_blocks = get_transposed_blocks(raw_ciphertext, potential_keysize)

    key = ""
    score = 0.0

    for block in transposed_blocks:
        block_key, block_plaintext, block_score = challenge_03.detect_xor_key(block)
        key = key + block_key
        score = score + block_score

    return key

Using the key, decrypt the ciphertext.

ciphertext_file = open("challenges/files/6.txt")
ciphertext_file_contents = ciphertext_file.readlines()
ciphertext = ""

for line in ciphertext_file_contents:
    ciphertext = ciphertext + line.strip()

raw_ciphertext = base64.b64decode(ciphertext)

key = break_repeating_key_xor(ciphertext)

plaintext_bytes = challenge_05.repeated_xor_encrypt(raw_ciphertext.decode('utf-8'), key)

print("Key: " + key)
print("Plaintext: \n" + plaintext_bytes.decode('utf-8'))

The entire source code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_06.py file.

Challenge 7 - AES in ECB mode

The Base64-encoded content in this file has been encrypted via AES-128 in ECB mode under the key
"YELLOW SUBMARINE".
(case-sensitive, without the quotes; exactly 16 characters; I like “YELLOW SUBMARINE” because it’s exactly 16 bytes long, and now you do too).
Decrypt it. You know the key, after all.
Easiest way: use OpenSSL::Cipher and give it AES-128-ECB as the cipher.
Do this with code.
You can obviously decrypt this using the OpenSSL command-line tool, but we’re having you get ECB working in code for a reason. You’ll need it a lot later on, and not just for attacking ECB.

The Advanced Encryption Standard (AES), is a block cipher used for encryption. AES is a symmetric encryption algorithm, and both parties talking must share the same key beforehand. Refer to the Wikipedia article on AES to learn more.

Electronic Code Book (ECB) is the simplest way of applying the AES algorithm on a message. This involves splitting the message into equal sized blocks and applying the algorithm on each block.

For this challenge, the PyCrypto package is used for crypto functionality. As a rule of thumb, steer away from implementing cryptographic algorithms yourself. As much as possible use well documented public cryptographic implementations only.

The decryption function:

from Crypto.Cipher import AES

def aes_ecb_decrypt(plaintext, key):
    obj = AES.new(key, AES.MODE_ECB)
    return obj.encrypt(plaintext)

The data for the file is read and passed on to this function:

import base64
...
KEY = "YELLOW SUBMARINE"

f = open("challenge7/encrypted.txt", "r")
ciphertext = base64.b64decode(f.read())

print aes_ecb_decrypt(ciphertext, KEY)

The entire source code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_07.py file.

Challenge 8 - Detect AES in ECB mode

In this file are a bunch of hex-encoded ciphertexts.
One of them has been encrypted with ECB.
Detect it.
Remember that the problem with ECB is that it is stateless and deterministic; the same 16 byte plaintext block will always produce the same 16 byte ciphertext.

The key to identifying the usage of ECB mode with AES is to look for identical blocks in the ciphertext. With AES being a deterministic algorithm, the same plaintext will produce the same ciphertext.

The has_repeated_blocks() function will split the ciphertext into blocks of a given size, and check if there are any repeated blocks. The presence of a repeated block indicates that the mode used was ECB.

def detect_ecb_encryption(string, block_size = 16):
    blocks = [string[i:i+block_size] for i in range(0, len(string), block_size)]

    blocks_without_duplicates = [*set(blocks)]

    if len(blocks_without_duplicates) < len(blocks):
        return True
    else:
        return False

The file content is passed line by line to the detect_ecb_encryption() function to identify the line with ECB encryption.

ciphertext_file = open("challenges/files/8.txt")

ciphertext_file_contents = ciphertext_file.readlines()

lineNumber = 1

for line in ciphertext_file_contents:
    raw_ciphertext_line = binascii.unhexlify(line.rstrip())

    if detect_ecb_encryption(raw_ciphertext_line):
        break

    lineNumber += 1

print("Line #" + str(lineNumber))

The source code can be found at source/challenge8.py.

The entire source code can be found at in the arunjohnkuruvilla/cryptopals-crypto-challenges repository in the challenges/set_1_challenge_08.py file.

Challenge 1 - Convert hex to base64#

Challenge 2 - Fixed XOR#

Challenge 3 - Single-byte XOR cipher#

Challenge 4 - Detect single-character XOR#

Challenge 5 - Implement repeating-key XOR#

Challenge 6 - Break repeating-key XOR#

Challenge 7 - AES in ECB mode#

Challenge 8 - Detect AES in ECB mode#