Possibility of Overlapping

A few words about the issue of returning the input file as the output in an encryption algorithm...

1. Algorithm as a black box

If we think of any encryption algorithm, irrespective of its content, like a black box, we can accept the encryption process as an injective mapping from the input file to the output file. If we consider that both plain and chiper alphabets are B = {0, 1}, we can write

E_k : Bⁿ ® Bⁿ for our E_k encryption function. n is the word or block length that is determined by the selected algorithm.

The point to be emphasized here is: It’s not important at all that E_k function forms a Fiestel- or a SP-network, or it’s a stream chipher or represents another encryption method. It’s enough that it just doesn’t contain extension, that means the length of the input file is equivalent to the length of the output file. Moreover, a little detailing would suffice to remove this limitation.

Because Bⁿ = Bⁿ equation is an identity, pt; ct ϵ Bⁿ statement is true ̶ pt stands for plaintext (the input file) and ct for ciphertext (the output file).

The accuracy of this statement lays bare the possibility of pt = ct. This is a possibility that lowers when n increases, but it doesn’t become zero as long as n is finite.

2. Possibility of Overlapping

The inversion of input file to output file in an algorithm mustn’t be able to be followed through indexes (at least easily). So, some barriers such as substitution vectors, S-boxes... are placed in the algorithm to stop the index connection at least in the means of arithmetic relations. It might be considered that the chances for the pt = ct equation to become true get even more lower because of these interruptions since they stop the algorithm being a permutation sequence carried on linear index functions, although they never break the determinist relation between pt and ct. In fact, these barriers (S-boxes for example) can be arranged so that the number of 1s input is different from the number of 1s outcome and pt = ct equation becomes impossible. If this could be done, the thesis of that the content of E_k function is not important at all loses its meaning. However, in order to do that, the result of (pt Ä k) operation which takes place in almost every algorithm and is the most effective place the key is used should be known categorically ̶ that means, it must be known as seperate from the plaintext and key which are variables with the algorithmic structure used. It means that encrypting turns into a lookup table operation rather than an encryption. So, this method doesn’t appear to be a solution. (Also, in order the said to be true, there doesn’t have to be a (pt Ä k) operation in the algorithm. Even it’s not the operation, in the algorithm there will always be a step based on the interaction of the plaintext and the key. Or else, the key loses its meaning.)

3. Algorithm Design Factor

It’s also misleading to perceive the problem as an error of the algorithm. Because the problem is not just p_i = c_i - it doesn’t appear only in this way.

That val (p_i) = val(c_i) providing that val(c) is a function reflecting the value of c determines the same result. Such an example can be given for further explanation. Given that the subscripts are w-white, r-red, g-green, b-blue, y-yellow, p-purple, o-orange, the 8-bit long plaintext (0_𝓨1_r1_w0_g0r1_g 0_b0_𝓨) can be converted to (0_𝓨1_g1_r0_r0_b1_w0_w0_g) or (0_𝓨1_g1_o0_r 0_p1_p0_w0_o) by encrypting. (Here the colors are a markup, they shouldn’t be considered as an alphabet expansion.) First output shows a permutation, the second one shows a substitution, and the reason to mention the second output is to underline that the problem doesn’t result from the permutation. Considered the colors, the input and output files are completely different. But in a black and white view, the three files are the same as each other. Therefore, the point in question is not an algorithm error. So, it can't be said that if the algorithm is designed carefully the input file and the output file can never overlap.

4. A Testing Block

The solution can’t be adding a code sequence in the algorithm to measure and block the similarity between the input and output files. Such a solution means adding information to the algorithm about input/output, more or less, or narrowing the pool which the ciphertext will take place in. Even though narrowing the pool means that the place to look for output will be contracted, it’s not important at all, contrary to the popular belief – if n is big enough, a little contraction doesn’t cause a problem. But such a reduction means that the input takes place in the subtracted part, which is the real danger.

During the World War II, in the enchiphering algorithm designed for ENIGMA which was an encryption machine used by the German cryptographs, encrypting a letter to itself was prevented. It is told that this weakness was the reason for ENIGMA encryption to be broken.

5. An Algebraic Approach

We can also present the matter in an algebraic way. Let p be a parameter and define a parametric function f_pwith one variable, on a finite set A onto its own: f_p(c) : A → A. Construct a set R such as

R = { (c; 𝓨) : 𝓨 = f_p(c) and ; c, 𝓨 ϵ A}.

Now the question is if we can show that R is reflexive or not without knowing the content of f_p_.Furthermore, if we can show this for every possible value of _p_. Can we do it? If not (meaning we can't show it is reflexive or not) we can't say anything about the issue of returning the input file as the output in an encryption algorithm_.Moreover, it is not enough to say that R is not reflexive. It should be proved that there can never be an element in R in reflexive form: (e; e). As to writer's knowledge, there is no way of doing this_.

6. A Knowledge or A Belief (or A Hope)

On the other hand, we say that if the algorithm is strong enough and the key is long enough then the encrytion is safe enough even to cipher our top secrets. We say that because it takes billions times billions times billions...milleniums to check out every key ̶ of course, considering the brute force attack only_. It is knowledge that it takes so much time to check out every possible key_. But this is a kind of gambling - Mallory can make the right shot at the first try_. But just like some believe there exists a God and some believe he exists not, we believe that Mallory can't be that lucky_. So it is a belief. We don't know he will be or not, we hope he won't. We hope nobody is that lucky.

If we don't have an independent way to show that an algorithm will not produce a copy of the input file as an output, we just have a belief that it will not do that.

Ömer KÖKSAL

CRYPTTECH - Cyber Security Intelligence

Bu Blogda Ara

CRYPTTECH BLOG

Possibility of Overlapping

Yorumlar

Yorum Gönder

Bu blogdaki popüler yayınlar

1. Geleneksel Stajyer CTF Soru ve Cevapları

2. Geleneksel Stajyer CTF Soru ve Cevapları - 2017

Package Vulnerability Scanning