Possibility of Overlapping

A few words about the issue of returning the input file as the output in an encryption algorithm...

1. Algorithm as a black box
If we think of any encryption algorithm, irrespective of its content, like a black box, we can accept the encryption process as an injective mapping from the input file to the output file. If we consider that both plain and chiper alphabets are = {0, 1}, we can write 
Ek : Bn ® Bn for our Ek encryption function. n is the word or block length that is determined by the selected algorithm.
The point to be emphasized here is: It’s not important at all that Ek function forms a Fiestel-  or a SP-network, or it’s a stream chipher or represents another encryption method. It’s enough that it just doesn’t contain extension, that means the length of the input file is equivalent to the length of the output file. Moreover, a little detailing would suffice to remove this limitation.

Because Bn = Bn equation is an identity, ptct ϵ Bn statement is true  ̶  pt stands for plaintext (the input file) and ct for ciphertext (the output file).

The accuracy of this statement lays bare the possibility of pt = ct. This is a possibility that lowers when n increases, but it doesn’t become zero as long as n is finite.

2. Possibility of Overlapping
The inversion of input file to output file in an algorithm mustn’t be able to be followed through indexes (at least easily). So, some barriers such as substitution vectors, S-boxes... are placed in the algorithm to stop the index connection at least in the means of arithmetic relations. It might be considered that the chances for the pt = ct equation to become true get even more lower because of these interruptions since they stop the algorithm being a permutation sequence carried on linear index functions, although they never break the determinist relation between pt and ct. In fact, these barriers (S-boxes for example) can be arranged so that the number of 1s input is different from the number of 1s outcome and pt = ct equation becomes impossible. If this could be done, the thesis of that the content of Ek function is not important at all loses its meaning. However, in order to do that, the result of (pt  Ä k) operation which takes place in almost every algorithm and is the most effective place the key is used should be known categorically  ̶  that means, it must be known as seperate from the plaintext and key which are variables with the algorithmic structure used. It means that encrypting turns into a lookup table operation rather than an encryption. So, this method doesn’t appear to be a solution. (Also, in order the said to be true, there doesn’t have to be a (pt Ä k) operation in the algorithm. Even it’s not the operation, in the algorithm there will always be a step based on the interaction of the plaintext and the key. Or else, the key loses its meaning.)


3. Algorithm Design Factor
It’s also misleading to perceive the problem as an error of the algorithm. Because the problem is not just pi = ci - it doesn’t appear only in this way.

That val (pi) = val(ci) providing that val(c) is a function reflecting the value of c determines the same result. Such an example can be given for further explanation. Given that the subscripts are w-white, r-red, g-green, b-blue, y-yellow, p-purple, o-orange, the 8-bit long plaintext (0𝓨1r1w00r1g 0b0 𝓨) can be converted to (0𝓨1g1r00b1w0w0g) or (0 𝓨1g1o0r 0p1p0w0o) by encrypting. (Here the colors are a markup, they shouldn’t be considered as an alphabet expansion.) First output shows a permutation, the second one shows a substitution, and the reason to mention the second output is to underline that the problem doesn’t result from the permutation. Considered the colors, the input and output files are completely different. But in a black and white view, the three files are the same as each other. Therefore, the point in question is not an algorithm error. So, it can't be said that if the algorithm is designed carefully the input file and the output file can never overlap.

4. A Testing Block
The solution can’t be adding a code sequence in the algorithm to measure and block the similarity between the input and output files. Such a solution means adding information to the algorithm about input/output, more or less, or narrowing the pool which the ciphertext will take place in. Even though narrowing the pool means that the place to look for output will be contracted, it’s not important at all, contrary to the popular belief – if n is big enough, a little contraction doesn’t cause a problem. But such a reduction means that the input takes place in the subtracted part, which is the real danger.

During the World War II, in the enchiphering algorithm designed for ENIGMA which was an encryption machine used by the German cryptographs, encrypting a letter to itself was prevented. It is told that this weakness was the reason for ENIGMA encryption to be broken.

5. An Algebraic Approach
We can also present the matter in an algebraic way. Let p be a parameter and define a parametric function fwith one variable, on a finite set A onto its own: f(c) : → A. Construct a set R such as

R = { (c𝓨) : 𝓨  = fp(c) and ;  c𝓨 ϵ A}.

Now the question is if we can show that  is reflexive or not without knowing the content of fpFurthermore, if we can show this for every possible value of p. Can we do it? If not (meaning we can't show it is reflexive or not) we can't say anything about the issue of returning the input file as the output in an encryption algorithmMoreover, it is not enough to say that R is not reflexive. It should be proved that there can never be an element in R in reflexive form: (e; e). As to writer's knowledge, there is no way of doing this.

6. A Knowledge or A Belief (or A Hope)
On the other hand, we say that if the algorithm is strong enough and the key is long enough then the encrytion is safe enough even to cipher our top secrets. We say that because it takes billions times billions times billions... milleniums to check out every key  ̶  of course, considering the brute force attack only. It is  knowledge that it takes so much time to check out every possible key. But this is a kind of gambling - Mallory can make the right shot at the first try. But just like some believe there exists a God and some believe he exists not, we believe that Mallory can't be that lucky. So it is a belief.  We don't know he will be or not, we hope he won't. We hope nobody is that lucky.

If we don't have an independent way to show that an algorithm will not produce a copy of the input file as an output, we just have a belief that it will not do that.

Ömer KÖKSAL
CRYPTTECH - Cyber Security Intelligence



Yorumlar

Bu blogdaki popüler yayınlar

1. Geleneksel Stajyer CTF Soru ve Cevapları

B*-Tree (BTree, BPlusTree) Veri Yapısı ile Veri İndeksleme

2. Geleneksel Stajyer CTF Soru ve Cevapları - 2017