Конференции ИВТ СО РАН

The distributed information-computational resources, X-th conference DICR-2005

October 06 - 08, 2005, Russia, Novosibirsk, Academgorodok

Abstracts

Efficient transformation of plaintext into random sequence

Fionov A.

Siberian State University of Telecommunications and Informatics (Novosibirsk)

One of the methods for breaking ciphers is the statistical analysis of sequences produced by the cipher. Recently, with the aid of the gradient statistical attack, a principal possibility of breaking block ciphers was demonstrated, provided the input plaintext is non-random, i.e., distinguished from a sequence of equiprobable and independent zeros and ones. Therefore, for constructing reliable information protection systems it is an actual problem of coding to transform non-random plaintext into a completely random sequence. Yet in the classical paper of Shannon [1], it has been shown that even a simple cipher built upon such encoded sequence is unbreakable.

We suggest a fast and efficient method for performing the said transformation of the plaintext based on arithmetic coding with a dummy symbol. Due to the use of the dummy symbol we change the source probability distribution so that a usual arithmetic code becomes non-redundant. This leads to production of a completely random code sequence, i.e., all the bits are equiprobable and independent.

Arithmetic coding with a dummy symbol (ACDS) has the same asymptotic complexity as the earlier known arithmetic coding with interval splitting (also suggested by the author), however, a computer implementation of ACDS occurs several times faster.

The idea of ACDS can be illustrated by the following example. Let the source generate letters over the alphabet A={a,b} with probabilities p(a)=3/5, p(b)=2/5 and let the size of current interval in arithmetic coding be 7 units. We have to distribute the source letters over this interval according to their probabilities. But as we have to operate with integer numbers only we can, to our best, allocate 4 units for the letter a and 3 units for b thus truncating the "right" shares (4.2 for a and 2.8 for b). This leads to coding redundancy and, hence, to imperfect statistical properties of the code sequence. We suggest another way. Select a dummy symbol d with probability 2/7, and the source symbol with probability 5/7. We obtain p(a)=3/7, p(b)=2/7, and p(d)=2/7. Now all three symbols are well distributed on the interval of 7 units which leads to generation of zero-redundant arithmetic code with perfect statistical properties (zeros and ones are equiprobable and independent).

With comparison to optimal letterwise homophonic codes, the described method has much smaller message expansion and random bit consumption. The optimal letterwise codes, for each encoded symbol, add 2 bits higher the entropy to the codeword on average, and use 4 external random bits. The reduction of these characteristics can be achieved through blocking symbols which incurs the exponential growth of the algorithm's complexity. The suggested method achieves an arbitrarily low message expansion in almost linear time. Actually, it requires only a few machine operations for encoding and decoding of each symbol. Alike other arithmetic codes, the new method has the advantage of fast adaptation to changing source statistics, while the letterwise methods must rebuild the code tables each time the statistics change. This property occurs very useful in constructing the so-called universal homophonic code [2].

[1] Shannon C. E. Communication theory of secrecy systems // Bell System Technical J. -- 1949. -- V. 28. -- P. 656--715.

[2] Fionov A. Universal homophonic coding // 2001 IEEE International Symposium on Information Theory (ISIT 2001). -- Washington, DC, USA, June 24-29, 2001. -- P. 116.

Note. Abstracts are published in author's edition

Comments

[Home]
[Conference]