Cryptography : is it me ??? [Archive]

fullmetalcoder

24th February 2007, 10:28

Hi,

I am currently writing some kind of paper on algorithms for school and I thought about cryptography to illustrate my work with examples. However when googling around I only found very complicated and time-consuming things without much explanations on the principle : WEP, WPA, Blowfish...

So I took my pencil, unleashed the power of my mind :cool:
and I came to the following code which as you can see is a rather lightweight, though very powerful, XOR crypto :

// this function is called encrypt but is actually a "state-switcher"...
void encrypt(QByteArray& data, const QByteArray& pass)
{
for ( int i = 0; i < data.count(); i++ )
data[i] ^= pass.at(i % pass.count());
}

My question is : Why isn't such a method used everywhere ??? It has many advantages and, provided a shared key is affordable, no real drawbacks AFAIK...

Just to convince you here are some figures :

A WEP passphrase is something like 10 ASCII chars long, which, you shall admit, corresponds to 10*8 = 80 bits
Using a passphrase of this size (which is rather reasonable) in my homemade encryption function gives 2^80 = more than 1.2*10^24 possible combinations
Assuming one attempts a "brute-force" cracking of a 100 characters long text message encoded with the aforementioned key he will have to try decoding the message with each of these combinations and then check if it is likely to be a message...
Assuming 100 T-states for each pass of the encoding/decoding loop we have 100*100 = 10.000 T-states for each encoding/decoding.
With luck the brute-force attack can find the correct key after 2^40 attempts (sounds a lot ? try it and you'll see that it would be quite a luck indeed...:p). That would take him : 10.000 * 2^40 = 1.1*10^16 T-states
You're not used to measure time with T-states are you? So let's convert that to common units. Assuming the cracker uses a 3GHz processor fully dedicated to the task (which never happens but we'll discard hardware questions here...), the attack will last : 1.1*10^16 / 3*10^9 = 3.665.039 seconds = 1018 hours = 42 days and an half : about a month and an half!!! For a simple text message encoded using a WEP-like key!!!However I'm not a specialist of cryptography and may have missed something very important. Thus I'd appreciate if someone could answer my question : Why isn't it used everywhere???

e8johan

24th February 2007, 10:40

One reason not to use this method is that the same key is used for both decryption and encryption. Some modern encryption methods works with one public key used for encryption and a different, private, key for decryption. This means that the encryption key can be spread freely and the the decryption can be protected.

fullmetalcoder

24th February 2007, 11:07

I know that but as I stated this method is intended to be used only when the use of two different keys (a public and a private one) is not necessary. Wifi networks are encrypted without RSA-like cryptography for instance and this whatever the method used (WEP, WPA or anything else...) : there is one key know by the accesspoints and all the computers connecting to it...

wysota

24th February 2007, 14:03

My question is : Why isn't such a method used everywhere ???
Because it's easy to break :)

In case the password is shorter than the input data, you're reusing the same password (this is called Electronic Code Book (ECB) mode), which is the same as if you encrypted two different messages with the same key - it's easy to break. Russian ciphers were compromised because of reusing the same key for multiple messages.

It has many advantages and, provided a shared key is affordable, no real drawbacks AFAIK...
To be honest it's pretty easy to break :) Furthermore it only encrypts the message but you can still change the contents, etc.

Furthermore, try encrypting a message containing only zeros with your method and see what you get as output :)

A WEP passphrase is something like 10 ASCII chars long, which, you shall admit, corresponds to 10*8 = 80 bits

DES has 56 bit keys and it's considered quite safe (not counting the brute force attack, of course). But it's because the way the key is used.

BTW. WEP is not safe because of its keylength. And the RC4 cipher used for Wi-Fi along WEP is not that good either. Usage of WPA with AES is advised.

Using a passphrase of this size (which is rather reasonable) in my homemade encryption function gives 2^80 = more than 1.2*10^24 possible combinations
This is nothing. You can make a brute force attack on it using parallel computing (http://distributed.net) - you'd have it broken within days.

Assuming one attempts a "brute-force" cracking of a 100 characters long text message encoded with the aforementioned key he will have to try decoding the message with each of these combinations and then check if it is likely to be a message...
Not quite. You can limit the number of possibilities by making comparisons knowing the key length.

With luck the brute-force attack can find the correct key after 2^40 attempts (sounds a lot ? try it and you'll see that it would be quite a luck indeed...:p). That would take him : 10.000 * 2^40 = 1.1*10^16 T-states
Even for 2^80 combinations (provided you encrypt messages shorter than the key length) with computers that can do about 2^31 operations per second (for single core 2GHz processors) you only need ~2^50 second with a single processor or 2^40 seconds for 1000 computers, etc. It's not that much.

1.1*10^16 / 3*10^9 = 3.665.039 seconds = 1018 hours = 42 days and an half : about a month and an half!!!
You can break a static key WEP protected network after listening for the traffic for ~4 hours and some time to process the sniffed data.

You can strenghten the cipher by using some feedback mode with your method for messages longer than the key (http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation).

But your cipher is still very weak for shorter messages using a known plaintext attack (you know the plain text, you know the cipher text, you try to find the key).

One reason not to use this method is that the same key is used for both decryption and encryption. Some modern encryption methods works with one public key used for encryption and a different, private, key for decryption. This means that the encryption key can be spread freely and the the decryption can be protected.

Symmetric ciphers are used everywhere, including SSL. Assymetric systems have a wider range of use than symmetric ones and they are safer, because they require more time consuming operations than symmetric ones (the keys are much longer) and additionally you can't implement them in hardware efficiently. For example you can assure that the message is not changed, you can be sure that the message was not forged by someone else, etc.

jacek

24th February 2007, 23:42

Because it's easy to break :)
Precisely, but how easy? Let's see:

Suppose we have a XOR-encrypted file out.txt. It's a paragraph from a book, not that it's my favorite one, but it's quite long and available at Guthenberg project's site, so it's useful when you want to play with some text-handling algorithms.

First let's compute the number of coincidences for every possible key length. How to do that? We just have to compare all bytes that were encrypted using the same character and see how many of them are equal.

I wrote a small script that automates this:

$ ./coin.py < out.txt > coin.txt

If we plot the contents of coin.txt file we will get something like this:
935
Clearly the number of coincidences is significantly larger if we assume that the key length is 8, 16, 24, 32 and so on, so most likely 8-byte key was used.

Now if we divide the cryptogram into 8 groups, so that each group contains bytes that were encrypted using the same character, and find the most frequent byte in each group we will get this:

$ ./hist.py 8 < out.txt
0: best=77 count=27
1: best=79 count=22
2: best=66 count=25
3: best=89 count=22
4: best=68 count=29
5: best=12 count=22
6: best=67 count=23
7: best=75 count=23
Possible password: ********

Now a small riddle: what is the most frequent character in texts?

Finding the password and decrypting the text is left as an exercise for the reader. ;)

Now if you were using XOR to encrypt packets, the job would be event simplier, because you can easily predict values of certain fields, like version or protocol number.

wysota

25th February 2007, 00:28

Precisely, but how easy?
With a known plaintext (like in case of network protocols) the complexity of finding the password is O(1). Just xor the ciphertext with the plaintext. No need to determine key length or anything.

It's a paragraph from a book, not that it's my favorite one, but it's quite long and available at Guthenberg project's site, so it's useful when you want to play with some text-handling algorithms.
A wild guess - Moby Dick? :)

Now a small riddle: what is the most frequent character in texts?
\x20? :)

Now if you were using XOR to encrypt packets, the job would be event simplier, because you can easily predict values of certain fields, like version or protocol number.

You can go even further here. Some fields have exactly specified values (mostly zeros) and there is also padding involved (most often a one and zeros again). Network addresses can also be used (broadcast, source and destination), but zeros are certainly the best ;)
Some Operating-Systems-With-Faulty-TCP/IP-Implementations (C) make the life harder here as they tend to ignore some rules and standards (for instance not zeroing fields that should be set to zero as they assume nobody will read them anyway, so why bother? Let's leave the trash that was in that block of memory (maybe some password, credit card number or email address?) in the packet).

Brandybuck

25th February 2007, 00:45

If you want a good example of an encryption algorithm, take a look at IDEA. It's not at all trivial, but on the other hand, it's fairly easy to understand (the "how" if not the "why"). While it is probably under patent, you're using it in a classroom, so it shouldn't be a problem. There was an article on it in an early 1990's DDJ that was very good. MD5 is also another fairly easy cryptography algorithm.

fullmetalcoder

25th February 2007, 14:17

Because it's easy to break

In case the password is shorter than the input data, you're reusing the same password (this is called Electronic Code Book (ECB) mode), which is the same as if you encrypted two different messages with the same key - it's easy to break. Russian ciphers were compromised because of reusing the same key for multiple messages.

To be honest it's pretty easy to break Furthermore it only encrypts the message but you can still change the contents, etc.

That easy??? My understanding was that the only way to break the code was a brute-force attack which would be very long... Jacek and you keep talking about "known plain-text attack" but I don't really see what you mean here... I view cryptography as a way to hide data using a key and not to hide a key in know data...:p

Furthermore, try encrypting a message containing only zeros with your method and see what you get as output

I'm sorry but I don't see any point in :
sending such a message
encrypting it if soBesides what tells you, when you want to decrypt the message in question that is was only zeros ??? it could be repeated characters but nothing helps you in guessing what they were before encryption...

Not quite. You can limit the number of possibilities by making comparisons knowing the key length.

But your cipher is still very weak for shorter messages using a known plaintext attack (you know the plain text, you know the cipher text, you try to find the key).

My point here is that, assuming I send a message (plain text or not) through network, encrypted with my XOR method, if one intercepts it all he will know will be the encrypted message and he will try to find the original data without knowing the key (and not even its length). In this case the decyphering will be quite complicated won't it? Especially because it needs, at each step an accurate check to determine wether the key used led to clear-looking data which is likely to be the original data. This can get impossible if the encryption uses a second transformation (based on the key and possibly on another variable) before or after the XOR... Then, and only then, the key is known and later messages will be decrypted instantly but before that...

Just to check wether you're right about the weakness of it, here is a small file (http://edyuk.tuxfamily.org/misc/test.xor) (and redundant to make your task easier :p) that I encrypted with a buffered, plain-Qt, version of the algorithm above... Encrypting took 79ms according to QTime (including statistics display and file I/O) which means an encrypting rate of about 15Mb/s... Your mission, if you accept it, is to decrypt this file before next Sunday and to tell me which key I used... :) If you managed to do that, you'd get... my eternal respect! :D

Symmetric ciphers are used everywhere, including SSL. Assymetric systems have a wider range of use than symmetric ones and they are safer, because they require more time consuming operations than symmetric ones (the keys are much longer) and additionally you can't implement them in hardware efficiently. For example you can assure that the message is not changed, you can be sure that the message was not forged by someone else, etc.
Well, asymmetric cyphers are quite powerful ATM but it might not remain true forever... Quantum mechanics is being studied so as to craft a quantic computer that would be able to do integer factorization *VERY FAST* directly from hardware and as RSA and other asymmetric cyphers depend on factorization of (big) prime numbers they are doomed to death (even if it won't happen tomorrow).

wysota

25th February 2007, 15:32

That easy???
Yep...

My understanding was that the only way to break the code was a brute-force attack which would be very long...
Nope.

Jacek and you keep talking about "known plain-text attack" but I don't really see what you mean here... I view cryptography as a way to hide data using a key and not to hide a key in know data...

Yes, but sometimes you just know what is being encrypted. For example if you know the message is probably about a "nuclear weapon", you can try matching the ciphertext to the plaintext and based on that get the key by xoring the ciphertext with the plaintext. Chances are the key is a phrase, which you can then guess by having a part of it.

If you intercept a transmission between a man and a woman having an affair and the transmission takes place on the Valentine's Day, you can try to match the word "love", "miss", the names or even the phrase "I love you", "I miss you", etc.

I'm sorry but I don't see any point in :
sending such a message
encrypting it if so

This is where you're wrong. Files often contain blocks of zeros, so do network packets. And the "zero" case is just an example - you can do it with any known plaintext, the result will just not be that impressive (by xoring the key with zeroes you reveal the key yourself).

If you transmit images or other kinds of files, you can look for file format headers or other known signatures. This is the exact same way antivirus software uses to detect polymorphic viri.

Besides what tells you, when you want to decrypt the message in question that is was only zeros ???
If I see a text which makes sense, I can guess it is a result of xoring a key with zeroes. Besides, I can:
a) forge and plant a message that would get encrypted by you and then intercept the ciphertext and decrypt it by xoring it with the plain text.
b) try my luck and assume there is a block of zeroes somewhere as padding, header or whatever.

it could be repeated characters but nothing helps you in guessing what they were before encryption...
You can always do statistics attack just like Jacek described - knowing the key length you know that each nth byte will be encrypted with a monoalphabet cipher.

My point here is that, assuming I send a message (plain text or not) through network, encrypted with my XOR method, if one intercepts it all he will know will be the encrypted message and he will try to find the original data without knowing the key (and not even its length). In this case the decyphering will be quite complicated won't it?
No. One can safely assume the key length (in bits) is divisable by eight. Then you have three ways - statistics, guessing the plaintext (for example network addresses, reserved fields, padding, etc.) knowing the transmission protocol or trial and error. Each of these will vastly decrease the number of tries needed to break the code.

Especially because it needs, at each step an accurate check to determine wether the key used led to clear-looking data which is likely to be the original data. This can get impossible if the encryption uses a second transformation (based on the key and possibly on another variable) before or after the XOR...
You proposed an algorithm that doesn't do anything like that. We're pointing weaknesses of the method you suggested, not any method that uses xor as one of its blocks. There are successfull ciphers that use xor to scramble data, but xor can't be the main safeguard used.

Then, and only then, the key is known and later messages will be decrypted instantly but before that...
As long as there exists a way to convert the cipher to a set of monoalphabetic ciphers, any text may be deciphered provided it is long enough.

Just to check wether you're right about the weakness of it, here is a small file (http://edyuk.tuxfamily.org/misc/test.xor) (and redundant to make your task easier :p) that I encrypted with a buffered, plain-Qt, version of the algorithm above... Encrypting took 79ms according to QTime (including statistics display and file I/O) which means an encrypting rate of about 15Mb/s... Your mission, if you accept it, is to decrypt this file before next Sunday and to tell me which key I used... :) If you managed to do that, you'd get... my eternal respect! :D

As we already said, you need an insight of what the message contains to decipher it. Most often, if you try to break a cipher, you know what you are looking for - at least you know the language used to write the message.

Well, asymmetric cyphers are quite powerful ATM but it might not remain true forever...
Assymetric ciphers are very simple. Their strength is based on difficulty to perform some mathematic operations, like factorising large numbers or finding a discrete logarithm of large numbers. There is nothing "powerful" about it.

as RSA and other asymmetric cyphers depend on factorization of (big) prime numbers
You can't factorise a prime number :) A prime number, by definition can't be factorised because it contains only two factors - 1 and itself.

they are doomed to death (even if it won't happen tomorrow).

Yes, but when computers become fast enough (or someone finds a fast algorithm) to calculate factors of large numbers in short time, messages encrypted will long have become outdated. You see, you rarely want to protect some data forever - you only need to protect it for a limited period of time - be it 20 minutes or 200 years. You don't have to use strong ciphers to protect your SSH sessions, because the session key is changed every 20 minutes or so, so if it takes someone 21 minutes to break the cipher, it will already be too late, as the connection will be using a different key then. And remember that the main goal of SSH connections is not to hide the content of the transmission, but to prevent one from hijacking the session by performing a man in the middle attack. SSH was designed as a method of securing terminal access, not to plan terroristic attacks.

fullmetalcoder

25th February 2007, 15:59

As we already said, you need an insight of what the message contains to decipher it. Most often, if you try to break a cipher, you know what you are looking for - at least you know the language used to write the message.

Well, that's not always the case but I'll help you out : the language used in the sample encrypted file is C++ (with english comments) and many redundancy. Is that enough as an inside???

Assymetric ciphers are very simple. Their strength is based on difficulty to perform some mathematic operations, like factorising large numbers or finding a discrete logarithm of large numbers. There is nothing "powerful" about it.

There is something very powerful in them because they force the potential hacker to do many complicated computations or to pass away sobbing...:)

You can't factorise a prime number :) A prime number, by definition can't be factorised because it contains only two factors - 1 and itself.

Well, that's not what I meant... English is not my native language and hurry often causes nonsense...:o AFAIK RSA uses (big) prime numbers to encrypt data and a brute force attack would need factorizing encrypted data (into a product of prime numbers because, as everyone know, each integer can be described as a unique product of prime numbers...).

jacek

25th February 2007, 17:18

My understanding was that the only way to break the code was a brute-force attack which would be very long...
There are three conditions which must be met: the key must be random,
the key must be used only once,
the key must have the same length as the message
If all of them are met, this encryption scheme is unbreakable (as proved by Shannon), otherwise it's unsecure.

Jacek and you keep talking about "known plain-text attack" but I don't really see what you mean here... I view cryptography as a way to hide data using a key and not to hide a key in know data...
I know it sounds weird, but suppose that you send several messages that were encrypted using the same key and one of them was compromised. The attacker has a plain text and cipher text of one of the messages and he can perform a know plain-text attack on the cipher to retrieve the key and decipher the rest of messages.

Such attack is also possible if you can guess the contents of plain-text (or part of it). In case of XOR-cipher, each correct guess gives you one character of the key.

Get a copy of Schneier's Applied Crypthography or try this: http://www.cacr.math.uwaterloo.ca/hac/ (just beware, that this one requires a lot of courage as it has been written by mathematicians for mathematicians).

jacek

25th February 2007, 17:52

Your mission, if you accept it, is to decrypt this file before next Sunday and to tell me which key I used...
Next Sunday? Oh... sorry, I haven't noticed "next". :P

The decrypted file is quite long, so I won't attach it, but it looks like this:

/************************************************** **************************
** Resource object code
**
** Created: Fri Feb 23 22:00:14 2007
** by: The Resource Compiler for Qt version 4.2.2
**
** WARNING! All changes made in this file will be lost!
************************************************** ***************************/

#include <QtCore/qglobal.h>
...
Q_CONSTRUCTOR_FUNCTION(qInitResources_Edyuk)
int qCleanupResources_Edyuk()
{
extern bool qUnregisterResourceData(int, const unsigned char *, const unsigned char *, const unsigned char *);
qUnregisterResourceData(0x01, qt_resource_struct, qt_resource_name, qt_resource_data);
return 1;
}
Q_DESTRUCTOR_FUNCTION(qCleanupResources_Edyuk)
The key is: ?10w?weak?

fullmetalcoder

25th February 2007, 17:53

There are three conditions which must be met:
the key must be random,
the key must be used only once,
the key must have the same length as the messageIf all of them are met, this encryption scheme is unbreakable (as proved by Shannon), otherwise it's unsecure.

The first two conditions can be met with proper software. It only needs a random key generator owned by both sender and receiver and initialized with common data so that the sequence of random key is the same under both computers. I guess this can be done but then, then initial data with which the random key generator has been set up becomes the real key...

About the third condition : a key with the exact same length as the message sounds quite weird and even though Shannon used this assumption to prove th unbreakability I'm pretty sure that as long as the key is not fully-repeated (i.e. key.size() > message.size() / 2) the unbreakability remain or at least we're very close to it....

Besides, if the two first conditions are met, I don't think a mismatch of the third on make the encryption so weak. And, as wysota pointed out, the encrypted data doesn't need to be encrypted forever...

fullmetalcoder

25th February 2007, 17:55

Next Sunday? Oh... sorry, I haven't noticed "next". :P

The decrypted file is quite long, so I won't attach it, but it looks like this:

/************************************************** **************************
** Resource object code
**
** Created: Fri Feb 23 22:00:14 2007
** by: The Resource Compiler for Qt version 4.2.2
**
** WARNING! All changes made in this file will be lost!
************************************************** ***************************/

#include <QtCore/qglobal.h>
...
Q_CONSTRUCTOR_FUNCTION(qInitResources_Edyuk)
int qCleanupResources_Edyuk()
{
extern bool qUnregisterResourceData(int, const unsigned char *, const unsigned char *, const unsigned char *);
qUnregisterResourceData(0x01, qt_resource_struct, qt_resource_name, qt_resource_data);
return 1;
}
Q_DESTRUCTOR_FUNCTION(qCleanupResources_Edyuk)The key is: ?10w?weak?
I must admit that I'm quite impressed.... :eek: Just for my information, how much time did it take to decrypt it???

jacek

25th February 2007, 18:05

The first two conditions can be met with proper software. It only needs a random key generator owned by both sender and receiver and initialized with common data so that the sequence of random key is the same under both computers.
That's how stream ciphers work.

I must admit that I'm quite impressed.... Just for my information, how much time did it take to decrypt it???
I don't know exactly, but judging from the time of my posts it didn't take more than 30 minutes.

fullmetalcoder

25th February 2007, 18:12

That's how stream ciphers work.
:confused:...

Maybe you could be a little more precise... Any link?

wysota

25th February 2007, 18:14

The first two conditions can be met with proper software. It only needs a random key generator owned by both sender and receiver and initialized with common data so that the sequence of random key is the same under both computers.
You see... there is no software that can generate real random numbers. As for using the key once - if you use ECB mode, you violate that rule on the very beginning.

About the third condition : a key with the exact same length as the message sounds quite weird and even though Shannon used this assumption to prove th unbreakability I'm pretty sure that as long as the key is not fully-repeated (i.e. key.size() > message.size() / 2) the unbreakability remain or at least we're very close to it....
If at least part of the key repeats, it means more than one message is encrypted with the same key, which may allow you to extract part of the original message, which might lead to compromising the whole message (often one only needs some part of the message).

Besides, if the two first conditions are met, I don't think a mismatch of the third on make the encryption so weak.
Actually, it's the most important of the three... The first two only make sure you end up with a good key (with an additional assumption that the cipher doesn't have weak keys).

And, as wysota pointed out, the encrypted data doesn't need to be encrypted forever...

That doesn't make a cipher strong. It only means the message is (or is not) protect "well enough for a particular need".

Next Sunday? Oh... sorry, I haven't noticed "next". :P
I was wondering why it takes you so long to answer the thread ;)

The key is: ?10w?weak?

Seemed obvious the key would be 80 bit long :) "Just in case"...

Just to theorise - I don't know exactly how Jacek broke the key, but I can make a guess at possible ways of dealing with the situation.

Knowing the key is a multiple of 8bits, one had to checks ten possible lengths. Now we know that statistically the character "e" is the most common in english language, so it is most likely to receive collisions on it. Unfortunately use of C++ may change those statistics, so we have two possible ways of overcoming this.
1. gather own statistics of C++ code by counting the number of occurences of each character in a large C++ code (like Qt sources, although it lacks comments).
2. try to guess most popular characters - I'd say they are " " (space), ":" (colon), ";" (semicolon) and "{","}" (braces).

The longer the text, the more collisions one may expect. Furthermore if the message is a C++ sourcecode, it most probably begins with a hash (#) - like in "#ifndef ..." or with a slash (/) - marking the beginnning of a comment. This gives additional insights, allowing one to try to guess the first character of the key right away. Furthermore if the following characters are asterisks (in case of the comment), then we might guess the beginning of the message is "/********************". That gives us more than the key length, so xoring the beginning of the ciphertext with assumed plaintext we receive the key. The rest is just about verifying the key by decrypting the rest of the message.

So as you see, you might have compromised the key yourself here. Again, I don't know if that was the path Jacek took to decode the message, but it's a possible strategy.

Another approach would be to try to find collisions at each 10th character of the data - that's where the amount of text for analysis is vital.

jacek

25th February 2007, 18:31

Maybe you could be a little more precise... Any link?
http://en.wikipedia.org/wiki/Stream_cipher

Also check those books I've mentioned earlier.

fullmetalcoder

25th February 2007, 19:07

If at least part of the key repeats, it means more than one message is encrypted with the same key, which may allow you to extract part of the original message, which might lead to compromising the whole message (often one only needs some part of the message).

Actually, it's the most important of the three... The first two only make sure you end up with a good key (with an additional assumption that the cipher doesn't have weak keys).

OK. Does this mean that extending the key to make it as long as the message would do? I mean reusing the same key but shifted or semi-randomly bitmasked or anything else but in a way that's know to both encryption and decryption algorithm?

Seemed obvious the key would be 80 bit long :) "Just in case"...
I don't quite get you... :confused:

Just to theorise - I don't know exactly how Jacek broke the key, but I can make a guess at possible ways of dealing with the situation.

Knowing the key is a multiple of 8bits, one had to checks ten possible lengths. Now we know that statistically the character "e" is the most common in english language, so it is most likely to receive collisions on it. Unfortunately use of C++ may change those statistics, so we have two possible ways of overcoming this.
1. gather own statistics of C++ code by counting the number of occurences of each character in a large C++ code (like Qt sources, although it lacks comments).
2. try to guess most popular characters - I'd say they are " " (space), ":" (colon), ";" (semicolon) and "{","}" (braces).

The longer the text, the more collisions one may expect. Furthermore if the message is a C++ sourcecode, it most probably begins with a hash (#) - like in "#ifndef ..." or with a slash (/) - marking the beginnning of a comment. This gives additional insights, allowing one to try to guess the first character of the key right away. Furthermore if the following characters are asterisks (in case of the comment), then we might guess the beginning of the message is "/********************". That gives us more than the key length, so xoring the beginning of the ciphertext with assumed plaintext we receive the key. The rest is just about verifying the key by decrypting the rest of the message.

So as you see, you might have compromised the key yourself here. Again, I don't know if that was the path Jacek took to decode the message, but it's a possible strategy.

Another approach would be to try to find collisions at each 10th character of the data - that's where the amount of text for analysis is vital.
I know realize that the data itself was not very suited for XOR encoding... :o
Honestly Jacek, would you have done it so fast without knowing that it was C++ code???

jacek

25th February 2007, 19:20

I mean reusing the same key but shifted or semi-randomly bitmasked or anything else but in a way that's know to both encryption and decryption algorithm?
It could be enough.

I know realize that the data itself was not very suited for XOR encoding...
The problem is not with data, but with the method.

Honestly Jacek, would you have done it so fast without knowing that it was C++ code???
No.

fullmetalcoder

25th February 2007, 19:21

It could be enough.
I'll try this out then... :)

The problem is not with data, but with the method.

No.
Sounds like you're being a bit contradictory... :p

jacek

25th February 2007, 19:27

Sounds like you're being a bit contradictory... :p
Good cipher works on any data and it can withstand any kind of attack, including the known-plaintext one.

Brandybuck

25th February 2007, 19:36

The cardinal rule of inventing new cryptographic algorithms is: Never "roll your own".

wysota

25th February 2007, 20:20

OK. Does this mean that extending the key to make it as long as the message would do? I mean reusing the same key but shifted or semi-randomly bitmasked or anything else but in a way that's know to both encryption and decryption algorithm?

If the message was long enough, that wouldn't have helped much. Knowing the pattern of masking bits or shifting one could still be able to find collisions, i.e. if you shift, you'll eventually get the original key again and it can happer faster when you use weak keys. For example if you rotated the key one bit to the right in each iteration, a weak key could be 1111 0000 1111 0000 - it repeats every half of its length (0101 0101 0101 0101 is even weaker).

So to be safe you need two things (at once):
- a good key
- a good cipher algorithm

wysota

25th February 2007, 20:27

@Fullmetacoder: Try breaking this one, it should take no longer than a few minutes to do it even if I don't tell you what kind of cipher it uses. The message is an abstract from this site.

NZ ZPRZXI VRRAMXVIMDOH VOY XDCRDOZOIH IVTMOU RVGI MO ILZ XDOIZHI ID LVKZ HDCZ FDGC DF XDCRAZIZOZHH - OD ADDHZ ZOYH, HIJWH VOY ODI NDGTMOU CZILDYH. MF ILZ FMOVA XDCRDOZOI MH AMCMIZY XDCRVGZY ID NLVI LVH WZZO YZXAVGZY MO ILZ RGDEZXI, ZPIGV XDYZ MGGZAZKVOI ID ILZ FJOXIMDOVAMIQ NLMXL MH RGDKMYZY MO ILZ FMOVA HDAJIMDO LVH ID WZ GZCDKZY.

ILZ XDYZ IVTMOU RVGI MO ILZ XDCRZIMIMDO HLDJAY WZ NZAA YZHMUOZY. QDJ HLDJAY RVQ VIIZOIMDO ID XAZVO MOIZGFVXZH, XDYZ GZJHZ, UZOZGMX HDAJIMDOH, YZHMUO RVIIZGOH VOY VAA DILZG ILMOUH ILVI CVTZ QDJG RGDEZXI AZHH ZGGDG-RGDOZ VOY ZVHMZG ID CVMOIVMO.

NZ ZPRZXI ILZ VRRAMXVIMDO DG XDCRDOZOI ID WZ WVHZY DO BI IZXLODADUQ. AMCMIMOU ZPIZGOVA AMWGVGMZH VOY VRRAMXVIMDOH OZZYZY ID JHZ ILZ RGDUGVC MH VYKMHZY, NLZGZ VRRAMXVWAZ (M.Z. NZ YDO'I ZPRZXI VO MYZ ID WZ V FJAAQ HZAF-XDOIVMOZY VRRAMXVIMDO, WJI NZ ZPRZXI V IZPI ZYMIDG ID YMHRAVQ IZPI JHMOU BI XAVHHZH VOY ODI ILMGY RVGIQ XDCRDOZOIH).

fullmetalcoder

26th February 2007, 11:39

If the message was long enough, that wouldn't have helped much. Knowing the pattern of masking bits or shifting one could still be able to find collisions, i.e. if you shift, you'll eventually get the original key again and it can happer faster when you use weak keys. For example if you rotated the key one bit to the right in each iteration, a weak key could be 1111 0000 1111 0000 - it repeats every half of its length (0101 0101 0101 0101 is even weaker).

When I mentioned predefined extension of the key to make it virtually as long as the message I had no precise idea but I'm now thinking about this concept : "the message is the key". My first approach is to keep the XOR block encryption but to avoid repeating the key by using the message as a key. The first block is encrypted with the key and then used as key to encrypt the next block and so on... This is not of course unbreakable but the first block can't be decrypted in any other way than brute-force search which makes the algorithm much stronger than the first one I proposed.

void encrypt(QByteArray& data, QByteArray& pass)
{
int i = 0, e = data.size(), x = 0, m = pass.count();
char c, *d = data.data(), *p = pass.data(), *b = new char [m];

while ( i < e )
{
b[x] = d[i];
d[i] ^= p[x];

if ( !(x = (++i % m)) )
qSwap(p, b);
}
}

void decrypt(QByteArray& data, QByteArray& pass)
{
int i = 0, e = data.size(), x = 0, m = pass.count();
char c, *d = data.data(), *p = pass.data(), *b = new char [m];

while ( i < e )
{
d[i] ^= p[x];
b[x] = d[i];

if ( !(x = (++i % m)) )
qSwap(p, b);
}
}

Further enhancement of this code could be :

Compressing the data on-the-fly to reduce redundancy without increasing complexity (a simple RLE would be enough I think)
Using tricks to make encryption a little more random. For instance XOR'ing with 0xff (one's complement) on some special occasions (data byte is even, or odd, or multiple of 3 or whatever...)

wysota

26th February 2007, 13:18

My first approach is to keep the XOR block encryption but to avoid repeating the key by using the message as a key. The first block is encrypted with the key and then used as key to encrypt the next block and so on...
Correct me if I'm wrong, but if the first ciphertext block is the key to the next, then all I have to do to break it is to use the previous block of ciphertext as the key for the next one. I won't be able to break the first block, but surely the rest will be deciphered in no time.

fullmetalcoder

26th February 2007, 14:57

Correct me if I'm wrong, but if the first ciphertext block is the key to the next, then all I have to do to break it is to use the previous block of ciphertext as the key for the next one. I won't be able to break the first block, but surely the rest will be deciphered in no time.

block1 ^ key => cypherblock1
block2 ^ block1 => cypherblock2
...

if, as you suggest, you do :
cypherblock2 ^ cypherblock1
<=> (block1 ^ key) ^ (block2 ^ block1)
<=> block2 ^ key

I don't see what is decrypted by doing so but maybe I missed something...:)

wysota

26th February 2007, 15:07

I thought you meant:
block1 ^ key => ciphertext1
block2 ^ ciphertext1 => ciphertext2

BTW. Your cipher is still weak. If you try to encrypt the sourcefile mentioned earlier, you'll notice that you expose some of the message (that in turn exposes the remaining part of the message, as the message itself is used as a key).

Known plaintext attack will expose the key immediately.

fullmetalcoder

26th February 2007, 15:25

BTW. Your cipher is still weak. If you try to encrypt the sourcefile mentioned earlier, you'll notice that you expose some of the message (that in turn exposes the remaining part of the message, as the message itself is used as a key).

That's why :

The key is used only once and a new key is generated by the compression algorithm (though not in the code I posted...)
It is highly recommended to use compression to prevent such exposition of the message that might endanger the key.Also note that if you didn't have any hint about the content of the previous file the second remark would not be that crucial. To determine whether it is strong enough for my purpose (if I have to talk about cryptography in my paper it's better to present a decent algorithm...;) ) I'll put a new encrypted file on the same FTP ASAP so that you guys can crack it and tell me how hard it is...:)

Known plaintext attack will expose the key immediately.
I don't claim this method is perfect. As far as I'm concerned the risk of a plain text attack is about null because my data is randomly formatted (can be plain text, code, compressed odt, pictures, ...) and it is highly improbable that people wanting to decrypt it have more than a general idea of what it contains...

wysota

26th February 2007, 15:48

fullmetalcoder

26th February 2007, 15:57

Try this one:

uint keygen(uint prevkey){
int sum = 0;
uint t = prevkey;
for(int i=0;i<31;i++){
if(t & 0x1) sum++;
t >>= 1;
}
prevkey >>= 1;
prevkey |= (sum & 0x1) << 31;
return prevkey;
}

This generates iteration key. The cipher algorithm is the first you suggested - xoring plaintext with the key. The code above is a pseudo-random number generator that will generate pseudorandom keys based on the original key. You can obtain the same by doing srand(key) and then calling rand() - you'll also receive a stream of pseudorandom keys. The trick is that if you initialise the generator with the same seed, you'll receive the same result.

This should get you a decent cipher.
There are many ways of generating pseudorandom integers and the one you suggest is probably as good as any other but the thing is that my algorithm uses passphrase which are sequences of ascii bytes longer than 4 bytes...

Another problem, appearing only when with using srtand() then rand() is that srand() might not lead to the same sequence of numbers, even if initialized with the same seed, under various computers (it's a matter of C runtime...)

wysota

26th February 2007, 16:15

but the thing is that my algorithm uses passphrase which are sequences of ascii bytes longer than 4 bytes...
But this is not a problem. You can extend the method to as many bytes as you want. You calculate the sum by iterating words, then you shift each word one bit to the right and set the oldest bit to the value of the youngest bit of the next older word. The oldest bit of the key is set based on the sum. For 32bit key I get about 2M different keys with this algorithm, allowing to safely encrypt about 8MB of data using distinct keys. Again, the cipher is vulnerable to the known plaintext attack if you're able to guess the beginning of the message (to get the first key used, so that you can generate the rest of the key stream). You can make it harder by using every n-th key from the stream (so "n" is also part of the key).

Another problem, appearing only when with using srtand() then rand() is that srand() might not lead to the same sequence of numbers, even if initialized with the same seed, under various computers (it's a matter of C runtime...)

It's not a matter of C runtime. rand() is always a pseudo-random generator. The same rand() implementation has to return the same stream for the same seed. Different implementations might return different streams though. I just wanted to show the principle, so that doesn't matter.

fullmetalcoder

27th February 2007, 11:32

But this is not a problem. You can extend the method to as many bytes as you want. You calculate the sum by iterating words, then you shift each word one bit to the right and set the oldest bit to the value of the youngest bit of the next older word. The oldest bit of the key is set based on the sum. For 32bit key I get about 2M different keys with this algorithm, allowing to safely encrypt about 8MB of data using distinct keys. Again, the cipher is vulnerable to the known plaintext attack if you're able to guess the beginning of the message (to get the first key used, so that you can generate the rest of the key stream).
Interesting. There's a thing I always wanted to know about shifts in C++... What asm instructions are used? In z80 for example (whic is the only assembly language I know) there are many different shift/rotate instructions. Some that preserve the content of the byte whatever happens, some that set "new" bits to the content of the "carry flag" which is affected by other operations, including shifts...

You can make it harder by using every n-th key from the stream (so "n" is also part of the key).
:confused:

It's not a matter of C runtime. rand() is always a pseudo-random generator. The same rand() implementation has to return the same stream for the same seed. Different implementations might return different streams though. I just wanted to show the principle, so that doesn't matter.
Yes but each C run time (VS / MinGW / GCC / ..) may have a different generator... If so the rand() function is not affordable...:(

wysota

27th February 2007, 11:47

Some that preserve the content of the byte whatever happens, some that set "new" bits to the content of the "carry flag" which is affected by other operations, including shifts...
Bit shifting operator from C/C++ fills the oldest/youngest bits with zeros. In assembly it also depends on little/big endianness and some other things (like the register used).

:confused:
If you use subsequent keys, compromising one of them allows the attacker to generate all the remaining keys. If you use every nth key instead (and hide the "n" from the attacker), then it makes it harder for the attacker to break the key (he/she needs to break two blocks). If you extend the idea by using some S-block to pseudo-randomise the "n", you'll make the key even safer.

Yes but each C run time (VS / MinGW / GCC / ..) may have a different generator... If so the rand() function is not affordable...:(
I didn't say to actually use rand() in a cipher, I said it performs the same task as the function I implemented.

fullmetalcoder

27th February 2007, 11:55

Bit shifting operator from C/C++ fills the oldest/youngest bits with zeros. In assembly it also depends on little/big endianness and some other things (like the register used).
Good to know...:)

If you use subsequent keys, compromising one of them allows the attacker to generate all the remaining keys. If you use every nth key instead (and hide the "n" from the attacker), then it makes it harder for the attacker to break the key (he/she needs to break two blocks). If you extend the idea by using some S-block to pseudo-randomise the "n", you'll make the key even safer.
It's a good idea but how do we compute the "n"??? If it's generated from the first key or the first unencrypted block or anything like that it does not provide any security enhancement except if you consider that the attacker ignore the algorithm used but in this case he doesn't stand a chance to break the message... BTW what is a "S-block" ???

wysota

27th February 2007, 12:09

It's a good idea but how do we compute the "n"???
S-box will handle that.

If it's generated from the first key or the first unencrypted block or anything like that it does not provide any security enhancement except if you consider that the attacker ignore the algorithm used but in this case he doesn't stand a chance to break the message...
No, it's not "generated" from key, it's part of the key. Consider the key to be made of two parts - one is the S-box number used and the other is the seed for the key stream generator.

BTW what is a "S-block" ???

Eem... sorry, should be S-box (http://en.wikipedia.org/wiki/Substitution_box) :)

Generally the idea is that n changes every block and you don't know which S-box is used, so you can't follow the pattern - you have to check every S-box. The larger the number of S-boxes, the better the security. Additionally you can introduce another level - switch S-boxes during computation.

Of course all that makes the cipher quite computation intensive and hard to implement in hardware.

jacek

27th February 2007, 19:44

Bit shifting operator from C/C++ fills the oldest/youngest bits with zeros.
Are you sure? ;)

#include <iostream>
#include <iomanip>
#include <limits>

template< class T >
void print( const char * l, T v )
{
std::cout << l
<< std::hex
<< std::showbase
<< std::internal
<< std::setw(10)
<< std::setfill('0')
<< v
<< std::endl;
}

int main()
{
int a = 1;
unsigned int b = 1;
int c = -1;
unsigned int d = std::numeric_limits< unsigned int >::max();

print( "a = ", a );
print( "a' = ", a << 1 );

print( "b = ", b );
print( "b' = ", b << 1 );

print( "c = ", c );
print( "c' = ", c >> 1 );

print( "d = ", d );
print( "d' = ", d >> 1 );

return 0;
}
Output:
$ ./a.out
a = 0x00000001
a' = 0x00000002
b = 0x00000001
b' = 0x00000002
c = 0xffffffff
c' = 0xffffffff
d = 0xffffffff
d' = 0x7fffffff
There are two right shift operators: logical and arithmetical (the latter replicates the sign bit).

wysota

27th February 2007, 19:49

I was shifting an unsigned integer, so it didn't matter :) But yes, you're right - the sign has to prevail.

fullmetalcoder

28th February 2007, 16:44

No, it's not "generated" from key, it's part of the key. Consider the key to be made of two parts - one is the S-box number used and the other is the seed for the key stream generator.

Generally the idea is that n changes every block and you don't know which S-box is used, so you can't follow the pattern - you have to check every S-box. The larger the number of S-boxes, the better the security. Additionally you can introduce another level - switch S-boxes during computation.

Of course all that makes the cipher quite computation intensive and hard to implement in hardware.
My goal here is not to obtain the most secure algorithm ever... I need something simple and fast enough to be describe as an appendix of my paper... I can't afford a long and complicated stuff because it is not a paper on computer science in particular but on math in general and even if I was free to choose any math-related topic there are some limits... :(

So here comes the fun. The link below points to a XOR encrypted version of my paper in its current state. I use the last version of the algorithm I proposed so all you have to do is to break the first block. However there are some difficulties :
AFAIK the only way to break it is a brute-force attack or a stunning luck... ;)
The key is rather big say between 15 and 25 chars => 120 and 200 bits => quite a number of possibilities...
The input data is compressed due to the file format used (a "all-in one" document with text formatting and embedded images...)However you have some insights : it's written in english, it deals with "algorithms", "computational complexity", "cryptography", ...

The encrypted file! (http://edyuk.tuxfamily.org/misc/Algorithms.xor)

Good luck to the ones who won't be able to resist and attempt to crack it! :D

P.S : if you managed to break the code may you take a look at it and report mistakes in the content or the expression (english is not my native language...:o)

jacek

28th February 2007, 22:36

Good luck to the ones who won't be able to resist and attempt to crack it! :D
How much time do we have this time? ;)

report mistakes in the content or the expression
p. 1
... through agriculture...
pregnant need ?
...how big were their herds...
... when it has to do with mathematics...
...however long mathematicians will keep on arguing... - this is a bit wierd, but I'm not sure why

p. 2
...processes can be found in earlier... - in what? something is missing here

The rest looks OK.

jacek

28th February 2007, 23:14

it is not a paper on computer science in particular but on math in general and even if I was free to choose any math-related topic there are some limits...
Since you write about mathematics and cyptography, you could mention the first mathematicians that were dealing with cryptography for real - Marian Rejewski (http://en.wikipedia.org/wiki/Marian_Rejewski) and his team, who broke the Enigma code just before WW2 (before them cryptography was more about linguistics and statistics of natural languages then math).

wysota

28th February 2007, 23:34

Don't forget Alan Turing (http://en.wikipedia.org/wiki/Alan_Turing)

fullmetalcoder

1st March 2007, 07:51

How much time do we have this time?
That easily??? :crying: Well I got to know : what's the trick??? :confused:

p. 1
... through agriculture...
...how big were their herds...
... when it has to do with mathematics...

p. 2
...processes can be found in earlier... - in what? something is missing here

Thanks a lot! I indeed missed them...

pregnant need ?
What's wrong with this one? Doesn't it exist???:confused:

...however long mathematicians will keep on arguing... - this is a bit wierd, but I'm not sure why
What does sound weird to you?

Since you write about mathematics and cyptography, you could mention the first mathematicians that were dealing with cryptography for real - Marian Rejewski (http://en.wikipedia.org/wiki/Marian_Rejewski) and his team, who broke the Enigma code just before WW2 (before them cryptography was more about linguistics and statistics of natural languages then math).I could but see, the main topic is algorithmic with a focus on computer science. Cryptography in my paper is just an appendix I choose to illustrate my paper with "real" but simple enough examples...

Don't forget Alan Turing (http://en.wikipedia.org/wiki/Alan_Turing)Well, I've heard about him and read a few things already but I don't think I will cover the Turing machine in my paper. It's complex enough for multiple papers to be exclusively dedicated to it and mine shouldn't be longer than 12-15pages and I got 10 already... Besides it relies on notions that I haven't studied yet so teachers won't expect it (honestly I'm pretty sure they wouldn't fully understand it ;)).

wysota

1st March 2007, 10:34

Known plaintext attack (with a known key length) again ;)

What is a "pregnant need"? :)

About "however long" - I'd suggest "nevertheless", "however" or something simmilar.

I didn't mean Turing machine. Turing was part of the british cryptology team during WW2. His team built the first Enigma decypher for the military Enigma (Polish team broke the civil one a few years earlier, as Jacek pointed out).

My suggestion is - if you want to write something on cryptography, then don't use XOR. XOR is fine as an auxillary step in encryption, but it not so good as a standalone cipher. I'd even call it "encoding" instead of "encrypting" if you catch the difference :)

TheRonin

1st March 2007, 16:41

a straight up XOR encryption can at most be used for stream ciphers but as we've seen during the course of this discussion, they're pretty difficult to make secure. XOR as a step in a larger encryption process is quite common; I believe some commonly used block cyphers use XOR in their S-boxes.

In my opinion you can't really write a paper about encryption without talking about RSA (public/private-key crypto) and some form of block cypher (DES or Rijndael(AES) for example). The public/private-key cryptos are especially interesting from a mathematical point of view. You've got a lot of advanced finite-field stuff relating to abstract algebra for instance.

Also, stream cyphers are not entirely without merrit. They're VERY quick and easy to implement in hardware. RC4 has historically been used quite a bit: WEP and the encryption on your GPRS-phone are two concrete examples where stream ciphers have been used.

fullmetalcoder

1st March 2007, 17:41

Known plaintext attack (with a known key length) again ;)

How so?

What is a "pregnant need"?
Well I thought it meant a very urging need but maybe it doesn't exist... ;)

I didn't mean Turing machine. Turing was part of the british cryptology team during WW2. His team built the first Enigma decypher for the military Enigma (Polish team broke the civil one a few years earlier, as Jacek pointed out).
Alright.

My suggestion is - if you want to write something on cryptography, then don't use XOR. XOR is fine as an auxillary step in encryption, but it not so good as a standalone cipher. I'd even call it "encoding" instead of "encrypting" if you catch the difference :)

In my opinion you can't really write a paper about encryption without talking about RSA (public/private-key crypto) and some form of block cypher (DES or Rijndael(AES) for example). The public/private-key cryptos are especially interesting from a mathematical point of view. You've got a lot of advanced finite-field stuff relating to abstract algebra for instance.

I perfectly catch the difference now that you'v broken everything. However, my paper is on algorithms and cryptography was just an example. RSA is interesting indeed but not as an appendix. If I was writing a paper on cryptography only I would certainly give it many space because its concept was revolutionnary when it appeared. If I can't find any efficient encryption algorithm that's simple enough to implement and explain I'll fallback on another field...

jacek

1st March 2007, 20:24

How so?
Every .zip file starts with a header and its fields have predictable values (not mentioning the 4-byte magic number at the beginning). Also all Open Document files contain "mimetypeapplication/vnd.oasis.opendocument.text" string. The key was predictable too (after I got several bytes).

The key lenght was easy to guess, because your algorithm leaves some parts unencrypted. When I saw "PK" at 21st byte, I just knew that the key must have 20 bytes.

wysota

1st March 2007, 20:27

How so?
Well, I personally haven't seen the file, but I've been talking to Jacek after he broke your code and he said it was an OpenOffice compatible document (sorry, don't remember the mime type now). This is really a ZIP file with a regular (known) ZIP header. Additionally the mime type is standing out (take two such documents and look at them using your favourite hex editor (if you don't have one, use (k)hexedit), you'll notice simmilarities). As the constant part is longer than your xor key, this can lead to breaking the "cipher" using a known plaintext attack.

Well I thought it meant a very urging need but maybe it doesn't exist...
I've never heard of such a phrase :)

I perfectly catch the difference now that you'v broken everything.
Jacek did, I just know the theory :) For your information, I'm part of the team handling the cryptography course at my University, I just have to know some things ;)

However, my paper is on algorithms and cryptography was just an example. RSA is interesting indeed but not as an appendix.
RSA is a very simple algorithm. And a very nice one, too. But if you want to describe a cryptography algorithm, I'd suggest DES - it's more complicated and has some "steps", whereas RSA really only has a single step.

If I can't find any efficient encryption algorithm that's simple enough to implement and explain I'll fallback on another field...

Use RC4 as already mentioned or explain some hashing function, like MD5 or SHA1. These are simple and very widespread (and they use XOR :rolleyes: ).

I even implemented RC4 as a QIODevice subclass during my cryptography course.