PDA

View Full Version : String encryption - suggestions asked



yop
12th February 2007, 12:21
Hi guys,
I have a string of fixed length that is made up of digits only (ascii 0x30 to 0x39) and has to be encrypted / scrambled (no fancy stuff required just some mangling would be enough but the fancy stuff are welcome as well) and the output should also be digits only and the length should be preserved (here's the catch).
Do you have any suggestions on well tested solutions for this situation? Custom solutions are not an option, it must be something well defined.
Thanks for your time...

TheRonin
12th February 2007, 14:42
well, there are about a million different ways of mangling a sequence of digits. I guess the first question ought to be what your intended use is for the encrypted string. Do you want the actual integer values to be hidden 'securely' such that it becomes infeasible to deduce the string from the encrypted one? -- like a password hash, used to compare a stored value with some user data without transmitting or storing the user data itself. Or do you want the operation to be reversible such that given the correct parameters, the original values can be retrieved? Also, what is the length of the string? You said that the fixed length must be preserved which implies that there is some given length that all your strings must be. So, what, like 4 bytes? 32 bytes? 128 bytes? depending on your answer, some standard hash function or crypto algorithm might be a good solution. If the length can be anything, and the security requirements are less than paramount, then something simple and ad hoc may be in order.

One such ad hoc solution may be to raise each number to some power, divide by some upper bounds for your integer values and return the remainder:



int pseudoEncryptInt(int myNumber){
int secretExponent = 37;
int upperBound = 59;
return pow(oldNumber, secretExponent) % upperBound;
}


but like i said, it all depends on what you want to do with the returned number. For example, I wouldn't use the above scheme to do anything important like handle user data...well, not without ensuring that the secret exponent and upperbound values are chosen correctly. Read up on how RSA public-private-key encryption works for more information.

jacek
12th February 2007, 15:28
Do you have any suggestions on well tested solutions for this situation? Custom solutions are not an option, it must be something well defined.
Take a look at stream ciphers and one-time pad algorithm. You can also use any block cipher in Cipher Feedback, Output Feedback or Counter mode.

TheRonin
12th February 2007, 15:32
Custom solutions are not an option...

haha, missed the 'NOT', my bad. :D

yop
12th February 2007, 17:06
Or do you want the operation to be reversible such that given the correct parameters, the original values can be retrieved?This one :)
32 bytes? This is enough
Take a look at stream ciphers and one-time pad algorithm. You can also use any block cipher in Cipher Feedback, Output Feedback or Counter mode.OK I 'll start with wikipedia and move on... Are these available in any of the well known toolkits? (crypto++ or qca or ?)
haha, missed the 'NOT', my bad. :DI have had the same problem so many times, but who's counting anyway :) (now that I think about it my Q&A does :mad: )

jacek
12th February 2007, 18:22
Are these available in any of the well known toolkits? (crypto++ or qca or ?)
No, but you can reuse some parts of those libraries to construct one.

wysota
12th February 2007, 22:12
There is also a question like would you like to use a key which one has to know to retrieve the original message (encryption) or not (encoding). For example, if you want the latter, then the simplest solution is to just mix the order of digits, for instance 1234 could yield 4132. Of course this is by no means encrypting or whatsoever.

For encryption you can use any encrypting algorithm and then encode the result to receive digits again (I think that was your goal). To retrieve the original, first decode and then decipher using the key.

yop
13th February 2007, 09:24
...and then encode the result to receive digits again... Your solution is very appealing but this step is kind of confusing me. Let me give an example (the encryption algorithm was just one of the many let's not argue about it):
Go to http://www.cs.eku.edu/faculty/styer/460/Encrypt/JS-AES.html
Input: 1234567890123456 (ascii)
Key: 0f1571c947d9e8590cb7add6af7f6798 (hex)
Output: 2f 7d 76 42 5e bb 85 e4 f2 e7 b0 08 68 bf 0f ce (hex)
Could you elaborate on the "encode result" part? It's obviously not digits. If this is RTFM material, do you have a FM link handy? Sorry, I 've obviously never used/handled/bothered with encryption before, but it seems that I will be needing something like what I 've described in the near future, so it's better to be prepared.

TheRonin
13th February 2007, 12:58
Your solution is very appealing but this step is kind of confusing me. Let me give an example (the encryption algorithm was just one of the many let's not argue about it):
Go to http://www.cs.eku.edu/faculty/styer/...pt/JS-AES.html
Input: 1234567890123456 (ascii)
Key: 0f1571c947d9e8590cb7add6af7f6798 (hex)
Output: 2f 7d 76 42 5e bb 85 e4 f2 e7 b0 08 68 bf 0f ce (hex)
Could you elaborate on the "encode result" part? It's obviously not digits. If this is RTFM material, do you have a FM link handy? Sorry, I 've obviously never used/handled/bothered with encryption before, but it seems that I will be needing something like what I 've described in the near future, so it's better to be prepared.


I believe that when wysota says encode, he means changing the structure of the data without losing information and without requiring a password. For instance, you could take your hex-data, and reencode the hex to decimal form, thereby getting only integers.

2f => 47, 7d => 125, etc.

So, i still don't quite understand what you are doing that requires the encrypted data be as long as the input and only integers. Those requirements sort of limit your options since the input data is treated as ascii, where 1 is 49, 2 is 50, etc. The act of encrypting ascii-characters will result in new ascii characters that don't necessarily fall within the range of values that correspond to the integer characters, 48-57. If it truly is a requirement of your application, then i'm sure we can find a good solution, otherwise i would suggest supporting either ascii-characters in the output or output that can be longer than the input...here's what i mean by that: (NOTE: all ascii values are in decimal form, in aggrement with the integer-requirement)

"157" => 49,53,55 (ascii values) => [some encryption routine] =>(hypothetical ascii output) 98,33,125 => "b!}". We can encode the output string to integers like so=>"9833125", but it's longer than the input.

I hope that this helps and that i'm not answering the wrong question again :D :o

jacek
13th February 2007, 20:51
Output: 2f 7d 76 42 5e bb 85 e4 f2 e7 b0 08 68 bf 0f ce (hex)
Could you elaborate on the "encode result" part? It's obviously not digits.
That's why I've suggested one-time pad, but there are three requirements that the key must meet: it can be used only once,
it must be of the same length as the message,
it must be Random(tm).

One-time pad works like this[1]:
QString cipherText( plainText.size(), QChar( 'x' ) );
for( int i = 0; i < plainText.size(); ++i ) {
cipherText[i] = QChar( '0' + (plainText[i].digitValue() + key[i].digitValue() % 10) );
}

The problem is that when somebody has both plainText and cipherText, he can do this:
for( int i = 0; i < plainText.size(); ++i ) {
key[i] = QChar( '0' + (cipherText[i].digitValue() - plainText[i].digitValue() % 10) );
}
That's why it's very important not to reuse the key. In other words, if you will reuse the key --- don't use it.

You can implement one-time pad using a good random generator (in such case the seed would be the real key) and that's what stream ciphers usually do.

You can also consider using RSA with "n" having the same number of digits as the input string, but if those strings have less than few hundred digits it might be easily broken.

[1] Of course it's usually implemented on bit level, so it uses XOR not addition modulo 10.

wysota
13th February 2007, 22:10
I think the first question we should ask here is "what do you need it for?"

yop
14th February 2007, 22:57
I think the first question we should ask here is "what do you need it for?"

Clients: Have a string of digits < 32 bytes
Server: Has the same string of digits

Expected outcomes:

1) Client encodes/encrypts the string in a way that it is difficult (NOT impossible) for the server to regenerate it. Any other client must be able to regenerate the string with no problem. The resulting string should be digits as well with the same strlen as the initial string.
2) Client produces a (strictly) 16 bit check sum that is impossible for the server to regenerate. Any other client can produce the same checksum.

In any case we can have "things" that the client only known but not the server. All the clients can share this "secret".

I just need some rough guidelines and then I can do my homework ;)

jacek
14th February 2007, 23:55
Client encodes/encrypts the string in a way that it is difficult (NOT impossible) for the server to regenerate it. Any other client must be able to regenerate the string with no problem.
If the server will only know the encrypted string, you could use one-time pad, since it won't be able to reproduce the key easily[1]. The only problem is how to seed the random generator --- you might use some pre-shared seed combined with current time (if the clients are synchronized).

Other solution you might try is to devise n string mangling schemes. Like simple arithmetical operations, transpositions and so on. Then clients can decide which scheme they're using currently. All the server will be able to do is to guess which scheme was used and the chance for the right guess is 1/n.

You might want to dig around zero-knowledge protocols.

[1] Note that if the server knows a (+) k and b (+) k it also knows a (-) b. Where (+) is digit-wise sum modulo 10.

wysota
15th February 2007, 00:47
I think a simple permutation of digits satisfies the scheme. If the clients know the scheme of permutations used, they can reverse the process and the server won't have any knowledge about anything if it doesn't know the scheme. I think such thing is used by a security mechanism used by banks (and other) known as the token. I mean the "key change" scheme, the permutation is just a way to scramble input.

TheRonin
15th February 2007, 14:01
One way of doing the scrambling is to use a modified version of the RC4 stream cypher. The RC4 uses an array of bytes from 0 to 255. This array is then 'scrambled' with a password. The resulting scrambled array, we'll call it the s-box, is then used to encrypt the data by a similar scrambling process. The resulting cyphertext will be as long as the input-text.

My proposal is that you instead of using an array from 0 to 255, go from 0 to 9. Then the scrambling and encryption will only occur over those 10 digits. I.e., you're restricting your finite field to 10 instead of 256.

The reason for modifying RC4 instead of just making something up is that it's a known algorithm that you can find source code for online to alter AND it's *very* quick and simple. Also, it means that each client can have a list of keys instead of different procedures for scrambling data.

So the clients can share keys and decide on one to use for one communication session. They then encrypt their digits with the session key and send that. Only those that know the session key can decrypt the data.

With this scheme, you can make sure all the clients share a checksum-key. This key can be used to perform an encryption of the data that only the clients can reproduce - to create your checksum.

What thinks? :)

jacek
15th February 2007, 14:30
The reason for modifying RC4 instead of just making something up is that it's a known algorithm that you can find source code for online to alter AND it's *very* quick and simple. Also, it means that each client can have a list of keys instead of different procedures for scrambling data.
RC4 has a well-known weakness, so one should at least skip the initial bytes it generates.

The idea of several scrambling schemes comes from a method for electronic cash verification.

TheRonin
15th February 2007, 16:37
RC4 has a well-known weakness, so one should at least skip the initial bytes it generates.

Indeed, statistical analysis of the first few bytes can reveal a lot about the status of the s-box, indirectly revealing information about the key (the mentioned analysis requires a known plaintext to work). I made the assumption that our friend yop didn't require any extreme encryption scheme for his application. If that were the case, a stream cypher would not be recommended at all. Also, there are methods of patching the mentioned statistical weakness, such as you suggest, discarding the initial output.

I would think that the security in using different schemes instead of one scheme with keys would be less because of the reduced keyspace. If i were to use one of say, 100 schemes, i only have 100 possible combinations. Compare this to using a key of just 8 bits that gives you 256 possible combinations. You get greater security with less work since you don't need to create a bunch of different scrambling methods.

Perhaps they combine the two to get added security, but i would think that doing so would produce more potential security problems since some scrambling schemes may be weaker and more prone to attack than others. I believe using a strong method of encryption along with a stable framework for key distribution would be the wise choice, in the field. But i digress...Again, i'm assuming yop is not working on some critical system with enormous security requirements. If he is, i appologize for leading him astray with the less than optimal RC4..just trying to be helpful :o :D

jacek
15th February 2007, 17:03
If i were to use one of say, 100 schemes, i only have 100 possible combinations. Compare this to using a key of just 8 bits that gives you 256 possible combinations. You get greater security with less work since you don't need to create a bunch of different scrambling methods.
What about 20922789888000 schemes? And all you need is to write a single method that permutes 16-bit value. ;) As long as the server can't tell the difference between a valid string and scrambled one (either by using data it has at hand or by asking one of the clients), you're protected by plain probability.


I believe using a strong method of encryption along with a stable framework for key distribution would be the wise choice, in the field.
Yes, you're right, but it's hard to meet those same-length and digits-only requirements.