mirror of
https://github.com/sigmasternchen/libsfuid
synced 2025-03-15 16:09:02 +00:00
68 lines
3.4 KiB
Markdown
68 lines
3.4 KiB
Markdown
# libsfuid
|
|
Semantic-Free Unique ID En-/Decoder Lib
|
|
|
|
## What is this?
|
|
|
|
This is a lib that encodes enumerated IDs into a seemingly random string of characters.
|
|
|
|
Possible usecases include the obscuration of internal IDs in a system for user generated content so that no information about
|
|
the existence of other postings is leaked to attackers.
|
|
|
|
## So this is a library for hashing?
|
|
|
|
Not quite. Although there are certain similarities.
|
|
The biggest difference is that a hash is in general not reverseable. The encoded IDs of this library are. This might be useful
|
|
because no redundant data has to be stored.
|
|
By extension this means that there are no collisions in the generated IDs. Thus the input range of the encoder is limited.
|
|
|
|
## How is this useful if the input range is limited?
|
|
|
|
Well, by default the lib provides about 576 quadrillion unique IDs (2^59). If more are needed the settings can be tuned to get
|
|
about 18 quintillion (2^64) unique IDs - at the expense of not using the output space efficiently.
|
|
|
|
If you really need more then message me and I'll try to upgrade to 128 bit.
|
|
|
|
## What about timing?
|
|
|
|
I think I got it working pretty fast. On my laptop (i7-4600U) encoding takes about 500 ns. Decoding is a bit slower and takes
|
|
about 2 µs. I provided the benchmark script I used in case you want to check on your own machine.
|
|
|
|
## Does this work everywhere?
|
|
|
|
Probably not. The lib needs a 128 bit integer type. So I don't think this is working on any 32 bit machines. Also the type
|
|
definition I used is specific for GCC. If you get it to work on another compiler, please tell me.
|
|
|
|
## How can I use it?
|
|
|
|
Oh, that's pretty simple.
|
|
|
|
```
|
|
sfuid_settings_t settings = sfuid_default_settings;
|
|
sfuid_init(settings);
|
|
|
|
// Now we can use it.
|
|
|
|
char* string = malloc(settings.length + 1);
|
|
sfuid_encode(42, string);
|
|
|
|
uint64_t result;
|
|
sfuid_decode(string, &result);
|
|
```
|
|
|
|
## How does this work?
|
|
|
|
It's basically a kind of MCG (multiplicative congruential generator) but instead of the last entry in the series being the parameter I used the current ID. That means the result is not as random because the distance between consecutive ID is basically constant. The reason I did this is to save computation time (exponentiation would work, but 128 bit won't be enough for numbers that size).
|
|
|
|
So the formular to get a specific ID is:
|
|
s(i) := p\*i mod m
|
|
|
|
The condition for p and m is that their GCD is 1. I chose m to be a power of 2 so it's prime factors are only 2s. For p I chose a prime that is about 25 % of m. For that I hardcoded a list of possible primes candidates into the program.
|
|
|
|
To make it more efficient the program calculates m to be the greatest power of 2 that fits in the output space.
|
|
For example: Let the character set be "0-9" (10 characters) and the length of the result is 4, the output space would be 10^4. The biggest power of 2 in that space is 2^13. That's m. The p would be 2039 (about 25 % of m).
|
|
|
|
(Note to myself: Maybe just using a big Mersenne prime (like 2^61-1) would also work. That would maximize the useable output space efficiency, because it's guaranteed that x^y with y > 1 is not a prime, thus m can be size of the output space.)
|
|
|
|
The result of the MCG is then converted into the string by treating it as a number with the length of the charset as its basis.
|
|
|
|
As for the decoding we have to solve the congruence equation x == p * i (mod m) for i. We can do that by multiplying x the multiplicative inverse of p and m.
|