Hashing Just Random Numbers

My recent project had some issues with hashing some 10 million numbers. To analyze the matter I wrote a small test program, see numberhash.c.

I wanted to know which influence the following factors play:

  1. hashing just numbers (no alphabetic characters)
  2. ASCII vs. EBCIDC
  3. choice of hash function
  4. load factor
  5. distribution of collisions

Continue reading

Advertisements

Very simple SHA1 test program written in C

Here is a simple test program to call SHA1 hashing routine from OpenSSL.

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>

int main (int argc, char *argv[]) {
        unsigned long i, n;
        unsigned char md[1024];

        if (argc <= 1) return 0;

        n = strlen(argv[1]);
        SHA1((unsigned char*)argv[1],n,md);

        for (i=0; i<SHA_DIGEST_LENGTH; ++i)
                printf("%02x",md[i]);
        puts("");

        return 0;
}

Compile with

cc -Wall sha1tst.c -o sha1tst -lcrypto

It is important to give the -l flag after -o.

Some tests:

$ ./sha1tst ABCabc
135488ccc0c5e5a3d0ac437aac1821bba9347b3d
$ printf "ABCabc" | sha1sum
135488ccc0c5e5a3d0ac437aac1821bba9347b3d  -

In Ubuntu the openssl development libraries are in libssl-dev.

Hash functions: An empirical comparison — article by Peter Kankowski

Peter Kankowski wrote a very interesting article on hashing functions. It compares a number of current hash functions and conducts some performance benchmarks.

  1. iSCSI CRC
  2. Meiyan
  3. Murmur2
  4. XXHfast32
  5. SBox (dead link: http://home.comcast.net/~bretm/hash/10.html)
  6. Larson
  7. XXHstrong32
  8. Sedgewick
  9. Novak unrolled
  10. CRC-32
  11. Murmur3
  12. x65599
  13. FNV (Fowler–Noll–Vo) hash
  14. Murmur2A
  15. Fletcher
  16. Kernighan & Ritchie
  17. Paul Hsieh
  18. Bernstein
  19. x17 unrolled
  20. lookup3
  21. MaPrime2c
  22. Ramakrishna
  23. One At Time
  24. Arash Partow
  25. Weinberger
  26. Hanson

A more complicated algorithm does not necessarily mean better performance. So the classical Kernighan & Ritchie hash still performs quite well.