Hashing Just Random Numbers

My recent project had some issues with hashing some 10 million numbers. To analyze the matter I wrote a small test program, see numberhash.c.

I wanted to know which influence the following factors play:

  1. hashing just numbers (no alphabetic characters)
  3. choice of hash function
  4. load factor
  5. distribution of collisions

Continue reading

Very simple SHA1 test program written in C

Here is a simple test program to call SHA1 hashing routine from OpenSSL.

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>

int main (int argc, char *argv[]) {
        unsigned long i, n;
        unsigned char md[1024];

        if (argc <= 1) return 0;

        n = strlen(argv[1]);
        SHA1((unsigned char*)argv[1],n,md);

        for (i=0; i<SHA_DIGEST_LENGTH; ++i)

        return 0;

Compile with

cc -Wall sha1tst.c -o sha1tst -lcrypto

It is important to give the -l flag after -o.

Some tests:

$ ./sha1tst ABCabc
$ printf "ABCabc" | sha1sum
135488ccc0c5e5a3d0ac437aac1821bba9347b3d  -

In Ubuntu the openssl development libraries are in libssl-dev.

Hash functions: An empirical comparison — article by Peter Kankowski

Peter Kankowski wrote a very interesting article on hashing functions. It compares a number of current hash functions and conducts some performance benchmarks.

  1. iSCSI CRC
  2. Meiyan
  3. Murmur2
  4. XXHfast32
  5. SBox (dead link: http://home.comcast.net/~bretm/hash/10.html)
  6. Larson
  7. XXHstrong32
  8. Sedgewick
  9. Novak unrolled
  10. CRC-32
  11. Murmur3
  12. x65599
  13. FNV (Fowler–Noll–Vo) hash
  14. Murmur2A
  15. Fletcher
  16. Kernighan & Ritchie
  17. Paul Hsieh
  18. Bernstein
  19. x17 unrolled
  20. lookup3
  21. MaPrime2c
  22. Ramakrishna
  23. One At Time
  24. Arash Partow
  25. Weinberger
  26. Hanson

A more complicated algorithm does not necessarily mean better performance. So the classical Kernighan & Ritchie hash still performs quite well.