Hashing Just Random Numbers

My recent project had some issues with hashing some 10 million numbers. To analyze the matter I wrote a small test program, see numberhash.c.

I wanted to know which influence the following factors play:

  1. hashing just numbers (no alphabetic characters)
  2. ASCII vs. EBCIDC
  3. choice of hash function
  4. load factor
  5. distribution of collisions

Continue reading

Advertisements

Very simple SHA1 test program written in C

Here is a simple test program to call SHA1 hashing routine from OpenSSL.

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>

int main (int argc, char *argv[]) {
        unsigned long i, n;
        unsigned char md[1024];

        if (argc <= 1) return 0;

        n = strlen(argv[1]);
        SHA1((unsigned char*)argv[1],n,md);

        for (i=0; i<SHA_DIGEST_LENGTH; ++i)
                printf("%02x",md[i]);
        puts("");

        return 0;
}

Compile with

cc -Wall sha1tst.c -o sha1tst -lcrypto

It is important to give the -l flag after -o.

Some tests:

$ ./sha1tst ABCabc
135488ccc0c5e5a3d0ac437aac1821bba9347b3d
$ printf "ABCabc" | sha1sum
135488ccc0c5e5a3d0ac437aac1821bba9347b3d  -

In Ubuntu the openssl development libraries are in libssl-dev.

Hash functions: An empirical comparison — article by Peter Kankowski

Peter Kankowski wrote a very interesting article on hashing functions. It compares a number of current hash functions and conducts some performance benchmarks.

  1. iSCSI CRC
  2. Meiyan
  3. Murmur2
  4. XXHfast32
  5. SBox (dead link: http://home.comcast.net/~bretm/hash/10.html)
  6. Larson
  7. XXHstrong32
  8. Sedgewick
  9. Novak unrolled
  10. CRC-32
  11. Murmur3
  12. x65599
  13. FNV (Fowler–Noll–Vo) hash
  14. Murmur2A
  15. Fletcher
  16. Kernighan & Ritchie
  17. Paul Hsieh
  18. Bernstein
  19. x17 unrolled
  20. lookup3
  21. MaPrime2c
  22. Ramakrishna
  23. One At Time
  24. Arash Partow
  25. Weinberger
  26. Hanson

A more complicated algorithm does not necessarily mean better performance. So the classical Kernighan & Ritchie hash still performs quite well.

GNU C: Extensions to the C Language Family

Good to know.

memset's blog

Hi. Today I’ll talk about the extensions to the C language family introduced by the GNU C.
The GNU C provides several language features not found in ANSI standard C. These extensions are available both in C and C++. The `-pedantic’ option directs GNU CC to print a warning message if any of these features is used.
The list of these features is very long: often we use them implicitly. I will show to you only those I consider most useful and “strange”:

View original post 279 more words

Memory Limitations with IBM Enterprise COBOL Compiler

Recently I learned the hard way that IBM Enterprise COBOL compiler cannot generate 8-byte long POINTER variables, but only 4-byte pointers, meaning, that you cannot use more than 2GB in COBOL on a mainframe. I.e., you cannot make use of AMODE=64 with COBOL on the mainframe. You can run in AMODE=64, but you cannot exploit it. BTW, we have the year 2013. So, no big-data on mainframe with COBOL.

IBM Assembler and C/C++ can fully exploit AMODE=64, i.e., can use 8-byte long pointers.

Generators are now in PHP 5.5

Generators (and therefore coroutines) are now part of PHP (Wikipedia) 5.5, as of 20-Jun-2013. Here is an example:

function xrange($start,$end) {
    for ($i = $start; $i<=$end; ++$i)
        yield($i);
}

The Icon programming language (Wikipeda) was one of the first computer languages where generators are completely general and may occur in any computation. Icon is goal-directed in the sense that the evaluation mechanism attempts to produce at least one result for all expressions. yield is analogous to Icon’s suspend.

Icon can limit generators, PHP apparently cannot. Icon uses

expr \ i

for this limitation.

The Python programming language also provides generators. A simple example is

def xrange(a,b):
    for i in range(a,b):
        yield i

WordPress Market Share 2011

I read a short note in Heise, that in 2011 WordPress was one of the most popular CMS. The report compared

  1. Alfresco WCM
  2. CMSMadeSimple
  3. Concrete5
  4. DotNetNuke
  5. Drupal
  6. e107
  7. eZ Publish
  8. Joomla!
  9. Liferay
  10. MODx
  11. Movable Type
  12. OpenCms
  13. Plone
  14. SilverStripe
  15. Textpattern
  16. Tiki Wiki CMS Groupware
  17. Typo3
  18. Umbraco
  19. WordPress
  20. Xoops

The corresponding document can be found in Water & Stone report. This report analyzes

  1. Downloads
  2. Installations
  3. Developer Support
  4. Books in Print
  5. Search Engine Visibility
  6. Google Page Rank
  7. Reputation

Unfortunately, there seems to be no more recent report on this at Water & Stone.