Hashing Just Random Numbers

My recent project had some issues with hashing some 10 million numbers. To analyze the matter I wrote a small test program, see numberhash.c.

I wanted to know which influence the following factors play:

  1. hashing just numbers (no alphabetic characters)
  2. ASCII vs. EBCIDC
  3. choice of hash function
  4. load factor
  5. distribution of collisions

Continue reading

Very simple SHA1 test program written in C

Here is a simple test program to call SHA1 hashing routine from OpenSSL.

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>

int main (int argc, char *argv[]) {
        unsigned long i, n;
        unsigned char md[1024];

        if (argc <= 1) return 0;

        n = strlen(argv[1]);
        SHA1((unsigned char*)argv[1],n,md);

        for (i=0; i<SHA_DIGEST_LENGTH; ++i)
                printf("%02x",md[i]);
        puts("");

        return 0;
}

Compile with

cc -Wall sha1tst.c -o sha1tst -lcrypto

It is important to give the -l flag after -o.

Some tests:

$ ./sha1tst ABCabc
135488ccc0c5e5a3d0ac437aac1821bba9347b3d
$ printf "ABCabc" | sha1sum
135488ccc0c5e5a3d0ac437aac1821bba9347b3d  -

In Ubuntu the openssl development libraries are in libssl-dev.

Hash functions: An empirical comparison — article by Peter Kankowski

Peter Kankowski wrote a very interesting article on hashing functions. It compares a number of current hash functions and conducts some performance benchmarks.

  1. iSCSI CRC
  2. Meiyan
  3. Murmur2
  4. XXHfast32
  5. SBox (dead link: http://home.comcast.net/~bretm/hash/10.html)
  6. Larson
  7. XXHstrong32
  8. Sedgewick
  9. Novak unrolled
  10. CRC-32
  11. Murmur3
  12. x65599
  13. FNV (Fowler–Noll–Vo) hash
  14. Murmur2A
  15. Fletcher
  16. Kernighan & Ritchie
  17. Paul Hsieh
  18. Bernstein
  19. x17 unrolled
  20. lookup3
  21. MaPrime2c
  22. Ramakrishna
  23. One At Time
  24. Arash Partow
  25. Weinberger
  26. Hanson

A more complicated algorithm does not necessarily mean better performance. So the classical Kernighan & Ritchie hash still performs quite well.

GNU C: Extensions to the C Language Family

Good to know.

memset's blog

Hi. Today I’ll talk about the extensions to the C language family introduced by the GNU C.
The GNU C provides several language features not found in ANSI standard C. These extensions are available both in C and C++. The `-pedantic’ option directs GNU CC to print a warning message if any of these features is used.
The list of these features is very long: often we use them implicitly. I will show to you only those I consider most useful and “strange”:

View original post 279 more words

Memory Limitations with IBM Enterprise COBOL Compiler

Recently I learned the hard way that IBM Enterprise COBOL compiler for z/OS (mainframe) cannot generate 8-byte long POINTER variables, but only 4-byte pointers. This means, you cannot use more than 2GB in COBOL on a mainframe. I.e., you cannot make use of AMODE=64 with COBOL on the mainframe. You can run in AMODE=64, but you cannot exploit it. By the way, we have the year 2013. So, no big-data on mainframe with COBOL.

IBM Assembler and C/C++ can fully exploit AMODE=64, i.e., can use 8-byte long pointers.

Added 16-May-2020:

  1. Enterprise COBOL for z/OS, Version 4.2.0: No AMODE=64
  2. Enterprise COBOL for z/OS, Version 5.2.0: No AMODE=64
  3. Enterprise COBOL for z/OS, Version 6.1.0: No AMODE=64
  4. Enterprise COBOL for z/OS, Version 6.2.0: No AMODE=64
  5. Enterprise COBOL for z/OS, Version 6.3.0: Hurray, now finally AMODE=64 support since September 2019

For a list of versions see IBM Enterprise COBOL for z/OS.

Generators are now in PHP 5.5

Generators (and therefore coroutines) are now part of PHP (Wikipedia) 5.5, as of 20-Jun-2013. Here is an example:

function xrange($start,$end) {
    for ($i = $start; $i<=$end; ++$i)
        yield($i);
}

The Icon programming language (Wikipeda) was one of the first computer languages where generators are completely general and may occur in any computation. Icon is goal-directed in the sense that the evaluation mechanism attempts to produce at least one result for all expressions. yield is analogous to Icon’s suspend.

Icon can limit generators, PHP apparently cannot. Icon uses

expr \ i

for this limitation.

The Python programming language also provides generators. A simple example is

def xrange(a,b):
    for i in range(a,b):
        yield i

WordPress Market Share 2011

I read a short note in Heise, that in 2011 WordPress was one of the most popular CMS. The report compared

  1. Alfresco WCM
  2. CMSMadeSimple
  3. Concrete5
  4. DotNetNuke
  5. Drupal
  6. e107
  7. eZ Publish
  8. Joomla!
  9. Liferay
  10. MODx
  11. Movable Type
  12. OpenCms
  13. Plone
  14. SilverStripe
  15. Textpattern
  16. Tiki Wiki CMS Groupware
  17. Typo3
  18. Umbraco
  19. WordPress
  20. Xoops

The corresponding document can be found in Water & Stone report. This report analyzes

  1. Downloads
  2. Installations
  3. Developer Support
  4. Books in Print
  5. Search Engine Visibility
  6. Google Page Rank
  7. Reputation

Unfortunately, there seems to be no more recent report on this at Water & Stone.

Kepler’s Hypothesis explained by Brian Koberlein

I copy Brian Koberlein’s explanations in Google+ on the history of Kepler’s law.

Kepler’s first two rules, that the orbit of a planet is an ellipse, and that a line drawn from the Sun to a planet sweeps out area at a constant rate were proposed in 1609. While these rules allowed for a more accurate description of observed planetary motion, they weren’t perfect. For one thing, the planets don’t actually move in exact ellipses, nor is Kepler’s “constant area” rule exact.

Continue reading

Running CPU/GPU Intensive Jobs on Titan Supercomputer

There is a an INCITE program (HPC Call for Proposals), where one can apply for CPU/GPU intensive jobs, the link is INCITE.

From the FAQ: The INCITE program is open to US- and non-US-based researchers and research organizations needing large allocations of computer time, supporting resources, and data storage to pursue transformational advances in science and engineering.

The machines in question: Mira and Titan.

Positive Erfahrungen mit Unitymedia

Bisher habe ich mit der Firma Unitymedia ausschließlich positive Erfahrungen gemacht. Sie sind es wert, kurz festgehalten zu werden.

Am 20. August 2009 habe ich bei Unitymedia Internet (20MBit/s) und Telefon bestellt. Am nächsten Tag stand ein Herr Niclas Roth von Computer-Füchse vor der Tür und hat Kabelmodem und Telefon eingerichtet. Alles lief auf Anhieb.

Anfang 2012 habe ich von 20 MBit/s auf 50MBit/s hochgerüstet. Unitymedia schickte mir das hierfür erforderliche neue Kabelmodem zu. Der Einsatz erfolgte klaglos.

Am 13. Juni 2013 fiel Internet und Telefon aus. Am nächsten Tag schickte Unitymedia einen Techniker, der das Problem behob.

Wenn ich an meine Erfahrungen zu ähnlichen Situationen mit der Telekom oder 1&1 denke, dann ist dies ein Unterschied wie Tag und Nacht.

Cisco Cablemodem Signal to Noise Ratio

For internet connection I use a cable modem EPC3208 from Cisco, which was supplied by the cable provider, Unitymedia in my case.

It shows the following measurements regarding signal-to-noise ratio, modem is at 192.168.100.1:

Signal to Noise

More information on this modem can be found here: Cisco EPC3208. According to Cisco modem the user and password credentials are admin/atlanta.