Hashing with Iterator in C

Many scripting languages, like Awk, Perl, Python, Lua, and so on, have hash-tables already built-in. I needed hashing functionality similar to Perl’s hash-variables. In particular I needed lookup, insert, and iterating over all entries.

I wrote on hashing in Hashing Just Random Numbers and Hash functions: An empirical comparison — article by Peter Kankowski. See also Revisiting hash table performance, or TommyDS: a C library of hashtables and tries.

1. Structure. Each hash element contains a key and a value. Additionally there is an indicator whether storage allocation using malloc() for key and value are done during hash insert, or are done elsewhere, e.g., constant strings. All hash elements having the same hash value, see below, are inserted into a singly linked list. Additionally all elements are part of singly linked list.

struct hashElem {
        int dup;        // memory ownership: 0, 0x01, 0x02, 0x03
        char *key, *val;
        struct hashElem *next;  // next hash element with same key or NULL
        struct hashElem *nextAll;       // for iterator over all hash entries

There is a hash header which contains the start of the list, and the actual table.

typedef struct {
        int size, collisions, load;
        const char *name;       // only for diagnostic and debugging
        struct hashElem **tab;  // hash table with size times (hashElem*)
        struct hashElem *first; // first element in hash
} hash_t;

Variable load is the number of entries in the hash table. So

        if (zzr01Hash->load > 0) ...

is equivalent to the following Perl code

        if (keys %zzr01Hash > 0) ...

The C program uses 7 hash tables. Each of these is initialized by

// Initialize hash with given size, return NULL on error
hash_t *hashInit (int size, const char *name) {
        hash_t *h = (hash_t*)malloc(sizeof(hash_t));
        if (h == NULL) return NULL;
        // allocate size x pointer to hashElem, calloc clears memory
        h->tab = (struct hashElem**)calloc(sizeof(struct hashElem*),size);
        if (h->tab == NULL) return NULL;

        h->size = size;
        h->collisions = 0;
        h->load = 0;
        h->name = name;
        h->first = NULL;
        return h;

So the C code

        pgmHash = hashInit(ZZ_HASHSIZE,"pgmHash");
        greenHash = hashInit(GREEN_HASHSIZE,"greenHash");
        zzHash = hashInit(ZZ_HASHSIZE,"zzHash");
        decommHash = hashInit(CORIA_HASHSIZE,"decommHash");
        dynHash = hashInit(CALL_HASHSIZE,"dynHash");
        callHash = hashInit(CALL_HASHSIZE,"callHash");
        zzr01Hash = hashInit(CALL_HASHSIZE,"zzr01Hash");

is equivalent to the following Perl code

        my (%pgmHash, %greenHash, %zzHash, %decommHash, %dynHash, %callHash, %zzr01Hash);

Clearly, the Perl version is much more terse and easier to write.

2. Lookup. For lookup I used a slight variation of Kernighan & Ritchie‘s hash function given in their seminal work The C Programming Language. Instead of adding up the characters I used exclusive-or instead. Initial value is 5381 from Dan Bernstein, instead of zero.

// Hashing function according Kernighan & Ritchie,
// The C Programming Language, chapter 6.6
unsigned int hashKR (hash_t *h, const char *s) {
        unsigned int hv = 5381;
        for (; *s; ++s) {
                hv = 31 * hv ^ *s;
        return hv % h->size;

Lookup is now

// Find key in hash
const char *hashLookup (hash_t *h, const char *key) {
        unsigned int hv = hashKR(h,key);
        struct hashElem *e = h->tab[hv];

        while (e) {
                if (strcmp(e->key,key) == 0) return e->val;
                e = e->next;
        return NULL;

So the C code

        if (hashLookup(pgmHash,token)) ...

is equivalent to the following Perl code

        if (defined($pgmHash{$token}) ...

3. Insertion. Inserting new key/value pair is

// Insert key into hash h
// Use strdup() for key/val if dup nonzero
// possible values for dup: 0x00 no strdup(), 0x01: key ony, 0x02: val only, 0x03: key+val
int hashInsert (hash_t *h, const char *key, const char *val, int dup) {
        unsigned int hv = hashKR(h,key);
        struct hashElem *e = h->tab[hv];

        while (e) {
                if (strcmp(e->key,key) == 0) {
                        e->dup = dup;   // overwrite dup
                        // check if exact same key/val pair already present
                        if (strcmp(e->val,val) == 0) return 0;
                        if (dup & 0x02) free(e->val);   // overwrite val
                        return (e->val = dup & 0x02 ? strdup(val) : (char*)val) == NULL;
                e = e->next;
                h->collisions += 1;
        // key not found, i.e., e==NULL || collisions > 0
        h->load += 1;   // new element
        e = (struct hashElem*)malloc(sizeof(struct hashElem));
        if (e == NULL) return 2;
        e->dup = dup;
        e->key = dup & 0x01 ? strdup(key) : (char*)key;
        e->val = dup & 0x02 ? strdup(val) : (char*)val;
        if (e->key == NULL || e->val == NULL) return 3;

        // elements with same keys are all members of a singly linked list
        e->next = h->tab[hv];   // prepend one-self at head of list
        h->tab[hv] = e;

        // maintain 2nd singly linked list for iterator over all hash elements
        e->nextAll = h->first;
        h->first = e;
        return 0;       // all is well

Depending on the duplicate variable it either calls strdup() oder just copies the pointer. Function strdup() calls malloc().
The C code


is equivalent to the Perl code

        $dynHash{$token1} = $tokenJ;

Three hash tables are re-used over and over again. So there must be a function to clear the hash table.

// Free all key/value pairs in hash, but hash-table memory h->tab[] is not freed.
// Hash table can then be reused for new key/value pairs.
void hashClear (hash_t *h) {
        int i;
        struct hashElem *e, *sv;

        for (i=0; i < h->size; ++i) {
                for (e=h->tab[i]; e; ) {
                        if (e->dup & 0x01) free(e->key);
                        if (e->dup & 0x02) free(e->val);
                        e->nextAll = NULL;
                        sv = e; // save this e
                        e = e->next;
                h->tab[i] = NULL;       // clear slot in tab[]

        h->collisions = 0;
        h->load = 0;
        h->first = NULL;

So the C code


is equivalent to the following Perl code

        (%dynHash, %callHash, %zzr01Hash) = ((), (), ());

4. Statistics. For debugging and informational purposes it is advantageous to have information on the distribution of key/value pairs in our hash-tables.

Function hashStat() populates an array with the distribution of key/value pairs. Above a certain threshold, given in distrMax, all collisions are grouped collectively.

// Statistics on hash table
// distr[0] = number of empty slots
// distr[1] = number of entries with exactly one element
// distr[2] = number of entries with exactly two elements in linked list
// and so on
// last distr[distrMax] lumps all the rest
void hashStat (hash_t *h, int distr[], int distrMax) {
        int i, cnt;
        struct hashElem *e;

        if (distrMax < 0) return;
        for (i=0; i<=distrMax; ++i) distr[i] = 0;       // clear statistics

        for (i=0; i < h->size; ++i) {
                cnt = 0;
                for (e=h->tab[i]; e; e=e->next)
                distr[cnt<=distrMax?cnt:distrMax] += 1;

Printing this distribution is now:

void hashStatPrt (hash_t *h) {
        enum { DISTRMAX = 10 };
        int i, distr[DISTRMAX];
        printf("\t%s: size=%d, load=%d, load/size=%7.4f, collisions=%d\n",
                h->name, h->size, h->load, h->load / (double)(h->size), h->collisions);
        for (i=0; i<DISTRMAX; ++i)

Example output for distribution using



        pgmHash: size=97007, load=5393, load/size= 0.0556, collisions=145
                distr[ 0]=91758
                distr[ 1]=5106
                distr[ 2]=142
                distr[ 3]=1
                distr[ 4]=0
                distr[ 5]=0
                distr[ 6]=0
                distr[ 7]=0
                distr[ 8]=0
                distr[ 9]=0
        greenHash: size=73, load=13, load/size= 0.1781, collisions=0
                distr[ 0]=60
                distr[ 1]=13
                distr[ 2]=0
                distr[ 3]=0
                distr[ 4]=0
                distr[ 5]=0
                distr[ 6]=0
                distr[ 7]=0
                distr[ 8]=0
                distr[ 9]=0
        zzHash: size=97007, load=43767, load/size= 0.4512, collisions=10211
                distr[ 0]=62029
                distr[ 1]=27466
                distr[ 2]=6367
                distr[ 3]=1026
                distr[ 4]=106
                distr[ 5]=13
                distr[ 6]=0
                distr[ 7]=0
                distr[ 8]=0
                distr[ 9]=0

Just for debugging, printing all entries in the hash-table is simple:

void hashPrt (hash_t *h) {
        int i=0;
        struct hashElem *e;
        for (e=h->first; e; e=e->nextAll)
                printf("\t\t%d key=|%s|, val=|%s|\n",++i,e->key,e->val);

Filippo Mantovani: ARM for HPC

On 23-Oct-2017 Filippo Mantovani held a talk in Darmstadt on “Mobile technology for production-ready high-performance computing systems: The path of the Mont-Blanc project”. Unfortunately I was unable to attend, but Mr. Mantovani sent me his Darmstadt Seminar slides. As his slides and documents are very interesting to people using or intending to use ARM in HPC, I copy these documents here, so they are easily available. I also copied a report on “MB3_D6.4 Report on application tuning and optimization on ARM platform“.

Optimal Product Portfolio

1. Prerequisites

Assume one owns n>0 different products, n\in\mathbb{N}, usually n\approx 100. Each product has a cost c_i>0, c_i\in\mathbb{R}, associated with this ownership, i=1,\ldots, n. Let N=\left\{1,\ldots,n\right\}, and \mathbb{P}(N) be the powerset of N, i.e., the set of all subsets. The powerset \mathbb{P}(N) has 2^n elements.

There are two real-valued functions f and g operating on \mathbb{P}(N), i.e., f,g\colon\mathbb{P}(N)\to\mathbb{R^+}. Function g(I) denotes the sum of the cost of the product set given by I\in\mathbb{P}(N), g is therefore easy to evaluate. I.e.,

\displaystyle{      g(I) = \sum_{i\in I} c_i  }

Function f(I) denotes the buying cost if one can dispose the set of products given by I\in\mathbb{P}(N). This means, it costs money to get rid of the products given by set I. As the products have interrelationships, f is more costly to evaluate than g and is given by some table lookup. Function f does not depend on c_i.

Some notes on f: Assume a non-negative matrix A=(a_{ij}), where a_{ij} denotes the cost to decouple the i-th product from the j-th. Then

\displaystyle{      f(I) = \sum_{i\in I} \sum_{j\in N} a_{ij}  }

Usually a_{ij}=0, if i<j, i.e., A is upper triangular, because decoupling product i from product j does not need to be done again for decoupling the other direction from j to i. More zeros in above sum are necessary if decoupling product i is sufficient if done once and once only. In practical application cases f depends on a real-valued parameter \gamma\in[0,1] giving different solution scenarios.

Example: For three products let c_1=8, c_2=4, c_3=5, thus cost of ownership is g(\emptyset)=0, g(\left\{1\right\})=8, g(\left\{1,2\right\})=8+4=12, g(\left\{1,3\right\})=8+5=13, g(\left\{1,2,3\right\})=8+4+5=17, and so on for the rest of 2^3-5=3 sets. Function f gives positive values in a similar fashion, although, as mentioned before, these values are not related to c_i.

2. Problem statement

Find the set I of products which gives the least cost, i.e., find I for

\displaystyle{      \min_{I\in\mathbb{P}(N)}\left(f(I)-g(I)\right)  }

Rationale: One invests f(I) to get rid of the products in set I but saves ownership costs given by g(I).

I does not need to be unique. If f depends on \gamma then I will likely also depend on \gamma.

3. Challenges

As n can be “large”, evaluating all possible combinations becomes prohibitive on todays computers. Maybe one of our gentle readers knows a method to find the optimum, or maybe just a nearby solution to above problem, or a probabilistic approach. Any comment is welcome.

Statistics of this Blog: Crossed 60.000 Views

This blog was viewed more than 60.000 times since its inception and had more than 45.000 visitors. I wrote about the development of this blog as follows:

  1. 2014/05/07: 5,000 Views
  2. 2014/09/10: 10,000 Views
  3. 2014/12/27: 15,000 Views
  4. 2015/04/23: 20,000 Views
  5. 2016/01/24: 30,000 Views
  6. 2016/11/12: 40,000 Views
  7. 2017/05/18: 50,000 views

The averages per month are:

Between 1,000 and 2,000 views per month in bar chart form:

As Countries, USA, Germany, and India at the top:

Referrers, Google by far the most important referrer:

The most popular blog posts are:

C Pointer Surprises

An article by Krister Walfridsson on C pointers are not hardware pointers demonstrated that even adjacent integer variables having the same hardware address may compare unequal regarding C pointers.

See the following C program:

#include <stdio.h>

int main(int argc, char *argv[]) {
        int x, y;
        int *p = &x + 1;
        int *q = &y;
        printf("%p %p %d\n", (void*)p, (void*)q, p == q);
        return 0;

You have to compile with optimization enabled, e.g., cc -O3. Otherwise gcc adds some stuff between variables. On AMD/Intel/ARM CPUs the output looks something like this:

0xbe849afc 0xbe849afc 0

I.e., the pointers point to the same address, but the pointer comparison gives “false”.

Added 06-Aug-2017: As hinted by the comment given by Ashwin Nanjappa below, the compiler actually does not generate compare instructions, but rather just adds 0=false.


$ cc -Wall -O3 -c ptrcomp.c
$ objdump -d ptrcomp.o


ptrcomp.o:     file format elf64-x86-64

Disassembly of section .text.startup:

0000000000000000 <main>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # b <main+0xb>
   b:   31 c9                   xor    %ecx,%ecx
   d:   48 8d 54 24 04          lea    0x4(%rsp),%rdx
  12:   64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
  19:   00 00
  1b:   48 89 44 24 08          mov    %rax,0x8(%rsp)
  20:   31 c0                   xor    %eax,%eax
  22:   48 89 d6                mov    %rdx,%rsi
  25:   e8 00 00 00 00          callq  2a <main+0x2a>
  2a:   48 8b 4c 24 08          mov    0x8(%rsp),%rcx
  2f:   64 48 33 0c 25 28 00    xor    %fs:0x28,%rcx
  36:   00 00
  38:   75 07                   jne    41 <main+0x41>
  3a:   31 c0                   xor    %eax,%eax
  3c:   48 83 c4 18             add    $0x18,%rsp
  40:   c3                      retq
  41:   e8 00 00 00 00          callq  46 <main+0x46>

Xoring oneself gives zero.

Text Analysis using Concordance

When analyzing longer text, especially if this text was written by oneself, it helps to read the text in a different way, here using a concordance.

Assume your text is provided as PDF. Convert PDF to text using pdftotext, which part of package poppler. Replace line breaks in text file with spaces using below C program (called linebreak.c):

#include <stdio.h>

int main(int argc, char *argv[]) {
        int c, flag=0;
        FILE *fp;

        if (argc >= 2) {
                if ((fp = fopen(argv[1],"rb")) == NULL)
                        return 1;
        } else {
                fp = stdin;

        while ((c = fgetc(fp)) != EOF) {
                if (c == '\n') {
                        flag += 1;
                        if (flag > 1) { putchar(c); flag = 0; }
                        else putchar(' ');
                } else {
                        flag = 0;

        return 0;

Then generate a list of (single) words with below Perl program:

#!/bin/perl -W
# Print word concordances

use strict;

my (%H,@F);

while (<>) {
        s/\s+$//;       # rtrim
        @F = split;
        foreach my $w (@F) {
                $w =~ s/^\s+//; # ltrim
                $w =~ s/\s+$//; # rtrim
                $H{$w} += 1;

foreach my $w (sort keys %H) {

To print all word pairs replace above loop with

while (<>) {
        s/\s+$//;       # rtrim
        @F = split;
        for(my $i=0; $i<$#F; ++$i) {
                $F[$i] =~ s/^\s+//;     # ltrim
                $F[$i] =~ s/\s+$//;     # rtrim
                $F[$i+1] =~ s/^\s+//;   # ltrim
                $F[$i+1] =~ s/\s+$//;   # rtrim
                $H{$F[$i] . " " . $F[$i+1]} += 1;

Similar, for word triples replace the loop with

while (<>) {
        s/\s+$//;       # rtrim
        @F = split;
        for(my $i=0; $i+1<$#F; ++$i) {
                $F[$i] =~ s/^\s+//;     # ltrim
                $F[$i] =~ s/\s+$//;     # rtrim
                $F[$i+1] =~ s/^\s+//;   # ltrim
                $F[$i+1] =~ s/\s+$//;   # rtrim
                $F[$i+2] =~ s/^\s+//;   # ltrim
                $F[$i+2] =~ s/\s+$//;   # rtrim
                $H{$F[$i] . " " . $F[$i+1] . " " . $F[$i+2]} += 1;

Printing concordances using Perl hashes is very simple, as one can see.

Here is an example from the man-page of expect using below sequence of commands:

( TERM=dumb; man expect ) | linebreak | word3concord | sort -r

Truncated result is

            16  For example, the
            13  example, the following
            12  the current process.
             9  the end of
             8  using Expectk, this
             8  this option is
             8  sent to the
             8  flag causes the
             8  body is executed
             8  Expectk, this option
             8  (When using Expectk,
             7  to the current
             7  the spawn id
             7  the most recent
             7  the current process
             7  the corresponding body
             7  option is specified
             7  is specified as
             7  corresponding body is
             7  by Don Libes,
             7  be used to
             6  set for the
             6  of the current
             6  is set for
             6  is an alias

Contributing to Hugo Static

The discussion forum for Hugo contains a description: Hugo development – how to contribute code. Also see Contributing to Hugo.

1. Preparation

First set GOPATH as

export GOPATH=$HOME/tmp/H



Fetch source with go get

time go get -u -v github.com/spf13/hugo

takes around 1-2 minutes as it has to download almost 200MB.

Now change to the Hugo source code and compile

cd src/github.com/spf13/hugo/
time make hugo

Compilation from scratch takes roughly 1-2 minutes. Recompiling a single file usually takes less than 10 seconds.

In the same directory, run test-cases with

time make check

which takes less than a minute.

All timings are on an AMD FX(tm)-8120 Eight-Core Processor clocked with 3.1 GHz running Linux 4.11.3, and using Go 1.8.3.

2. Fork in Github, git branch and pull-request

Fork https://github.com/spf13/hugo by pressing the “Fork” icon:

Move original Git repository out of your way, clone the new fork, add or modify files as required, add, and commit them:

cd $GOPATH/src/github.com/spf13/
mv hugo hugo.original
time git clone git@github.com:eklausme/hugo.git

cd hugo
git branch YOURNAME
git checkout YOURNAME

go fmt
git add YOURFILE
git commit

A git clone of hugo alone takes less than 10 seconds. Watch out to run go fmt before git add.

Contributors are asked to provide single commits. In case you have multiple, then squash them into one, i.e., git rebase -i and git push -f.

Finally press the pull-request button in Github:

Be prepared to wait weeks or even months before your pull-request will be accepted or even rejected, so patience is required.