Performance Comparison of mmap() versus read() versus fread()

I recently read in Computers are *fast*! by Julia Evans about a comparison between fread() and mmap() suggesting that both calls deliver roughly the same performance. Unfortunately the codes mentioned there and referenced in bytesum.c for fread() and bytesum_mmap.c for mmap() do not really compare the same thing. The first adds size_t, the second adds up uint8_t. My computer showed that these programs do behave differently and therefore give different performance.

I reprogrammed the comparison adding read() to fread() and mmap(). The code is in GitHub. Compile with

cc -Wall -O3 tbytesum1.c -o tbytesum1

For this program the results are as follows:

Continue reading


Speeding-Up Software Builds: Parallelizing Make and Compiler Cache

1. Problem statement

Compiling source code with a compiler usually employs the make command which keeps track of dependencies. Additionally GNU make can parallelize your build using the j-parameter. Often you also want a so called clean build, i.e., compile all source code files, just in case make missed some files when recompiling. Instead of deleting all previous effort one can use a cache of previous compilations.

I had two questions where I wanted quantitative answers:

  1. What is the best j for parallel make, i.e., how many parallel make’s should one fire?
  2. What effect does a compiler cache have?

Continue reading

Effect of Optimizer in gcc on Intel/AMD and Power8

What effect can the optimizer have for gcc?

On Intel/AMD I ran my intpoly program (with -n0) once with and once without optimizer. It showed a speed-up of about 3.

  1. no optimizer: 7.84s
  2. -O3: 2.25s

gcc for Intel/AMD is version 4.8.2.

On Power8 I again ran intpoly (with -n0). The factor is more than 8 (eight).
Continue reading

Georg Hager’s Blog: Intel vs. GCC for the OpenMP vector triad: Barrier shootout!

Georg Hager’s Blog posted an illustrative article on icc versus g++ performance w.r.t. OpenMP. Dr. Georg Hager is one of the authors of Introduction to High Performance Computing for Scientists and Engineers.

Measurement of

double precision, dimension(N) :: a,b,c,d
! initialization etc. omitted
s = walltime()
!$omp parallel private(R,i)
do R=1,NITER
!$omp do
  do i=1,N
    a(i) = b(i) + c(i) * d(i)
!$omp end do
!$omp end parallel
MFlops = R*N/(e-s)/1.e6


icc versus g++

Very simple SHA1 test program written in C

Here is a simple test program to call SHA1 hashing routine from OpenSSL.

#include <stdio.h>
#include <string.h>
#include <openssl/sha.h>

int main (int argc, char *argv[]) {
        unsigned long i, n;
        unsigned char md[1024];

        if (argc <= 1) return 0;

        n = strlen(argv[1]);
        SHA1((unsigned char*)argv[1],n,md);

        for (i=0; i<SHA_DIGEST_LENGTH; ++i)

        return 0;

Compile with

cc -Wall sha1tst.c -o sha1tst -lcrypto

It is important to give the -l flag after -o.

Some tests:

$ ./sha1tst ABCabc
$ printf "ABCabc" | sha1sum
135488ccc0c5e5a3d0ac437aac1821bba9347b3d  -

In Ubuntu the openssl development libraries are in libssl-dev.