GCC 6.1 Compiler Optimization Level Benchmarks

In Effect of Optimizer in gcc on Intel/AMD and Power8 I measured speed ratios between optimized and non-optimized C code of three on Intel/AMD, and eight on Power8 (PowerPC) for integer calculations. For floating-point calculations the factors were two and three, respectively.

Michael Larabel in GCC 6.1 Compiler Optimization Level Benchmarks: -O0 To -Ofast + FLTO measured various optimization flags of the newest GCC.

For a Poisson solver the speed ratio between optimized and non-optimized code was five.

HimenoBenchmarkGCC61

Convert ASCII to Hex and vice versa in C and Excel VBA

In Downloading Binary Data, for example Boost C++ Library I already complained about some company policies regarding the transfer of binary data. If the openssl command is available on the receiving end, then things are pretty straightforard as the aforementioned link shows, in particular you then have Base64 encoding. If that is not the case but you have a C compiler, or at least Excel, then you can work around it.

C program ascii2hex.c converts from arbitrary data to hex, and vice versa. Excel VBA (Visual Basic for Applications) ascii2hex.xls converts from hex to arbitrary data.

To convert from arbitrary data to a hex representation

ascii2hex -h yourBinary outputInHex

Back from hex to ASCII:

ascii2hex -a inHex outputInBinary

Continue reading

Performance Comparison C vs. Lua vs. LuaJIT vs. Java

Ico Doornekamp on 20-Dec-2011 asked why a C version of a Lua program ran more slowly than the Lua program. The mentioned discrepancy cannot be reproduced, neither on an AMD FX-8120, nor an Intel i5-4250U processor. Generally a C version program is expected to be faster than a Lua program.

Here is the Lua program called lua_perf.lua:

local N = 4000
local S = 1000

local t = {}

for i = 0, N do
        t[i] = {
                a = 0,
                b = 1,
                f = i * 0.25
        }
end

for j = 0, S-1 do
        for i = 0, N-1 do
                t[i].a = t[i].a + t[i].b * t[i].f
                t[i].b = t[i].b - t[i].a * t[i].f
        end
        print(string.format("%.6f", t[1].a))
end

It computes values for a circle.
lua_perf

Continue reading

Performance Comparison of mmap() versus read() versus fread()

I recently read in Computers are *fast*! by Julia Evans about a comparison between fread() and mmap() suggesting that both calls deliver roughly the same performance. Unfortunately the codes mentioned there and referenced in bytesum.c for fread() and bytesum_mmap.c for mmap() do not really compare the same thing. The first adds size_t, the second adds up uint8_t. My computer showed that these programs do behave differently and therefore give different performance.

I reprogrammed the comparison adding read() to fread() and mmap(). The code is in GitHub. Compile with

cc -Wall -O3 tbytesum1.c -o tbytesum1

For this program the results are as follows:

Continue reading

Speeding-Up Software Builds: Parallelizing Make and Compiler Cache

1. Problem statement

Compiling source code with a compiler usually employs the make command which keeps track of dependencies. Additionally GNU make can parallelize your build using the j-parameter. Often you also want a so called clean build, i.e., compile all source code files, just in case make missed some files when recompiling. Instead of deleting all previous effort one can use a cache of previous compilations.

I had two questions where I wanted quantitative answers:

  1. What is the best j for parallel make, i.e., how many parallel make’s should one fire?
  2. What effect does a compiler cache have?

Continue reading