Calling C from Julia

Two ways to compute the error function or Bessel function in Julia.

1. Calling C. On UNIX libm provides erf() and j0(). So calling them goes like this:


In this case one can omit the reference to Watch out for the funny looking (Float64,).

2. Using Julia. SpecialFunctions.jl provides erf() and besselj0.

import Pkg
import SpecialFunctions

Splitting and anti-merging vCard files

Sometimes vCard files need to be split into smaller files, or the file needs to be protected against merging in another application.

1. Splitting. Below Perl script splits the input file into as many files as required. Output files are named adr1.vcf, adr2.vcf, etc. You can pass a command line argument “-n” to specify the number of card records per file. Splitting a vCard file is provided in palmadrsplit on GitHub:

use Getopt::Std;

my %opts;
my ($i,$k,$n) = (1,0,950);
$n = ( defined($opts{'n'}) ? $opts{'n'} : 950 );

open(F,">adr.$i.vcf") || die("Cannot open adr.$i.vcf for writing");
while (<>) {
        if (/BEGIN:VCARD/) {
                if (++$k % $n == 0) {   # next address record
                        close(F) || die("Cannot close adr.$i.vcf");
                        ++$i;   # next file number
                        open(F,">adr.$i.vcf") || die("Cannot open adr.$i.vcf for writing");
        print F $_;
close(F) || die("Cannot close adr.$i.vcf");

This is required for Google Contacts, as Google does not allow to import more than 1,000 records per day, see Quotas for Google Services.

2. Anti-Merge. Inhibiting annoying merging is given in file palmantimerge on GitHub. Overall logic is as follows: Read entire vCard file and each card, delimited by BEGIN:VCARD and END:VCARD, is put on a hashmap. Each hashmap entry is a list of vCards. Hash key is the N: entry, i.e., the concatentation of lastname and firstname. Once everything is hashed, then walk through hash. Those hash entries, where the list contains just one entry, can be output as is. Where the list contains more than one entry, then these entries would otherwise be merged, and then the N: part is modified by using the ORG: field.

use strict;
my @singleCard = ();    # all info between BEGIN:VCARD and END:VCARD
my ($name) = "";        # N: part, i.e., lastname semicolon firstname
my ($clashes,$line,$org) = (0,"","");
my %allCards = {};      # each entry is list of single cards belonging to same first and lastname, so hash of array of array

while (<>) {
        if (/BEGIN:VCARD/) {
                ($name,@singleCard) = ("", ());
                push @singleCard, $_;
        } elsif (/END:VCARD/) {
                push @singleCard, $_;
                push @{ $allCards{$name} }, [ @singleCard ];
        } else {
                push @singleCard, $_;
                $name = $_ if (/^N:/);

for $name (keys %allCards) {
        $clashes = $#{$allCards{$name}};
        for my $sglCrd (@{$allCards{$name}}) {
                if ($clashes == 0) {
                        for $line (@{$sglCrd}) { print $line; }
                } else {
                        $org = "";
                        for $line (@{$sglCrd}) {
                                $org = $1 if ($line =~ /^ORG:([ \-\+\w]+)/);
                        for $line (@{$sglCrd}) {
                                $line =~ s/;/ \/${org}\/;/ if ($line =~ /^N:/);
                                print $line;

Every lastname is appended with “/organization/” if the combination of firstname and lastname is not unique. For example, two records with Peter Miller in ABC-Corp and XYZ-Corp, will be written as N:Miller /ABC-Corp/;Peter and N:Miller /XYZ-Corp/;Peter.

This way Simple Mobile Tools Contacts will not merge records together which it shouldn’t. Issue #446 for this is on GitHub.

Performance Comparison Pallene vs. Lua 5.1, 5.2, 5.3, 5.4 vs. C

Installing Pallene is described in the previous post: Installing Pallene Compiler. In this post we test the performance of Pallene versus C, Lua 5.4, and LuaJIT.

1. Array Access. I checked a similar program as in Performance Comparison C vs. Lua vs. LuaJIT vs. Java.

function lua_perf(N:integer, S:integer)
        local t:{ {a:float, b:float, f:float} } = {}

        for i = 1, N do
                t[i] = {
                        a = 0.0,
                        b = 1.0,
                        f = i * 0.25

        for j = 1, S-1 do
                for i = 1, N-1 do
                        t[i].a = t[i].a + t[i].b * t[i].f
                        t[i].b = t[i].b - t[i].a * t[i].f
                --io_write( t[1].a )

This program, which does no I/O at all, runs in 0.14s, and therefore runs two times slower than the LuaJIT, which finishes in 0.07s. This clearly is somewhat disappointing. Lua 5.4, as part of Pallene, needs 0.75s. So Pallene is roughly five times faster than Lua.
Continue reading

Installing Pallene Compiler

Pallene is a Lua based language. In contrast to Lua, which is untyped, Pallene is typed. A good paper on Pallene is “Pallene: A companion language for Lua”, by Hugo Musso Gualandi, and Roberto Ierusalimschy.

From above paper:

The compiler itself is quite conventional. After a standard parsing step, it converts the program to a high-level intermediate form and from that it emits C code, which is then fed into a C compiler such as gcc.

From “A gradually typed subset of a scripting language can be simple and efficient”:

Pallene was designed for performance, and one fundamental part of that is that its compiler generates efficient machine code. To simplify the implementation, and for portability, Pallene generates C source code instead of directly generating assembly language.

So, very generally, this idea is similar to f2c (Fortran to C), cobc (Cobol compiler), or Lush (Lisp Universal SHell).

The whole Pallene compiler is implemented in less than 7 kLines of Lua, and less than 1 kLines of C source code for the runtime.

To install Pallene compiler you need git, gcc, lua, and luarocks. Description is for Linux. MacOS is very similar.

1. Source. Fetch source code via git clone.

$ git clone 
Cloning into 'pallene'...
$ cd pallene

2. Rocks. Fetch required Lua rocks via luarocks command.

$ luarocks install --local --only-deps pallene-dev-1.rockspec
Missing dependencies for pallene dev-1:                                                                                                                 
   lpeglabel >= 1.5.0 (not installed)                                                                                                                   
   inspect >= 3.1.0 (not installed)                                                                                                                     
   argparse >= 0.7.0 (not installed)                                                                                                                    
   luafilesystem >= 1.7.0 (not installed)                                                                                                               
   chronos >= 0.2 (not installed)                                                                                                                       
pallene dev-1 depends on lua ~> 5.3 (5.3-1 provided by VM)                                                                                              
pallene dev-1 depends on lpeglabel >= 1.5.0 (not installed)                                                                                             
lpeglabel 1.6.0-1 depends on lua >= 5.1 (5.3-1 provided by VM)
gcc -O2 -fPIC -I/usr/include -c lpcap.c -o lpcap.o
gcc -O2 -fPIC -I/usr/include -c lpcode.c -o lpcode.o
gcc -O2 -fPIC -I/usr/include -c lpprint.c -o lpprint.o
gcc -O2 -fPIC -I/usr/include -c lptree.c -o lptree.o
gcc -O2 -fPIC -I/usr/include -c lpvm.c -o lpvm.o
gcc -shared -o lpcap.o lpcode.o lpprint.o lptree.o lpvm.o
No existing manifest. Attempting to rebuild...
lpeglabel 1.6.0-1 is now installed in /home/klm/.luarocks (license: MIT/X11) 

pallene dev-1 depends on inspect >= 3.1.0 (not installed)

inspect 3.1.1-0 depends on lua >= 5.1 (5.3-1 provided by VM)
inspect 3.1.1-0 is now installed in /home/klm/.luarocks (license: MIT <>)

pallene dev-1 depends on argparse >= 0.7.0 (not installed)

argparse 0.7.0-1 depends on lua >= 5.1, < 5.4 (5.3-1 provided by VM)
argparse 0.7.0-1 is now installed in /home/klm/.luarocks (license: MIT)

pallene dev-1 depends on luafilesystem >= 1.7.0 (not installed)

luafilesystem 1.8.0-1 depends on lua >= 5.1 (5.3-1 provided by VM)
gcc -O2 -fPIC -I/usr/include -c src/lfs.c -o src/lfs.o
gcc -shared -o src/lfs.o
luafilesystem 1.8.0-1 is now installed in /home/klm/.luarocks (license: MIT/X11)

pallene dev-1 depends on chronos >= 0.2 (not installed)

chronos 0.2-4 depends on lua >= 5.1 (5.3-1 provided by VM)
gcc -O2 -fPIC -I/usr/include -c src/chronos.c -o src/chronos.o -I/usr/include
gcc -shared -o src/chronos.o -L/usr/lib -Wl,-rpath,/usr/lib -lrt
chronos 0.2-4 is now installed in /home/klm/.luarocks (license: MIT/X11)

Stopping after installing dependencies for pallene dev-1

3. Environment variables. Make sure that you source the environment variables given by

luarocks path

For example:

export LUA_PATH='/usr/share/lua/5.3/?.lua;/usr/share/lua/5.3/?/init.lua;/usr/lib/lua/5.3/?.lua;/usr/lib/lua/5.3/?/init.lua;./?.lua;./?/init.lua;/home/klm/.luarocks/share/lua/5.3/?.lua;/home/klm/.luarocks/share/lua/5.3/?/init.lua'
export LUA_CPATH='/usr/lib/lua/5.3/?.so;/usr/lib/lua/5.3/;./?.so;/home/klm/.luarocks/lib/lua/5.3/?.so'
export PATH='/home/klm/.luarocks/bin:/usr/bin:/home/klm/bin:...:.

4. Build Lua and runtime. Build Lua and the Pallene runtime (you are still in the pallene directory):

make linux-readline

Some warnings will show up for Lua, but they can be ignored for now.

5. Run compiler. Now you can run pallenec, provided you still are in the same directory, where you built pallene.

$ ./pallenec   
Usage: pallenec [-h] [--emit-c] [--emit-asm] [--compile-c]
       [--dump {parser,checker,ir,uninitialized,constant_propagation}]

Error: missing argument 'source_file'

6. Run example. Now check one of the examples.

$ pallenec examples/factorial/factorial.pln
$ ./lua/src/lua -l factorial examples/factorial/main.lua 
The factorial of 5 is 120.

The most common error will be to not use the lua/src/lua command from Pallene, but rather the system-wide.

You can compile all examples and benchmarks:

for i in examples/*/*.pln; do pallenec $i; done
for i in benchmark/*/*.pln; do pallenec $i; done

Things to note in Pallene:

  1. Array indexes must start at one
  2. Pallene source code, except type-, record-definition or variable definitions, must be within a function
  3. Pallene offers no goto statement. The goto statement was added in Lua 5.2.

Performance Comparison in Computing Exponential Function

If your computation is dominated by exponential function evaluations, then it makes a significant difference whether you evaluate the exponential function exp() in single precision or in double precision. You can reduce your computing time by roughly 25% when moving from double precision (double) to single precision (float). Evaluation in quadruple precision is more than six times more expensive than evaluation in double precision.

Changing from double precision to single precision also halves the amount of storage needed. On x86_64 Linux float usually occupies 4 bytes, double occupies 8 bytes, and long double needs 16 bytes.

1. Result. Here are the runtime numbers of a test program.

  1. Single precision (float): 2.44s
  2. Double precision (double): 3.32s
  3. Quadruple precision (long double): 22.88s

These numbers are dependant on CPU internal scheduling, see CPU Usage Time Is Dependant on Load.

2. Test program. The test program is essentially as below:

long i, rep=1024, n=65000;
int c, precision='d';
float sf = 0;
double sd = 0;
long double sq = 0;
switch(precision) {
case 'd':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sd += exp(i % 53) - exp((i+1) % 43) - exp((i+2) % 47) - exp((i+3) % 37);
        printf("sd = %f\n",sd);
case 'f':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sf += expf(i % 53) - expf((i+1) % 43) - expf((i+2) % 47) - expf((i+3) % 37);
        printf("sf = %f\n",sf);
case 'q':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sq += expl(i % 53) - expl((i+1) % 43) - expl((i+2) % 47) - expl((i+3) % 37);
        printf("sq = %Lf\n",sq);

Full source code is in GitHub, file in question is called exptst.c.

3. Environment.AMD Bulldozer FX-8120, 3.1 GHz, Arch Linux 5.6.8, gcc version 9.3.0. Compiled the code with -O3 -march=native

J-Pilot Plugin For SQLite Export

In SQL Datamodel For J-Pilot I described the SQLite datamodel. I wrote a J-Pilot plugin which can export the below entities and write them to an SQLite database file. The direction is one-way: from J-Pilot to SQLite.

  1. Address
  2. Datebook
  3. Memo
  4. To-Do
  5. Expense
  6. Various categories for above entities

Adding more entities is pretty easy. For example, if people need the Calendar Palm database exported, this can be implemented quickly. We use the usual SQLite API with sqlite3_exec(), and sqlite3_prepare(), sqlite3_bind(), sqlite3_step(), and finally sqlite3_finalize().

The general mechanics of a J-Pilot plugin are described by Judd Montgomery, the author of J-Pilot, in this document. I took the Expense/expense.c source code from the Expense plugin as a guide.

The plugin provides the following functionality:

  1. Create new database from scratch, it is called jptables.db
  2. Export above mentioned entities
  3. In debug mode you can use J-Pilot‘s search to search in the SQLite database

If you call jpilot -d then debug-mode is activated.


  1. Compile single source code file jpsqlite.c
  2. Copy library (.so file) in plugin directory ($HOME/.jpilot/plugins)
  3. Copy datamodel SQL file jptables.sql in plugin directory

Compilation is with below command:

gcc `pkg-config -cflags-only-I gtk+-2.0` -I <J-Pilot src dir> -s -fPIC -shared jpsqlite.c -o -lsqlite3

For this to work you need the Pilot-Link header files and the J-Pilot (AUR) source code at hand.

Running the plugin: go to the plugins menu by main-menu selection or function key (F7 in my case), then press SQL button. All previous data is completey erased in the database, then all data is written to database within a single transaction.

In debug mode and in debug mode only, the J-Pilot search also searches through all entities in the SQLite database.

The long-term goal is that SQLite is the internal data structure for J-Pilot, thereby abandoning the binary files entirely.

java.sql.SQLRecoverableException: IO Error: Connection reset by peer, Authentication lapse

I encountered the following error, when I wanted to connect to Oracle v12. database with Java 1.8.0_192-b26:

java.sql.SQLRecoverableException: IO Error: Connection reset by peer, Authentication lapse 321631 ms.

This was unexpected as the same program did run absolutely fine on another Linux machine. Program in question is

import java.sql.Connection;
import java.sql.SQLException;

import oracle.jdbc.pool.OracleDataSource;

public class OraSample1 {

        public static void main (String argv[]) {
                try {
                        OracleDataSource ds = new OracleDataSource();
                        Connection conn=ds.getConnection("c##klm","klmOpRisk");
                } catch (SQLException e) {


Solution: Add the following property setting to command line

java OraSample1

Also see “java.sql.SQLException: I/O Error: Connection reset” in linux server [duplicate] on Stackoverflow.

Passing HashMap from Java to Java Nashorn

Java Nashorn is the JavaScript engine shipped since Java 8. You can therefore use JavaScript wherever you have at least Java 8. Java 8 also has a standalone interpreter, called jjs.

It is possible to create a Java HashMap and use this structure directly in JavaScript. Here is the code:

import java.util.*;
import javax.script.*;

public class HashMapDemo {

        public static void main(String[] args) {
                HashMap hm = new HashMap();

                hm.put("A", new Double(3434.34));
                hm.put("B", new Double(123.22));
                hm.put("C", new Double(1200.34));
                hm.put("D", new Double(99.34));
                hm.put("E", new Double(-19.34));

                for( String name: hm.keySet() )
                        System.out.println(name + ": "+ hm.get(name));

                // Increase A's balance by 1000
                double balance = ((Double)hm.get("A")).doubleValue();
                hm.put("A", new Double(balance + 1000));
                System.out.println("A's new account balance : " + hm.get("A"));

                // Call JavaScript from Java
                try {   
                        ScriptEngine engine = new ScriptEngineManager().getEngineByName("nashorn");
                        engine.eval("print('Hello World');");
                        engine.eval(new FileReader("example.js"));
                        Invocable invocable = (Invocable) engine;
                        Object result = invocable.invokeFunction("sayHello", "John Doe");

                        result = invocable.invokeFunction("prtHash", hm);
                } catch (FileNotFoundException | NoSuchMethodException | ScriptException e) {


And here is the corresponding JavaScript file example.js:

var sayHello = function(name) {
        print('Hello, ' + name + '!');
        return 'hello from javascript';

var prtHash = function(h) {
        print('h.A = ' + h.A);
        print('h.B = ' + h["B"]);
        print('h.C = ' + h.C);
        print('h.D = ' + h["D"]);
        print('h.E = ' + h.E);

Output is:

$ java HashMapDemo
A: 3434.34
B: 123.22
C: 1200.34
D: 99.34
E: -19.34
A's new account balance : 4434.34
Hello World
Hello, John Doe!
hello from javascript
class java.lang.String
h.A = 4434.34
h.B = 123.22
h.C = 1200.34
h.D = 99.34
h.E = -19.34

Above example uses sample code from

  1. Riding the Nashorn: Programming JavaScript on the JVM
  2. Simple example for Java HashMap
  3. Nashorn: Run JavaScript on the JVM

Decisive was the statement in

Java objects can be passed without loosing any type information on the javascript side. Since the script runs natively on the JVM we can utilize the full power of the Java API or external libraries on nashorn.

Above program works the same if one changes HashMap to HashMap and populating accordingly, e.g.:

                HashMap hm = new HashMap();

                hm.put("A", new Double(3434.34));
                hm.put("B", new String("Test"));
                hm.put("C", new Date(5000));
                hm.put("D", new Integer(99));
                hm.put("E", new Boolean(Boolean.TRUE));

Output from JavaScript would be

h.A = 4434.34
h.B = Test
h.C = Thu Jan 01 01:00:05 CET 1970
h.D = 99
h.E = true

Entries changed in JavaScript can be returned back to Java. Assume JavaScript program changes values:

var prtHash = function(h,hret) {
        hret.U = 57;
        hret.V = "Some text";
        hret.W = false;

Then these changed arguments can be used back in Java program:

HashMap hret = new HashMap();

result = invocable.invokeFunction("prtHash", hm, hret);
System.out.println("hret.U = " + hret.get("U"));
System.out.println("hret.V = " + hret.get("V"));
System.out.println("hret.W = " + hret.get("W"));

Output is then

hret.U = 57
hret.V = Some text
hret.W = false

Using Scooter Software Beyond Compare

Beyond Compare is a graphical file comparison tool sold by Scooter Software. Its open-source competitors are mainly vimdiff, and kdiff3. Its advantage is ease-of-use. While comparing files they can be edited instantly. You can diff complete directory trees.

It is written in Delphi Object Pascal, the source code is not open-source. It runs on Windows, x86 Linux, and OS X. It does not run on ARM, like Raspberry Pi or Odroid, see support for arm processors – like the raspberry pi. The “Standard Edition” costs $30, the “Pro Edition” costs $60. The software is in AUR.

1. Root User Problem. When using it as root-user you must use:


When running

DIFFPROG=bcompare pacdiff

the screen looks like this:

2. Git Usage. To use Beyond Compare with git difftool you have to do two things: First you must create an alias bc3 for bcompare.

[root /bin]# ln -s bcompare bc3

Second add the following lines to your ~/.gitconfig file:

        tool = bc3
        prompt = false
        bc3 = trustExitCode
        tool = bc3
        bc3 = trustExitCode

Alternatively to above changes in the ~/.gitconfig file, use the following commands:

git config --global diff.tool bc3
git config --global difftool.bc3.trustExitCode true
git config --global merge.tool bc3
git config --global mergetool.bc3.trustExitCode true

Towards web-based delta synchronization for cloud storage systems

Very interesting article.

Some remarkable excerpts:

To isolate performance issues to the JavaScript VM, the authors rebuilt the client side of WebRsync using the Chrome native client support and C++. It’s much faster.

Replacing MD5 with SipHash reduces computation complexity by almost 5x. As a fail-safe mechanism in case of hash collisions, WebRsync+ also uses a lightweight full content hash check. If this check fails then the sync will be re-started using MD5 chunk fingerprinting instead.

The client side of WebR2sync+ is 1700 lines of JavaScript. The server side is based on node.js (about 500 loc) and a set of C processing modules (a further 1000 loc).

the morning paper

Towards web-based delta synchronization for cloud storage systems Xiao et al., FAST’18

If you use Dropbox (or an equivalent service) to synchronise file between your Mac or PC and the cloud, then it uses an efficient delta-sync (rsync) protocol to only upload the parts of a file that have changed. If you use a web interface to synchronise the same files though, the entire file will be uploaded. This situation seems to hold across a wide range of popular services:

Given the universal presence of the web browser, why can’t we have efficient delta syncing for web clients? That’s the question Xiao et al. set out to investigate: they built an rsync implementation for the web, and found out it performed terribly. Having tried everything to improve the performance within the original rsync design parameters, then they resorted to a redesign which moved more of the heavy lifting back to…

View original post 728 more words

Unix Command comm: Compare Two Files

One lesser known Unix command is comm. This command is far less known than diff. comm needs two already sorted files FILE1 and FILE2. With the options

  • -1 suppress column 1 (lines unique to FILE1)
  • -2 suppress column 2 (lines unique to FILE2)
  • -3 suppress column 3 (lines that appear in both files)

For example, comm -12 F1 F2 prints all common lines in files F1 and F2.

I thought that comm had a bug, so I wrote a short Perl script to simulate the behaviour of comm. Of course, there was no bug, I just missed to notice that the records in the two files did not match due to white space.

#!/bin/perl -W
use strict;

use Getopt::Std;
my %opts = ('d' => 0, 's' => 0);
my $debug = ($opts{'d'} != 0);
my $member = defined($opts{'s'}) ? $opts{'s'} : 0;

my ($set,$prev) = (1,"");
my %H;

while (<>) {
        $prev = $ARGV if ($prev eq "");
        if ($ARGV ne $prev) {
                $set *= 2;
                $prev = $ARGV;
        $H{$_} |= $set;
        printf("\t>>\t%s: %s -> %d\n",$ARGV,$_,$H{$_}) if ($debug);

$member = 2*$set - 1 if ($member == 0);
printf("\t>>\tmember = %d\n",$member) if ($debug);
for my $i (sort keys %H) {
        printf("%s\n",$i) if ($H{$i} == $member);

Above Perl scripts does not need sorted input files, as it stores all records of the files in memory, in a hash. It uses a bitmask as a set. For example, mycomm -s2 F1 F2 prints only those records, which are only in file F2 but not in F1.

Parallelization and CPU Cache Overflow

In the post Rewriting Perl to plain C the runtime of the serial runs were reported. As expected the C program was a lot faster than the Perl script. Now running programs in parallel showed two unexpected behaviours: (1) more parallelizations can degrade runtime, and (2) running unoptimized programs can be faster.

See also CPU Usage Time Is Dependant on Load.

In the following we use the C program siriusDynCall and the Perl script siriusDynUpro which was described in above mentioned post. The program or scripts reads roughly 3GB of data. Before starting the program or script all this data has been already read into memory by using something like wc or grep.

1. AMD Processor. Running 8 parallel instances, s=size=8, p=partition=1(1)8:

for i in 1 2 3 4 5 6 7 8; do time siriusDynCall -p$i -s8 * > ../resultCp$i & done
real 50.85s
user 50.01s
sys 0

Merging the results with the sort command takes a negligible amount of time

sort -m -t, -k3.1 resultCp* > resultCmerged

Best results are obtained when running just s=4 instances in parallel:

$ for i in 1 2 3 4 ; do /bin/time -p siriusDynCall -p$i -s4 * > ../dyn4413c1p$i & done
real 33.68
user 32.48
sys 1.18

Continue reading