Rewriting Perl to plain C

Perl script was running too slow. Rewriting it in C made it 20 times faster.

1. Problem statement. Analyze call-tree dependency in COBOL programs. There are 77 million lines of COBOL code in ca. 30,000 files. These 30,000 COBOL programs could potentially include 74,000 COPY-books comprising 10 million lines of additional code. COBOL COPY-books are analogous to C header-files. So in total there are around 88 million lines of COBOL code. Just for comparison: the Linux kernel has ca. 20 million lines of code.

COBOL program analysis started with a simple Perl script. This Perl script is less than 200 lines, including comments. This script produced the desired dependency information.

Wading through all this COBOL code took up to 25 minutes in serial mode, and 13 minutes using 4 cores on an HP EliteBook notebook using Intel Skylake i7-6600U clocked 2.8 GHz. It took 36 minutes on an AMD FX-8120 clocked with 3.1 GHz. This execution time was deemed too long to see any changes in the output changing something in the Perl script. All runs are on Arch Linux 4.14.11-1 SMP PREEMPT.

2. Result. Rewriting the Perl script in C resulted in a speed improvement of factor 20 when run in serial mode, i.e., run times are now 110s on one core. It runs in 32s when using 8 cores on an AMD FX-8120. C program uses taylormade hashing routines.
Continue reading


Hashing Just Random Numbers

My recent project had some issues with hashing some 10 million numbers. To analyze the matter I wrote a small test program, see numberhash.c.

I wanted to know which influence the following factors play:

  1. hashing just numbers (no alphabetic characters)
  3. choice of hash function
  4. load factor
  5. distribution of collisions

Continue reading

Memory Limitations with IBM Enterprise COBOL Compiler

Recently I learned the hard way that IBM Enterprise COBOL compiler cannot generate 8-byte long POINTER variables, but only 4-byte pointers, meaning, that you cannot use more than 2GB in COBOL on a mainframe. I.e., you cannot make use of AMODE=64 with COBOL on the mainframe. You can run in AMODE=64, but you cannot exploit it. BTW, we have the year 2013. So, no big-data on mainframe with COBOL.

IBM Assembler and C/C++ can fully exploit AMODE=64, i.e., can use 8-byte long pointers.

Mainframe Rehosting: Cost Reduction, Hardware Sizing, Tools, and Methodology

Mainframe rehosting is about replacing the whole mainframe with one or multiple Linux boxes, or at times, move portions of the application landscape from the mainframe to Linux. Thereby you basically keep many of the hitherto used development- and runtime-environment, like COBOL, DB2, IMS, CICS, etc. The goal in mainframe rehosting is to dramatically reduce costs. If you are dauntless you can also move to Windows.

The point is that you do not rewrite your applications but rather just move your applications, i.e., you recompile your applications on new hardware. So all accumulated experiences with the software is fully preserved.

In 2008 I started writing a paper on mainframe rehosting, see Mainframe Rehosting.

During 2008-2011 the paper has been revised somewhat. The paper is in German.