Rewriting Perl to plain C

Perl script was running too slow. Rewriting it in C made it 20 times faster.

1. Problem statement. Analyze call-tree dependency in COBOL programs. There are 77 million lines of COBOL code in ca. 30,000 files. These 30,000 COBOL programs could potentially include 74,000 COPY-books comprising 10 million lines of additional code. COBOL COPY-books are analogous to C header-files. So in total there are around 88 million lines of COBOL code. Just for comparison: the Linux kernel has ca. 20 million lines of code.

COBOL program analysis started with a simple Perl script. This Perl script is less than 200 lines, including comments. This script produced the desired dependency information.

Wading through all this COBOL code took up to 25 minutes in serial mode, and 13 minutes using 4 cores on an HP EliteBook notebook using Intel Skylake i7-6600U clocked 2.8 GHz. It took 36 minutes on an AMD FX-8120 clocked with 3.1 GHz. This execution time was deemed too long to see any changes in the output changing something in the Perl script. All runs are on Arch Linux 4.14.11-1 SMP PREEMPT.

2. Result. Rewriting the Perl script in C resulted in a speed improvement of factor 20 when run in serial mode, i.e., run times are now 110s on one core. It runs in 32s when using 8 cores on an AMD FX-8120. C program uses taylormade hashing routines.
Continue reading

Advertisements

Possible Enhancements to J-Pilot

Here are some thoughts about possible enhancements for J-Pilot.

  1. Convert pdb’s and pc3’s to sqlite. This makes it easier to analyze data according some criteria, e.g., find how many addresses have the same telephone numbers, how may entries in datebook contain the same substring, etc.
  2. Convert and transform pdb’s and pc3’s to Google GData. The Google id, which is returned after transmitting data, is then possibly stored in pdb/pc3.
  3. Use mmap() instead of all its fread() and malloc()‘s inside pilot-link and J-Pilot.
  4. When J-Pilot searches for strings in the case-insensitive case, then it copies all elements and uses malloc() for each element. Instead, just use a home-brewed strstr() which takes care of case.
  5. Provide ncurses interfaces instead of Gtk. See for example calcurse.