Parallelization and CPU Cache Overflow

In the post Rewriting Perl to plain C the runtime of the serial runs were reported. As expected the C program was a lot faster than the Perl script. Now running programs in parallel showed two unexpected behaviours: (1) more parallelizations can degrade runtime, and (2) running unoptimized programs can be faster.

See also CPU Usage Time Is Dependant on Load.

In the following we use the C program siriusDynCall and the Perl script siriusDynUpro which was described in above mentioned post. The program or scripts reads roughly 3GB of data. Before starting the program or script all this data has been already read into memory by using something like wc or grep.

1. AMD Processor. Running 8 parallel instances, s=size=8, p=partition=1(1)8:

for i in 1 2 3 4 5 6 7 8; do time siriusDynCall -p$i -s8 * > ../resultCp$i & done
        real 50.85s
        user 50.01s
        sys 0

Merging the results with the sort command takes a negligible amount of time

sort -m -t, -k3.1 resultCp* > resultCmerged

Best results are obtained when running just s=4 instances in parallel:

$ for i in 1 2 3 4 ; do /bin/time -p siriusDynCall -p$i -s4 * > ../dyn4413c1p$i & done
        real 33.68
        user 32.48
        sys 1.18

Continue reading


Operation Costs Measured in CPU Clock Cycles

I have written on the merits of 64-bit vs 32-bit, or on the AMD Bulldozer CPU Architecture Overview. An article in gave a very good overview about the relative performance of various CPU operations. Below is the graphic:

Not all CPU operations are created equal

Also see the often quoted Agner Fog: Instruction Tables.

Running CPU/GPU Intensive Jobs on Titan Supercomputer

There is a an INCITE program (HPC Call for Proposals), where one can apply for CPU/GPU intensive jobs, the link is INCITE.

From the FAQ: The INCITE program is open to US- and non-US-based researchers and research organizations needing large allocations of computer time, supporting resources, and data storage to pursue transformational advances in science and engineering.

The machines in question: Mira and Titan.