In the post Rewriting Perl to plain C the runtime of the serial runs were reported. As expected the C program was a lot faster than the Perl script. Now running programs in parallel showed two unexpected behaviours: (1) more parallelizations can degrade runtime, and (2) running unoptimized programs can be faster.
See also CPU Usage Time Is Dependant on Load.
In the following we use the C program siriusDynCall
and the Perl script siriusDynUpro
which was described in above mentioned post. The program or scripts reads roughly 3GB of data. Before starting the program or script all this data has been already read into memory by using something like wc
or grep
.
1. AMD Processor. Running 8 parallel instances, s=size=8, p=partition=1(1)8:
for i in 1 2 3 4 5 6 7 8; do time siriusDynCall -p$i -s8 * > ../resultCp$i & done real 50.85s user 50.01s sys 0
Merging the results with the sort
command takes a negligible amount of time
sort -m -t, -k3.1 resultCp* > resultCmerged
Best results are obtained when running just s=4 instances in parallel:
$ for i in 1 2 3 4 ; do /bin/time -p siriusDynCall -p$i -s4 * > ../dyn4413c1p$i & done real 33.68 user 32.48 sys 1.18