Last year I used a drop-in replacement for the ordinary Linux
sort command called
nsort from Ordinal Technology. Ordinal’s
nsort is free but not open-source. One thing is clear, however, it is very fast.
nsort was written by Chris Nyberg.
The motivation for looking for a faster sort was as follows. I had to drop all duplicate records from a single Oracle database table. The table had more than 800 million records. It was later found out, i.e., after I already had the solution, that from the initial number of records only 3% of the records would remain, i.e., 97% of the records were indeed duplicates. The solution basically was to extract all data from the table with a small C program. The extracted data was then sorted (
sort -u), the result then loaded into the database table again.
nsort instead of plain
sort runtime was one-third. In my case overall runtime went down from 60 minutes to 20 minutes.
Nsort user guide is the very readable user’s guide to
nsort can be found at sortbenchmark.org.