Dramatic Faster Sorting in Linux Using Nsort

Last year I used a drop-in replacement for the ordinary Linux sort command called nsort from Ordinal Technology. Ordinal’s nsort is free but not open-source. One thing is clear, however, it is very fast. nsort was written by Chris Nyberg.

The motivation for looking for a faster sort was as follows. I had to drop all duplicate records from a single Oracle database table. The table had more than 800 million records. It was later found out, i.e., after I already had the solution, that from the initial number of records only 3% of the records would remain, i.e., 97% of the records were indeed duplicates. The solution basically was to extract all data from the table with a small C program. The extracted data was then sorted (sort -u), the result then loaded into the database table again.

Using nsort instead of plain sort runtime was one-third. In my case overall runtime went down from 60 minutes to 20 minutes.

Nsort user guide is the very readable user’s guide to nsort.

Benchmarks involving nsort can be found at sortbenchmark.org.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s