Performance of the Java sorting algorithm
Generally, Java's standard sorting method (Arrays.sort() and its derivative Collections.sort()) does
what you would expect from a library routine: it is general-purpose,
correct, well-tested and provides
adequate performance in many cases. The purpose of this page is therefore not
to advocate that everyone goes off and writes their own sorting routine. However, sometimes it is useful
to know what performance you can expect from the library routine, and what the few specialised cases are
where it's worth replacing it.
Java's sorting algorithm
The sorting algorithm used (in Sun's implementation at least) is a version of mergesort1.
Mergesort is based on a "divide and conquer" strategy: we recursively divide the list into two halves and sort
each half, then merge the two sorted halves back together. As we break the data down into smaller and smaller
parts, we eventually reach a critical size where it's not worth dividing any more, and we sort in situ.
Java uses an insertion sort for these 'atomic' sorts. Together, this gives Java's sort routine the following
characteristics:
- The sort is stable: i.e. if two elements are equal, their order won't be
swapped after sorting (with some other algorithms, the order of equal elements is effectively random).
- Because of the requirement to merge lists together into a new list, the algorithm
operates on a copy of the data: when you call Arrays.sort(), the first thing
the method does is take a clone of the array.
Taking a clone of the array means taking a copy of the list of references,
not of the actual data. The cost is generally negligible for large-ish arrays, but relatively
considerable for sorting very small sets of data.
- The underlying insertion sort has a worst case where the input data is in reverse order; thus
in principle, it is possible to hit an unlucky case consisting of a series of sub-lists in reverse order.
- In terms of recursion, this hybrid algorithm improves on some "textbook" cases of recursing right
down until the sublist size is 1 or 2 (but at the expense of a less constant sort time as mentioned
in the previous point).
- In most practical cases, doubling the size of the list more or less doubles2
the time taken to sort.
You might have read the last point and thought "well, duh!". But it turns out that
more-or-less doubling the sort time as the data size is doubled is actually good going. Various
less efficient sorting algorithms (including the insertion sort on its own) don't manage this, or
are more likely to hit "worse cases", where the time effectively quadruples as the
list size doubles.
Sort performance in practice
Holding this in our head, on the next page we look at some observed measurements of
performance of Java's sorting algorithm.
1. I understand that a different algorithm will be used in Java 7. Even more reason not
to rewrite your code on the basis of the particular algorithm used in today's version of
the library.
2. As we'll see, mergesort is usually analysed as an O n log n sort. In practical
terms, this means that when we double the number of elements, the sort time is actually
"slightly worse than double" (multiplying by 2.2 gives a good guide in many cases).
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants.
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.