# Quicksort in Haskell

The quicksort algorithm is a beautiful algorithm that has been used in introductions to the elegance and simplicity of functional programming. Compared with the implementation in C/C++, the Haskell code below is short and looks elegant:

```
quicksort :: Ord a => [a] -> [a]
quicksort [] = []
quicksort (x:xs) = qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
```

However, this much beloved and fairly typical example is not the genuine quicksort algorithm!

## Where it sucks ?

We can gain the idea that there may be a critical loss of perfermance behind code seems straightforward from the paper
*The Genuine Sieve of Eratosthenes* by Melissa E. O’Neill. Let’s do an analysis of complexity of the code above, both
in time usage and space usage.

Assume we need sort a list whose length is $n$ using $T(n)$ time, at first we need $O(n)$ time to do the paritioning, then we’ll comsume $T(\frac{n}{2})$ to sort elements less than the pivot and $T(\frac{n}{2})$ for elements greater than the pivot, as well as $O{\frac{n}{2}}$ time will be used to concatenate sorted sub-lists, under the average condition. Now we can draw the conclusion: \(T(n) = O(n) + 2T(\frac{n}{2}) + O(\frac{n}{2})\) According to the master theorem, \(T(n)=\Theta(n\log{n})\)

The time complexity seems not so bad, which is different with the case described in *The Genuine Sieve of Eratosthenes*.
So, where is the real problem ?

Let’s look at the original Quicksort algorithm. The real Quicksort algorithm has to be imperative because it relies on destructive update. The real Quicksort uses a so amazing partitioning strategy. The partitioning works in this way: scan from the left for an element bigger than the pivot, then scan from the right for an element smaller than the pivot, and then swap them. Repeat this util no such pair can be found, indicates all elements has been partitioned. Describe this process in imperative language:

```
algorithm partition(A, lo, hi) is
pivot := A[hi]
i := lo // place for swapping
for j := lo to hi – 1 do
if A[j] <= pivot then
swap A[i] with A[j]
i := i + 1
swap A[i] with A[hi]
return i
```

But in Haskell, as well as many other functional programming language, the data is immutable! When we partition the given list and merge sorted sub-lists, many new list will be created. Obviously it’s no an actual in-place algorithm! Although that Haskell can express the algorithm easily, the academics have addressed Haskell’s failure by bastardizing the algorithm. It completely fails to capture the essence, specifically the in-place partitioning using swaps, of the real quicksort algorithm represented in Tony Hoare’s original paper that makes it so efficient.

## In-place quicksort in Haskell

There’s just one array constructor type namely `Array`

in Haskell’98 standard, which provides immutable boxed arrays. The mian
Haskell compiler, GHC, support these libraries contains a new implementation of arrays with far more features which is backward
compatible with the Haskell’98 one. `STArray`

is just one of mutable array based on `MutableArray#`

defined in `GHC.Prim`

and
powered by `ST Monad`

.

```
data STArray s i e
= STArray !i -- the lower bound, l
!i -- the upper bound, u
!Int -- a cache of (rangeSize (l,u))
-- used to make sure an index is
-- really in range
(MutableArray# s e) -- The actual elements
-- No Ix context for STArray. They are stupid,
-- and force an Ix context on the equality instance.
```

Now we can translate the imperative pseudo code of quicksort algorithm from Wikipedia directly into Haskell code. First, we need
to create a function to represent the `for`

control-flow in impreative language:

```
foreach :: (Monad m, Foldable t) => t a -> b -> (b -> a -> m b) -> m b
foreach xs v f = foldM f v xs
```

And an auxiliary function, to swap elements at two positions in an array:

```
swap :: (Ix i, MArray arr e m) => arr i e -> i -> i -> m ()
swap arr ia ib = do
a <- readArray arr ia
b <- readArray arr ib
writeArray arr ia b
writeArray arr ib a
```

Then translate the functions `partition`

and `quicksort`

into Haskell (the impreative pseudo code can be found at
Quicksort - Wikipedia):

```
qsortImpl arr l r = when (r > l) $ do
let mid = l + (r - l) `div` 2
nmid <- partition arr l r mid
qsortImpl arr l (nmid - 1)
qsortImpl arr (nmid + 1) r
partition arr l r mid = do
pivot <- readArray arr mid
swap arr mid r
slot <- foreach [l..r-1] l (\slot i -> do
val <- readArray arr i
if val < pivot
then swap arr i slot >> return (slot+1)
else return slot)
swap arr slot r >> return slot
```

The most commonly used and most convenient data structure for sequence is `Data.List`

, not mutable array. We also need to create
a abstract layer so that we can used the efficient genuine quicksort implementation on `List`

.

```
qsort :: [Int] -> [Int]
qsort xs = runST $ do
let len = length xs - 1
arr <- newListArray (0, len) xs :: ST s (STUArray s Int Int) -- or `ST s (STArray s Int Int)
qsortImpl arr 0 len >> getElems arr
```

You can obtain the complete source code here.

## Parallel and concurrency

Quicksort is a divide and conquer algorithm, divides a large array into two sub-arrays, one contains elements less than pivot, one contains elements greater than pivot and sort these two sub-arrays using quicksort, then merge sorted sub-arrays. Finially, all elements in this whole array are in order. Soring the sub-arrays are independent task so it’s easy to do them parallelly.

Thanks to `rpar`

strategy provided in parallel package, it’s so convenient to
refactor the code above into parallelism.

```
withStrategy rpar $ qsortImpl arr l (nmid - 1)
qsortImpl arr (nmid + 1) r
```

This solution uses Haskell’s parallel “strategies”. This concepts was introduced to give haskell programmer more control over
parallelization and help translate original sequential code to parallel version at the lowest price. `rpar`

will sparks its
argument for evaluation in parallel.

Another similar idea can be used to improve perfermance is concurrency. It’s also not hard to use `forkIO`

to run tasks in
multiple threads. Haskell’s runtime does support multi-core well.

```
(qsortImpl arr l (nmid - 1)) `dualThread` (qsortImpl arr (nmid + 1) r)
where
dualThread fg bg = do
wait <- backgroud bg
fg >> wait
backgroud task = do
m <- newEmptyMVar
forkIO (task >>= putMVar m)
return $ takeMVar m
```

Here are the complete Haskell sources code, including efficient implementation of parallel strategy and concurrency implementation for the famous quicksort algorithm.

## Further work

A thorough benchmark is need to analysis the perfermance of our ST-powered quick sort and parallel quick sort, and compare
to the sort algorithm in Haskell’s library `Data.List.sort`

. The function `qsort`

should be made polymorphic for reusable
reason and this function can be used more wilely. More optimization should be added for better perfermance, such as when
the size of sub-arrays is small enough, sort them with bubble sort algorithm or swap sort algorithm, and a more optimal
strategy to select the pivot can also do good to avoiding encountering worse time complexity $O(n^2)$ of the quicksort algorithm.