by Zoran Horvat

Apr 27, 2013

Given an unsorted array of N integers, write a function that returns Kth smallest number in the array, for zero-based K.

Example: Suppose that the following array with 5 numbers is given: 3, 1, 7, 5, 9. For K=3, the result is 7. This is because, when arry is sorted into 1, 3, 5, 7, 9, the element at zero-based position 3 is number 7.

Keywords: Kth smallest, array, median of medians, selection problem.

We could start solvong the problem by simply sorting the array and returning the element at position K. This solution produces the result in time proportional to NlogN. But is there a more efficient solution? If we stick to the sorting idea, we might ask why should we sort the whole array if only the Kth element in the sort order is requested? Let's examine the possibilities that arise when QUICKSORT algorithm is applied. Remember that this algorithm partitions the array into two disjoint sections delimited by some pivot element. Once the partitioning step is completed, the algorithm is run on each of the partitions recursively. These are the steps executed by the sorting algorithm:

```
1. Pick an element within current segment and call it the pivot
2. Move all elements smaller than or equal to the pivot
near the beginning of the current segment
3. Move all elements larger than the pivot near the end
of the current segment
4. Place the pivot on the remaining position in between
5. Recursively call the sorting algorithm on the segment
to the left from the pivot
6. Recursively call the sorting algorithm on the segment
to the right from the pivot
```

The way in which this algorithm works seems to guarantee high sorting performance. Values in the array are divided up into two, hopefully equal, sub-arrays. All values in the left partition are strictly smaller than all elements in the right partition. Consequently, the two partitions can be sorted independently from each other. The "hopefully equal" part is the tricky one, though. Depending on how the pivot is selected, we could get quite different results in terms of relative sizes of the two partitions. If we really get out of luck, pivot will be the smallest of all values in the segment (or largest, with same consequences). The segment will then be divided into an empty left segment, the pivot, and the right segment that contains all the values but the pivot. So in the worst case, the next step will be the same as the current step, only the segment will be by one element shorter. Not much of a reduction, with overall result that worst case runs in quadratic time, i.e. time proportional to square of the number of elements.

Now let's turn back to the Kth smallest element problem. We could approach it in the same way in which QUICKSORT algorithm approaches the sorting problem. First pick the pivot, partition the array and then, instead of sorting the left and the right partition further, test their sizes in order to decide which partition contains the Kth element that is being sought. If we're lucky enough, the Kth element would be the pivot - when things slide in this direction the search is over. Otherwise, Kth element must fall either into the left or into the right partition. If former is the case, then we can recursively run the search for the Kth element inside the left partition, completely ignoring the right partition. Otherwise, the element falls into the right partition and then we run the algorithm recursively only this time on the right partition. The only trick is that in the latter case we do not search for the Kth element in the following iterations any more. Instead, index of the requested element must be reduced by the size of the left partition and by one more, standing for the pivot. Below is the modified algorithm.

```
1. Pick an element within current segment
and call it the pivot
2. Count elements that are smaller and
elements that are larger than the pivot
3. If number of elements smaller than the pivot
is larger than K, then move those elements
to the beginning of the array and run
the algorithm recursively only on that part of the array.
4. Otherwise, if number of elements smaller than the pivot
plus number of elements equal to the pivot is larger
than K, then Kth element is equal to pivot
so just return the pivot and finish.
5. Otherwise, move all elements larger than the pivot
to the beginning of the array and run the algorithm
recursively only on that part of the array.
```

Once again, if we are lucky enough with the pivot selection, this algorithm will cut the array in half in every step, converging quickly towards the solution. But once again, poor pivot selection leads to very slow convergence. In the worst case, only the pivot will be removed for the next iteration, leading to grim prediction in which the whole algorithm will have to be repeated as many times through the array as many elements there are in the whole array - hardly that we could devise any algorithm slower than that. Success of the whole idea depends on good pivot selection. That is what we are going to discuss next.

There are many enhancements to the QUICKSORT algorithm intended to improve worst case behavior. Basically, should the pivot element be the median, we would rest assured that the array will be divided into equal halves in each iteration. One of the enhancements has actually lead to a completely new algorithm by turning the question the other way around: What is the most efficient way to discover the median of an array? The idea went like this. First we divide the array into N/C columns, each column containing C consecutive elements of the array. Note that C is a relatively small odd number (many authors suggest five elements per column). Next, we sort each column separately and pick its exact median (this is why it's important for C to be odd). After this step, we have obtained N/C medians and then we can simply run the algorithm recursively on those values only. At some point there will be only one value remaining (called median of medians), it is selected as the pivot. Further down the line, the algorithm goes the same as given above - pivot is used to split the array into values that are smaller and values that are larger than the pivot.

The only thing to worry about is what is the worst case in this setting, i.e. what is the smallest number of values that are discarded in each iteration of the algorithm. It has been proven that in each iteration of the algorithm the pivot element selected in this way discards at least one quarter of all elements still in focus. Consequently, we are never going to get into an unfair situation in which array hardly reduces between the two iterations. On the contrary, this is the proof that the array will converge towards the Kth element quite quickly - to be perfectly clear about the time requirements, it has been proven that number of comparisons required to isolate the Kth element is proportional to number of elements in the array N. This algorithm has been proposed in the early 1970s and ever since it has remained the most efficient algorithm with guaranteed linear worst case running time. Compared to quadratic worst time required by algorithms derived from QUICKSORT, it comes as no surprise that this algorithm is still in use. Below is the whole algorithm - it will sound quite simple when compared to pondering that lead to its discovery.

```
1. Divide the array into N/C columns of elements,
for small odd C.
2. Find the median of each column by sorting it.
3. Take only the medians and repeat steps 1-2 recursively
until only one value remains. That value is picked as the pivot.
4. Iterate through the array and count number of elements
strictly smaller than the pivot (S), larger than the pivot (L)
and equal to the pivot (E=N-S-L).
5. If N>K, move all values smaller than the pivot
to the beginning of the array and recursively run
the whole algorithm on that sub-array.
6. If N+E>K, conclude that Kth element is equal
to the current pivot so return the pivot value
and terminate the algorithm.
7. Otherwise, move all values larger than the pivot
to the beginning of the array and recursively run
the whole algorithm on that sub-array.
```

In the following section we are going to provide implementation of this algorithm in C#. The only thing that should be discussed is step 2 of the algorithm, in which medians for columns are extracted. The algorithm is designed so that simple sorting with quadratic running time is applied to each column. Performance of this step is guaranteed by the small value of C. There comes the suggestion to take C=5 - this advice was derived from the fact that five values can be stored in processor's registers, making the sort operation extremely fast in practice. Moreover, columns could be sorted only until the middle element is sorted, since any element past that position cannot be any smaller. But that optimization may cost more than it saves for small C. For C=5, it would save only one comparison to sort the last two elements in the column. In solution presented below, this last optimization is not applied.

In this section we are providing the complete source code for selecting Kth smallest integer value from an unsorted array. This implementation takes C=5 as assumed parameter. Generally, this value could also be passed to the function as an argument.

```
int FindKthSmallest(int[] a, int k)
{
int value = 0;
int n = a.Length;
int c = 5; // Constant used to divide the array into columns
while (true)
{
// Extract median of medians and take it as the pivot
int pivot = FindPivot(a, n, c);
// Now count how many smaller and larger elements are there
int smallerCount = 0;
int largerCount = 0;
CountElements(a, n, pivot, out smallerCount, out largerCount);
// Finally, partition the array
if (k < smallerCount)
{
Partition(a, ref n, pivot, true);
}
else if (k < n - largerCount)
{
value = pivot;
break;
}
else
{
k -= n - largerCount;
Partition(a, ref n, pivot, false);
}
}
return value;
}
int FindPivot(int[] a, int n, int c)
{
while (n > 1)
{
int pos = 0;
int tmp = 0;
for (int start = 0; start < n; start += c)
{
int end = start + c;
if (end > n) // Last column may have
end = n; // less than c elements
// Sort the column
for (int i = start; i < end - 1; i++)
for (int j = i + 1; j < end; j++)
if (a[j] < a[i])
{
tmp = a[i];
a[i] = a[j];
a[j] = tmp;
}
// Pick the column's median and promote it
// to the beginning of the array
end = (start + end) / 2; // Median position
tmp = a[end];
a[end] = a[pos];
a[pos++] = tmp;
}
n = pos; // Reduce the array and repeat recursively
}
return a[0]; // Last median of medians is the pivot
}
void Partition(int[] a, ref int n, int pivot, bool extractSmaller)
{
int pos = 0;
for (int i = 0; i < n; i++)
if ((extractSmaller && a[i] < pivot) ||
(!extractSmaller && a[i] > pivot))
{
int tmp = a[i];
a[i] = a[pos];
a[pos++] = tmp;
}
n = pos;
}
```

In this section we are going to present a short console application that demonstrates the use of the FindKthSmallest function. Here is the complete source code of the console application:

```
using System;
namespace KthSmallest
{
public class Program
{
static Random _rnd = new Random();
static int[] GenerateArray(int n)
{
int[] array = new int[(Int64)n];
for (int i = 0; i < n; i++)
array[i] = _rnd.Next(100);
// Now randomize the array
for (int i = array.Length - 1; i > 0; i--)
{
int index = _rnd.Next(i);
int tmp = array[index];
array[index] = array[i];
array[i] = tmp;
}
return array;
}
static void PrintArray(int[] array)
{
for (int i = 0; i < array.Length; i++)
Console.Write("{0,4}", array[i]);
Console.WriteLine();
}
static int FindKthSmallest(int[] a, int k) { ... }
static int FindPivot(int[] a, int n, int c) { ... }
static void CountElements(int[] a, int n, int pivot,
out int smallerCount,
out int largerCount) { ... }
static void Partition(int[] a, ref int n, int pivot,
bool extractSmaller) { ... }
static void Main(string[] args)
{
while (true)
{
Console.Write("n=");
int n = int.Parse(Console.ReadLine());
if (n <= 0)
break;
Console.Write("k=");
int k = int.Parse(Console.ReadLine());
int[] a = GenerateArray(n);
int value = FindKthSmallest(a, k);
PrintArray(a);
Console.WriteLine("{0}. smallest value is {1}", k, value);
}
Console.Write("Press ENTER to continue... ");
Console.ReadLine();
}
}
}
```

And this is one possible output produced by this demonstration application:

```
n=10
k=4
37 36 0 27 7 38 65 98 43 90
4. smallest value is 37
n=9
k=3
48 48 4 63 73 43 83 81 92
3. smallest value is 48
n=7
k=6
93 38 13 43 73 90 19
6. smallest value is 93
n=11
k=0
3 17 41 45 52 59 64 60 64 92 82
0. smallest value is 3
n=0
Press ENTER to continue...
```

Readers are suggested to try these extended tasks:

- Apply the function to very large arrays containing random data (millions to hundreds of millions of integers). Measure how time required to produce the result grows with growing size of the array.
- Try to solve the problem by using Array.Sort and picking the Kth element from the fully sorted array. Repeat the performance test to measure how sorting time increases with growing size of the array.
- Compare absolute times obtained in previous tests and discuss the differences.

Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented development style and clean coding practices and techniques that improve longevity of complex business applications.

- What Makes while Loop a Poor Choice in Programming
- How to Wrap System.Random Into an Infinite IEnumerable<int> Sequence
- Substituting the Builder with the Sequence of Factory Methods
- Custom Implementation of the Option/Maybe Type in C#
- Pros and Cons of Multiple Constructors
- Defensive Design: An Experiment
- More...

- Making Your C# Code More Object-oriented
- Making Your C# Code More Functional
- Writing Purely Functional Code in C#
- Tactical Design Patterns in .NET: Creating Objects
- Tactical Design Patterns in .NET: Control Flow
- Tactical Design Patterns in .NET: Managing Responsibilities
- Advanced Defensive Programming Techniques
- Writing Highly Maintainable Unit Tests
- Improving Testability Through Design