Exercise #8: Finding a Value in a Sorted Array by Zoran Horvat @zoranh75
Problem Statement
Given a sorted array of integer numbers, write a function which returns
zerobased position on which the specified value is located. Function should
return negative value if requested number cannot be found in the array. If
value occurs more than once, function should return position of the first
occurrence.
Example: Suppose that array consists of numbers 2, 3, 5, 6, 6, 8, 9. If value 6
is searched, then result is 3, because that is the zerobased position of the
first occurrence of number 6 in the array. If number 7 is searched, then result
is a negative value (e.g. 1), which indicates that value 7 cannot be found in
the array.
Keywords: Array, sorted, search, partitioning.
Problem Analysis
For a moment forget that array might contain the requested number more than
once. If we reach a satisfactory algorithm that efficiently finds any
occurrence of a number, then we will already be close to the solution. With a
simple tweak we will then ensure that algorithm discovers the first occurrence
of the number.
The simplest way to see if an array contains a requested value is to iterate
through the array and stop as soon as the value is located. Alternatively,
search fails if end of array is reached. This algorithm, with some
enhancements, is demonstrated in one of the previous exercises (Finding a Value in an Unsorted Array). But this approach does not take advantage of the fact that array is sorted.
In order to make the searching function faster, we will have to collect
additional information about values in the array from the sort order itself.
If we take the array from example above and try to find number 6 in it, then we
might inspect the first value (2) and conclude that search should continue
forward because this value is less than the number we are looking for. But we
are not constrained to iterate through the array too stringently. We could jump
two positions ahead, where number 5 is located  still too small. Should we
jump three steps ahead next time, we would find value 8, which is too large 
signal to step back. And there we go  one position back and value 6 is found.
Searching tactic just demonstrated is not the most elegant one that we could
come up with. But it hits the point in searching through the sorted array. We
are free to jump back and forth through array as long as searching space
shrinks with each step. After the first two comparisons we were sure that
number 6 does not appear on any of the three leading positions. One step later,
we already knew that number we are looking for doesn't appear on the last two
positions either. In only three comparisons we have eliminated five out of
seven elements of the array. This analysis shows that searching through the
array can be much more efficient than the simple iterative approach when array
is known to be sorted. Next thing to do now is to discover a better method of
eliminating unwanted elements.
Suppose that we use an arbitrary method to pick a position in the array and to
compare a number on that position, whichever it is, with the target number. If
the two numbers are equal, then the search is over. Otherwise, we must decide
on which side of the current position could the requested number be located. If
current element is greater than the requested number, then searching should
obviously continue on the lefthand part of the array. Otherwise, we continue
searching on the righthand part of the array. Algorithm continues recursively
on the selected subarray, completely discarding all the values on the other
side of the element which was tested. Algorithm terminates either when element
under test is equal to the number we are looking for, or when remaining section
of the array becomes empty, in which case we simply conclude that the number
was not located.
Now we are ready to ask the central question. At any given step we have a
certain range within the array which we are investigating. All elements to the
left and to the right have already been discarded. Without any additional
information about values in the remaining section, which is the most convenient
element we might choose for comparison to maximally reduce the search section?
Since requested value could be located anywhere within the array section (if it
is there, in the first place), then the best we could do is to pick an element
in the middle of the array, cutting the segment into equal halves (off by one,
at most). That partitioning method guarantees that one half of the values will
be gone once the comparison is done, whichever the comparison result.
Here is the pseudocode which finds any occurrence of the requested value
within the array:
function FindAnySorted(a, n, x)
a  sorted array
n  length of the array
x  value searched in the array
begin
left = 0
right = n – 1
while left <= right
begin
middle = (left + right) / 2
if a[middle] = x then
return middle
if a[middle] > x then
right = middle  1
else
left = middle + 1
end
return 1  value was not found
end
Observe how the remaining part of the array reduces by half plus the middle
element in each iteration of the algorithm. This leads to a conclusion that
number of comparisons needed to find a value in the sorted array with N
elements is proportional to logN in the worst case.
Now we should return to the original problem statement and try to modify the
solution so that it returns position of the first occurrence of a value within
the array in cases when that value occurs more than once. Modified partitioning
algorithm would be to search for the boundary, a location between two elements,
rather than a particular element, such that all numbers to the left of the
boundary are strictly less than the requested number and all elements to the
right are greater or equal to the requested number. With the example array from
the problem statement (2, 3, 5, 6, 6, 8, 9), when searching for value 6, this
boundary would fall between positions 2 and 3.
In the simple partitioning algorithm, we tracked positions left and right, which indicated the first and the last element in the remaining segment of
the array, respectively. That way of tracking the elements was natural in that
case, because array had to be nonempty in each step. Should the array become
empty, the search would fail. In the modified solution, however, we are forcing
the algorithm to run until remaining segment is empty. Its left and right
border would then be the last element less than and the first element greater
or equal than the requested element. This implies that initial left and right indices are 1 and N for zerobased array. In other words, initial boundaries
will be outside the array. To picture this situation, we might imagine that
there exists one infinitely small element to the left of the array occupying
location 1, and one infinitely large element to the right of the array
occupying location N. These two fake elements will certainly be the smallest
and the largest elements in the array, and array will remain sorted. Note that left and right indicator in this setting now point outside of the current array segment.
Algorithm will terminate when right indicator is by one position to the right
of the left indicator.
The following picture demonstrates finding the first occurrence of number 6
within an array.
Observe how left and right indicators gradually converge towards each other. Once they meet, the cutting
line between them divides the array – all values to the left are strictly less
than 6; all values to the right are greater or equal to 6. Now the only thing
remaining is to test whether there is any element to the right (cutting line
could easily fall at the end of the array) and, if there is such an element, to
test whether it is equal to 6. Since righthand elements are greater or equal
to 6, the first element to the right might as well be strictly greater than 6,
in which case we simply conclude that searched number does not appear in the
array.
Now that we have outlined the way in which bounds of the current array segment
are tracked, we are ready to produce the final algorithm. Here is the
pseudocode:
function FindFirstSorted(a, n, x)
a  sorted array
n  length of the array
x  value searched in the array
begin
left = 1
right = n
while right > left + 1
begin
middle = (left + right) / 2
if a[middle] < x then
left = middle
else
right = middle
end
if right < n and a[right] = x then
return right
return 1
end
There are a couple of interesting details in this solution. First of all, we
might ask is the while loop going to terminate in all cases? We are supposed to
prove that middle is strictly greater than left and strictly less than right. If so, then in either way one of the boundaries is going to change and array
segment being searched is going to shrink in every step. Proof for this claim
is very simple if we start from the condition in the while statement: right > left + 1. Now we can compare the value of the middle variable with both left and right:
From this analysis we conclude that:
This proves that remaining part of the array is going to be shorter with every
step of the algorithm. Middle position is strictly greater than left and
strictly less than right boundary. As a result, either the left boundary is
going to move to the right or the right boundary is going to move to the left.
In either case, number of elements in between will reduce. Off course, once
there are no more elements between the boundaries, the remaining part of the
array will become empty and while loop will terminate.
Now that we know the function is going to exit at some point in time, we will
examine the final part, in which return value is decided. Namely, the whole
purpose of the loop was to find a dividing point in the array, a point standing
in between the two elements, such that all numbers to the left are strictly
less than the requested element and all numbers to the right are greater or
equal to the requested element. The point to note here is that, once the loop
terminates, variable right points to the first element of the array that is greater or equal to the
searched number. Now there are two pitfalls out there. First one is that
variable right might point outside the array (right = N). This happens if right remains with its initial value, which in turn happens when all values in the
array are strictly less than the searched value. The second issue is if value
at position right is not equal to the requested number. In that case, the loop has established a
cutting point where smaller values end and larger values begin. There is
nothing wrong with the loop, only there are no values equal to the searched
number. Therefore, algorithm terminates with success only if the following
combined criterion is met: right < n and a[right] = x. Otherwise, function returns negative value, indicating that searched value
was not found in the array.
Implementation
With all the previous analysis, implementation of the function that finds first
occurrence of a number in the sorted array is very simple. Here is the source
code:
int FindFirstSorted(int[] a, int x)
{
int left = 1;
int right = a.Length;
while (right > left + 1)
{
int middle = (left + right) / 2;
if (a[middle] >= x)
right = middle;
else
left = middle;
}
if (right < a.Length && a[right] == x)
return right;
return 1;
}
Demonstration
Below is the source code of a simple console application which demonstrates use
of the FindFirstSorted function.
using System;
namespace FindSortedDemo
{
public class Program
{
static Random _rnd = new Random();
static int[] GenerateArray(int n)
{
int[] a = new int[n];
for (int i = 0; i < n; i++)
a[i] = _rnd.Next(10);
Array.Sort(a);
return a;
}
static void PrintArray(int[] a)
{
for (int i = 0; i < a.Length; i++)
Console.Write("{0,3}", a[i]);
Console.WriteLine();
}
static int FindFirstSorted(int[] a, int x) { ... }
static void Main(string[] args)
{
while (true)
{
Console.Write("n=");
int n = int.Parse(Console.ReadLine());
if (n <= 0)
break;
int[] a = GenerateArray(n);
PrintArray(a);
Console.Write("x=");
int x = int.Parse(Console.ReadLine());
int pos = FindFirstSorted(a, x);
if (pos < 0)
Console.WriteLine("Value not found.");
else
Console.WriteLine("Value found at position {0}.", pos);
}
Console.Write("Press ENTER to continue... ");
Console.ReadLine();
}
}
}
When application is run, it produces output like this:
n=10
0 0 1 3 3 5 6 7 8 8
x=5
Value found at position 5.
n=7
0 2 3 3 7 8 9
x=1
Value not found.
n=9
0 1 2 4 4 5 6 8 8
x=9
Value not found.
n=8
2 2 2 2 3 4 8 9
x=2
Value found at position 0.
n=10
1 2 4 4 5 5 6 6 8 9
x=9
Value found at position 9.
n=0
Press ENTER to continue...
FollowUp Exercises
Starting from the solution from this exercise, try to solve the following
problems:
 Write the function which returns last occurrence of a number within the array.
 Write the function which simultaneously returns first and last occurrence of a
number within the array. Try to optimize the search so that the function does
not operate twice on the whole array.
See also:
Published: May 14, 2013
ZORAN HORVAT Zoran is software architect dedicated to clean design and CTO in a growing software company. Since 2014 Zoran is an author at Pluralsight where he is preparing a series of courses on design patterns, writing unit and integration tests and applying methods to improve code design and longterm maintainability. Follow him on Twitter @zoranh75 to receive updates and links to new articles. Watch Zoran's video courses at pluralsight.com (requires registration): Tactical Design Patterns in .NET: Managing Responsibilities Applying a design pattern to a realworld problem is not as straightforward as literature implicitly tells us. It is a more engaged process. This course gives an insight into tactical decisions we need to make when applying design patterns that have to do with separating and implementing class responsibilities. More... Tactical Design Patterns in .NET: Control Flow Improve your skills in writing simpler and safer code by applying coding practices and design patterns that are affecting control flow. More... Improving Testability Through Design This course tackles the issues of designing a complex application so that it can be covered with high quality tests. More... Share this article
