Counting Intersection of Two Unsorted Arrays

by Zoran Horvat

Problem Statement

Given two unsorted arrays of integer numbers, write a function which returns number of elements that exist in both arrays. Elements may be repeated in any of the arrays, and all common appearances should be counted.

Example: Suppose that the first array is 3, 1, 4, 1, 2 and the second array is 5, 1, 2, 4, 2, 1. Common elements of the two arrays are 1, 1 and 2, and therefore the function should return result 3.

Problem Analysis

In this problem, we have to traverse the two arrays and identify elements that are in common. This exercise is closely connected with Finding Intersection of Two Unsorted Arrays , in which we were actually constructing the array containing the common elements.

In that exercise, we have discussed several approaches:

  • Indexing solution – Pass through one array and put all its elements into a hash table. Then iterate through the other array and see if any element can be found in the hash table. If so, remove it from the hash table (to avoid future hits) and push it to the output.
  • Brute force solution – Compare any two elements of both arrays, while ensuring that elements that take part in any matching are not considered again.
  • Sorting solution – Sort both arrays and then progress through them and extract common elements.
  • Partial sorting solution – Divide and conquer both arrays around the same pivot values and then push common pivot occurrences to the output.

For more details about each of these methods, please refer to Finding Intersection of Two Unsorted Arrays exercise. In this article, we will provide solution which is based on partial sorting.

Partial Sorting Solution

As already indicated, this solution is based on divide-and-conquer idea. We would find a pivot in the first array and then partition the array around that pivot value. Then we would partition the second array around the same pivot value. In this way, we would split both arrays around the same value. The whole idea boils down to this sequence of steps:

1. Partition both arrays around the same pivot value

2. Count how many occurrences of the pivot value they have in common

3. Repeat recursively on left and on right partitions of both arrays, if non-empty

Once recursion unfolds, all calls will report how many pivot values they have in common. Overall result is then the sum of all partial results.

Below is the pseudo-code which implements this idea.

function CountIntersection(a, b, n, m)
    -- a - array
    -- b - array
    -- n - length of a
    -- m - length of b
begin
    return CountIntersectionRecursive(a, b, 1, n, 1, m)
end

function CountIntersectionRecursive(a, b, aStart, aEnd, bStart, bEnd)
    -- a - array
    -- b - array
    -- aStart - first position in a to intersect
    -- aEnd - last position in a to intersect
    -- bStart - first position in b to intersect
    -- bEnd - last position in b to intersect
begin

    pivot = MedianOfMedians(a, aStart, aEnd);

    Partition(a, aStart, aEnd, pivot, out aPivotStart, out aPivotLen)
    Partition(b, bStart, bEnd, pivot, out bPivotStart, out bPivotLen)

    count = min(aPivotLen, bPivotLen)

    -- Recursively solve left partitions
    if aPivotStart > aStart AND bPivotStart > bStart then
        count = count +
            CountIntersectionRecursive(a, b, aStart, aPivotStart - 1,
                                       bStart, bPivotStart - 1)

    -- Recursively solve right partitions
    if aPivotStrat + aPivotLen < aEnd AND bPivotStart + bPivotLen < bEnd then
        count = count +
            CountIntersectionRecursive(a, b, aPivotStart + aPivotLen, aEnd,
                                       bPivotStart + bPivotLen, bEnd)
    return count

end

Partition is a utility function which partitions the array around specified pivot value. MedianOfMedians is another utility function which calculates convenient pivot value using median of medians algorithm. You can read more about this algorithm in exercise Finding Kth Smallest Element in an Unsorted Array .

Implementation

Below is the complete Console application in C# which implements the intersection counting function.

using System;

namespace IntersectionCount
{

    public class Program
    {

        static int IntersectionCount(int[] a, int[] b)
        {
            return IntersectionCountRecursive(a, b,
                                              0, a.Length - 1,
                                              0, b.Length - 1);

        }

        static int IntersectionCountRecursive(int[] a, int[] b,
                                         int aStart, int aEnd,
                                         int bStart, int bEnd)
        {

            int pivot = MedianOfMedians(a, aStart, aEnd);

            int aPivotStart;
            int aPivotLen;
            int bPivotStart;
            int bPivotLen;

            Partition(a, aStart, aEnd, pivot,
                      out aPivotStart, out aPivotLen);

            Partition(b, bStart, bEnd, pivot,
                      out bPivotStart, out bPivotLen);

            int count = aPivotLen;
            if (bPivotLen < count)
                count = bPivotLen;

            // Recursively solve left partitions
            if (aPivotStart > aStart && bPivotStart > bStart)
                count += IntersectionCountRecursive(a, b,
                                                    aStart, aPivotStart - 1,
                                                    bStart, bPivotStart - 1);

            // Recursively solve right partitions
            if (aPivotStart + aPivotLen - 1 < aEnd &&
                bPivotStart + bPivotLen - 1 < bEnd)
                count += IntersectionCountRecursive(a, b,
                                                    aPivotStart + aPivotLen, aEnd,
                                                    bPivotStart + bPivotLen, bEnd);

            return count;

        }

        static void Partition(int[] a, int start, int end, int pivot,
                              out int pivotStart, out int pivotLen)
        {

            int i = start;

            while (i <= end)
            {

                if (a[i] < pivot)
                {
                    a[start++] = a[i++];
                }
                else if (a[i] == pivot)
                {
                    i++;
                }
                else
                {
                    int tmp = a[end];
                    a[end] = a[i];
                    a[i] = tmp;
                    end--;
                }

            }

            for (i = start; i <= end; i++)
                a[i] = pivot;

            pivotStart = start;
            pivotLen = end - start + 1;

        }

        static int MedianOfMedians(int[] a, int start, int end)
        {

            while (end > start + 1)
            {

                int pos = start;

                for (int segment = start; segment < end; segment += 5)
                {

                    int tmp;

                    int segmentEnd = segment + 5;
                    if (segmentEnd > end)
                        segmentEnd = end;

                    for (int i = segment; i < segmentEnd - 1; i++)
                        for (int j = i + 1; j < segmentEnd; j++)
                            if (a[i] > a[j])
                            {
                                tmp = a[i];
                                a[i] = a[j];
                                a[j] = tmp;
                            }

                    int middlePos = (segment + segmentEnd - 1)  2;
                    tmp = a[pos];
                    a[pos] = a[middlePos];
                    a[middlePos] = tmp;
                    pos++;

                }

                end = pos;

            }

            return a[start];

        }

        static int[] ReadArray()
        {

            Console.Write("Enter array elements (ENTER to exit): ");

            string[] elements =
                Console
                .ReadLine()
                .Split(new char[] { ' ' },
                       StringSplitOptions.RemoveEmptyEntries);

            int[] a = null;

            if (elements.Length > 0)
            {
                a = new int[elements.Length];
                for (int i = 0; i < elements.Length; i++)
                    a[i] = int.Parse(elements[i]);
            }

            return a;

        }

        static void Main(string[] args)
        {

            while (true)
            {

                int[] a = ReadArray();
                if (a == null)
                    break;

                int[] b = ReadArray();
                if (b == null)
                    break;

                int intersectionCount = IntersectionCount(a, b);

                Console.WriteLine("Arrays share {0} element(s).\n",
                                  intersectionCount);

            }

        }

    }
}

Demonstration

When Console application shown above is run and tried on a few arrays, it produces the following output:

            
Enter array elements (ENTER to exit): 3 1 4 1 2
Enter array elements (ENTER to exit): 5 1 2 4 2
Arrays share 3 element(s).

Enter array elements (ENTER to exit): 2 5 1 2 7 3
Enter array elements (ENTER to exit): 9 8 7 6 5 4
Arrays share 2 element(s).

Enter array elements (ENTER to exit): 5 5 5 5 5
Enter array elements (ENTER to exit): 4 4 4 4 4
Arrays share 0 element(s).

Enter array elements (ENTER to exit):
                
    

Discussion

Good thing about solution presented in this article is that analysis of one partition of an array terminates if the corresponding partition in the other array is empty. Searching for new common elements between the two arrays progresses only as long as there is hope to find more of them. This measure makes positive effect on the efficiency of the solution.


If you wish to learn more, please watch my latest video courses

About

Zoran Horvat

Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented and functional development style and clean coding practices and techniques that improve longevity of complex business applications.

  1. Pluralsight
  2. Udemy
  3. Twitter
  4. YouTube
  5. LinkedIn
  6. GitHub