Testing If Two Strings are Anagrams

by Zoran Horvat
December 18, 2013

Problem Statement

Two strings are given, each representing one or more words. There are no punctuation characters in these strings. Words are separated by single whitespace character. The task is to write a function which tests whether these strings are anagrams. Two strings are anagrams if they are not equal, but use exactly the same rearranged set of letters. Letter casing and blank space characters should be ignored. Function should return value true if two strings are anagrams and value false otherwise.

Examples: Words "license" and "silence" are anagrams and function should return true. "William Shakespeare" is anagram with "I am a weakish speller".

Problem Analysis

According to problem statement, two strings are anagrams if they use the same set (and count) of letters, only in different order. First conclusion that we can make is that two strings are not anagrams if they are equal (case ignored). So the core of the solution might look like this:

function AreAnagrams(s1, s2)
begin
    if TOUPPER(s1) = TOUPPER(s2) then
        return false
    else
        ...
end

The helper function TOUPPER converts all letters in the string to upper case.

Further on, we need to know that each letter used in one string appears exactly the same number of times in the other string. We can count occurrences of a letter by constructing a table with as many entries as there are different letters. If these letters were ASCII characters only (which is not the case in this problem), that would make 26 distinct letters. For the William Shakespeare (I am a weakish speller) anagram, histogram tables for the two strings would look like this:

William Shakespeare		I am a weakish speller
Letter	Count	Letter	Count
A	3	A	3
E	3	E	3
H	1	H	1
I	2	I	2
K	1	K	1
L	2	L	2
M	1	M	1
P	1	P	1
R	1	R	1
S	2	S	2
W	1	W	1

Note that remaining rows, those that would contain count zero, are omitted for clarity. In real counting these rows would still exist and would all contain values zero should we use the table as the underlying data structure. Now, the process of comparing two potential anagram strings might look like this:

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin

            for letter in s1
                if letter <> ' ' then
                    table1[TOUPPER(letter)] = table1[TOUPPER(letter)] + 1

            for letter in s2
                if letter <> ' ' then
                    table2[TOUPPER(letter)] = table2[TOUPPER(letter)] + 1

            for i = 1 to 26
                if table1[i] <> table2[i] then
                    return false

            return true

        end

end

In this function, we are first testing whether two strings are exactly equal. If strings are equal, then they are not anagrams. Otherwise, we iterate through the two strings to construct two tables with counts. We could even pull the table construction to a helper function:

function ConstructHistogram(table, string s)
begin
    for letter in s
        if letter <> ' ' then
            table[TOUPPER(letter)] = table[TOUPPER(letter)] + 1
end

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin

            ConstructHistogram(table1, s1)
            ConstructHistogram(table2, s2)

            for i = 1 to 26
                if table1[i] <> table2[i] then
                    return false

            return true

        end

end

With this helper function in place, we could move the solution one step further. Observe the way in which we are testing two tables for equality. Instead of comparing entries in two tables (table1[i] <> table2[i]), we could subtract their entries and compare it against zero (table1[i] – table2[i] <> 0). This leads to the idea to use one table rather than two: First string would cause values in the table to be incremented by one, while the second string would cause values to be decremented. Once done, we would only need to verify that all values in the table are zero. Here is the pseudocode:

function ConstructHistogram(table, s, increment)
begin
    for letter in s
        if letter <> ' '
            table[letter] = table[letter] + increment
end

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin

            ConstructHistogram(table, s1, 1)
            ConstructHistogram(table, s2, -1)

            for i = 1 to 26
                if table[i] <> 0 then
                    return false
                return true

        end

end

Now observe that we are not constrained to 26 English alphabet letters. This makes the table very large, leading to high complexity of the comparing loop at the end of the function. This loop would easily take several orders of magnitude more size than combined lengths of the two strings we are comparing. So the table idea must be dropped in favor of a more space-efficient solution. Note, however, that 26-rows long table remains quite an efficient solution for ASCII anagrams comparison.

Instead of table with predetermined size, we might use a different data structure which would allow us to add and remove letters accompanied with their counts. If one letter appears for the first time, we want to add it to the data structure with count one. Next time the same letter appears, we would increase its count to two. When the second string is processed, counts would be decremented by one each time a letter is encountered. If at any time count for a given letter reaches zero, the whole entry is removed from the data structure. Finally, if two strings are anagrams, data structure containing letter appearances should be empty because two strings should negate each other completely.

Appropriate data structure for this task is hashtable. It has space complexity O(n), i.e. its size grows linearly with number of items added to it. Its time complexity is O(1) for add, query and delete operations. These properties make hashtable perfect candidate for the anagrams problem. Here is the pseudocode which relies on hashtables:

function ConstructHistogram(hashtable, s, increment)
begin
    for letter in s
        begin
            if letter <> ' ' then
                begin

                    item = TOUPPER(letter)

                    if not hashtable.Contains(item) then
                        hashtable.Add(item, increment)
                    else if hashtable.GetValue(item) = -increment then
                        hashtable.Delete(item)
                    else
                        hashtable.SetValue(item, hashtable.GetValue(item) + increment)

                end
        end
end

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin
            ConstructHistogram(hashtable, s1, 1)
            ConstructHistogram(hashtable, s2, -1)
            return hashtable.IsEmpty
        end

end

Space complexity of this solution depends only on length of the longer of the two strings. If lengths are m and n, then space complexity is O(max(m, n)). Similarly, time complexity of the ConstructHistogram function is proportional to length of the string submitted, because all hashtable operations performed in the function require O(1) time. This means that overall time complexity of the function is O(m + n), because ConstructHistogram function is invoked independently for the two strings.

Implementation

Below is the implementation of a console application in C# which lets user enter two strings and then tests whether they are anagrams.

using System;
using System.Collections;

namespace Anagrams
{

    public class Program
    {

        static void ConstructHistogram(Hashtable table, string s, int increment)
        {
            foreach (char c in s)
                if (c != ' ')
                {

                    char letter = new string(new char[] { c }).ToUpper()[0]; // Convert character to upper case

                    if (!table.Contains(letter))
                        table.Add(letter, increment);
                    else if ((int)table[letter] == -increment)
                        table.Remove(letter);
                    else
                        table[letter] = (int)table[letter] + increment;

                }
        }

        static bool AreAnagrams(string s1, string s2)
        {
            if (string.Compare(s1, s2, true) == 0)   // Ignore case
            {
                return false;   // Equal strings are not anagrams
            }
            else
            {

                Hashtable table = new Hashtable();

                ConstructHistogram(table, s1, 1);
                ConstructHistogram(table, s2, -1);

                return table.Count == 0;

            }
        }

        static void Main(string[] args)
        {

            while (true)
            {

                Console.Write("Enter first string (empty to exit): ");
                string s1 = Console.ReadLine();

                if (s1 == string.Empty)
                    break;

                Console.Write("Enter second string: ");
                string s2 = Console.ReadLine();

                if (AreAnagrams(s1, s2))
                    Console.WriteLine("These strings are anagrams.");
                else
                    Console.WriteLine("These strings are not anagrams.");

                Console.WriteLine();

            }
        }

    }
}

Demonstration

When this application is run, it produces the following output:

            
Enter first string (empty to exit): silence
Enter second string: license
These strings are anagrams.

Enter first string (empty to exit): licenses
Enter second string: silence
These strings are not anagrams.

Enter first string (empty to exit): William Shakespeare
Enter second string: I am a weakish speller
These strings are anagrams.

Enter first string (empty to exit):

Beginning Object-oriented Programming with C#

In this course, you will learn the basic principles of object-oriented programming, and then learn how to apply those principles to construct an operational and correct code using the C# programming language and .NET.

As the course progresses, you will learn such programming concepts as objects, method resolution, polymorphism, object composition, class inheritance, object substitution, etc., but also the basic principles of object-oriented design and even project management, such as abstraction, dependency injection, open-closed principle, tell don't ask principle, the principles of agile software development and many more.

More...

Design Patterns in C# Made Simple

In this course, you will learn how design patterns can be applied to make code better: flexible, short, readable.

You will learn how to decide when and which pattern to apply by formally analyzing the need to flex around specific axis.

More...

Refactoring to Design Patterns

This course begins with examination of a realistic application, which is poorly factored and doesn't incorporate design patterns. It is nearly impossible to maintain and develop this application further, due to its poor structure and design.

As demonstration after demonstration will unfold, we will refactor this entire application, fitting many design patterns into place almost without effort. By the end of the course, you will know how code refactoring and design patterns can operate together, and help each other create great design.

More...

Mastering Iterative Object-oriented Development in C#

In four and a half hours of this course, you will learn how to control design of classes, design of complex algorithms, and how to recognize and implement data structures.

After completing this course, you will know how to develop a large and complex domain model, which you will be able to maintain and extend further. And, not to forget, the model you develop in this way will be correct and free of bugs.

More...

About

Zoran Horvat

Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented and functional development style and clean coding practices and techniques that improve longevity of complex business applications.

Beginning Object-oriented Programming with C#

Design Patterns in C# Made Simple

Refactoring to Design Patterns

Mastering Iterative Object-oriented Development in C#

Testing If Two Strings are Anagrams

Problem Statement

Problem Analysis

Implementation

Demonstration

See Also:

About

Elsewhere

Video Courses

Articles

Exercises