http://www.codinghelmet.com/  

Wear a helmet. Even when coding.

exercises > anagram-testing

Exercise #19: Testing If Two Strings are Anagrams
by Zoran Horvat @zoranh75

Problem Statement

Two strings are given, each representing one or more words. There are no punctuation characters in these strings. Words are separated by single whitespace character. The task is to write a function which tests whether these strings are anagrams. Two strings are anagrams if they are not equal, but use exactly the same rearranged set of letters. Letter casing and blank space characters should be ignored. Function should return value true if two strings are anagrams and value false otherwise.

Examples: Words "license" and "silence" are anagrams and function should return true. "William Shakespeare" is anagram with "I am a weakish speller".

Problem Analysis

According to problem statement, two strings are anagrams if they use the same set (and count) of letters, only in different order. First conclusion that we can make is that two strings are not anagrams if they are equal (case ignored). So the core of the solution might look like this:

function AreAnagrams(s1, s2)
begin
    if TOUPPER(s1) = TOUPPER(s2) then
        return false
    else
        ...
end

The helper function TOUPPER converts all letters in the string to upper case.

Further on, we need to know that each letter used in one string appears exactly the same number of times in the other string. We can count occurrences of a letter by constructing a table with as many entries as there are different letters. If these letters were ASCII characters only (which is not the case in this problem), that would make 26 distinct letters. For the William Shakespeare (I am a weakish speller) anagram, histogram tables for the two strings would look like this:

William Shakespeare I am a weakish speller
Letter Count Letter Count
A 3 A 3
E 3 E 3
H 1 H 1
I 2 I 2
K 1 K 1
L 2 L 2
M 1 M 1
P 1 P 1
R 1 R 1
S 2 S 2
W 1 W 1

Note that remaining rows, those that would contain count zero, are omitted for clarity. In real counting these rows would still exist and would all contain values zero should we use the table as the underlying data structure. Now, the process of comparing two potential anagram strings might look like this:

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin

            for letter in s1
                if letter <> ' ' then
                    table1[TOUPPER(letter)] = table1[TOUPPER(letter)] + 1

            for letter in s2
                if letter <> ' ' then
                    table2[TOUPPER(letter)] = table2[TOUPPER(letter)] + 1

            for i = 1 to 26
                if table1[i] <> table2[i] then
                    return false

            return true

        end

end

In this function, we are first testing whether two strings are exactly equal. If strings are equal, then they are not anagrams. Otherwise, we iterate through the two strings to construct two tables with counts. We could even pull the table construction to a helper function:

function ConstructHistogram(table, string s)
begin
    for letter in s
        if letter <> ' ' then
            table[TOUPPER(letter)] = table[TOUPPER(letter)] + 1
end

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin

            ConstructHistogram(table1, s1)
            ConstructHistogram(table2, s2)

            for i = 1 to 26
                if table1[i] <> table2[i] then
                    return false

            return true

        end

end

With this helper function in place, we could move the solution one step further. Observe the way in which we are testing two tables for equality. Instead of comparing entries in two tables (table1[i] <> table2[i]), we could subtract their entries and compare it against zero (table1[i] – table2[i] <> 0). This leads to the idea to use one table rather than two: First string would cause values in the table to be incremented by one, while the second string would cause values to be decremented. Once done, we would only need to verify that all values in the table are zero. Here is the pseudocode:

function ConstructHistogram(table, s, increment)
begin
    for letter in s
        if letter <> ' '
            table[letter] = table[letter] + increment
end

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin

            ConstructHistogram(table, s1, 1)
            ConstructHistogram(table, s2, -1)

            for i = 1 to 26
                if table[i] <> 0 then
                    return false
                return true

        end

end

Now observe that we are not constrained to 26 English alphabet letters. This makes the table very large, leading to high complexity of the comparing loop at the end of the function. This loop would easily take several orders of magnitude more size than combined lengths of the two strings we are comparing. So the table idea must be dropped in favor of a more space-efficient solution. Note, however, that 26-rows long table remains quite an efficient solution for ASCII anagrams comparison.

Instead of table with predetermined size, we might use a different data structure which would allow us to add and remove letters accompanied with their counts. If one letter appears for the first time, we want to add it to the data structure with count one. Next time the same letter appears, we would increase its count to two. When the second string is processed, counts would be decremented by one each time a letter is encountered. If at any time count for a given letter reaches zero, the whole entry is removed from the data structure. Finally, if two strings are anagrams, data structure containing letter appearances should be empty because two strings should negate each other completely.

Appropriate data structure for this task is hashtable. It has space complexity O(n), i.e. its size grows linearly with number of items added to it. Its time complexity is O(1) for add, query and delete operations. These properties make hashtable perfect candidate for the anagrams problem. Here is the pseudocode which relies on hashtables:

function ConstructHistogram(hashtable, s, increment)
begin
    for letter in s
        begin
            if letter <> ' ' then
                begin

                    item = TOUPPER(letter)

                    if not hashtable.Contains(item) then
                        hashtable.Add(item, increment)
                    else if hashtable.GetValue(item) = -increment then
                        hashtable.Delete(item)
                    else
                        hashtable.SetValue(item, hashtable.GetValue(item) + increment)

                end
        end
end

function AreAnagrams(s1, s2)
begin

    if TOUPPER(s1) = TOUPPER(s2) then
        begin
            return false
        end
    else
        begin
            ConstructHistogram(hashtable, s1, 1)
            ConstructHistogram(hashtable, s2, -1)
            return hashtable.IsEmpty
        end

end

Space complexity of this solution depends only on length of the longer of the two strings. If lengths are m and n, then space complexity is O(max(m, n)). Similarly, time complexity of the ConstructHistogram function is proportional to length of the string submitted, because all hashtable operations performed in the function require O(1) time. This means that overall time complexity of the function is O(m + n), because ConstructHistogram function is invoked independently for the two strings.

Implementation

Below is the implementation of a console application in C# which lets user enter two strings and then tests whether they are anagrams.

using System;
using System.Collections;

namespace Anagrams
{

    public class Program
    {

        static void ConstructHistogram(Hashtable table, string s, int increment)
        {
            foreach (char c in s)
                if (c != ' ')
                {

                    char letter = new string(new char[] { c }).ToUpper()[0]; // Convert character to upper case

                    if (!table.Contains(letter))
                        table.Add(letter, increment);
                    else if ((int)table[letter] == -increment)
                        table.Remove(letter);
                    else
                        table[letter] = (int)table[letter] + increment;

                }
        }

        static bool AreAnagrams(string s1, string s2)
        {
            if (string.Compare(s1, s2, true) == 0)   // Ignore case
            {
                return false;   // Equal strings are not anagrams
            }
            else
            {

                Hashtable table = new Hashtable();

                ConstructHistogram(table, s1, 1);
                ConstructHistogram(table, s2, -1);

                return table.Count == 0;

            }
        }

        static void Main(string[] args)
        {

            while (true)
            {

                Console.Write("Enter first string (empty to exit): ");
                string s1 = Console.ReadLine();

                if (s1 == string.Empty)
                    break;

                Console.Write("Enter second string: ");
                string s2 = Console.ReadLine();

                if (AreAnagrams(s1, s2))
                    Console.WriteLine("These strings are anagrams.");
                else
                    Console.WriteLine("These strings are not anagrams.");

                Console.WriteLine();

            }
        }

    }
}

Demonstration

When this application is run, it produces the following output:

Enter first string (empty to exit): silence
Enter second string: license
These strings are anagrams.

Enter first string (empty to exit): licenses
Enter second string: silence
These strings are not anagrams.

Enter first string (empty to exit): William Shakespeare
Enter second string: I am a weakish speller
These strings are anagrams.

Enter first string (empty to exit):

See also:

Published: Dec 18, 2013

ZORAN HORVAT

Zoran is software architect dedicated to clean design and CTO in a growing software company. Since 2014 Zoran is an author at Pluralsight where he is preparing a series of courses on design patterns, writing unit and integration tests and applying methods to improve code design and long-term maintainability.

Follow him on Twitter @zoranh75 to receive updates and links to new articles.

Watch Zoran's video courses at pluralsight.com (requires registration):

Tactical Design Patterns in .NET: Managing Responsibilities

Applying a design pattern to a real-world problem is not as straightforward as literature implicitly tells us. It is a more engaged process. This course gives an insight into tactical decisions we need to make when applying design patterns that have to do with separating and implementing class responsibilities. More...

Tactical Design Patterns in .NET: Control Flow

Improve your skills in writing simpler and safer code by applying coding practices and design patterns that are affecting control flow. More...

Improving Testability Through Design

This course tackles the issues of designing a complex application so that it can be covered with high quality tests. More...

Share this article

webmasters