# Testing If Two Strings are Anagrams

by Zoran Horvat
Dec 18, 2013

## Problem Statement

Two strings are given, each representing one or more words. There are no punctuation characters in these strings. Words are separated by single whitespace character. The task is to write a function which tests whether these strings are anagrams. Two strings are anagrams if they are not equal, but use exactly the same rearranged set of letters. Letter casing and blank space characters should be ignored. Function should return value true if two strings are anagrams and value false otherwise.

Examples: Words "license" and "silence" are anagrams and function should return true. "William Shakespeare" is anagram with "I am a weakish speller".

## Problem Analysis

According to problem statement, two strings are anagrams if they use the same set (and count) of letters, only in different order. First conclusion that we can make is that two strings are not anagrams if they are equal (case ignored). So the core of the solution might look like this:

``````function AreAnagrams(s1, s2)
begin
if TOUPPER(s1) = TOUPPER(s2) then
return false
else
...
end
``````

The helper function TOUPPER converts all letters in the string to upper case.

Further on, we need to know that each letter used in one string appears exactly the same number of times in the other string. We can count occurrences of a letter by constructing a table with as many entries as there are different letters. If these letters were ASCII characters only (which is not the case in this problem), that would make 26 distinct letters. For the William Shakespeare (I am a weakish speller) anagram, histogram tables for the two strings would look like this:

William Shakespeare I am a weakish speller
Letter Count Letter Count
A 3 A 3
E 3 E 3
H 1 H 1
I 2 I 2
K 1 K 1
L 2 L 2
M 1 M 1
P 1 P 1
R 1 R 1
S 2 S 2
W 1 W 1

Note that remaining rows, those that would contain count zero, are omitted for clarity. In real counting these rows would still exist and would all contain values zero should we use the table as the underlying data structure. Now, the process of comparing two potential anagram strings might look like this:

``````function AreAnagrams(s1, s2)
begin

if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin

for letter in s1
if letter <> ' ' then
table1[TOUPPER(letter)] = table1[TOUPPER(letter)] + 1

for letter in s2
if letter <> ' ' then
table2[TOUPPER(letter)] = table2[TOUPPER(letter)] + 1

for i = 1 to 26
if table1[i] <> table2[i] then
return false

return true

end

end
``````

In this function, we are first testing whether two strings are exactly equal. If strings are equal, then they are not anagrams. Otherwise, we iterate through the two strings to construct two tables with counts. We could even pull the table construction to a helper function:

``````function ConstructHistogram(table, string s)
begin
for letter in s
if letter <> ' ' then
table[TOUPPER(letter)] = table[TOUPPER(letter)] + 1
end

function AreAnagrams(s1, s2)
begin

if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin

ConstructHistogram(table1, s1)
ConstructHistogram(table2, s2)

for i = 1 to 26
if table1[i] <> table2[i] then
return false

return true

end

end
``````

With this helper function in place, we could move the solution one step further. Observe the way in which we are testing two tables for equality. Instead of comparing entries in two tables (table1[i] <> table2[i]), we could subtract their entries and compare it against zero (table1[i] – table2[i] <> 0). This leads to the idea to use one table rather than two: First string would cause values in the table to be incremented by one, while the second string would cause values to be decremented. Once done, we would only need to verify that all values in the table are zero. Here is the pseudocode:

``````function ConstructHistogram(table, s, increment)
begin
for letter in s
if letter <> ' '
table[letter] = table[letter] + increment
end

function AreAnagrams(s1, s2)
begin

if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin

ConstructHistogram(table, s1, 1)
ConstructHistogram(table, s2, -1)

for i = 1 to 26
if table[i] <> 0 then
return false
return true

end

end
``````

Now observe that we are not constrained to 26 English alphabet letters. This makes the table very large, leading to high complexity of the comparing loop at the end of the function. This loop would easily take several orders of magnitude more size than combined lengths of the two strings we are comparing. So the table idea must be dropped in favor of a more space-efficient solution. Note, however, that 26-rows long table remains quite an efficient solution for ASCII anagrams comparison.

Instead of table with predetermined size, we might use a different data structure which would allow us to add and remove letters accompanied with their counts. If one letter appears for the first time, we want to add it to the data structure with count one. Next time the same letter appears, we would increase its count to two. When the second string is processed, counts would be decremented by one each time a letter is encountered. If at any time count for a given letter reaches zero, the whole entry is removed from the data structure. Finally, if two strings are anagrams, data structure containing letter appearances should be empty because two strings should negate each other completely.

Appropriate data structure for this task is hashtable. It has space complexity O(n), i.e. its size grows linearly with number of items added to it. Its time complexity is O(1) for add, query and delete operations. These properties make hashtable perfect candidate for the anagrams problem. Here is the pseudocode which relies on hashtables:

``````function ConstructHistogram(hashtable, s, increment)
begin
for letter in s
begin
if letter <> ' ' then
begin

item = TOUPPER(letter)

if not hashtable.Contains(item) then
else if hashtable.GetValue(item) = -increment then
hashtable.Delete(item)
else
hashtable.SetValue(item, hashtable.GetValue(item) + increment)

end
end
end

function AreAnagrams(s1, s2)
begin

if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin
ConstructHistogram(hashtable, s1, 1)
ConstructHistogram(hashtable, s2, -1)
return hashtable.IsEmpty
end

end
``````

Space complexity of this solution depends only on length of the longer of the two strings. If lengths are m and n, then space complexity is O(max(m, n)). Similarly, time complexity of the ConstructHistogram function is proportional to length of the string submitted, because all hashtable operations performed in the function require O(1) time. This means that overall time complexity of the function is O(m + n), because ConstructHistogram function is invoked independently for the two strings.

## Implementation

Below is the implementation of a console application in C# which lets user enter two strings and then tests whether they are anagrams.

``````using System;
using System.Collections;

namespace Anagrams
{

public class Program
{

static void ConstructHistogram(Hashtable table, string s, int increment)
{
foreach (char c in s)
if (c != ' ')
{

char letter = new string(new char[] { c }).ToUpper(); // Convert character to upper case

if (!table.Contains(letter))
else if ((int)table[letter] == -increment)
table.Remove(letter);
else
table[letter] = (int)table[letter] + increment;

}
}

static bool AreAnagrams(string s1, string s2)
{
if (string.Compare(s1, s2, true) == 0)   // Ignore case
{
return false;   // Equal strings are not anagrams
}
else
{

Hashtable table = new Hashtable();

ConstructHistogram(table, s1, 1);
ConstructHistogram(table, s2, -1);

return table.Count == 0;

}
}

static void Main(string[] args)
{

while (true)
{

Console.Write("Enter first string (empty to exit): ");

if (s1 == string.Empty)
break;

Console.Write("Enter second string: ");

if (AreAnagrams(s1, s2))
Console.WriteLine("These strings are anagrams.");
else
Console.WriteLine("These strings are not anagrams.");

Console.WriteLine();

}
}

}
}
``````

## Demonstration

When this application is run, it produces the following output:

``````Enter first string (empty to exit): silence
These strings are anagrams.

Enter first string (empty to exit): licenses
Enter second string: silence
These strings are not anagrams.

Enter first string (empty to exit): William Shakespeare
Enter second string: I am a weakish speller
These strings are anagrams.

Enter first string (empty to exit):
`````` 