by Zoran Horvat
Two strings are given, each representing one or more words. There are no punctuation characters in these strings. Words are separated by single whitespace character. The task is to write a function which tests whether these strings are anagrams. Two strings are anagrams if they are not equal, but use exactly the same rearranged set of letters. Letter casing and blank space characters should be ignored. Function should return value true if two strings are anagrams and value false otherwise.
Examples: Words "license" and "silence" are anagrams and function should return true. "William Shakespeare" is anagram with "I am a weakish speller".
According to problem statement, two strings are anagrams if they use the same set (and count) of letters, only in different order. First conclusion that we can make is that two strings are not anagrams if they are equal (case ignored). So the core of the solution might look like this:
function AreAnagrams(s1, s2)
begin
if TOUPPER(s1) = TOUPPER(s2) then
return false
else
...
end
The helper function TOUPPER converts all letters in the string to upper case.
Further on, we need to know that each letter used in one string appears exactly the same number of times in the other string. We can count occurrences of a letter by constructing a table with as many entries as there are different letters. If these letters were ASCII characters only (which is not the case in this problem), that would make 26 distinct letters. For the William Shakespeare (I am a weakish speller) anagram, histogram tables for the two strings would look like this:
William Shakespeare | I am a weakish speller | ||
---|---|---|---|
Letter | Count | Letter | Count |
A | 3 | A | 3 |
E | 3 | E | 3 |
H | 1 | H | 1 |
I | 2 | I | 2 |
K | 1 | K | 1 |
L | 2 | L | 2 |
M | 1 | M | 1 |
P | 1 | P | 1 |
R | 1 | R | 1 |
S | 2 | S | 2 |
W | 1 | W | 1 |
Note that remaining rows, those that would contain count zero, are omitted for clarity. In real counting these rows would still exist and would all contain values zero should we use the table as the underlying data structure. Now, the process of comparing two potential anagram strings might look like this:
function AreAnagrams(s1, s2)
begin
if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin
for letter in s1
if letter <> ' ' then
table1[TOUPPER(letter)] = table1[TOUPPER(letter)] + 1
for letter in s2
if letter <> ' ' then
table2[TOUPPER(letter)] = table2[TOUPPER(letter)] + 1
for i = 1 to 26
if table1[i] <> table2[i] then
return false
return true
end
end
In this function, we are first testing whether two strings are exactly equal. If strings are equal, then they are not anagrams. Otherwise, we iterate through the two strings to construct two tables with counts. We could even pull the table construction to a helper function:
function ConstructHistogram(table, string s)
begin
for letter in s
if letter <> ' ' then
table[TOUPPER(letter)] = table[TOUPPER(letter)] + 1
end
function AreAnagrams(s1, s2)
begin
if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin
ConstructHistogram(table1, s1)
ConstructHistogram(table2, s2)
for i = 1 to 26
if table1[i] <> table2[i] then
return false
return true
end
end
With this helper function in place, we could move the solution one step further. Observe the way in which we are testing two tables for equality. Instead of comparing entries in two tables (table1[i] <> table2[i]), we could subtract their entries and compare it against zero (table1[i] – table2[i] <> 0). This leads to the idea to use one table rather than two: First string would cause values in the table to be incremented by one, while the second string would cause values to be decremented. Once done, we would only need to verify that all values in the table are zero. Here is the pseudocode:
function ConstructHistogram(table, s, increment)
begin
for letter in s
if letter <> ' '
table[letter] = table[letter] + increment
end
function AreAnagrams(s1, s2)
begin
if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin
ConstructHistogram(table, s1, 1)
ConstructHistogram(table, s2, -1)
for i = 1 to 26
if table[i] <> 0 then
return false
return true
end
end
Now observe that we are not constrained to 26 English alphabet letters. This makes the table very large, leading to high complexity of the comparing loop at the end of the function. This loop would easily take several orders of magnitude more size than combined lengths of the two strings we are comparing. So the table idea must be dropped in favor of a more space-efficient solution. Note, however, that 26-rows long table remains quite an efficient solution for ASCII anagrams comparison.
Instead of table with predetermined size, we might use a different data structure which would allow us to add and remove letters accompanied with their counts. If one letter appears for the first time, we want to add it to the data structure with count one. Next time the same letter appears, we would increase its count to two. When the second string is processed, counts would be decremented by one each time a letter is encountered. If at any time count for a given letter reaches zero, the whole entry is removed from the data structure. Finally, if two strings are anagrams, data structure containing letter appearances should be empty because two strings should negate each other completely.
Appropriate data structure for this task is hashtable. It has space complexity O(n), i.e. its size grows linearly with number of items added to it. Its time complexity is O(1) for add, query and delete operations. These properties make hashtable perfect candidate for the anagrams problem. Here is the pseudocode which relies on hashtables:
function ConstructHistogram(hashtable, s, increment)
begin
for letter in s
begin
if letter <> ' ' then
begin
item = TOUPPER(letter)
if not hashtable.Contains(item) then
hashtable.Add(item, increment)
else if hashtable.GetValue(item) = -increment then
hashtable.Delete(item)
else
hashtable.SetValue(item, hashtable.GetValue(item) + increment)
end
end
end
function AreAnagrams(s1, s2)
begin
if TOUPPER(s1) = TOUPPER(s2) then
begin
return false
end
else
begin
ConstructHistogram(hashtable, s1, 1)
ConstructHistogram(hashtable, s2, -1)
return hashtable.IsEmpty
end
end
Space complexity of this solution depends only on length of the longer of the two strings. If lengths are m and n, then space complexity is O(max(m, n)). Similarly, time complexity of the ConstructHistogram function is proportional to length of the string submitted, because all hashtable operations performed in the function require O(1) time. This means that overall time complexity of the function is O(m + n), because ConstructHistogram function is invoked independently for the two strings.
Below is the implementation of a console application in C# which lets user enter two strings and then tests whether they are anagrams.
using System;
using System.Collections;
namespace Anagrams
{
public class Program
{
static void ConstructHistogram(Hashtable table, string s, int increment)
{
foreach (char c in s)
if (c != ' ')
{
char letter = new string(new char[] { c }).ToUpper()[0]; // Convert character to upper case
if (!table.Contains(letter))
table.Add(letter, increment);
else if ((int)table[letter] == -increment)
table.Remove(letter);
else
table[letter] = (int)table[letter] + increment;
}
}
static bool AreAnagrams(string s1, string s2)
{
if (string.Compare(s1, s2, true) == 0) // Ignore case
{
return false; // Equal strings are not anagrams
}
else
{
Hashtable table = new Hashtable();
ConstructHistogram(table, s1, 1);
ConstructHistogram(table, s2, -1);
return table.Count == 0;
}
}
static void Main(string[] args)
{
while (true)
{
Console.Write("Enter first string (empty to exit): ");
string s1 = Console.ReadLine();
if (s1 == string.Empty)
break;
Console.Write("Enter second string: ");
string s2 = Console.ReadLine();
if (AreAnagrams(s1, s2))
Console.WriteLine("These strings are anagrams.");
else
Console.WriteLine("These strings are not anagrams.");
Console.WriteLine();
}
}
}
}
When this application is run, it produces the following output:
Enter first string (empty to exit): silence
Enter second string: license
These strings are anagrams.
Enter first string (empty to exit): licenses
Enter second string: silence
These strings are not anagrams.
Enter first string (empty to exit): William Shakespeare
Enter second string: I am a weakish speller
These strings are anagrams.
Enter first string (empty to exit):
If you wish to learn more, please watch my latest video courses
In this course, you will learn the basic principles of object-oriented programming, and then learn how to apply those principles to construct an operational and correct code using the C# programming language and .NET.
As the course progresses, you will learn such programming concepts as objects, method resolution, polymorphism, object composition, class inheritance, object substitution, etc., but also the basic principles of object-oriented design and even project management, such as abstraction, dependency injection, open-closed principle, tell don't ask principle, the principles of agile software development and many more.
More...
In this course, you will learn how design patterns can be applied to make code better: flexible, short, readable.
You will learn how to decide when and which pattern to apply by formally analyzing the need to flex around specific axis.
More...
This course begins with examination of a realistic application, which is poorly factored and doesn't incorporate design patterns. It is nearly impossible to maintain and develop this application further, due to its poor structure and design.
As demonstration after demonstration will unfold, we will refactor this entire application, fitting many design patterns into place almost without effort. By the end of the course, you will know how code refactoring and design patterns can operate together, and help each other create great design.
More...
In four and a half hours of this course, you will learn how to control design of classes, design of complex algorithms, and how to recognize and implement data structures.
After completing this course, you will know how to develop a large and complex domain model, which you will be able to maintain and extend further. And, not to forget, the model you develop in this way will be correct and free of bugs.
More...
Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented and functional development style and clean coding practices and techniques that improve longevity of complex business applications.