by Zoran Horvat
In many cases we need to implement a feature to block unwanted messages from passing down the system. We can rely on elaborate solutions, but very often that is overkill. Most of the unwanted messages can be removed by a simple blacklist filter – a list of forbidden words.
If we are not willing to invest into a large solution, we can implement a simple LINQ expression which detects words from the blacklist in the block of text:
bool IsSpam(string text, IEnumerable<string> wordBlacklist)
{
string pattern = @"\b[\p{L}]+\b";
return
Regex.Matches(text, pattern)
.Cast<Match>() // Extract matches
.Select(match => match.Value.ToLower()) // Convert to lower case
.Where(word => wordBlacklist.Contains(word)) // Find in blacklist
.Any(); // Stop when first match found
}
This implementation is based on regular expression which detects words in the plain text. This expression can be changed to fit different needs. Please refer to Regex and LINQ Query to Split Text into Distinct Words for more options.
Let’s try this implementation on a text segment taken from the Ernest Hemingway’s "The Old Man and the Sea". In this demonstration, we are assuming that messages containing words "purse", "masculine" or "buy" are spam and should be eliminated.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace MinWeightPath
{
class Program
{
static bool IsSpam(string text, IEnumerable<string> wordBlacklist)
{
string pattern = @"\b[\p{L}]+\b";
return
Regex.Matches(text, pattern)
.Cast<Match>() // Extract matches
.Select(match => match.Value.ToLower()) // Convert to lower case
.Where(word => wordBlacklist.Contains(word)) // Find in blacklist
.Any(); // Stop when first match found
}
static void Main(string[] args)
{
string text =
"He always thought of the sea as 'la mar'\n" +
"which is what people call her in Spanish\n" +
"when they love her. Sometimes those who\n" +
"love her say bad things of her but they\n" +
"are always said as though she were a woman.\n" +
"Some of the younger fishermen, those who\n" +
"used buoys as floats for their lines and\n" +
"had motorboats, bought when the shark\n" +
"livers had brought much money, spoke of\n" +
"her as 'el mar' which is masculine.\n" +
"They spoke of her as a contestant or a\n" +
"place or even an enemy. But the old man\n" +
"always thought of her as feminine and\n" +
"as something that gave or withheld\n" +
"great favours, and if she did wild or\n" +
"wicked things it was because she\n" +
"could not help them. The moon affects\n" +
"her as it does a woman, he thought.";
string[] blacklist = { purse, masculine, buy };
if (IsSpam(text, blacklist))
Console.WriteLine("Ernest Hemingway is marked as spammer.");
Console.ReadLine();
}
}
}
When this code is run, it produces the following output:
Ernest Hemingway is marked as spammer.
If you wish to learn more, please watch my latest video courses
In this course, you will learn the basic principles of object-oriented programming, and then learn how to apply those principles to construct an operational and correct code using the C# programming language and .NET.
As the course progresses, you will learn such programming concepts as objects, method resolution, polymorphism, object composition, class inheritance, object substitution, etc., but also the basic principles of object-oriented design and even project management, such as abstraction, dependency injection, open-closed principle, tell don't ask principle, the principles of agile software development and many more.
More...
In this course, you will learn how design patterns can be applied to make code better: flexible, short, readable.
You will learn how to decide when and which pattern to apply by formally analyzing the need to flex around specific axis.
More...
This course begins with examination of a realistic application, which is poorly factored and doesn't incorporate design patterns. It is nearly impossible to maintain and develop this application further, due to its poor structure and design.
As demonstration after demonstration will unfold, we will refactor this entire application, fitting many design patterns into place almost without effort. By the end of the course, you will know how code refactoring and design patterns can operate together, and help each other create great design.
More...
In four and a half hours of this course, you will learn how to control design of classes, design of complex algorithms, and how to recognize and implement data structures.
After completing this course, you will know how to develop a large and complex domain model, which you will be able to maintain and extend further. And, not to forget, the model you develop in this way will be correct and free of bugs.
More...
Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented and functional development style and clean coding practices and techniques that improve longevity of complex business applications.