How to Validate String Properties in Business Objects

by Zoran Horvat

In a typical multi-tier application we have a separation between model classes and data access layer (DAL). Business layer operates in terms of model objects and then passes those objects to DAL to be stored in the database. The problems begin when model objects are not quite valid.

In this article we are going to demonstrate problems that can hit the application when care is not taken about model validation. Application will run on the SQL Server database through Entity Framework data model. We will identify and rectify problems in business logic layer, so that invalid objects cannot arrive to DAL in the first place. That is the most efficient, reliable and flexible solution. It is efficient because invalid data do not arrive to the database just to be discarded by mechanisms built inside the database. It is reliable because application operation does not depend on presence of protective measures in the database - protection is built into the business layer itself. Finally, this solution is flexible, because application can observe model inconsistencies directly, rather than coping with often cryptic messages wrapped into database exceptions.

In the following sections we are going to design a very small application that demonstrates basic validation techniques that can be applied to string properties in business objects.

Example Model

Observe the following database create script:

CREATE DATABASE ValidationTest
GO

USE ValidationTest
GO

CREATE TABLE SomeData
(
    SomeDataID INT NOT NULL IDENTITY PRIMARY KEY,
    InternationalValue NVARCHAR(10) NULL,
    LocalValue VARCHAR(10) NOT NULL
)
GO

These statements create a very simple database with only one table, named SomeData. In particular, we are interested in storing two strings - one encoded in Unicode, another one encoded in ASCII. It is important to notice the encoding difference early on, because that will shape some of our decisions when we come to model implementation.

At this point, we can create entity model from the database:

Entity Model

Next step is to write the business layer. SomeData entities will be represented by the corresponding model class:

namespace ValidationDemo
{
    public class SomeDataModel
    {
        public string InternationalValue { get; set; }
        public string LocalValue { get; set; }
    }
}

Now we are ready to write some demonstration code for our application. Below is the very simple console application which iteratively prompts the user to enter content of the model object (Main function plays role of the business logic layer) and then passes the model to the SaveObject method, which conveniently plays role of the DAL.

using System;
using System.Collections.Generic;
using System.Linq;

namespace ValidationDemo
{
    class Program
    {
        static void SaveObject(SomeDataModel model)
        {

            try
            {


                ValidationTestEntities ent = new ValidationTestEntities();

                SomeData sd = ent.SomeData.CreateObject();

                sd.InternationalValue = model.InternationalValue;
                sd.LocalValue = model.LocalValue;

                ent.SomeData.AddObject(sd);

                ent.SaveChanges();
            }
            catch (System.Exception ex)
            {
                Console.WriteLine("Error saving model object:");
                while (ex != null)
                {
                    Console.WriteLine(ex.Message);
                    ex = ex.InnerException;
                }
            }

        }

        static void PrintDatabaseContent()
        {

            ValidationTestEntities ent = new ValidationTestEntities();
            IEnumerable<SomeData> data =
                (from sd in ent.SomeData
                 orderby sd.InternationalValue
                 select sd);

            Console.WriteLine();
            Console.WriteLine("Database content:");
            foreach (SomeData sd in data)
                Console.WriteLine("{0,10} {1,10}", sd.InternationalValue, sd.LocalValue);
            Console.WriteLine(new string('-', 21));
            Console.WriteLine();

        }

        static void Main(string[] args)
        {

            Console.InputEncoding = Encoding.Unicode;
            Console.OutputEncoding = Encoding.Unicode;

            while (true)
            {

                SomeDataModel model = new SomeDataModel();

                Console.Write("Enter international value (empty to exit): ");
                model.InternationalValue = Console.ReadLine();

                if (string.IsNullOrEmpty(model.InternationalValue))
                    break;

                Console.Write("                        Enter local value: ");
                model.LocalValue = Console.ReadLine();

                SaveObject(model);
                PrintDatabaseContent();

            }

            Console.Write("Press ENTER to continue... ");
            Console.ReadLine();

        }
    }
}

Although this code might look well built, we can quickly get into troubles:

            
Enter international value (empty to exit): Something
                        Enter local value: again

Database content:
 Something      again
---------------------

Enter international value (empty to exit): Something else
                        Enter local value: And one more
Error saving model object:
An error occurred while updating the entries. See the inner exception for details.
String or binary data would be truncated.
The statement has been terminated.

Database content:
 Something      again
---------------------

Enter international value (empty to exit):
Press ENTER to continue...
                
    

The first model object was successfully stored in the database, as the database content listing clearly states. But the second attempt fails miserably with an exception thrown right from the database. Our model object and business layer as the whole had failed to see that user has entered strings that are longer then the corresponding database fields. No surprise that database has simply thrown back the infamous "String or binary data would be truncated" error.

Limiting String Length

If we take a closer look at how the error was raised in the previous example, it becomes apparent that root cause for the error is the fact that model object merrily accepts strings of any length, no matter the fact that "any length" is clearly not going to get stored in the database as long as there is any limit set on the database field length.

To rectify the issue, we must change the model class. The way in which string data are limited is to first apply StringLengthAttribute from the System.ComponentModel.DataAnnotations namespace:

using System.ComponentModel.DataAnnotations;

namespace ValidationDemo
{
    public class SomeDataModel
    {

        [StringLength(10, ErrorMessage=
                      "InternationalValue cannot have more than 10 characters in length.")]
        public string InternationalValue { get; set; }

        [StringLength(10, ErrorMessage=
                      "LocalValue cannot have more than 10 characters in length.")]
        public string LocalValue { get; set; }

    }
}

But this attribute alone does not make any difference. It is just a declaration of limit, not the limit itself. In order to make any use of this attribute, we must actually validate the model object. This is most conveniently done by using the Validator utility class from the System.ComponentModel.DataAnnotations namespace. Here is the method (which naturally belongs to the business layer), which validates the model object before it is sent to DAL:


static bool ValidateObject(SomeDataModel model)
{

    List<ValidationResult> errors = new List<ValidationResult>();
    ValidationContext context = new ValidationContext(model, null, null);

    if (!Validator.TryValidateObject(model, context, errors, true))
    {
        Console.WriteLine("Cannot save data:");
        foreach (ValidationResult e in errors)
            Console.WriteLine(e.ErrorMessage);
        return false;
    }

    return true;

}

Instead of simply calling the SaveObject method with model object at hand, business layer is now required to validate the model first:

if (ValidateObject(model))
    SaveObject(model);

When modified application is run, the output looks quite different:

            
Enter international value (empty to exit): Something
                        Enter local value: again

Database content:
 Something      again
---------------------

Enter international value (empty to exit): Something quite new
                        Enter local value: and something else
Cannot save data:
InternationalValue cannot have more than 10 characters in length.
LocalValue cannot have more than 10 characters in length.

Database content:
 Something      again
---------------------

Enter international value (empty to exit):
Press ENTER to continue...
                
    

As you can see, there are no exceptions this time. Model validation has been performed by the business layer using the Validator class. This class provides utility methods, such as TryValidateObject, that can traverse the supplied object and verify that all attributes derived from ValidationAttribute (System.ComponentModel.DataAnnotations namespace) that are applied to the object and its properties are indicating that object content is valid. Only when all ValidationAttributes are satisfied can we pass the model object further for processing, saved from worries about model object's destiny.

Supporting ASCII Validation

By setting the StringLengthAttribute on string properties, we have enforced maximum length on values set to those properties, so that validation fails if user has supplied strings that are too long. However, there is one additional problem with this solution. Let's run the program again and see what happens when we dump in a word, say, in Russian:

            
Enter international value (empty to exit): Хорошо
                        Enter local value: Something

Database content:
    Хорошо  Something
---------------------

Enter international value (empty to exit): Something
                        Enter local value: Хорошо

Database content:
 Something     ??????
    Хорошо  Something
---------------------

Enter international value (empty to exit):
Press ENTER to continue...
                
    

The first pass went quite well. This should not surprise us, because InternationalValue property is stored into NVARCHAR field, i.e. field with UTF-16 text encoding. However, the second round went terribly wrong, when the same word was sent into a plain ASCII field (LocalValue). There were no validation errors, no exceptions, but still our data went into the database as question marks - indicators that conversion into target encoding has silently failed.

It is simple fact that there is no ValidationAttribute related to character encoding. But there is a simple trick to work around the problem. Use RegularExpressionAttribute with pattern [\x00-\x7F]*. This pattern covers the exact range of character codes defined by the ASCII code. Here is the final model class which protects us from any attempts to send non-ASCII characters through the LocalValue property:

using System.ComponentModel.DataAnnotations;

namespace ValidationDemo
{
    public class SomeDataModel
    {

        [StringLength(10, ErrorMessage=
            "InternationalValue cannot have more than 10 characters in length.")]
        public string InternationalValue { get; set; }

        [StringLength(10, ErrorMessage=
            "LocalValue cannot have more than 10 characters in length.")]
        [RegularExpression(@"[\x00-\x7F]*", ErrorMessage=
            "LocalValue can only contain ASCII characters.")]
        public string LocalValue { get; set; }

    }
}

And here is the demonstration:

            
Enter international value (empty to exit): Это хорошо
                        Enter local value: Something

Database content:
Это хорошо  Something
---------------------

Enter international value (empty to exit): Something
                        Enter local value: Это хорошо
Cannot save data:
LocalValue can only contain ASCII characters.

Database content:
Это хорошо  Something
---------------------

Enter international value (empty to exit):
Press ENTER to continue...
                
    

Supporting UTF-16 Surrogate Pairs

Many authors complain about lacking support for UTF-16 surrogate pairs in production code. Surrogate pair is a pair of UTF-16 codes that act as a single unit. Such characters exist because single UTF-16 characters, consisting of two bytes each, are not sufficient to represent all of the characters conceivable. Instead, two-byte characters in UTF-16 encoding tend to cover most frequently used characters around the globe, conveniently named Basic Multilingual Plane. Special and rare characters (picked from so-called supplementary planes) that fall out of this basic plane are simply encoded by two consecutive 16-bit units, called surrogate pairs. The problem with support for supplementary planes is that programmers largely forget to test their code against characters such as this one: 𤨇. (Guess what - some Web browsers fail to render previous letter correctly!)

But, lucky enough, .NET Framework saves us from thinking about this problem. Let's try our demonstration code with a surrogate pair:

            
Enter international value (empty to exit): Это хорошо
                        Enter local value: Take this!

Database content:
Это хорошо Take this!
---------------------

Enter international value (empty to exit): Take𤨇this!
                        Enter local value: Take this!
Cannot save data:
InternationalValue cannot have more than 10 characters in length.

Database content:
Это хорошо Take this!
---------------------

Enter international value (empty to exit):
Press ENTER to continue...
                
    

The first attempt went fine. All the Russian letters used in the string are from basic plane and each requires only one UTF-16 character. The second attempt, however, fails because InternationalValue string, although it takes ten printed characters to display, internally requires eleven 16-bit codes to represent all of the ten letters. As a matter of demonstration, try this simple code:

string s = "Take𤨇this!";
Console.WriteLine(s.Length);

When these two lines are executed, value 11 will be produced on the output. This simple experiment explains why validation has failed: StringLengthAttribute relies on String.Length property, which takes surrogate pairs into account when thinking what to return.

Restricting User Input

There are two more tricks that we wish to demonstrate about string validation. First one deals with putting a lower limit on string length. Suppose that we wish to use LocalValue as a unique object identifier (e.g. a username). It would be normal to ask users to enter at least a couple of characters in this field. To enforce minimum length of the string property, we use MinimumLength property of the StringLengthAttribute.

But this doesn't protect us from entries entirely consisting of white space characters. To protect further, we can add RequiredAttribute, which ignores white space and treats strings consisting only of spaces as being empty. Here is the model class decorated with all attributes mentioned in this article:

using System.ComponentModel.DataAnnotations;

namespace ValidationDemo
{
    public class SomeDataModel
    {

        [StringLength(10, ErrorMessage=
                      "InternationalValue cannot have more than 10 characters in length.")]
        public string InternationalValue { get; set; }

        [StringLength(10, MinimumLength=3, ErrorMessage=
                      "LocalValue must be between 3 and 10 in length.")]
        [RegularExpression("[\x00-\x7F]*", ErrorMessage=
                           "LocalValue can only contain ASCII characters.")]
        [Required]
        public string LocalValue { get; set; }

    }
}

Note that this declaration still doesn't protect us from specifying LocalValue that begins or ends with white space. In cases where that is not allowed, we could modify the RegularExpressionAttribute. For example, to let only letters, digits and underscores pass by, we can use this regular expression:

[RegularExpression("[a-zA-Z0-9_]+", ErrorMessage=
                   "LocalValue can only contain letters, digits and underscores.")]

Other Cases

There is one special attribute that can be used to cover corner cases that are not covered by other, specific attributes. It is the CustomValidationAttribute, and it also derives from the ValidationAttribute. This attribute can be used to specify the method returning Boolean value, which will be invoked in order to validate the property or object. Once validation is requested, Validator class invokes the specified method and passes the validation result back. This validation attribute is rarely required in practice, but still presents a safeguard from cases that cannot be handled using other attributes.

Conclusion

In this article we have demonstrated how simple it is to protect the data access layer from receiving invalid model objects and passing them further to the database. Production code can use validation attributes to put a declarative limits to model content. Such model objects can then be validated before they are sent to any part of the system, especially before being sent to DAL to store them in the database.


If you wish to learn more, please watch my latest video courses

About

Zoran Horvat

Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented and functional development style and clean coding practices and techniques that improve longevity of complex business applications.

  1. Pluralsight
  2. Udemy
  3. Twitter
  4. YouTube
  5. LinkedIn
  6. GitHub