How to Write a Test to Prove that Collection was Loaded Eagerly?

by Zoran Horvat

Suppose that we are writing some service class, which exposes a method returning a sequence or a collection of objects. Take this sales service class as an example:

public class SalesService
{
    public IEnumerable<Article> GetArticles(string nameContains)
    {
        ...
    }
}

The GetArticles method is returning a sequence of Article objects which have certain substring in the article name. We have already written all the code and the time has come to consume the sales service class at some place, for example like this:

foreach (Article article in sales.GetArticles("e"))
{
    // Do something with article...
}

Now, as we all know pretty well, underlying infrastructure may support lazy loading. If your infrastructure layer is implemented using Entity Framework or NHibernate, then you may bet that at some point in execution of the GetArticles method lazy loading will jump in.

We may try to write an automated test which proves that the sequence of articles returned from the GetArticles service method has been fully populated, because we fear that the caller might cause multiple queries to be issued on the underlying database for the same set of data.

The question is: How do we write an automated test which proves that collection returned from a method is eagerly loaded?

And to answer the question, we will have to get better understanding of the processes we want to constrain with tests.

Understanding Lazy Loading

Lazy loading means that the result of the call is not fully materialized when the call is made. The result is rather returned as an empty shell. Its content, actual objects, will only be materialized when the caller attempts to access them.

Lazy loading makes sense when working with collections of data. Having an entire collection materialized in memory might be inefficient, especially if it eventually turns out that the caller didn’t even access objects in the collection. That happens surprisingly often, resulting in needless use of CPU cycles, needless pressure on the underlying storage, needless memory consumption, and so on.

We can improve overall performance by not loading the data until the consumer really attempts to iterate through the collection that contains them. Only when the collection is touched, through the call to the GetEnumerator method, which clearly indicates the decision to iterate through objects, the data are effectively loaded from the underlying storage.

Anyway, lazy loading comes with the twist. There are different implementations of it. In some of them, once the collection has been materialized, it is never materialized again. In other implementations, collection is not cached and it is materialized over and over again whenever the client accesses it. That is exactly how Entity Framework works. It will repeat the query on the underlying data store once again if we repeat the iteration. Every iteration through the results of a query will issue a brand new query on the underlying database.

Changing the Question

And so we come to the question – what is the right question to ask? It doesn’t seem reasonable to test whether the collection has been eagerly loaded. Nobody really cares about that. We care about something else, and that is the performance of the underlying data store.

Better question would be: How do we write a test which proves that call to the service method has resulted in just a single query executed on the underlying storage?

Now we are getting somewhere, because we are focusing on a precise criterion that could be covered. Anyway, counting database queries now looks too rigid. There should be the criterion that lies somewhere in between the two extremes we have outlined here.

And this is the final question we want to answer: How to write a test which proves that sequence of objects has been materialized before it was returned from a service method.

This concrete question is general enough so that it allows infinitely many implementations of the query, while at the same time limiting the service method to have to produce a final result to the caller. That is what we normally expect from automated tests. They must allow implementation to vary, while at the same time restricting the observable effect.

Observing the System under Test (SUT)

When we write automated tests, we have to know the element against which the test is constructed. So here is the primitive implementation of the classes we are interested in. Service class might use some repository class to get hold of the data:

public class SalesService
{
    public IEnumerable<Article> GetArticles(string nameContains)
    {
        Repository repo = new Repository();
        return
            repo.GetAll()
            .Where(article => article.Name.Contains(nameContains));
    }
}

Repository implementation is not relevant here. That class is opaque as far as we are concerned and its implementation may freely change. One part of the art of testing is to know what to leave undefined in tests. In this example, our test should leave underlying implementation of the Repository class unknown and it should focus on effects of calling the repository’s GetAll method. That is where all the danger for our system is, because GetAll might, and probably will, return lazy collection of articles.

The Where method, which is applied to the result of the GetAll method, comes from the LINQ library and it can operate on lazy collection and return another lazy collection. So right now, we have a tangible problem – GetArticles method of the SalesService class may or may not return a populated collection.

In order to test the SalesService class in the first place, we would have to make sure that it is testable. That means that the class must not be this tightly bound to the repository implementation. We could come up with a repository interface and a factory method, which would be fed through the constructor of the class under test:

public interface IRepository
{
    IEnumerable<Article> GetAll();
}

public class SalesService
{
    private Func<IRepository> RepoFactory { get; }

    public SalesService(Func<IRepository> repoFactory)
    {
        this.RepoFactory = repoFactory;
    }

    public IEnumerable<Article> GetArticles(string nameContains)
    {
        IRepository repo = this.RepoFactory();
        return
            repo.GetAll()
            .Where(article => article.Name.Contains(nameContains));
    }
}

In this way, we have prepared the SalesService class for testing. That is good thing to know. Now we could start with a simple functionality test. We might come up with a mock implementation of the repository, which would be useful in testing. Such mock would implement IRepository interface and it would make it possible to write functionality test for the SalesService class:

[TestClass]
public class SalesServiceTests
{
    class RepositoryMock : IRepository
    {
        public IEnumerable<Article> GetAll() =>
            new Article[]
            {
            new Article() { Name = "one" },
            new Article() { Name = "two" },
            new Article() { Name = "three" }
            };
    }

    [TestMethod]
    public void GetArticles_ReceivesLetterE_ReturnsArticlesWithEInName()
    {
        SalesService sales = new SalesService(() => new RepositoryMock());
        IEnumerable<Article> articles = sales.GetArticles("e");
        Assert.AreEqual(2, articles.Count());
        Assert.IsTrue(articles.Any(art => art.Name == "one"));
        Assert.IsTrue(articles.Any(art => art.Name == "three"));
    }
}

This simple test is proving that the GetArticles method will return exactly two articles when letter “e” has been passed as the filter. It also asserts that articles with name “one” and “three” are there in the result.

But this test doesn’t prove that collection returned will not be materialized many times. If GetArticles method returns lazy loaded collection, then, depending on concrete implementation of laziness, these three assertions from the test might cause entire collection to be loaded three times. That is what we want to prove false. And the question is – How?

Needless to say, this test is passing when run.

Passing functionality test

The Problem is In the Data Source

I will tell you the secret right away. The answer to the question – how do we prove that collection has been materialized before returned from the method? – is: We don’t.

All this time, I have been viewing the system in the wrong way. And the time has come to correct that mistake. The mistake is that I was constantly trying to limit implementation of the SalesService class from its consumer’s point of view. But consumer doesn’t need to know how many times the collection has been materialized.

I hope this statement is clear. Consumer won’t fail if data have been fetched multiple times from the underlying storage. Storage will fail, due to performance issues, if that happens. Worse yet, all performance tests might pass, because it doesn’t really take much time to execute a query three times, compared to executing it once. That is the difficulty we are fighting here.

Therefore, we don’t have to test anything from the consumer’s point of view. All testing must end between the service class and the repository. And that conclusion gives the idea how to proceed.

How can we force the service class materialize the data before returning them to the caller? One simple solution is to close the data source for further operations beyond certain point. Data source, which is represented by the object implementing IRepository in this case, would have to give the data out and then shut down. And, guess what, .NET Framework offers precisely that kind of functionality.

You may have guessed it already: It is the IDisposable interface. We can render an object useless by disposing it. In terms of the repository pattern, the repository would have to be disposable, and service class would have to dispose it before returning:

public interface IRepository: IDisposable
{
    IEnumerable<Article> GetAll();
}

public class SalesService
{
    private Func<IRepository> RepoFactory { get; }

    public SalesService(Func<IRepository> repoFactory)
    {
        this.RepoFactory = repoFactory;
    }

    public IEnumerable<Article> GetArticles(string nameContains)
    {
        using (IRepository repo = this.RepoFactory())
        {
            return
                repo.GetAll()
                    .Where(article => article.Name.Contains(nameContains));
        }
    }
}

The change here is that IRepository interface now extends IDisposable interface. This has immediate consequence in the GetArticles method implementation, where construction and use of the concrete repository object is wrapped in the using block.

Now, how does this affect tests? Let’s go to the testing class and make the repository mock truly lazy:

[TestClass]
public class SalesServiceTests
{
    class LazyRepositoryMock : IRepository
    {
        private Action AssertNotDisposed { get; set; } = () => { };

        public IEnumerable<Article> GetAll()
        {
            this.AssertNotDisposed();
            yield return new Article() {Name = "one"};
            yield return new Article() {Name = "two"};
            yield return new Article() {Name = "three"};
        }

        public void Dispose()
        {
            this.AssertNotDisposed =
                () => { throw new ObjectDisposedException("Repository"); };
        }
    }

    [TestMethod]
    public void GetArticles_ReceivesLetterE_ReturnsArticlesWithEInName()
    {
        SalesService sales = new SalesService(() => new LazyRepositoryMock());
        IEnumerable<Article> articles = sales.GetArticles("e");
        Assert.AreEqual(2, articles.Count());
        Assert.IsTrue(articles.Any(art => art.Name == "one"));
        Assert.IsTrue(articles.Any(art => art.Name == "three"));
    }
}

This time, GetAll method of the repository mock is really lazy. Instead of returning a fixed array with objects, it is now yield-returning them one at the time. Actual sequence of objects will be evaluated again and again whenever we invoke the GetAll method.

That is precisely what we wanted to rule out with a test, right? On a related note, observe that I have changed the name of the repository implementation. Now it reads LazyRepositoryMock, to communicate its purpose more clearly. Anyway, the functional test has remained exactly the same. I still want to see two articles, named “one” and “three”, returned back from the SalesService class.

But if I run the test again, this time it will fail, as shown in the following picture.

Failing functionality test

Where is the problem? The problem is that the articles collection was not materialized before the first Assert instruction has been invoked. Call to GetArticles method didn’t really pull the data from the mocked repository implementation. But it did dispose the repository. When the time came to really pull the data, and that was in the middle of the Assert call, mocked repository threw ObjectDisposedException, because at that time it has already been disposed.

In other words, there is no reason to write a test which proves that result was materialized before it was returned from a service method. It only takes a disposable underlying data source and data source itself will make the test fail if data were not materialized before returning to the caller.

That is why I said that you don’t write such test.

And now the final question – is there any other test we should write instead? And the answer is: Yes. We should write a unit test which proves that the service method is disposing the underlying data source. With that feature in place, we are absolutely sure that all other tests, those that deal with data, will actually fail if the data were not properly materialized before the consuming code got hold of them.

Below is the unit test which proves that Dispose has been called on the repository when GetArticles method executes. I have used Moq framework to implement this simple unit test. You can use whatever mocking method you prefer, including manual mocks, in your code. That will not affect the solution much, especially not on such a small scale like this single test:

[TestClass]
public class SalesServiceTests
{
    ...
    [TestMethod]
    public void GetArticles_DisposesRepository()
    {
        Mock<IRepository> repoMock = new Mock<IRepository>();
        repoMock.Setup(repo => repo.Dispose()).Verifiable();

        SalesService svc = new SalesService(() => repoMock.Object);
        svc.GetArticles("e");

        repoMock.Verify(repo => repo.Dispose());

    }
}

If I run the tests again, this new test will pass, because SalesService is really disposing the repository, just as expected. On the other hand, data test will keep failing, because I haven’t forced collection to be materialized before returning it from the GetArticles method. In order to make both tests pass, I will have to do precisely that – materialize the collection before returning it:

public class SalesService
{
    ...
    public IEnumerable<Article> GetArticles(string nameContains)
    {
        using (IRepository repo = this.RepoFactory())
        {
            return
                repo.GetAll()
                    .Where(article => article.Name.Contains(nameContains))
                    .ToList();
        }
    }
}

The only difference compared to previous implementation is the call to the ToList method at the end. This call will make sure that data are effectively pulled from the repository and stored in an in-memory list. All iterations through the data will run through this single list, and underlying repository will never be invoked again on these data.

This time, if I run the tests, they will both pass, as the following picture shows.

Passing functionality and disposing tests

Summary

In this article we have tackled one common requirement in business applications. We normally want to know that infrastructure is invoked exactly once to fetch the data, no matter what we plan to do with the data.

When it comes to write automated tests, it may be difficult to understand what exactly are we testing in situation like that?

We have shown that there is nothing special to test about eager materialization of objects. Instead, the problem is solved at the language level. We are ensuring that sensitive infrastructure objects are disposable. And then we ensure that they are indeed disposed just before returning the desired data.

By following the disposable pattern, we are sure that no consumer will ever be in position to trigger data materialization on the infrastructure, because infrastructure objects are already disposed when consumers get hold of the data.

In order to test such implementation, we only need two kinds of tests. One tests are those that prove that data are correct. The other tests are those that prove that the Dispose method has been invoked on infrastructure objects as expected.

If implementation fails to follow the disposable pattern, then one or both kinds of tests will fail. In that simple and clean way, passing tests can be used as the proof that underlying infrastructure is used efficiently.


If you wish to learn more, please watch my latest video courses

About

Zoran Horvat

Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented and functional development style and clean coding practices and techniques that improve longevity of complex business applications.

  1. Pluralsight
  2. Udemy
  3. Twitter
  4. YouTube
  5. LinkedIn
  6. GitHub