What Makes List<T> So Efficient in .NET?

by Zoran Horvat

In this article, we will talk about the List<T> generic type – about what makes it so efficient, what makes it the most widely used collection in .NET. You will learn the working principle of List<T> and compare it to the working principle of the array, which is an exceptionally fast collection, but not so flexible like the List<T> is.

This article is related to video course which published at Pluralsight, titled Collections and Generics in C#. In that video course, you can learn the use of collections in realistic business applications, as well as watch an in-depth coverage of generics, especially on using and constructing custom covariant and contravariant types. The course ends in designing and implementing your custom collections when the existing collections fail.

So, in this article, we will cover the topic of adding objects to the list. We will analyze the performance of the List<T> class and compare it to the array, making conclusions in the end.

Basic Operations on the List

Let us begin with code, then. What if we had some data - pretend that we don't know how many items there are. There is a sequence of items coming in, and we need to put the items into some collection for further analysis.

IEnumerable<int> data = ...

List<int> list = new();
foreach (int x in data) list.Add(x);
list.ForEach(x => Console.Write($"{x,3}"));
Console.WriteLine();

We are instantiating a list, and then traversing the data, adding every item to the list, before iterating through the list in a subsequent operation.

The question is how does this work and why does it work? We also know from experience that the list is doing this very efficiently, so the question is what magic lies behind the list that makes adding unknown number of items into it so efficient?

Of course, don't forget iterating through those data. Both operations are very efficient! In the following experiment, we will implement our own working prototype of a self-expanding collection, and then analyze its efficiency.

Identifying the Shortcoming of a Common Array

To shed some light on the working principle behind the List<T> class, we might try to mimic it using just an array. Pretend that we have no list available, and we need to populate a collection with those same data.

int[] array = new int[16];
int count = 0;
foreach (int x in data)
{
  array[count++] = x;
}

We are initializing the array to some number of items. This random guess will turn out wrong, but we must start somewhere. We are also tracking how much of the array has been populated so far. Initially, the number of used elements in the array – the count variable – is zero. Then we iterate through the data, move the counter, increment it in every iteration, and set the element of the array to the incoming item.

The problem is that this implementation will fail, because it will step beyond the end of the array and cause IndexOutOfRangeException.

This is precisely the problem the List<T> is solving for us – automatically extending its underlying data structure to accept more items when its capacity is full. That is the case we must cover, and we will improve the code above in that direction.

Implementing the Auto-expanding Array

Here is the case we must cover: When count is equal to the length of the array, then trying to add another item would cause an exception. One way to address this issue is just to double the array capacity.

We instantiate a new array with twice as many elements in it, then copy the existing content into that new array, leaving half of the new array unused, and then substitute the new array.

int[] array = new int[16];
int count = 0;
foreach (int x in data)
{
  if (count == array.Length)
  {
    int[] newArray = new int[array.Length * 2];
    Array.Copy(array, newArray, array.Length);
    array = newArray;
  }
  array[count++] = x;
}

There will be no IndexOutOfRangeException this time. When this code is run, the array with adaptive length will accept all the input elements. But now we must ask: Is this method of reallocating and copying content efficient? Wouldn't it be copying the same item many times over?

Think of the initial few items that were put into the first copy of the array. If we reallocated the array one time, two times, three times, and so on, wouldn't those initial elements be copied that many times? Yes, they will. Isn't that too much of copying? Well, before we jump to a conclusion, that is the question that requires calculation to come to a valid answer.

Analyzing Performance of the Self-expanding Array

If I told you that the code segment above is outlining the working principle of the List<T> class – first to store items inside a common array, and then to double its size every time when its capacity is full, then that already indicates that the cost of repeated copying is not prohibitive. That brings the next question up: How many memory writes must we make to populate a list, including all those reallocations and copying of its content?

The calculation that follows is the common step when designing a data structure. We calculate the cost of using the data structure and entail the proof of its performance.

Since we are doubling the size of the array at each step, we can assume that there are 2^n items after n expansions. We need 2^n writes just to put every item in, so we cannot escape having that many memory writes. But we had also reallocated the array and moved the existing content with every expansion that preceded.

What is the accumulated cost of all reallocations that brought us to the array of size 2^n? We had to copy one 1 to make the array of size 2, then copy 2 items to make the array of size 4 then copy 4 items, and so on.

In the end we copied 2^(n-1), after which the list has assumed its final capacity of 2^n. How many memory writes is that, then?

There is a nice little trick to derive the closed form of this sum.

image1.png

The conclusion: Every location will be written two times on average.

The resizable list makes twice as many writes as the array of the same final size. That is the cost of using the List<T> versus the cost of using the plain array. On the other hand, you will get the benefit which the array cannot offer, and that is automatic resizing of the underlying collection so that you can put as many items as you like, at least until the amount of available memory is exceeded.

So, if anyone asks you what makes the list so efficient and so useful, this is the answer. it is reallocating the underlying data structure at the expense of writing every item in memory two times instead of only once.

Summary

In this article, we have effectively reimplemented the underlying data structure of a common generic list in .NET. We have entailed its performance, concluding that, on average, every item in the list will be written two times. At times, an entire list’s content would be copied to the new array. That may cause intermittent delays when adding many items to the list using its Add() method. But on the average, list will perform exceptionally well, justifying its place as the most frequently used data collection in .NET.


If you wish to learn more, please watch my latest video courses

About

Zoran Horvat

Zoran Horvat is the Principal Consultant at Coding Helmet, speaker and author of 100+ articles, and independent trainer on .NET technology stack. He can often be found speaking at conferences and user groups, promoting object-oriented and functional development style and clean coding practices and techniques that improve longevity of complex business applications.

  1. Pluralsight
  2. Udemy
  3. Twitter
  4. YouTube
  5. LinkedIn
  6. GitHub