.NET Nullable Types and DBNull Expose Design Flaws

November 17, 2005 - 4 minute read -

.NET has had DBNull since the beginning. With the release of .NET 2.0 it has also gained the Nullable type. DBNull is used to represent a null value returned from a database or a dataset. The Nullable type was added to give the ability to have null types for integral types. You might ask, "What's going on here? We can already have a null reference to class instance."

Background

.NET has 2 kinds of types. It has Classes and it has Structs. Classes are known as "reference types" and stored in an area of memory called the heap. Structs are known as value types and stored in an area of memory known as the stack. What's the difference? Well for one, the stack is a faster form of memory allocation because it doesn't require dereferencing a pointer to get to an object type. The second difference is that stack based, value types can not be null.

This value type is a performance optimzation, but I think it was a bad choice.

Rationale

As you can see, we can't have a null value type. So, you can see why the DBNull was added, because .NET needed a way to represent a NULL database value. The bad part is that DBNull.Value != null so you have to check for DBNull in many cases. It is also a problem because setting null into a ADO.NET parameter will cause an error. This is a really terrible semantic for the language. Wouldn't they be able to hide DBNull within the ADO.NET classes and convert null to DBNulls within the code? Yes, IF it was possible to have a null value type.

This also wouldn't be a problem if all of the Value types had a corresponding Reference type. int has Int32, string has String, etc. But what about DateTime? DateTime is a struct (a value type) with no corresponding reference type. So you can't have a null DateTime. Its default value is represented as DateTime.MinValue so you have to compare with that.

What else? Well Value types can not be inherited and can't inherit from other Classes or Structs. They can implement Interfaces, but you no longer get little OO things like inheritance. What where they thinking?

Yet another problem with this optimization is that it is only a half optimization. Methods in .NET are by default pass-by-value which means that a copy of a value is passed to the method. (In the case of an Object/Class instance passed to a method, a copy of the pointer to the instance is passed, so it's a small value that is copied.) With value types, because the pointer represents all of the memory space for the values contained in the Struct, when a Struct/Value Object is passed to a method, the entire memory space is copied to a new location. This increases the amount of memory used. So you gain some speed, and lose some memory. Is that a good tradeoff? Depends a lot on where your constraints are

In addition, because of the differences in that pass-by-value behavior, you get differences in the behavior of calling methods, or setting parameters from within a method.

    class Program {
          class MyClass {
            public string SomeValue;
        }
          struct ValueType {
            public string SomeValue;
        }
          static void Main(string[] args) {
            MyClass mc = new MyClass();
            mc.SomeValue = "MyClass: out of method";
            Console.Out.WriteLine(mc.SomeValue);
            Change(mc);
            Console.Out.WriteLine(mc.SomeValue);
              ValueType vt = new ValueType();
            vt.SomeValue = "ValueType: out of method";
            Console.Out.WriteLine(vt.SomeValue);
            Change(vt);
            Console.Out.WriteLine(vt.SomeValue);
        }
          private static void Change(MyClass cl) {
            cl.SomeValue = "MyClass: in method";
        }
          private static void Change(ValueType vt) {
            vt.SomeValue = "ValueType: in method";
        }
    }

What's the output?: MyClass: out of method MyClass: in method ValueType: out of method ValueType: out of method

Surprised? I was once I realized that was what would happen. These kinds of hidden semantic differences cause a lot of surprise. Surprise like that is not good in programming.

Conclusion

Non-nullable, surprising and different behaviors, inconsistent performance optmizations, extra hoops to jump through to do things like DBNulls. These kinds of things lead to developer frustration. All said, I don't think that the performance gain is enough to warrant this kind of trouble. I would have preferred to see a better design, one in which these kinds of optimizations were implemented in the compiler. Optimize developer productivity, not just runtime performance.