.NET Nullable Types and DBNull Expose Design Flaws

by Geoff Lane on November 17th, 2005

.NET has had DBNull since the beginning. With the release of .NET 2.0 it has also gained the Nullable type. DBNull is used to represent a null value returned from a database or a dataset. The Nullable type was added to give the ability to have null types for integral types. You might ask, “What’s going on here? We can already have a null reference to class instance.”

Background

.NET has 2 kinds of types. It has Classes and it has Structs. Classes are known as “reference types” and stored in an area of memory called the heap. Structs are known as value types and stored in an area of memory known as the stack. What’s the difference? Well for one, the stack is a faster form of memory allocation because it doesn’t require dereferencing a pointer to get to an object type. The second difference is that stack based, value types can not be null.

This value type is a performance optimzation, but I think it was a bad choice.

Rationale

As you can see, we can’t have a null value type. So, you can see why the DBNull was added, because .NET needed a way to represent a NULL database value. The bad part is that DBNull.Value != null so you have to check for DBNull in many cases. It is also a problem because setting null into a ADO.NET parameter will cause an error. This is a really terrible semantic for the language. Wouldn’t they be able to hide DBNull within the ADO.NET classes and convert null to DBNulls within the code? Yes, IF it was possible to have a null value type.

This also wouldn’t be a problem if all of the Value types had a corresponding Reference type. int has Int32, string has String, etc. But what about DateTime? DateTime is a struct (a value type) with no corresponding reference type. So you can’t have a null DateTime. Its default value is represented as DateTime.MinValue so you have to compare with that.

What else? Well Value types can not be inherited and can’t inherit from other Classes or Structs. They can implement Interfaces, but you no longer get little OO things like inheritance. What where they thinking?

Yet another problem with this optimization is that it is only a half optimization. Methods in .NET are by default pass-by-value which means that a copy of a value is passed to the method. (In the case of an Object/Class instance passed to a method, a copy of the pointer to the instance is passed, so it’s a small value that is copied.) With value types, because the pointer represents all of the memory space for the values contained in the Struct, when a Struct/Value Object is passed to a method, the entire memory space is copied to a new location. This increases the amount of memory used. So you gain some speed, and lose some memory. Is that a good tradeoff? Depends a lot on where your constraints are

In addition, because of the differences in that pass-by-value behavior, you get differences in the behavior of calling methods, or setting parameters from within a method.

    class Program {
 
        class MyClass {
            public string SomeValue;
        }
 
        struct ValueType {
            public string SomeValue;
        }
 
        static void Main(string[] args) {
            MyClass mc = new MyClass();
            mc.SomeValue = "MyClass: out of method";
            Console.Out.WriteLine(mc.SomeValue);
            Change(mc);
            Console.Out.WriteLine(mc.SomeValue);
 
            ValueType vt = new ValueType();
            vt.SomeValue = "ValueType: out of method";
            Console.Out.WriteLine(vt.SomeValue);
            Change(vt);
            Console.Out.WriteLine(vt.SomeValue);
        }
 
        private static void Change(MyClass cl) {
            cl.SomeValue = "MyClass: in method";
        }
 
        private static void Change(ValueType vt) {
            vt.SomeValue = "ValueType: in method";
        }
    }

What’s the output?:

MyClass: out of method
MyClass: in method
ValueType: out of method
ValueType: out of method

Surprised? I was once I realized that was what would happen. These kinds of hidden semantic differences cause a lot of surprise. Surprise like that is not good in programming.

Conclusion

Non-nullable, surprising and different behaviors, inconsistent performance optmizations, extra hoops to jump through to do things like DBNulls. These kinds of things lead to developer frustration. All said, I don’t think that the performance gain is enough to warrant this kind of trouble. I would have preferred to see a better design, one in which these kinds of optimizations were implemented in the compiler. Optimize developer productivity, not just runtime performance.

From → .NET, Code

5 Comments
  1. So far everything I have done with .NET 2.0 has been with a Stored Procedures in SQL Server 2000. And I always wrap my columns which could be null with COALESCE and give it a default value, even if it is an empty String. I did not want to even touch the situation with nulls. My policy with objects is to prevent null values from even being possible. I always construct my variables to some default and the Properties sometimes prevent a null value from replacing the non-null value. It depends on each case. I never want to check if a value returned from a method is a null. Like when I return collections, if there are no values I just return an empty collection, so I can call that function from a foreach loop without checking for a null. For strings I return String.Empty, etc. My code has worked quite well, and it is quite fast. I also never use string, and instead use String. I do not care about optimizations through nuances with value types. I find I get greater performance gains by taking my time to design a caching mechanism which cuts down on the db calls. That is where 95% of your latency will be anyway.

  2. It’s possible in many cases to “program defensively” in such a way that null values are basically never returned. But there are some applications where null values (especially in a database) are required.

    I also completely agree with you about performance. Good design is the place to start, worry about optimizing only when you find problem areas.

    I built a data collection and reporting tool where we needed to be able to represent a NULL value in the database to the user as an unanswered question. Of course without NULL value types, you have to either represent it in the database as a “magic number” or create some sort of wrappers around all of your integral values and have a “magic number” in the code. (Hint, doing it in the database makes reporting off the data VERY difficult) Of course Nullable negates the necessity of having this “magic number” hack in place, so it’s valuable in that sense.

    I just don’t think it’s necessary to require any of these Nullable/DBNull/Wrapper Object workarounds. If .NET provided a corollary reference type for all of the value types and would allow null references of these reference types, we could handle it ourselves. Minimzing/removing the use or value types because of the difference in semantics for copying, calling methods, etc has the side benefit of decreasing surprise.

  3. Brad permalink

    An interesting read and I agree with many things you say but I thought I would correct one small thing. Int32 is not a reference type that corresponds to int as a value type. Similarly, your claim of the difference between string and String is flawed. int and string are language-specific aliases for the value type Int32 and the reference type String correspondingly.

    int i;

    and

    Int32 i;

    will compile as identical code and become

    System.Int32 i;

    In VB.Net the aliases are Integer and String for these two types and in both languages, these aliases serve little purpose beyond provide familiarity to people migrating from non-.Net languages.

    Strings are not value types at all regardless of the upper or lowercase s.

  4. You are incorrect in saying “As you can see, we can’t have a null value type. So, you can see why the DBNull was added, because .NET needed a way to represent a NULL database value.” No, DBNull was not needed. DataRow["column"] returns an object (a reference type), therefore it could hold null. I have no idea why somebody at Microsoft invented DBNull, because the regular null value would have worked just fine. My hypothesis: it was an idiot.

  5. You’re right, the DBNull != null thing is awful. But since .NET 2.0 there are “nullable value types”

    e.g.

    // C#.NET
    int? myInt = null;
    DateTime? myDateTime = null;

    I assume that there exists something equivalent in VB.NET?

    I would’ expect for typed datasets that int? getter/setter is generated for nullable ints and int for not nullable ones …

    @David Piepgrass: Yes and no. DBNull.Value is OK when it would be compageable with null but: DBNull.Value != null (strange, I agree ;-))

Leave a Reply

Note: XHTML is allowed. Your email address will never be published.

Subscribe to this comment feed via RSS