Unit testing an Entity Framework DAL part 1: Just hit the database
As almost anyone who tries to unit test a database application will quickly discover, databases present a huge problem for unit testing. Strictly speaking, if you are testing your C# or VB code and you actually hit a real database, then it isn’t really a unit test. It is actually an integration test. However, I have found that it doesn’t really matter what you call it, the end result is that your tests are much more useful if they actually hit a real database. You don’t have to worry about whether the test failed because you screwed up your mock object or if the actual application is buggy and you get better code coverage because even broken SQL will lead to a failed test.
There are several methods that can be used to prevent your unit tests from actually using a real SQL Server database, but they all have their problems:
- Using an in-memory provider like SQLite
There is an Entity Framework provider for SQLite that allows you to interact with a database without using a network or even going to your file system. This could certainly increase the execution speed of your unit tests and makes it easy to prevent cross contamination in your tests, but they are still integration tests. The only difference is that you are now testing whether your code works on SQLite, rather than the DBMS that you will actually use in production. The problem is that all database systems have different behaviors and feature sets, so your tests are no longer valid if you use a different DBMS for testing. There is also currently no system in place to automatically generate the SQLite schema from your entity data model, so you will need to find your own way of doing that, or you have to manually maintain a separate SQLite schema. Gross. If you are going to use another provider, it needs to be specially designed to behave exactly like your production database (ie: a mock SQL Server provider) but to my knowledge no such providers exists (if I’m wrong, please let me know!).
- Mock the Entity Framework ObjectContext
If all you want to do is read data, then this works well and is easy to implement. Unfortunately, in the vast majority of cases, we also need to write data and that’s where this method gets tricky. Your mock ObjectContext needs to be able to track changes and save them to an in-memory repository. And again, you have to make sure that it behaves exactly like your production database. Because this method often involves either a huge wrapper or major alterations to auto-generated code (which means you also need to make your own generator or you’ll lose maintainability) the mock object itself is extremely complicated, leaving a high likelihood that it will have errors. Since the mock is so complicated one could argue that you are again doing integration tests, not unit tests. But this time instead of testing your code and the database, you are testing your code and the mock ObjectContext. Just like the SQLite example, this is much worse because you are testing whether your code integrates with something you will not use in production. If you are going to do integration tests anyways, then you might as well integrate with the real thing. This method could lead to faster executing tests, but don’t forget that a local SQL Server instance is actually extremely fast and might be just as good.
- Encapsulate your data access layer and then mock it
I see this response on message boards all the time. Whenever someone asks how they unit test their data access code someone will respond “You’re doing it wrong, put all of your data access code into a separate module that you can mock”. There are a couple problems with this. First of all, you still need to test the code in the data access layer. If you have a function in your DAL that executes a complicated LINQ to Entites query, then you want to test that query. Without using one of the techniques mentioned above, this requires hitting the database. Secondly, making your client code completely unaware of the data access layer’s implementation leads to some issues. Let’s pretend that my data access layer looks like this:
Public Interface IUsersModel Function GetUsers() As IEnumerable(Of Users) Sub Save() End Interface Public Class UsersModel Implements IUsersModel Private _context As New DataTestEntities Public Function GetUsers() As IEnumerable(Of Users) Implements IUsersModel.GetUsers Return _context.Users End Function Public Sub Save() Implements IUsersModel.Save _context.SaveChanges() End Sub End Class
It’s pretty simple, the code just allows you to get a collection of users and save any changes you make. UsersModel correctly implements the interface using the Entity Framework. Then we also have a controller that accesses the DAL. It looks like this:
Public Class UsersController Private _usersModel As IUsersModel Public Sub New(ByVal usersModel As IUsersModel) _usersModel = usersModel End Sub Public Sub ChangeFirstUserNameToFoobar() _usersModel.GetUsers().First.userName = "foobar" _usersModel.Save() End Sub End Class
UsersController has a dependency on IUsersModel, so when unit testing the ChangeFirstUserNameToFoobar method, we pass in a mock implementation of IUsersModel, but we cannot simply verify that Save() was called, we also need to know what is going to happen when Save is called. Specifically, we need some way of checking that the first user’s username was changed to “foobar”. This means that a mocking framework like RhinoMocks or Moq will not be sufficient. There must be a fake implementation of IUsersModel that keeps track of the changes that have been made. Now we are getting back into “mock the ObjectContext” territory because that’s basically what we will have done.
There is a definite trend here: each of the above methods is complicated enough that you lose the benefits of isolating your tests from the database. They are all integration tests. In every case you are testing your client code, plus the repository. Since you have to test a repository, it might as well be the real one. Of course, this presents its own challenges. You will want to use a local instance of SQL Server (or whatever DBMS you use) to keep the tests fast (and isolated from other developers) and you will need to roll back changes after each test. In subsequent articles I will look at how to deal with these issues.
Update: I have posted the second article: Unit testing an Enitity Framework DAL part 2: Rolling back the test database
[...] couple months ago I wrote this article explaining why I think it is reasonable for unit tests to hit a real database. [...]
I don’t think its possible to unit test Entity Framework data access with an in memory database like SqLite when the entity model is generated for another DBMS. This is not even possible in Entity Framework V2 (.NET 4.0). The generated SSDL section of the edmx file contains provider information to the specific DBMS used in production code. I.e when using SqlServer the provider will be “System.Data.SqlClient” but SQLite demands the System.Data.SQLite provider. Do you know another solution or workaround for this? Was this statement theoretical or have you managed to get it to work?
It was theoretical. You might have to make two entity models (one for each provider) and then make both object contexts implement the same interface and resolve instances with an IOC framework. Probably isn’t worth the effort though.
Ok, thanks for your answer. I agree, that is not worth the effort. And it will also require POCO.
you’re missing the point. the reason why you don’t hit the actual database is that you’re testing logic and not connection. if, for whatever reason, you get a timeout loading the data the test will fail even if your logic is not faulty and that produces a false negative.
you database connection and reading / writing should be done in an integration test. logic should be tested in its own unit test. the only way they should feature together is in a test sequence that is rolling up multiple tests together.
furthermore: in reality you may have a _lot_ of unit tests and hitting a database is slower than a mock object. unit tests are meant to be run often and any speed increase will save frustration.
admittedly there should be tests that hit the db and test all functionality but they’re not unit tests and should run as often as your unit tests run.
I have used an in memory database with SQLite together with NHibernate. It was extremely quick. We used the repository pattern so all the queries could be unit tested. I see what you mean about them beeing integration tests, but I don’t fully agree. They test the queries (criterias in Nhibernate, could be linq) of the repositories but that doesnt mean it will work with the real DBMS. Additional integration tests against the real DBMS should also exist to guarante the functionality.
zen: I acknowledged that it is technically an integration test in the article:
“it isn’t really a unit test. It is actually an integration test”
My point was that is some cases it makes more sense to only do integration tests. Maybe I should have made it more clear in the title.
Marcus: I’m glad to see that is working well. The only thing I would be worried about is that NHibernate may behave differently with each DBMS in some cases. However, it the behaviour is consistent then that is probably a great solution.
Firstly thanks for this contribution I think this is a critically important aspect of TDD, Agile Development and Automated Testing which is not adequately covered in most books, articles and blogs.
Although your blog is specific to Entity Framework, I think it is worth looking at other Solutions and seeing how these could be adapted to Entity Framework or how Entity Framework could be adapted to make testing without actually hitting the DB simpler.
We all accept that as part of your test pack there are going to be integration tests and Acceptance tests that actually hit the Database.
In my experience there are a whole range of tests (Call them integration tests if you want) that need to persist some objects and load them again. Projects with complex processing logic etc.
In all these cases you either need a database or you need to be able to Fake a data access layer.
If you are using a framework such as Habanero or NHibernate this should be relatively simple since these pretty much ensure that database differences are accounted for.
Habanero uses an InMemoryDatabase which is a Fake it behaves exactly like a real database as far as Habanero is concerns and implements all the same constraints etc.
I have used this on my last few projects and found it to be incredibly simple, usable and resulted in incredibly fast tests with no test interactions etc. It worked so well that when I touch an old project I automatically change to using the InMemory
The In Memory Database just stores the objects. You don’t need to propogate schema changes to it etc etc.
It was obviously a bit of work to create but the Habanero guys have already done it.
The unintended benefits of using the In Memory database for Agile developement is that we have found that we are doing large chunks of development using just Business Objects and the In Memory database (No adding removing columns/Table etc from the Database). Even the visual testing by the developer is done using an In Memory Database. When you go into stabilisation i.e. testing by testers before deployment we generate the DB Schema and run our Database integration and acceptance tests and then do visual testing.
Sorry for the long reply but I have found this feature incredibly usefull and very supportive of Agile development and I would like to see it incorporated into other frameworks.
Shanti Dev