An introduction to Entity Framework

Introduction

This is article isn’t a hands-on guide to getting started with Entity Framework (EF). Instead it aims to give you an understanding of what EF is, whether it’s for you, and if so, which of its options apply best to you.

In the next article I will do the hands-on stuff, where I walk you through getting started with EF – although I will be concentrating on only one of the many options.

There is loads of information about EF in books and elsewhere on the web. I’m not trying to reproduce it all here. Instead, I’m trying to give you a high-level view that I hope will mean that other EF stuff you read will make more sense.

What is Entity Framework?

Entity Framework is an Object-Relational Mapper (ORM) from Microsoft to use with C# programs. There are alternative ORMs for C# such as Dapper and NHibernate.

What is an Object-Relational Mapper?

An ORM is a bridge between the world of objects (your C# code) and the world of relations (tables in a database such as SQL Server or Oracle). If you don’t use a database or something similar, then you don’t need an ORM. If your code does talk to a database, then an ORM is well worth considering.

In the world of objects:

a class defines a set of similar things;
an object represents a member of a given set;
the details of a member of a set are held in properties of an object;
there can be a relationship between things by one of the properties of object A being object B, or collection of objects C – B and C could be the same class as A (as in a tree structure) or a different class to A.

In the world of relations:

a table defines a set of similar things;
a row represents a member of a given set;
the details of a member of a set are held in the columns of a row;
there can be a relationship between things by row A holding a unique identifier of row B (often the primary key of B) – B could be a row in the same table as A or in a different table.

You can see that the worlds are different but correspond to each other. You might have created a new object and want to store it as a row in a table. Or you might have a set of tables linked by foreign key relationships, and you want to read one row from one table and all its child rows in other tables and have them all represented as a graph of objects in memory.

An ORM helps with this translation of information between one world and the other.

Why should I use an Object-Relational Mapper?

You don’t need to use an ORM, but in my experience it’s much easier to do so. The alternative is to e.g. hand-craft the SQL that corresponds to the insert statement that will add an object’s data to the database, or the SQL and constructor calls that will select one or more rows from the database and then correctly populate the one or more objects from the returned data.

The translation between worlds is something that’s both repetitive and somewhere where details matter. By that I mean the basic process of turning rows in table A into objects of class X is the same as turning rows in table B into objects of class Y, it’s just the specifics that differ. Also, it’s easy to do copy/paste/edit errors such that e.g. the data from one column in a select ends up in the wrong property of the created object.

It’s the kind of work that a machine can do better than a human, and also the kind of work that humans often find boring. So, delegate the work to the ORM so that you can concentrate on the more valuable and interesting stuff that’s closer to people and their problems.

Regardless of which tool you use, and how much help it gives you with security, it’s your responsibility to arm your code against attacks.

Why people don’t use ORMs

Some people don’t like to use ORMs in general because they believe they’re slow, or because they don’t let you have complete freedom to do whatever you like when interacting with the database. These are valid concerns, but it’s worth doing some homework before rejecting an ORM. Also, not all ORMs are the same – you might have had a bad experience in the past with a different ORM, but that doesn’t necessarily mean that EF isn’t a good choice for you.

Is your code actually slow? (Who defines what slow means in your situation, and how is it measured?) If so, is it the code that talks to the database that’s causing the problem or is some other code slow e.g. an API call? If it is the database code that’s slow, is it because the basic structure of the query is slow? E.g. you query parent and child rows via separate queries, or you have the N+1 problem? The query might be structurally sound, but still it results in a slow execution plan because there is an index missing from the database. These are all issues where the ORM isn’t the problem.

It could also be that there is some unusual query that you need to do, that doesn’t fit into the mould required by EF. However, it’s worth seeing how much of the total set of queries that this unusual query makes up – if 99% of the queries are straightforward and fit into EF, that’s a different situation to if you are doing only unusual queries.

The solution to both problems – poor performance and unusual queries – is likely to be to use EF to run arbitrary SQL strings against the database. My suggestion is that you default to using EF (or some other ORM) and then fall back on custom SQL only when you need to. You will get the benefits of an ORM (fewer bugs, higher productivity, clearer code) where you can, and avoid its costs (performance and inflexibility) when they occur.

What do I need to decide on before I start with Entity Framework?

There are a few decisions to do with EF where you need to choose between option A and option B – you can’t do both. Before you start coding using EF, you need to work through these decisions. I will list them here and then expand on them below.

Which version of EF?
Which persistence scenario?
Lazy or eager loading?
Which workflow?

Which version of Entity Framework?

There are two versions of EF – EF and EF Core. Which you use is partly determined by whether you’re using .Net or .Net Core, and partly by the fact that the features supported by the two aren’t identical. You need to decide whether EF or EF Core offers a set of features that’s a better fit for you.

Which persistence scenario?

There are two alternatives here:

Connected
Disconnected

They relate to how EF knows what changes need to be made to the database, based on the state of the objects currently in memory.

In the connected scenario, there is one long-running instance of the DbContext class, which is what EF uses to keep track of the state of objects. Therefore, if you read an object from the database, then change the object, all you need to do for the change to affect the database is to tell EF to save all outstanding changes.

The DbContext instance is also what EF uses to connect to the database, and so it means that there is a long-running database connection. If your C# code doesn’t need to share the database with anything else (including other threads executing the same code), and the database is local, then this could be OK. If you have a remote and/or shared database, e.g. in a web application, then the cost of the connected scenario is probably too high.

If you are using the disconnected scenario, then you need to tell EF explicitly about what changes it needs to make before you tell it to save changes. Basically, this is passing an updated object to EF to tell it that the corresponding bit of the database needs updating.

The update case is the only version where this difference occurs. In create and delete the was to use the context is the same:

	Connected	Disconnected
Create	context.MySet.Add(newObject)	context.MySet.Add(newObject)
Update	No call to EF	context.MySet.Update(newVersionOfObject)
Delete	context.MySet.Remove(objectToDelete)	context.MySet.Remove(objectToDelete)

Lazy or eager loading?

This is not to be confused with the lazy or eager loading happening because of LINQ; this is laziness or eagerness within EF. (The two can combine, which is a bit confusing.)

If you have table A that has a foreign key to table B, then that will be represented as class X that contains a property of class Y. For instance, the Person table/class and the Address table/class. It’s possible to write a query to retrieve just data from Person, or you could write a different query to join Person data with the corresponding Address data and so get data from both tables with the one query.

With eager EF loading, you will always load the Address data along with the Person data. With lazy EF loading, you will load the Person data and then the Address data for a given Person will be loaded invisibly in the background the first time you try to access their Address. So, if you never access the Address property, that data will never be retrieved from the database. (Note that lazily loading the Address is via a separate query to the database.)

The lazy loading due to LINQ is separate from the EF lazy loading. With LINQ lazy loading, the code will appear to construct and execute a C# query that you expect will be getting data from the database. This is not what happens – the query is built up, but will only be run the first time some code needs the results of the query e.g. by doing a ToList() on the results, or some calculation in memory.

So, going back to the Person/Address example, the C# code will build up and appear to execute a query to retrieve data. Due to LINQ laziness, this won’t actually trigger a query to the database until the data is used somewhere. At this point, the query that’s executed will pull data from:

Just the Person table, if EF is using lazy loading;
Both the Person and Address tables, if either EF is using eager loading or if it’s using lazy loading and the code is accessing the Address object.

The classes used for lazy and eager loading will be the same apart from the properties that are classes that relate to other tables. In eager loading they are just like this

public Address Address {get; set;}

but in lazy loading they are like this

public virtual Address Address {get; set;}

Lazy loading will result in more database traffic than eager loading if you access the data in the child objects (as a separate query is needed to get the data). However, if you don’t access the child objects’ data there will be the same number of queries as in eager loading, but these will return fewer columns than in eager loading (as the queries aren’t joining to child tables and pulling back their columns).

Which workflow?

Workflow here refers to the way that you work, not so much the way that the code works (although it influences that in some cases). As I said above, an ORM is a bridge between worlds. You, as a programmer, can build that bridge in three different directions:

Build from existing tables in the database to new classes in the code (database first)
Build from existing classes in the code to new tables in the database (code first)
Build a free-standing bridge by itself (in the form of a model) and then use this to define new classes in the code and new tables in the database (model first).

With model first you use a designer tool to create a model, which is stored as a .edmx file.

With database first you have two options:

You can reverse-engineer a .edmx file from the database, and use that to generate classes;
You can have no .edmx file in your project, and hand-craft the classes.

With code first there is no .edmx file, but you can reverse engineer one from your classes.

This is an article that might help you decide which workflow to use.

Summary

I hope that this has given you a better understanding of what EF is, and which (if any) flavour of working with it fits your needs. I hope that as a result, random answers on Stack Overflow or random blog posts will make a bit more sense. In the next article I will walk through some example code in what I consider the simplest approach to EF: EF 6, database-first, using hand-crafted POCO entities and eager loading.