Reading JSON in SQL Server 2016

Introduction

There’s a lot of this that I won’t have time or space to cover, so for more information, I suggest you look at Microsoft’s blog on SQL Server + JSON.  What I will cover is the OPENJSON table-valued function, which lets you turn JSON text into a result set, i.e. rows and columns.  One use of it would be to take JSON returned by an external API and get it ready for inserting into a table.

The things that make it really nice are:

  1. It can cope with a very big lump of JSON – at least 600,000 characters;
  2. Missing values are just treated as nulls in the output;
  3. You can specify the output format simply, in a very similar way to defining a table;
  4. You can skip down the input JSON’s hierarchy before starting the conversion, thereby ignoring upper levels in the JSON you don’t want;
  5. It can cope with more than one hierarchy level in the part of the JSON that gets turned into a single output row.

It won’t be too interesting to show you the first two, but I will explain the others in the next few sections.

Simple example – specifying the output format

Imagine the input JSON looks like this, stored in a variable @JSONtext:

{[
{"colour": "Green", "region": "East", "quantity": 156},
{"colour": "Orange", "region": "North", "quantity": 82}
]}

You can process that with some SQL like this:

select * from OPENJSON(@JSONtext) WITH (Colour VARCHAR(200), Quantity INT)

Note that everything is nullable, so the WITH clause has only name and type but not nullity.  Also note that region is ignored – that’s fine, but it won’t be in the output.

Skipping upper bits of input JSON

In the previous example, the JSON is a close match to what we want.  Other than the region property, we want all of it.  JSON is a tree-like structure, and there might be things at the top of the JSON’s tree that we don’t want.

If you look at the example on the JSON-API website, the comment data (the part showing ids) is buried quite deeply in the JSON.  It is Root.data[0].relationships.comments.data – you want the 0th child of the top-level data element etc.  You can extract that using the following SQL:

select * from OPENJSON(@JSONtext, '$.data[0].relationships.comments.data') WITH (Type VARCHAR(200), Id INT)

This will return just the two comments, and skip all the rest of the JSON’s tree.

Many hierarchy levels in the input JSON

In the same example as above, imagine you wanted the information about articles.  There are two top-level bits of information (ignoring comments etc.): id and title.  The repeating bit of the JSON we want is Root.data[0], Root.data[1] etc.  However, within one repeat, the two bits of information aren’t at the same level:

  1. Id is Root.data[0].id
  2. Title is Root.data[0].attributes.title

Fortunately this still works, by extending the select statement at bit:

select * from OPENJSON(@JSONtext, '$.data') WITH (Id INT '$.id', Title VARCHAR(200) '$.attributes.title')

Inside the WITH clause the $ is the current thing, i.e. element in the array selected by the ‘$.data’ inside the OPENJSON.

JSON in SSIS

This is all very nice (and there are other things such as exporting JSON that are also nice but I won’t cover here).  Where it comes to SSIS the nice-ness is surprisingly missing.  You can work around some of it, but that still leaves things being a bit clunky.

If you wanted to connect to an external API, fetch some JSON from it, and then turn it into a result set ready for normal SSIS processing you will need three things:

  1. A script task that contains C# to connect to the API (handling any credentials etc, error messages etc.) and write the result into an SSIS variable;
  2. An SQL task that contains an OPENJSON as above, to turn the JSON in the SSIS variable into a result set;
  3. Other bits of SSIS after that to do the rest of the processing.

The SQL task could insert into a table, which could then be read by the next step.  If you don’t fancy the extra write and read, you will need to explicitly tell SSIS about the columns that the OPENJSON will produce.

BONUS: More than one result set from a stored procedure

(Added 11th March 2018)

Phil Factor uses OPENJSON to let a stored procedure return 2+ result sets.  He uses an OUTPUT parameter per result set, returning each one as a JSON string to be turned back into the usual form via OPENJSON.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s