Tuesday, September 30, 2008

Let's talk about BLOBs

So far we have seen how to edit table data using the IRowsetChange interface. The provided sample managed to edit a simple table where all columns had limited size. Table edits are performed in a single step, where all the data is read or written at once, and all the data items are sequentially stored on the same data buffer. This scheme works very well but does not support BLOBs.

A BLOB is a different beast because it can store a very large amount of data. In SQL Compact there are only two types of BLOBs: the image (variable length byte array) and the ntext types (variable length character array). These can be really large and the maximum theoretical size of a BLOB field is 4 GB in SQL Compact (the same size limit for the whole database).Contrary to smaller data types, you may not be able to store the whole BLOB data in the device memory, so we need a different approach to handling these types.

The OLE DB standard allows two different ways to bind BLOB data to your accessor: by reference or by value. SQL Compact does not allow by reference binding, so we are limited to binding these by value? What are the differences between both types of data binding?
  • When binding by reference, the OLE DB provider allocates memory for each bound BLOB field and fills it with data. A pointer to this data is stored in the accessor buffer along with its status and size. Column data is accessed through an extra pointer indirection and our code must make sure that the BLOB data buffer is released when the data is not needed anymore.
  • When binding by value, we can either specify (force) a fixed column size or bind through a storage interface. The first approach allows your BLOB data to be manipulated just like any other column but forces you to put a limit on how much data you will handle per BLOB column and this defeats the whole purpose of using them. Alternatively (and most commonly) your code will request that the provider creates a storage object to handle the BLOB data (when reading - when writing data your code must do this). When data is read ftom the table, the provider stores a pointer to either an ISequentialStream or an ILockBytes interface and then your code can read the data through these. When writing, your code must create one of these objects per BLOB column.
The last approach - binding by value through storage objects - is the recommended approach, but it becomes harder to implement when your code is handling more than one BLOB per row. In fact, SQL Compact will only manage (read from or write to) one storage object at a time, and if you ask it to instantiate more than one you will get an error. So how do you read from a table (or query) that returns more than one BLOB column? And how do you write the data? Well, you do so one at a time and this means that your code cannot read nor write the whole data in one single step.

The first change we must make to the rowset code is to add more accessor handles. The first accessor handle will bind the non-BLOB columns and these can be read and written to just like we are doing right now. For each BLOB column the rowset must allocate an additional accessor handle and make a note of it.

When data is read, the non-BLOB columns are all read, but none of the BLOB columns are. When your client code requests data from a BLOB column, the rowset must know that this is a special case, retrieve the colsumn's accessor handle and call IRowset::GetData on that handle. On the data buffer, the OLE DB provider will write a pointer to one of the above-mentioned storage objects (it is chosen by the rowset code when the data binding is created) and your client code must read through it and release it when no longer needed.

When you need to write data back to the table, your client code will have to create one storage object per BLOB column, store the respective pointers in the data buffer and engage in a multi-step update (new row insertion or existing row update). This means that your code will store the non-BLOB columns first and then store the BLOB columns one at a time, but to be able to achieve this the rowset must be in delayed update mode. You specify this mode by setting the DBPROP_IRowsetUpdate property to VARIANT_TRUE and the rowset now exposes the IRowsetUpdate interface. When this interface is exposed, you can call IRowsetChange::SetData on the allocated accessor handles without having to worry about having the row in an inconsisten state (due to table constraints, for instace) while you are updating data. When your code is done setting data through all the accessors, just call IRowsetUpdate::Update to update all the changes at once.

This will be easier to understand through a sample, which I will publish on my next post.

No comments: