Thursday, May 17, 2007

The dangers of not terminating your strings

After a bit of a long break I'm here to tell you about my latest wild bug hunt. A few months ago I was contacted by a French company to develop a high-speed text to SQL CE converter. This component allows the user to send plain or compressed text files to a device and locally process inserts, updates and deletes. Due to its very nature, the code was supposed to accept either UTF8 or UTF16 character encodings.

My first approach was to use the text conversion functions (wcstombs and mbstowcs). This way I had only one set of conversion functions (I mean converting from text to a GUID, to a DBTIMESTAMP and so forth). After the first tests the customer complained about stability: the data would be written to the database but the component DLL crashed almost consistently.

Back to the eVC4 debugger (now running on my Vista box in a Parallels virtual machine) I was confronted with an ugly beast. After successfully running the code, the whole thing would blast into hyperspace when closing the data source object (yes, that's the last thing you close). Hunting the bug forced me to look everywhere (even in the tested ATL OLE DB code). The thing was nowhere to be found. Often times I had to hard reset the debug device in order to get it back to its feet. Not a nice thing...

Bugs like these "smell" like memory issues: you write to a wild pointer and you may destroy some very fundamental data structures in your application. Needless to say that you are in for some unexpected outcomes.

I did not kill the bug directly. Instead, I rewrote the code so as to avoid text conversions. After running the new code for the first time the whole thing just behaved correctly. So should we avoid the text conversion functions altogether? Absolutely not! Just be wiser than me when you use them and make sure the converted strings are terminated. An unterminated string may cause this kind of problems in your application. So beware: you can either add one to the source string length or you can do that to the target string and make sure the buffer is zeroed before the conversion. Or you can do both.

No comments: