Storing binary data in MySQL with AOLserver

This week at work, Michael Andrews asked how to round-trip binary data into MySQL from AOLserver. Since AOLserver’s ns_db interface is still defined using Tcl_CreateCommand which isn’t Tcl_Obj’ified, it’s sensitive to embedded NULL bytes. In other words, values (such as SQL strings) with embedded NULLs (which can appear in binary data) will result in the value being truncated at the first NULL. So, we have to escape them, and luckily MySQL understand this.

Here’s a brief walk-through at the control port demonstrating AOLserver’s ability to safely round-trip binary data with embedded NULLs to and from MySQL. I’ve left out a lot of prerequisites such as installing MySQL, configuring a database pool in AOLserver, configuring and connecting to the control port, etc. Those kind of things should be documented elsewhere, already.

First, we get a database handle:

nscp 1> set db [ns_db gethandle mysqldb]
nsdb0

For the sake of this demonstration, lets show what version of MySQL I’m testing against:

nscp 2> ns_db select $db "show variables like 'version'"
t0

nscp 3> ns_db getrow $db t0
1

nscp 4> ns_set array t0
Variable_name version Value 4.1.11-Debian_4sarge2-log

nscp 5> ns_set free t0

Okay, so I’m using MySQL 4.1.11 as shipped with Debian Linux, which is what I’m using. YMMV if you’re using a different version of MySQL, of course. Next, lets set up our test data. We’ll create a test value with all possible 8-bit bytes from 0 through 255, repeated once. The resulting value should be 512 bytes long:

nscp 6> set value ""

nscp 7> for {set i 0} {$i <= 255} {incr i} { append value [format "%c" $i] }

nscp 8> append value $value; return

nscp 9> string length $value
512

That little “; return” on line 9 is there to prevent the control port from returning the result of the “append value $value” which would have new returned the new value, or 512 bytes of binary data, back to our terminal, which we really don’t care to see. I’ll use that little trick throughout this tutorial where it makes sense to squelch a noisy command’s output.

At this point, we’ve got a 512-byte long binary value in $value, with two NULLs in it, one at the very beginning, and one 257 bytes into it. Now, in order to send this through to MySQL, we’ll need to escape several bytes: the NULL (also expressed as “\0“), the single-quote or “'“, the Control-Z (byte 26 or 0x1A in hex representation) and the backspace itself. We have to escape the single quote because it will prematurely end the value we’re trying to pass in the SQL statement. This is how SQL injection attacks work, by taking advantage of sloppy programmers who don’t escape these single quotes when using dynamic SQL. Database APIs that use “prepared statements” and “bind variables” generally aren’t as susceptible to these injection attacks (but they still can be, if done improperly), but that’s a topic for another discussion. In the meantime, Tom Kyte wrote a great article in Oracle Magazine January/February 2005 on this.

Next, we’ll define two helper procs that will escape and unescape values suitable for using in our dynamic SQL statements. I’ve named the procs mysql.quotevalue and mysql.unquotevalue. They both use Tcl’s string map to perform the transformation. To quote a NULL byte we replace it with the string “\0” and to quote a single-quote we prepend a backslash so it gets replaced with the string “\'“. We do the same operation in reverse to unquote the values:

nscp 10> proc mysql.quotevalue {value} { string map {\0 \\0 ' \\' \x1A \\Z \\ \\\\} $value }

nscp 11> proc mysql.unquotevalue {value} { string map {\\0 \0 \\' ' \\Z \x1A \\\\ \\} $value }

Now that we have these two procs defined, we can create a temporary table for our test and start INSERT‘ing the 512-byte long binary value into it. I’ve named the table binarytest with just one BLOB column named value:

nscp 12> ns_db dml $db "CREATE TABLE binarytest (value BLOB)"

nscp 13> ns_db dml $db "INSERT INTO binarytest (value) VALUES (_utf8'[mysql.quotevalue $value]')"

Because Tcl stores its values inside of Tcl_Obj’s as UTF-8 encoded data but we’re pushing it naively through the ns_db interface, MySQL will unfortunately be receiving UTF-8 encoded data from us. So, we need to tell it that, in case it needs to do any behind-the-scenes transcoding based on the server or database’s default charsets. In MySQL, if you prefix a value with underscore followed by a character set name, it makes note that the value is encoded in that character set, which is exactly what you see above.

Now, lets retrieve the data back out, using the MySQL function QUOTE() to ask the DB to quote the response before returning it to us, so embedded NULLs in the response don’t cause premature truncation:

nscp 18> ns_db select $db "SELECT QUOTE(value) FROM binarytest"

nscp 19> ns_db getrow $db t0
1

nscp 20> set result [ns_set get t0 "QUOTE(value)"]; return

nscp 21> ns_set free t0

Now, if you read the documentation for QUOTE(), you’d notice that MySQL will surround the value with single-quotes unless the value is a NULL database value. Since Tcl has no explicit way of expressing a NULL value, we’ll reduce it to an empty string — while I don’t necessarily think this is the right thing to do, this is also a topic for another time. So, lets go to work:

nscp 22> set newvalue ""

nscp 23> if {[regexp {^'(.*)'} $result -> newvalue]} { set newvalue [mysql.unquotevalue $newvalue] }; return

So, now we should have our original data in $value and our newly retrieved and unescaped value in $newvalue. Lets ask Tcl to compare the two and tell us if they are indeed equal or not:

nscp 24> string length $newvalue
512

nscp 24> string equal $value $newvalue
1

Yay! We got back exactly what we put in. Perfect!

Some caveats with this approach include the fact that we end up storing our binary data encoded as UTF-8 in the database, not as actual binary. If you plan to connect to this database with another application other than AOLserver through the ns_db API, be very aware of this. Other applications will need to transcode the data from UTF-8 back to plain binary, and will need to transcode to UTF-8 before inserting/updating the values in the database so that AOLserver can pull them out correctly. Also, due to the storage in UTF-8, this also means that the storage requirement in the database will be close to two times the actual binary data being stored, so be aware of this when doing your capacity planning.

I hope this little walk-through tutorial has been helpful to you, or at least has given you some new things to think about with respect to how to use AOLserver, Tcl and MySQL. If you have any questions, don’t hesitate to leave a comment below or email me privately.

Comments

  1. First time I’ve heard of AOLServer. Do you like it better than Apache?

  2. Do I like AOLserver better than Apache? Sure, I guess, but that’s not really a fair comparison. Apache is just a web server. AOLserver is a web server tightly integrated with the Tcl scripting language. Do I prefer AOLserver and Tcl over Apache and PHP or Perl or Ruby or Python? Absolutely, because I really enjoy working in Tcl.

    I don’t mind working in PHP, Perl or Ruby (I have done next to nothing in Python), but I find Tcl really enjoyable.

Speak Your Mind

*