Warning: sem_get() [function.sem-get]: failed for key 0x152b: Permission denied in /home/dossy/htdocs/wp/wp-content/plugins/wp-cache/wp-cache-phase2.php on line 98
Dossy's Blog: Tcl

Archive for the 'Tcl' Category

tcl-mysql-udf 0.2 and Win32 DLL binary

Friday, August 31st, 2007

Last week, I released some code that would enable MySQL to evaluate Tcl scripts as a stored function, which I called tcl-mysql-udf. My friend Steve asked if I could prepare a Win32 DLL binary for him, so I worked on that tonight and am releasing version 0.2, along with the DLL:

In order to play with this, you’ll need the following prerequisites installed:

  • MySQL (I’m testing on 5.1.21-beta)
  • Tcl (I’m testing on 8.4.15.0)

Inside the tcl-mysql-udf-0.2-dll.zip will be the file tcl-mysql-udf.dll. On MySQL 5.0.x, it expects it to reside in the Program Files\MySQL\MySQL Server 5.0\bin directory, so copy it there. On MySQL 5.1.x, however, it expects it to be in the Program Files\MySQL\MySQL Server 5.1\lib directory. Pay close attention to what version of MySQL you’re using and the correct directory to copy the DLL into.

Once you’ve got everything installed and copied to the right locations so far, connect to your MySQL database with your favorite MySQL client, and issue the following command (you only type what’s in bold):

mysql> CREATE FUNCTION TCL RETURNS STRING SONAME 'tcl-mysql-udf.dll';
Query OK, 0 rows affected (0.10 sec)

We can check what version of Tcl we’ve loaded this way:

mysql> SELECT TCL('info patchlevel') AS script;
+--------+
| script |
+--------+
| 8.4.15 |
+--------+
1 row in set (0.04 sec)

Here’s a goofy example of storing Tcl scripts in the database and having MySQL evaluate them:

mysql> CREATE TABLE code (
n INT NOT NULL AUTO_INCREMENT,
script TEXT NOT NULL,
PRIMARY KEY pk_code (n)
) ENGINE=MyISAM;

Query OK, 0 rows affected (0.20 sec)

mysql> INSERT INTO code (script) VALUES
('set x 123'),
('expr {$x + 432}'),
('clock format [clock seconds]'),
('incr x [clock seconds]'),
('expr {$x * rand()}');

Query OK, 5 rows affected (0.04 sec)
Records: 5  Duplicates: 0  Warnings: 0

So, we now have a table with five rows in it, each row containing a Tcl script. We can have MySQL evaluate those Tcl scripts like this:

mysql> SELECT n, TCL(script)
FROM code
ORDER BY n;

+---+---------------------------------------------------+
| n | TCL(script)                                       |
+---+---------------------------------------------------+
| 1 | 123                                               |
| 2 | 555                                               |
| 3 | Fri Aug 31 12:41:25 AM Eastern Daylight Time 2007 |
| 4 | 1188535408                                        |
| 5 | 6935485.39171                                     |
+---+---------------------------------------------------+
5 rows in set (0.05 sec)

So what, right? How about fetching HTML documents via HTTP, right from within MySQL?

mysql> CREATE TABLE urls (
n INT NOT NULL AUTO_INCREMENT,
url VARCHAR(255) NOT NULL,
PRIMARY KEY pk_urls (n)
) ENGINE=MyISAM;

Query OK, 0 rows affected (0.03 sec)

mysql> INSERT INTO urls (url) VALUES
('http://dossy.org/'),
('http://aolserver.com/'),
('http://njgeeks.org/');

Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT n, url, TCL(
'package require http;'
'set url [lindex $args 0];'
'set token [http::geturl $url];'
'set data [http::data $token];'
'http::cleanup $token;'
'regexp -all -inline {<title>.*?</title>} $data',
url) AS script
FROM urls
ORDER BY n \G

*************************** 1. row ***************************
     n: 1
   url: http://dossy.org/
script: {<title>Dossy's Blog</title>}
*************************** 2. row ***************************
     n: 2
   url: http://aolserver.com/
script: <title>AOLserver</title>
*************************** 3. row ***************************
     n: 3
   url: http://njgeeks.org/
script: {<title>NJ Geeks | - New Jersey's IT Community</title>}
3 rows in set (1.54 sec)

This query uses the Tcl “http” package to fetch the documents specified by the url column and plucks out the <title> tag using a regular expression. The TCL() stored function takes a variable number of arguments, the first being the Tcl script to evaluate, followed by zero or more arguments that are placed into the $args Tcl variable. In this case, we pass in “url” so that on each row, we execute our Tcl script with the value from that row.

Be aware that this stored function is a security issue: allowing database users to execute arbitrary code has obvious risks, especially since that code will be executed as the user that the MySQL server is running as.

Hopefully this is enough to get you started and might even give you an idea as to how this could be useful to you in some way. If you have any questions, just leave them in the comments below.

Tags: , , ,

Evaluate Tcl scripts as a UDF in MySQL

Saturday, August 25th, 2007

In the previous blog entry, I mentioned evaluating Tcl scripts from within MySQL using the User-Defined Function (UDF) capability already in MySQL. For example, it would allow you to do this:

CREATE FUNCTION TCL RETURNS STRING SONAME 'libtcl-mysql-udf.so';

SELECT TCL('expr {2 + 2}');

Well, after a few hours of productive coding and hacking, I want to share tcl-mysql-udf-0.1.tar.gz with you.

I’ve done very limited testing on this and on very little sleep, so I wholly expect there to be really stupid mistakes in there–but, it does “work” on my personal Debian 4.0 Linux and MySQL 5.1.20 setup with Tcl 8.4.12. If you want to play around with this, I strongly urge you to read over the README included in the tarball, as it should have all the relevant bits of information necessary to get you started.

A quick demonstration of what’s currently possible with this version:

mysql> SELECT TCL('info patchlevel') AS script;
+--------+
| script |
+--------+
| 8.4.12 |
+--------+
1 row in set (0.00 sec)

mysql> SELECT TCL('info commands') AS script \G
*************************** 1. row ***************************
script: tell socket subst open eof pwd glob list exec pid time eval lrange fblocked lsearch gets case lappend proc break variable llength return linsert error catch clock info split array if fconfigure concat join lreplace source fcopy global switch update close
1 row in set (0.00 sec)

Currently, Tcl errors result in the function returning NULL–I haven’t figured out how to propagate the error string back up, yet.

mysql> SELECT TCL('error foo') AS script;
+--------+
| script |
+--------+
| NULL   |
+--------+
1 row in set (0.01 sec)

Also, the maximum length string that the function can return is currently hard-coded to 255 characters.

mysql> SELECT LENGTH(TCL('string repeat x 1024')) AS script;
+--------+
| script |
+--------+
|    255 |
+--------+
1 row in set (0.07 sec)

I implemented the function as accepting a variable number of arguments, the first being the Tcl script, followed by values that get appended to the $args Tcl variable–as though it were a Tcl proc call, in a sense.

mysql> SELECT TCL('concat [llength $args] $args', 'abc', 123, NOW()) AS script;
+---------------------------------+
| script                          |
+---------------------------------+
| 3 abc 123 {2007-08-25 08:50:35} |
+---------------------------------+
1 row in set (0.05 sec)

If you’re familiar with Tcl and MySQL and aren’t afraid of compiling something and fooling around with it in your dev playground, I’d love to hear what you think of this code. Can you see any use for it? What else would it need to be able to do before it would be useful to you?

Tags: , , ,

Multiple threads opening the same Berkeley DB in Tcl

Monday, July 17th, 2006

Thanks to the Tcl thread extension, you can use threads within your Tcl scripts with a thread-enabled build and this extension. There’s also a Tcl binding for Berkeley DB (BDB) available, as well. But, what if you want to combine the two, and access the same Berkeley DB from multiple threads in Tcl? Since Berkeley DB is free-threaded (or thread-safe), this should be simple, right? You’re right, it should be but it was pretty non-obvious, to me at least.

It’s not a bug, it’s just an undocumented feature!

First off, the BDB documentation makes mention of DB_THREAD needing to be set on the environment handle to get the free-threaded behavior, but the berkdb env documentation lacks any mention of how to do this. Turns out, the parameter -thread is undocumented, but does set DB_THREAD as expected. Fantastic! That’s all you should need to know, right? Wrong. There’s a peculiar constraint (bug?) which I ran up against, scratching my head, reading (and re-reading) the BDB source, trying to understand. Here’s an example script I started with:

package require Thread

set t1 [thread::create]
thread::send $t1 {
  package require Db_tcl
  set e [berkdb env -create -home bdb -thread -cdb]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}
thread::send $t1 {$db put foo bar}
  # => 0
thread::send $t1 {$db get foo}
  # => {foo bar}

set t2 [thread::create]
thread::send $t2 {
  package require Db_tcl
  set e [berkdb env -create -home bdb -thread -cdb]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}
thread::send $t1 {$db get foo}
  # => error: NULL db info pointer

You’d think this would work, right?

Naturally, what’s happened here is that two threads have attempted to use the same environment, and when the second thread does it, it invalidates the environment for the first thread, resulting in the “NULL db info pointer” error. Fine, I wouldn’t expect this to work for just this reason, so lets try to create the environment once and reuse it in the child threads. We’ll use tsv’s (thread-shared variables) to share the environment handle across threads.

package require Db_tcl
package require Thread

tsv::set bdb env [berkdb env -create -home bdb -thread -cdb]
  # => env0

set t1 [thread::create]
thread::send $t1 {
  package require Db_tcl
  set e [tsv::get bdb env]
    # => env0
  set db [berkdb open -env $e -thread -create -hash tables.db test]
    # => error: db open: illegal environment
}

Illegal? I’ll show YOU illegal!

This really puzzled me, getting the “db open: illegal environment” error. I looked at the source for db-4.2.52, in file tcl_db_pkg.c around lines 1464–1473:

   1464         switch ((enum bdbenvopen)optindex) {
   1465         case TCL_DB_ENV0:
   1466             arg = Tcl_GetStringFromObj(objv[i], NULL);
   1467             envp = NAME_TO_ENV(arg);
   1468             if (envp == NULL) {
   1469                 Tcl_SetResult(interp,
   1470                     "db open: illegal environment", TCL_STATIC);
   1471                 return (TCL_ERROR);
   1472             }
   1473         }

I mean, look at that. NAME_TO_ENV() is a macro that’s defined as (DB_ENV *)_NameToPtr((name)) which simply fishes data out of DBTCL_GLOBAL __dbtcl_global … it shouldn’t be returning a NULL, here. What gives? Since we have to initialize each thread’s state on creation (it doesn’t inherit anything from the thread that created it), we have to package require Db_tcl and load the BDB module in each thread. I bet there’s something goofy going on there. Looking at Db_tcl_Init(), we see:

     72 int
     73 Db_tcl_Init(interp)
     74     Tcl_Interp *interp;     /* Interpreter in which the package is
     75                      * to be made available. */
...
     97     LIST_INIT(&__db_infohead);
     98     return (TCL_OK);
     99 }

Aha! The villain is revealed!

Oh, look, it initializes the global each time the package is loaded! So, the list that points to the environments gets wiped out once our newly created thread loads the Db_tcl package. While this is frustrating, there’s a work-around: create the environment after creating the threads and loading the Db_tcl package inside of them:

package require Db_tcl
package require Thread

## Create our threads, making sure Db_tcl is loaded.
set t1 [thread::create]
thread::send $t1 {
  package require Db_tcl
}
set t2 [thread::create]
thread::send $t2 {
  package require Db_tcl
}

## Now, we can create the environment once:
tsv::set bdb env [berkdb env -create -home bdb -thread -cdb]

## ... and then, use it in our threads:
thread::send $t1 {
  set e [tsv::get bdb env]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}
thread::send $t2 {
  set e [tsv::get bdb env]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}

## See:
thread::send $t1 {$db put foo bar}
  # => 0
thread::send $t1 {$db get foo}
  # => {foo bar}
thread::send $t2 {$db get foo}
  # => {foo bar}

All this, for what?

So, this is how the story ends. It is possible to use BDB across multiple threads in a Tcl application, with the restriction that all threads that will load the Db_tcl package must be created and initialized before you create environments and probably any other operation that relies on the __dbtcl_global structure, as it gets initialized on package load and the single global is shared across all threads. Presumably, this might be considered a bug, in that the structure should probably only be initialized once per process rather than every package load.

Can I do something about it now?

Here’s my proposed patch to fix this, although there’s a potential race
condition here:

$ diff -u tcl_db_pkg.c.orig tcl_db_pkg.c
--- tcl_db_pkg.c.orig   2006-07-17 11:22:20.000000000 -0700
+++ tcl_db_pkg.c        2006-07-17 11:23:40.000000000 -0700
@@ -78,6 +78,7 @@
 {
   int code;
   char pkg[12];
+    static int initialized = 0;

   snprintf(pkg, sizeof(pkg), "%d.%d", DB_VERSION_MAJOR, DB_VERSION_MINOR);
   code = Tcl_PkgProvide(interp, "Db_tcl", pkg);
@@ -98,7 +99,10 @@
   (void)Tcl_LinkVar(
       interp, "__debug_test", (char *)&__debug_test,
       TCL_LINK_INT);
-  LIST_INIT(&__db_infohead);
+    if (!initialized) {
+        initialized = 1;
+        LIST_INIT(&__db_infohead);
+    }
   return (TCL_OK);
 }

My recommendation would be to load the Db_tcl package in a single-threaded environment (before any additional threads are created) to ensure that only one thread initializes __db_infohead, then you can freely create threads and load the Db_tcl package in them without each package load re-initializing the __db_infohead.

If this was helpful, feel free to let me know in the comments below. Or, if you have any questions, ask those too!

UPDATE: I received a response from an Oracle engineer (who now owns Sleepycat) about this issue. Here’s it is:

[...] Although the fix you sent in the SR is one of the
ones needed to allow threaded access to BDB’s Tcl API, it is
not the only one. The __db_infohead is the beginning of a
global linked list and manipulation of that linked list is
not protected in the API. All manipulation of it would need
a mutex. That isn’t necessarily hard, but we have not had
customer demand for multi-threading the Tcl API.

I have fixed our Reference Guide Programming Notes on Tcl to
include a statement that says it does not support multi-threading.

So, if you’d like to see full threaded support in Tcl for Berkeley DB, leave a comment below and we’ll see what kind of demand really exists for it. In the meantime, I might try to work on it myself, just in case.

Tags:
,
,
,



Warning: sem_acquire(): supplied argument is not a valid SysV semaphore resource in /home/dossy/htdocs/wp/wp-content/plugins/wp-cache/wp-cache-phase2.php on line 107

Warning: sem_release(): supplied argument is not a valid SysV semaphore resource in /home/dossy/htdocs/wp/wp-content/plugins/wp-cache/wp-cache-phase2.php on line 116