Multiple threads opening the same Berkeley DB in Tcl

Thanks to the Tcl thread extension, you can use threads within your Tcl scripts with a thread-enabled build and this extension. There’s also a Tcl binding for Berkeley DB (BDB) available, as well. But, what if you want to combine the two, and access the same Berkeley DB from multiple threads in Tcl? Since Berkeley DB is free-threaded (or thread-safe), this should be simple, right? You’re right, it should be but it was pretty non-obvious, to me at least.

It’s not a bug, it’s just an undocumented feature!

First off, the BDB documentation makes mention of DB_THREAD needing to be set on the environment handle to get the free-threaded behavior, but the berkdb env documentation lacks any mention of how to do this. Turns out, the parameter -thread is undocumented, but does set DB_THREAD as expected. Fantastic! That’s all you should need to know, right? Wrong. There’s a peculiar constraint (bug?) which I ran up against, scratching my head, reading (and re-reading) the BDB source, trying to understand. Here’s an example script I started with:

package require Thread

set t1 [thread::create]
thread::send $t1 {
  package require Db_tcl
  set e [berkdb env -create -home bdb -thread -cdb]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}
thread::send $t1 {$db put foo bar}
  # => 0
thread::send $t1 {$db get foo}
  # => {foo bar}

set t2 [thread::create]
thread::send $t2 {
  package require Db_tcl
  set e [berkdb env -create -home bdb -thread -cdb]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}
thread::send $t1 {$db get foo}
  # => error: NULL db info pointer

You’d think this would work, right?

Naturally, what’s happened here is that two threads have attempted to use the same environment, and when the second thread does it, it invalidates the environment for the first thread, resulting in the “NULL db info pointer” error. Fine, I wouldn’t expect this to work for just this reason, so lets try to create the environment once and reuse it in the child threads. We’ll use tsv‘s (thread-shared variables) to share the environment handle across threads.

package require Db_tcl
package require Thread

tsv::set bdb env [berkdb env -create -home bdb -thread -cdb]
  # => env0

set t1 [thread::create]
thread::send $t1 {
  package require Db_tcl
  set e [tsv::get bdb env]
    # => env0
  set db [berkdb open -env $e -thread -create -hash tables.db test]
    # => error: db open: illegal environment
}

Illegal? I’ll show YOU illegal!

This really puzzled me, getting the “db open: illegal environment” error. I looked at the source for db-4.2.52, in file tcl_db_pkg.c around lines 1464–1473:

   1464         switch ((enum bdbenvopen)optindex) {
   1465         case TCL_DB_ENV0:
   1466             arg = Tcl_GetStringFromObj(objv[i], NULL);
   1467             envp = NAME_TO_ENV(arg);
   1468             if (envp == NULL) {
   1469                 Tcl_SetResult(interp,
   1470                     "db open: illegal environment", TCL_STATIC);
   1471                 return (TCL_ERROR);
   1472             }
   1473         }

I mean, look at that. NAME_TO_ENV() is a macro that’s defined as (DB_ENV *)_NameToPtr((name)) which simply fishes data out of DBTCL_GLOBAL __dbtcl_global … it shouldn’t be returning a NULL, here. What gives? Since we have to initialize each thread’s state on creation (it doesn’t inherit anything from the thread that created it), we have to package require Db_tcl and load the BDB module in each thread. I bet there’s something goofy going on there. Looking at Db_tcl_Init(), we see:

     72 int
     73 Db_tcl_Init(interp)
     74     Tcl_Interp *interp;     /* Interpreter in which the package is
     75                      * to be made available. */
...
     97     LIST_INIT(&__db_infohead);
     98     return (TCL_OK);
     99 }

Aha! The villain is revealed!

Oh, look, it initializes the global each time the package is loaded! So, the list that points to the environments gets wiped out once our newly created thread loads the Db_tcl package. While this is frustrating, there’s a work-around: create the environment after creating the threads and loading the Db_tcl package inside of them:

package require Db_tcl
package require Thread

## Create our threads, making sure Db_tcl is loaded.
set t1 [thread::create]
thread::send $t1 {
  package require Db_tcl
}
set t2 [thread::create]
thread::send $t2 {
  package require Db_tcl
}

## Now, we can create the environment once:
tsv::set bdb env [berkdb env -create -home bdb -thread -cdb]

## ... and then, use it in our threads:
thread::send $t1 {
  set e [tsv::get bdb env]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}
thread::send $t2 {
  set e [tsv::get bdb env]
  set db [berkdb open -env $e -thread -create -hash tables.db test]
}

## See:
thread::send $t1 {$db put foo bar}
  # => 0
thread::send $t1 {$db get foo}
  # => {foo bar}
thread::send $t2 {$db get foo}
  # => {foo bar}

All this, for what?

So, this is how the story ends. It is possible to use BDB across multiple threads in a Tcl application, with the restriction that all threads that will load the Db_tcl package must be created and initialized before you create environments and probably any other operation that relies on the __dbtcl_global structure, as it gets initialized on package load and the single global is shared across all threads. Presumably, this might be considered a bug, in that the structure should probably only be initialized once per process rather than every package load.

Can I do something about it now?

Here’s my proposed patch to fix this, although there’s a potential race
condition here:

$ diff -u tcl_db_pkg.c.orig tcl_db_pkg.c
--- tcl_db_pkg.c.orig   2006-07-17 11:22:20.000000000 -0700
+++ tcl_db_pkg.c        2006-07-17 11:23:40.000000000 -0700
@@ -78,6 +78,7 @@
 {
   int code;
   char pkg[12];
+    static int initialized = 0;

   snprintf(pkg, sizeof(pkg), "%d.%d", DB_VERSION_MAJOR, DB_VERSION_MINOR);
   code = Tcl_PkgProvide(interp, "Db_tcl", pkg);
@@ -98,7 +99,10 @@
   (void)Tcl_LinkVar(
       interp, "__debug_test", (char *)&__debug_test,
       TCL_LINK_INT);
-  LIST_INIT(&__db_infohead);
+    if (!initialized) {
+        initialized = 1;
+        LIST_INIT(&__db_infohead);
+    }
   return (TCL_OK);
 }

My recommendation would be to load the Db_tcl package in a single-threaded environment (before any additional threads are created) to ensure that only one thread initializes __db_infohead, then you can freely create threads and load the Db_tcl package in them without each package load re-initializing the __db_infohead.

If this was helpful, feel free to let me know in the comments below. Or, if you have any questions, ask those too!

UPDATE: I received a response from an Oracle engineer (who now owns Sleepycat) about this issue. Here’s it is:

[…] Although the fix you sent in the SR is one of the
ones needed to allow threaded access to BDB’s Tcl API, it is
not the only one. The __db_infohead is the beginning of a
global linked list and manipulation of that linked list is
not protected in the API. All manipulation of it would need
a mutex. That isn’t necessarily hard, but we have not had
customer demand for multi-threading the Tcl API.

I have fixed our Reference Guide Programming Notes on Tcl to
include a statement that says it does not support multi-threading.

So, if you’d like to see full threaded support in Tcl for Berkeley DB, leave a comment below and we’ll see what kind of demand really exists for it. In the meantime, I might try to work on it myself, just in case.

Tags:
,
,
,

Comments

  1. Alexey Pechnikov says

    It’s work from tcl library, but don’t work from adp pages (or we must create threads for each adp page). Do you know a better method?

  2. Alexey: The only solution that will be correct is to use Berkeley DB’s Tcl API in a single-threaded environment, at least until the day when they rework their implementation to be properly thread-safe.

  3. Alexey Pechnikov says

    Do exists the correct solution for use BDB in a single-threaded environment of AOL Server? I try any ideas but it’s bad work. Do you use BDB with AOL?

  4. Unfortunately, there’s no good example code. I would start with AOLserver 4.5 and nsproxy, and load the BDB Db_tcl module in the proxy and access it there–as proxies are single-threaded tclsh processes.

    When I get a chance, I’ll try to write-up a HOWTO on how to set up and use nsproxy.

  5. Alexey Pechnikov says

    This HOWTO will be very usefull.

  6. Alexey Pechnikov says

    Are you wrote a HOWTO on how to set up and use nsproxy? I want to try nsproxy in aolserver 4.5 on debian lenny (testing) but have no good examples.

  7. Sorry, Alexey – not yet. You may want to ask for help on the AOLserver mailing list – someone else may be able to help you in the meantime.

  8. I have tried Berkeley DB on Windows for a week. Works well with a single thread. But when I try to use two threads reading at the same time that share a Db handle (there is no writing), I get an “Invalid argument” exception after approximately 5 seconds after I created threads. I open the Db handle with DB_THREADS in main thread, and then create two worker threads that use this handle, this does not help. I followed the documentation which says that this is enough and no locking or serializing access to Db handle are necessary (as well as using an environment). So now I have two options: 1) Use a single thread (which is actually not an option) 2) Throw Berkeley DB away.
    Looks like yet another raw and poorly documented and poorly supported product not better than Microsoft’s.

Speak Your Mind

*