Why it might be a bad idea to fork/exec in a multi-threaded application

Recently, the question of doing a Tcl [exec] inside AOLserver came up and I suggested using nsproxy to do it outside the main AOLserver process. This spurred a thread on the mailing list asking exactly why doing a Tcl [exec] inside the main AOLserver process isn’t a good idea. Finally, here’s what I wrote, trying to explain my understanding of the problem:

Considering the activity of this thread, let me contribute what I see as
the most common “problem” for AOLserver and Tcl [exec] …

Traditionally, fork() creates a copy of the process which invoked it,
which includes the memory allocated to that process. exec() overlays a
process with a new image and begins executing it. Since the typical
fork() immediately followed by exec() doesn’t write to its memory space
until the exec(), doing an actual copy of the parent process’s memory
then destroying it when it gets overlaid by the exec() is unnecessary.
So, in modern Unixes, an optimization was introduced: the parent
process’s memory is shared with the child and pages of memory are only
copied when they are written to–commonly referred to as “copy-on-write”.

This optimization is very wise in single-threaded applications: after
the fork() but before the child exec()’s, the parent process can only
do so much (i.e., cause pages to be copied) before the child releases
the memory by performing the exec(). However, in multi-threaded
applications, all of the threads in the parent process executing can
be writing to pages in memory, causing lots of copying to occur. In
the worst case, this can degenerate into almost 100% of the pages
getting copied, which means a fork could “cost” you 2x the memory of the
original parent process, depending on how many threads are active and
what they’re doing.

You might think “but, if I fork() and immediately exec() in the child,
how much can the threads in the parent process do?” Well, in Tcl,
[exec] doesn’t do an immediate fork() and exec(). There’s a handful of
code that’s executed in between. Without doing serious profiling,
there’s probably a non-trivial amount of instructions being executed,
all opportunities for context switching and execution of the threads in
the parent process. This problem is more visible in SMP systems with
many CPUs, where more threads in the parent process can be executed
while the child process is between its fork() and exec().

I’m not sure if this was “too technical” of an explanation. If it was,
please, don’t hesitate to ask questions. I want everyone to have a
decent understanding of the issue.

Also, to clarify: there’s no “danger” in executing [exec] from within
AOLserver. It “should” work — as long as you have sufficient free
memory for any pages that need to be copied — but, the impact to
performance can be costly. This isn’t a great concern in low-traffic
sites, but is certainly an issue when scaling.

This is one of the many reasons why nsproxy is good: it mitigates the
cost by doing all the fork/exec’ing in a single-threaded process that
has a small memory footprint, entirely outside the process space of the
main AOLserver process.

Tags:
,
,
,
,

Speak Your Mind

*