Normalize ISO-8859-1 to ASCII in Tcl

A co-worker, John, asked me, “How do I normalize ISO-8859-1 encoded text to ASCII in Tcl?” From the command line, you might use GNU recode like this:

$ echo "Mötley Crüe" | recode -f L1..BS | sed -e 's/.\x08//g'
Motley Crue

Fortunately, recode supports dumping a conversion table, but it currently only knows C and Perl. After a little post-processing of the output, I generated this:

set ::ISO_8859_1_to_ASCII [list \
    "" "\001" "\002" "\003" "\004" "\005" "\006" "\007" "\010" "\011" \
    "\012" "\013" "\014" "\015" "\016" "\017" "\020" "\021" "\022" "\023" \
    "\024" "\025" "\026" "\027" "\030" "\031" "\032" "\033" "\034" "\035" \
    "\036" "\037" " " "!" "\"" "#" "\$" "%" "&" "'" "(" ")" "*" "+" "," \
    "-" "." "/" "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" ":" ";" "<" "=" \
    ">" "?" "@" "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" \
    "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z" "\[" "\\" "]" "^" "_" \
    "`" "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" \
    "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" "{" "|" "}" "~" "\177" "" \
    "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" \
    "" "" "" "" "" "" "" "" " " "" "" "" "" "" "" "" "" "" "" "\"" "" "" \
    "" "" "" "" "" "" "" "" "" "" "" "" "" "\"" "" "" "" "" "A" "A" "A" \
    "A" "A" "" "" "C" "E" "E" "E" "E" "I" "I" "I" "I" "" "N" "O" "O" "O" \
    "O" "O" "" "O" "U" "U" "U" "U" "Y" "" "s" "a" "a" "a" "a" "a" "" "" \
    "c" "e" "e" "e" "e" "i" "i" "i" "i" "" "n" "o" "o" "o" "o" "o" "" "o" \
    "u" "u" "u" "u" "y" "" "y" \
]

proc ISO_8859_1_to_ASCII {in} {
    global ISO_8859_1_to_ASCII
    set out {}
    foreach c [split $in ""] {
        append out [lindex $ISO_8859_1_to_ASCII [scan $c "%c"]]
    }
    return $out
}

With this code, you can use it like this, from within your Tcl script where you’ve sourced this code in:

% ISO_8859_1_to_ASCII "Mötley Crüe"
Motley Crue

I hope this helps others, too.

Tags:
,
,
,
,

Comments

  1. This one helped me a lot… not in the way proposed by the author but it helped 🙂

  2. soraver: I’m glad it was helpful. Now, I’m curious: how did it help?

  3. I needed to print a char that is not in the keyboard so i figured that those \002 etc would do the trick in tcl… Unfortunately i couldnt find an escape character for the char i want to use :S
    The idea that it can be done this way was very helpful though.
    thanks 🙂

  4. soraver: What character were you trying to produce? Can you describe it?

Speak Your Mind

*