A co-worker, John, asked me, “How do I normalize ISO-8859-1 encoded text to ASCII in Tcl?” From the command line, you might use GNU recode like this:
$ echo "Mötley Crüe" | recode -f L1..BS | sed -e 's/.\x08//g' Motley Crue
Fortunately, recode supports dumping a conversion table, but it currently only knows C and Perl. After a little post-processing of the output, I generated this:
set ::ISO_8859_1_to_ASCII [list \ "" "\001" "\002" "\003" "\004" "\005" "\006" "\007" "\010" "\011" \ "\012" "\013" "\014" "\015" "\016" "\017" "\020" "\021" "\022" "\023" \ "\024" "\025" "\026" "\027" "\030" "\031" "\032" "\033" "\034" "\035" \ "\036" "\037" " " "!" "\"" "#" "\$" "%" "&" "'" "(" ")" "*" "+" "," \ "-" "." "/" "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" ":" ";" "<" "=" \ ">" "?" "@" "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" \ "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z" "\[" "\\" "]" "^" "_" \ "`" "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" \ "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" "{" "|" "}" "~" "\177" "" \ "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" \ "" "" "" "" "" "" "" "" " " "" "" "" "" "" "" "" "" "" "" "\"" "" "" \ "" "" "" "" "" "" "" "" "" "" "" "" "" "\"" "" "" "" "" "A" "A" "A" \ "A" "A" "" "" "C" "E" "E" "E" "E" "I" "I" "I" "I" "" "N" "O" "O" "O" \ "O" "O" "" "O" "U" "U" "U" "U" "Y" "" "s" "a" "a" "a" "a" "a" "" "" \ "c" "e" "e" "e" "e" "i" "i" "i" "i" "" "n" "o" "o" "o" "o" "o" "" "o" \ "u" "u" "u" "u" "y" "" "y" \ ] proc ISO_8859_1_to_ASCII {in} { global ISO_8859_1_to_ASCII set out {} foreach c [split $in ""] { append out [lindex $ISO_8859_1_to_ASCII [scan $c "%c"]] } return $out }
With this code, you can use it like this, from within your Tcl script where you’ve sourced this code in:
% ISO_8859_1_to_ASCII "Mötley Crüe" Motley Crue
I hope this helps others, too.
Tags:
Tcl,
programming,
character encoding,
recode,
ASCII
Latest comments