elb's Blog

Terminal Illnesses

The addition of finch to the Pidgin family has caused a lot of terminal woes to show up on IRC and in our XMPP MUC. Here are a few musings on the state of terminals in the Unix world today.

First of all, a plea to terminal authors and packagers: Please do not lie about your terminal's type or capabilities! Several popular terminals claim to be xterm but have incomplete xterm emulation (gnome-terminal, I'm looking at you), and some packagers change terminal definitions gratuitously (Gentoo, why is your screen advertised as screen.linux, which appears to be a broken terminfo definition just for the sake of brokenness?)[1]. The bottom line on this is that terminal definitions exist for a reason, and have meaning, and the definition being used needs to match the terminal being used. If you are using xterm, set TERM=xterm; if you are using rxvt-unicode, set TERM=rxvt-unicode; if you are using screen, set TERM=screen.

With that out of the way, we arrive at the next problem: locales and locale management. In our case, finch is a bit problematic in that it itself does not really obey locales; finch produces fixed UTF-8 output in many cases, regardless of what your locale claims its character type is. This really isn't good, and it's only exacerbated by locale problems. You really need to be using a UTF-8 locale (typically en_US.UTF-8 or en_US.utf8; replace en_US with your appropriate country code), and your terminal and emulated environment both need to agree on this. If your terminal emulator thinks you're using ISO-8859-15, but your programs are sending UTF-8, understandably bad things will happen. Generally speaking, this means that the environment both within and without your terminal need to agree. To further complicate matters, there exist terminals (such as xterm) which don't really obey the locale; to get a proper, working UTF-8 environment in an xterm, you need to provide the -u8 command line switch. Putty and other Windows-based ssh terminals seem to share this little quirk, probably because the Windows concept of locale is quite different from the Unix concept.

GNU screen is just another wrench in the works, but fortunately it can be a useful wrench if you manage to get it unstuck from the gears. Screen, like finch and xterm, is kind of ignorant about locale encodings and has its own mechanism to deal with them. If you start screen as 'screen -U' or with 'defutf8 on', it will assume that all programs within the screen send UTF-8 regardless of locale; in addition, -U tells it that your external terminal is UTF-8. The neat trick here is that, for future attachments to the screen, the utf-8 sent by internal applications will be translated to the external terminal's declared character set! This means that you can view finch's UTF-8-only renderings in an ASCII terminal, if you play all of your cards right. Now, you'll want to make sure (if you're a Gentoo user, particularly) that your TERM inside screen is 'screen' and not something broken like 'screen-linux'; if your arrow keys don't work, it's probably something broken.

It's not clear to me why all these things aren't fixed by now. Some of them (like terminals claiming to be something they are not, and finch sending UTF-8 regardless of locale) are clearly willful perpetuation of brokenness, sometimes for good reasons, and sometimes not. Others are simply historical baggage (bizarre non-local charset options) that might eventually go away, or at least be obviated. In the meantime, keep your terminal and locale ducks all in a row, and the maze can be navigated.

[1] Update (2007-08-08 14:25): In discussion with Derek Pomery, it became clear that the 'screen.linux' thing is not related to running screen on Gentoo, but running screen at the console -- we had simply only seen console users running Gentoo, I suppose. (Several of them were waiting for their preferred DE to compile.)