There are quite a number of different ways you can set up your Linux to read and write Chinese. Most mordern Linux distributions come with Chinese font by default (at least some GB2312 fonts), and browsing Chinese web sites is no long an issue.
Here is one example of setting a Chinese system for input under Slackware 9.0.
~/.bashrc to haveexport XMODIFIERS=@im=fcitx
unset LC_ALL
~/.Xclients before your favorite window manager’s starts.LC_ALL=zh_CN.GB2312 fcitx
LC_ALL=zh_CN.GB2312 mlterm
LC_ALL=zh_CN.GB2312 opera
LANG=zh_CN.GB2312 mozilla
Please note that all the LC_ALL and LANG stuff in the above is a bit hackish. On Redhat Linux 9, you simply need LANG=zh_CN.GB2312 mlterm to enable Chinese input via fcitx in mlterm. But somehow, it doesn’t work in Slackware 9.0, so I’m forced to use LC_ALL=zh_CN.GB2312 mlterm to start mlterm and reset LC_ALL in my .bashrc so that the usual date and locale settings are still English. It is also wierd that opera must be started with LC_ALL to receive Chinese input, while mozilla can do with only LANG setting. I also notice some wierdness of readline in bash under Redhat Linux 9.0, it no longer allows 8-bit clean inputs under its default locale setting.
In short, you may always try the LANG setting first, as it is the most non-intrusive. If that fails, then try LC_ALL setting. If you still prefer English environment and just want Chinese input only, you may reset LC_ALL in ~/.bashrc.
To input Chinese on normal command line under bash, the simple way is just
export LC_ALL=zh_CN.GB2312
This will change how ls sorts its output too. If that doesn’t work for you, try this too:
cat > ~/.inputrc << END
set meta-flag On
set convert-meta Off
set output-meta On
END
This is a setup for readline to accept 8-bit characters. On Slackware 9.0, the ~/.inputrc file alone works fine for me, but seems that on Redhat 9.0, the only way to do it is to set the LC_ALL.
I used to have Cxterm as a Chinese environment, but no longer.
There is a CJK Package for Latex, and you’ll need to set up Chinese fonts for Latex separately. Most guides I found on the web are overly complex, but it’s rather simple to setup once you understand what it needs. The following is done under tetex-3.0, but may also apply to other Latex environments.
For font installation, I use a tool called gbkfonts (not sure about its origin, but I got one from He Bo Liang’s homepage). I use the statically compiled version without going through the compilation.
Then you’ll need a set of Chinese Truetype fonts. I use the Microsoft YaHei font from Vista. Here is an example of what I do.
Download the latest CJK package. For example, I use cjk–4.7.0.tgz. Become root and do:
TEXDIR=/usr/share/texmf-local # or you can use /usr/share/texmf
mkdir -p $TEXDIR/tex/latex
tar zxf cjk-4.7.0.tar.gz
mv cjk-4.7.0/texinput $TEXDIR/tex/latex/CJK
Copy gbkfonts binary and your Truetype fonts to a temporary directory, become root and do the following
./gbkfonts msyh.ttf yahei # repeat it if you have more fonts
mkdir -p $TEXDIR/fonts/afm $TEXDIR/fonts/tfm $TEXDIR/fonts/type1 $TEXDIR/fonts/map/chinese
mv ./tex/latex/CJK/GB/* $TEXDIR/tex/latex/CJK/GB/
mv ./fonts/afm/chinese $TEXDIR/fonts/afm/
mv ./fonts/tfm/chinese $TEXDIR/fonts/tfm/
mv ./fonts/type1/chinese $TEXDIR/fonts/type1/
mv cjk.map $TEXDIR/fonts/map/chinese/
echo "Map cjk.map" >> /usr/share/texmf/web2c/updmap.cfg
The above installs the font metrics and the Type1 fonts with tetex and CJK, and then updates the map file setting to let it find them. You don’t even need the TTF file to be installed with tetex if you are not going to use dvipdfmx.
After the above setup, you may refresh tetex by:
texhash
updmap-sys # do this as root, or updmap as an user
For a test, you may write a Tex document like this:
Run pdflatex or latex over it to produce PDF or DVI file. The bold font would appear bold if you’ve installed a bold Type1 font.
There are more settings and macros to use to typeset Chinese documents, and I recommend the ctex site for further information.
If you are mostly making PDF files, then there is a good reason to embed Truetype fonts rather than Type1. Because Type1 fonts are limited to 256 glyphs every subfont. you’ll usually end up with quite a lot of subfonts embeded in a single PDF, and some applications, in particular, xpdf, don’t handle them well.
First, you’ll need to install the Truetype font. Do the following from the same directory where you run gbkfont:
mkdir -p $TEXDIR/dvipdfm $TEXDIR/fonts/truetype/chinese
mv ./cid-x.map $TEXDIR/dvipdfm/
mv msyh.ttf $TEXDIR/fonts/truetype/chinese/
Second, you need to get dvipdfmx and install it like this:
tar zxf dvipdfmx-*.tar.gz
cd dvipdfmx
./configure --with-kpathsea=/usr/share/texmf/
make
cp data/config/glyphlist.txt data/config/dvipdfmx.cfg $TEXDIR/dvipdfm/
cp src/dvipdfmx /usr/share/texmf/bin/
Third, you also the sfd file for GBK: UGBK.sfd and cmap files: Adobe-GB1-UCS2, UniGB-UCS2-H, UniGB-UCS2-V, UniGB-UTF16-H and UniGB-UTF16-V, which are actually corrected versions from Adobe Acrobat Reader’s Chinese font resource. Install them like this:
mkdir $TEXDIR/fonts/sfd $TEXDIR/fonts/cmap
cp UGBK.sf $TEXDIR/fonts/sfd
unzip -d $TEXDIR/fonts/cmap cmap.zip
Finally you need to do a texhash before using it. To use dvipdfmx, you may need to specify “dvipdfm” as an option for some packages. For instance, to make geometry and hyperref package generate the right DVI file for dvipdfmx you need
where the … represents other options you may use with these packages. Once you do this, pdflatex no longer work with the same source tex file, and you must run latex file.tex followed by dvipdfmx file.dvi to produce CID-indexed Truetype embeded PDF files, which are usually smaller in size compared to those Type1 embeded ones.
The best option is to use CJK Latex, but otherwise read on.
There is a Unicode text editor yudit that prints Chinese text files in PS format. It handles many other languages and encodings as well as code conversions between them. If you have some truetype fonts, it will produce a beautiful printout with anti-aliasing.
It is very sad that neither Opera nor Mozilla prints Chinese HTML correctly under Linux, at least not without some non-trivial hack. Previously I did a bit work on improving the Unix printing of Netscape Communicator 4.0 for European languages. It is a difficult task, especially when there lacks a set of standard International fonts for PS printers.
But again, the yudit folks got it right by embedding a small set of custom typefaces into each PS page, and using the same truetype font for both displaying and printing purposes.
I just recently tried the UTF–8 support for mlterm, it works wonderfully. On recent Linux distributions with zh_CN.utf8 support, type at command line
LC_ALL=zh_CN.utf8 mlterm
On older Linux systems with no zh_CN.utf8 support, type at command line
LC_ALL=zh_CN.GB2312 mlterm -E UTF-8
Chinese input by Fcitx works the same in these UTF–8 terminals just like in other terminals. But free UTF–8 font is somewhat hard to find. Some of the truetype fonts with Unicode encoding can be used by mlterm for aafont setting (run mlterm with -A option, compilation with freetype required), but I find it best to use bitmap font for terminals.
As a result of an afternoon hack, I’m glad to annouce two new Unicode fonts (12 point size and 14 point size, in pcf.gz format, good for X11) for download by sampling and combining the glyphs from a free Chinese Truetype font and some fixed size Ascii font. The copyright of these fonts belongs to their original authors. Take as it is.
To set up these fonts, first you have to put them into some X11 font directory such as /usr/X11R6/lib/X11/fonts/misc and then run mkfontdir (probably with root) to update the fonts.dir file.
Secondly, edit the fonts.alias file in the same directory to have:
unifont12 -misc-unifont-medium-r-normal--12-120-75-75-c-120-iso10646-1
unifont14 -misc-unifont-medium-r-normal--14-140-75-75-c-140-iso10646-1
This will make two aliases for the fonts so that you can use easily use them in mlterm. Now you can tell X11 to rehash its font by xset fp rehash so that these new fonts will be loaded.
Lastly, you have to tell mlterm to use them, edit a file named ~/.mlterm/font (or your system-wide mlterm configuration file) to have:
ISO10646_UCS2_1=unifont12;14,unifont14;
ISO10646_UCS2_1_BIWIDTH=unifont12;14,unifont14;
Because mlterm defaults to font size of 16, So you want to add option -w 14 or -w 12 when you use these fonts.
BTW, these fonts work superbly well in browsers like Netscape and Opera, you just set every foreign language to unifont, and enforce the minimal font size to be 12 so that the browser will not try to shrink the already tiny font. This is a good trick to deal with nasty sites that use 9pt font such as zaobao.com.
Earlier Mutt version used to be 8-bit clean, only until recent years with the support of encoding conversion, etc., it often breaks down when you read email text written in GB2312 but with US-ASCII or ISO–8859–1 MIME encoding. It seems that we should blame the sender of the email, but nevertheless, being able to read illy encoded text should be a feature too.
So if you are a Mutt addict just like me, and want to display illy encoded Chinese email text, you need to get the recent version Mutt 1.4.2.1, compile it by yourself. Optionally you may apply a GB2312 line wrapping patch mutt–1.4.2.1i.patch.gz before you compile it like this:
./configure --without-wc-funcs --disable-iconv --prefix=<some directory>
make
Disabing iconv will stop mutt from doing encoding coversions so that even illy encoded text is displayed at its original form, which is usually what we want.