The following reply was made to PR docs/50211; it has been noted by GNATS.
From: Jeroen Ruigrok van der Werven <asmodai@in-nomine.org>
To: bug-followup@FreeBSD.org
Cc:
Subject: Re: docs/50211: [PATCH] doc.docbook.mk: fix textfile creation
Date: Sun, 13 May 2007 16:59:23 +0200
A long overdue update I guess.
Neither links or elinks will help for the multibyte environments of Chinese,
Japanese, Korean and the likes. They simply do not understand encodings such
as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8.
Using www/w3m-m17n I can at least view Japanese pages.
Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the
resulting file is an UTF-8 encoded plain text file.
The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR
(Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and
KOI8-U (Russian).
I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it
all works fine.
So my suggestion is to change HTML2TXT to use w3m from w3m-m17n.
--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
扎具怒书댠押慎櫈冻钋呟 氬篭댠桨伦 艾具怒岛扼峡
http://www.in-nomine.org/ | http://www.rangaku.org/
Reality is an illusion, grimmer. The dreamlands are like masks within
masks, and Time has no dominion beyond the Shroud...
_______________________________________________
freebsd-doc@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-doc
To unsubscribe, send any mail to "freebsd-doc-unsubscribe@freebsd.org"