看板 FB_doc 關於我們 聯絡資訊
The following reply was made to PR docs/50211; it has been noted by GNATS. From: Jeroen Ruigrok van der Werven <asmodai@in-nomine.org> To: bug-followup@FreeBSD.org Cc: Subject: Re: docs/50211: [PATCH] doc.docbook.mk: fix textfile creation Date: Sun, 13 May 2007 16:59:23 +0200 A long overdue update I guess. Neither links or elinks will help for the multibyte environments of Chinese, Japanese, Korean and the likes. They simply do not understand encodings such as EucJP, SJIS, GB18030, GB2312, EucKR, or UTF-8. Using www/w3m-m17n I can at least view Japanese pages. Using a 'w3m -dump http://website > dump.txt' of a EucJP encoded page the resulting file is an UTF-8 encoded plain text file. The same also works for (X-)SJIS (Japanese), GB2312 (Chinese/PRC), EucKR (Korean), UTF-8, TIS-620 (Thai), Big5 (Taiwanese), VISCII (Vietnamese), and KOI8-U (Russian). I tried some ISO-8859 dumps as well (8859-6 for example as well as -7) and it all works fine. So my suggestion is to change HTML2TXT to use w3m from w3m-m17n. -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai 扎具怒书댠押慎櫈冻钋呟 氬篭댠桨伦 艾具怒岛扼峡 http://www.in-nomine.org/ | http://www.rangaku.org/ Reality is an illusion, grimmer. The dreamlands are like masks within masks, and Time has no dominion beyond the Shroud... _______________________________________________ freebsd-doc@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-doc To unsubscribe, send any mail to "freebsd-doc-unsubscribe@freebsd.org"