mirror of
https://github.com/Zeckmathederg/glfs.git
synced 2025-01-25 07:42:13 +08:00
6473e745fe
git-svn-id: svn://svn.linuxfromscratch.org/BLFS/trunk/BOOK@5726 af4574ff-66df-0310-9fd7-8a98e5e911e0
144 lines
6.2 KiB
XML
144 lines
6.2 KiB
XML
<?xml version="1.0" encoding="ISO-8859-1"?>
|
|
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
|
|
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd" [
|
|
<!ENTITY % general-entities SYSTEM "../../general.ent">
|
|
%general-entities;
|
|
]>
|
|
|
|
<sect1 id="locale-issues" xreflabel="Locale Related Issues">
|
|
<?dbhtml filename="locale-issues.html"?>
|
|
|
|
<sect1info>
|
|
<othername>$LastChangedBy$</othername>
|
|
<date>$Date$</date>
|
|
</sect1info>
|
|
|
|
<title>Locale Related Issues</title>
|
|
|
|
<para>This page contains information about locale related problems and
|
|
issues. In this paragraph you'll find a generic overview of things that can
|
|
come up when configuring your system for various locales. The previous
|
|
sentence and the remainder of this paragraph must still be
|
|
revised/completed.</para>
|
|
|
|
<sect2>
|
|
|
|
<title>Package Specific Locale Issues</title>
|
|
|
|
<para>For package-specific issues, find the concerned package from the list
|
|
below and follow the link to view the available information. If a package
|
|
is not listed here, it does not mean there are no known locale-specific
|
|
issues or problems with that package. It only means that this page has not
|
|
been updated with the locale-specific information regarding that package.
|
|
Please reference the BLFS Wiki page for a particular package for any
|
|
additional locale-specific information. </para>
|
|
|
|
<itemizedlist>
|
|
|
|
<title>List of Packages with Locale Related Issues</title>
|
|
|
|
<listitem>
|
|
<para><xref linkend="locale-mc"/></para>
|
|
</listitem>
|
|
<listitem>
|
|
<para><xref linkend="locale-unzip"/></para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<sect3 id="locale-mc" xreflabel="MC-&mc-version;">
|
|
|
|
<title><xref linkend="mc"/></title>
|
|
|
|
<para>This package makes the assumption that <quote>characters</quote>
|
|
and <quote>bytes</quote> are the same thing. This is not true in UTF-8
|
|
based locales. Due to this assumption <application>MC</application> will
|
|
incorrectly position characters on the screen. After the cursor is moved
|
|
a bit the screen becomes totally unreadable, as illustrated on
|
|
<ulink url="&files-anduin;/mc-bad.png">this
|
|
screenshot</ulink> (taken in a ru_RU.UTF-8 locale). Additionally, input
|
|
of non-ASCII characters in the editor is impossible, even after selecting
|
|
<quote>Other 8-bit</quote> encoding from the menu.</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3 id="locale-unzip" xreflabel="UnZip-&unzip-version;">
|
|
|
|
<title><xref linkend="unzip"/></title>
|
|
|
|
<note>
|
|
<para>Use of <application>UnZip</application> in the
|
|
<application>JDK</application>, <application>Mozilla</application>,
|
|
<application>DocBook</application> or any other BLFS package
|
|
installation is not a problem, as BLFS instructions never use
|
|
<application>UnZip</application> to extract a file with non-ASCII
|
|
characters in the file's name.</para>
|
|
</note>
|
|
|
|
<para>The <application>UnZip</application> package assumes that filenames
|
|
stored in the ZIP archives created on non-Unix systems are encoded in
|
|
CP850, and that they should be converted to ISO-8859-1 when writing files
|
|
onto the filesystem. Such assumptions are not always valid. In fact,
|
|
inside the ZIP archive, filenames are encoded in the DOS codepage that is
|
|
in use in the relevant country, and the filenames on disk should be in
|
|
the locale encoding. In MS Windows, the OemToChar() C function (from
|
|
<filename>User32.DLL</filename>) does the correct conversion (which is
|
|
indeed the conversion from CP850 to a superset of ISO-8859-1 if MS
|
|
Windows is set up to use the US English language), but there is no
|
|
equivalent in Linux.</para>
|
|
|
|
<para>When using <command>unzip</command> to unpack a ZIP archive
|
|
containing non-ASCII filenames, the filenames are damaged because
|
|
<command>unzip</command> uses improper conversion when any of its
|
|
encoding assumptions are incorrect. For example, in the ru_RU.KOI8-R
|
|
locale, conversion of filenames from CP866 to KOI8-R is required, but
|
|
conversion from CP850 to ISO-8859-1 is done, which produces filenames
|
|
consisting of undecipherable characters instead of words (the closest
|
|
equivalent understandable example for English-only users is rot13). There
|
|
are several ways around this limitation:</para>
|
|
|
|
<para>1) For unpacking ZIP archives with filenames containing non-ASCII
|
|
characters, use <ulink url="http://www.winzip.com/">WinZip</ulink> while
|
|
running the <ulink url="http://www.winehq.com/">Wine</ulink> Windows
|
|
emulator.</para>
|
|
|
|
<para>2) After running <command>unzip</command>, fix the damage made to
|
|
the filenames using the <command>convmv</command> tool
|
|
(<ulink url="http://j3e.de/linux/convmv/"/>). The following is an example
|
|
for the ru_RU.KOI8-R locale:</para>
|
|
|
|
<blockquote>
|
|
<para>Step 1. Undo the conversion done by
|
|
<command>unzip</command>:</para>
|
|
|
|
<screen><userinput>convmv -f iso-8859-1 -t cp850 -r --nosmart --notest \
|
|
<replaceable>[/path/to/unzipped/files]</replaceable></userinput></screen>
|
|
|
|
<para>Step 2. Do the correct conversion instead:</para>
|
|
|
|
<screen><userinput>convmv -f cp866 -t koi8-r -r --nosmart --notest \
|
|
<replaceable>[/path/to/unzipped/files]</replaceable></userinput></screen>
|
|
</blockquote>
|
|
|
|
<para>3) Apply this patch to unzip:
|
|
<ulink url="https://bugzilla.altlinux.ru/attachment.cgi?id=532"/></para>
|
|
|
|
<para>It allows to specify the assumed filename encoding in the ZIP
|
|
archive using the <option>-O charset_name</option> option and the
|
|
on-disk filename encoding using the <option>-I charset_name</option>
|
|
option. Defaults: the on-disk filename encoding is the locale encoding,
|
|
the encoding inside the ZIP archive is guessed according to the builtin
|
|
table based on the locale encoding. For US English users, this still
|
|
means that unzip converts from CP850 to ISO-8859-1 by default.</para>
|
|
|
|
<para>Caveat: this method works only with 8-bit locale encodings, not
|
|
with UTF-8. Attempting to use a patched <command>unzip</command> in UTF-8
|
|
locales may result in a segmentation fault and is probably a security
|
|
risk.</para>
|
|
|
|
</sect3>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|