Try to set LC_CTYPE to something UTF-8 capable (#8031)

* Try to set LC_CTYPE to something UTF-8 capable

When fish is started with LC_CTYPE=C (even just effectively, often via
LC_ALL=C!), it's basically broken. There's no way to handle non-ASCII
characters with a C locale unless we want to write our
locale-independent replacements for all of the system functions.

Since we're not going to do that, let's try to find *some locale* for
LC_CTYPE.

We already do that in __fish_setlocale, but that's

- a bit of a weird thing that reads unstandardized system
  configuration files
- allows setting locale to C explicitly

So it's still easily possible to end up in a broken configuration.

Now, the issue with this is that there is (AFAICT) no portable way to
get a list of all allowed locales and C.UTF-8 is not standardized, so
we have no one locale to fall back on and are forced to try a few. The
list we have here is quite arbitrary, but it's a start.

Python does something similar and only tries C.UTF-8, C.utf8 and
"UTF-8".

Once C.UTF-8 is (hopefully) standardized, that will just start
working (tm).

Note that we do not *export* the fixed LC_CTYPE variable, so external
programs still have to deal with the C locale, but we have no real
business messing with the user's environment.

To turn it off: $fish_allow_singlebyte_locale, if set to something true (like "1"),
will re-run the locale initialization and skip the bit where we force
LC_CTYPE to be utf8-capable.

This is mainly used in our tests, but might also be useful if people
are trying to do something weird.
This commit is contained in:
Fabian Homborg
2021-06-06 09:28:32 +02:00
committed by GitHub
parent e57c998d4c
commit 046db09f90
3 changed files with 36 additions and 1 deletions

View File

@@ -1409,6 +1409,7 @@ The locale variables are: ``LANG``, ``LC_ALL``, ``LC_COLLATE``, ``LC_CTYPE``, ``
The most common way to set the locale to use a command like ``set -gx LANG en_GB.utf8``, which sets the current locale to be the English language, as used in Great Britain, using the UTF-8 character set. That way any program that requires one setting differently can easily override just that and doesn't have to resort to LC_ALL. For a list of available locales on your system, try ``locale -a``.
Because it needs to handle output that might include multibyte characters (like e.g. emojis), fish will try to set its own internal LC_CTYPE to one that is UTF8-capable even if given an effective LC_CTYPE of "C" (the default). This prevents issues with e.g. filenames given in autosuggestions even if the user started fish with LC_ALL=C. To turn this handling off, set ``fish_allow_singlebyte_locale`` to "1".
.. _builtin-overview: