Files
fish-shell/build_tools/update_translations.fish

158 lines
5.6 KiB
Fish
Raw Normal View History

#!/usr/bin/env fish
# Updates the files used for gettext translations.
Switch to builtin gettext implementation This completely removes our runtime dependency on gettext. As a replacement, we have our own code for runtime localization in `src/wutil/gettext.rs`. It considers the relevant locale variables to decide which message catalogs to take localizations from. The use of locale variables is mostly the same as in gettext, with the notable exception that we do not support "default dialects". If `LANGUAGE=ll` is set and we don't have a `ll` catalog but a `ll_CC` catalog, we will use the catalog with the country code suffix. If multiple such catalogs exist, we use an arbitrary one. (At the moment we have at most one catalog per language, so this is not particularly relevant.) By using an `EnvStack` to pass variables to gettext at runtime, we now respect locale variables which are not exported. For early output, we don't have an `EnvStack` to pass, so we add an initialization function which constructs an `EnvStack` containing the relevant locale variables from the corresponding Environment variables. Treat `LANGUAGE` as path variable. This add automatic colon-splitting. The sourcing of catalogs is completely reworked. Instead of looking for MO files at runtime, we create catalogs as Rust maps at build time, by converting PO files into MO data, which is not stored, but immediately parsed to extract the mappings. From the mappings, we create Rust source code as a build artifact, which is then macro-included in the crate's library, i.e. `crates/gettext-maps/src/lib.rs`. The code in `src/wutil/gettext.rs` includes the message catalogs from this library, resulting in the message catalogs being built into the executable. The `localize-messages` feature can now be used to control whether to build with gettext support. By default, it is enabled. If `msgfmt` is not available at build time, and `gettext` is enabled, a warning will be emitted and fish is built with gettext support, but without any message catalogs, so localization will not work then. As a performance optimization, for each language we cache a separate Rust source file containing its catalog as a map. This allows us to reuse parsing results if the corresponding PO files have not changed since we cached the parsing result. Note that this approach does not eliminate our build-time dependency on gettext. The process for generating PO files (which uses `msguniq` and `msgmerge`) is unchanged, and we still need `msgfmt` to translate from PO to MO. We could parse PO files directly, but these are significantly more complex to parse, so we use `msgfmt` to do it for us and parse the resulting MO data. Advantages of the new approach: - We have no runtime dependency on gettext anymore. - The implementation has the same behavior everywhere. - Our implementation is significantly simpler than GNU gettext. - We can have localization in cargo-only builds by embedding localizations into the code. Previously, localization in such builds could only work reliably as long as the binary was not moved from the build directory. - We no longer have to take care of building and installing MO files in build systems; everything we need for localization to work happens automatically when building fish. - Reduced overhead when disabling localization, both in compilation time and binary size. Disadvantages of this approach: - Our own runtime implementation of gettext needs to be maintained. - The implementation has a more limited feature set (but I don't think it lacks any features which have been in use by fish). Part of #11726 Closes #11583 Closes #11725 Closes #11683
2025-08-22 20:03:45 +02:00
# By default, the whole xgettext + msgmerge pipeline runs,
# which extracts the messages from the source files into $template_file,
Switch to builtin gettext implementation This completely removes our runtime dependency on gettext. As a replacement, we have our own code for runtime localization in `src/wutil/gettext.rs`. It considers the relevant locale variables to decide which message catalogs to take localizations from. The use of locale variables is mostly the same as in gettext, with the notable exception that we do not support "default dialects". If `LANGUAGE=ll` is set and we don't have a `ll` catalog but a `ll_CC` catalog, we will use the catalog with the country code suffix. If multiple such catalogs exist, we use an arbitrary one. (At the moment we have at most one catalog per language, so this is not particularly relevant.) By using an `EnvStack` to pass variables to gettext at runtime, we now respect locale variables which are not exported. For early output, we don't have an `EnvStack` to pass, so we add an initialization function which constructs an `EnvStack` containing the relevant locale variables from the corresponding Environment variables. Treat `LANGUAGE` as path variable. This add automatic colon-splitting. The sourcing of catalogs is completely reworked. Instead of looking for MO files at runtime, we create catalogs as Rust maps at build time, by converting PO files into MO data, which is not stored, but immediately parsed to extract the mappings. From the mappings, we create Rust source code as a build artifact, which is then macro-included in the crate's library, i.e. `crates/gettext-maps/src/lib.rs`. The code in `src/wutil/gettext.rs` includes the message catalogs from this library, resulting in the message catalogs being built into the executable. The `localize-messages` feature can now be used to control whether to build with gettext support. By default, it is enabled. If `msgfmt` is not available at build time, and `gettext` is enabled, a warning will be emitted and fish is built with gettext support, but without any message catalogs, so localization will not work then. As a performance optimization, for each language we cache a separate Rust source file containing its catalog as a map. This allows us to reuse parsing results if the corresponding PO files have not changed since we cached the parsing result. Note that this approach does not eliminate our build-time dependency on gettext. The process for generating PO files (which uses `msguniq` and `msgmerge`) is unchanged, and we still need `msgfmt` to translate from PO to MO. We could parse PO files directly, but these are significantly more complex to parse, so we use `msgfmt` to do it for us and parse the resulting MO data. Advantages of the new approach: - We have no runtime dependency on gettext anymore. - The implementation has the same behavior everywhere. - Our implementation is significantly simpler than GNU gettext. - We can have localization in cargo-only builds by embedding localizations into the code. Previously, localization in such builds could only work reliably as long as the binary was not moved from the build directory. - We no longer have to take care of building and installing MO files in build systems; everything we need for localization to work happens automatically when building fish. - Reduced overhead when disabling localization, both in compilation time and binary size. Disadvantages of this approach: - Our own runtime implementation of gettext needs to be maintained. - The implementation has a more limited feature set (but I don't think it lacks any features which have been in use by fish). Part of #11726 Closes #11583 Closes #11725 Closes #11683
2025-08-22 20:03:45 +02:00
# and updates the PO files for each language from that.
#
# Use cases:
# For developers:
Switch to builtin gettext implementation This completely removes our runtime dependency on gettext. As a replacement, we have our own code for runtime localization in `src/wutil/gettext.rs`. It considers the relevant locale variables to decide which message catalogs to take localizations from. The use of locale variables is mostly the same as in gettext, with the notable exception that we do not support "default dialects". If `LANGUAGE=ll` is set and we don't have a `ll` catalog but a `ll_CC` catalog, we will use the catalog with the country code suffix. If multiple such catalogs exist, we use an arbitrary one. (At the moment we have at most one catalog per language, so this is not particularly relevant.) By using an `EnvStack` to pass variables to gettext at runtime, we now respect locale variables which are not exported. For early output, we don't have an `EnvStack` to pass, so we add an initialization function which constructs an `EnvStack` containing the relevant locale variables from the corresponding Environment variables. Treat `LANGUAGE` as path variable. This add automatic colon-splitting. The sourcing of catalogs is completely reworked. Instead of looking for MO files at runtime, we create catalogs as Rust maps at build time, by converting PO files into MO data, which is not stored, but immediately parsed to extract the mappings. From the mappings, we create Rust source code as a build artifact, which is then macro-included in the crate's library, i.e. `crates/gettext-maps/src/lib.rs`. The code in `src/wutil/gettext.rs` includes the message catalogs from this library, resulting in the message catalogs being built into the executable. The `localize-messages` feature can now be used to control whether to build with gettext support. By default, it is enabled. If `msgfmt` is not available at build time, and `gettext` is enabled, a warning will be emitted and fish is built with gettext support, but without any message catalogs, so localization will not work then. As a performance optimization, for each language we cache a separate Rust source file containing its catalog as a map. This allows us to reuse parsing results if the corresponding PO files have not changed since we cached the parsing result. Note that this approach does not eliminate our build-time dependency on gettext. The process for generating PO files (which uses `msguniq` and `msgmerge`) is unchanged, and we still need `msgfmt` to translate from PO to MO. We could parse PO files directly, but these are significantly more complex to parse, so we use `msgfmt` to do it for us and parse the resulting MO data. Advantages of the new approach: - We have no runtime dependency on gettext anymore. - The implementation has the same behavior everywhere. - Our implementation is significantly simpler than GNU gettext. - We can have localization in cargo-only builds by embedding localizations into the code. Previously, localization in such builds could only work reliably as long as the binary was not moved from the build directory. - We no longer have to take care of building and installing MO files in build systems; everything we need for localization to work happens automatically when building fish. - Reduced overhead when disabling localization, both in compilation time and binary size. Disadvantages of this approach: - Our own runtime implementation of gettext needs to be maintained. - The implementation has a more limited feature set (but I don't think it lacks any features which have been in use by fish). Part of #11726 Closes #11583 Closes #11725 Closes #11683
2025-08-22 20:03:45 +02:00
# - Run with no args to update all PO files after making changes to Rust/fish sources.
# For translators:
# - Specify the language you want to work on as an argument, which must be a file in the po/
# directory. You can specify a language which does not have translations yet by specifying the
# name of a file which does not yet exist. Make sure to follow the naming convention.
# For testing:
# - Specify `--dry-run` to see if any updates to the PO files would by applied by this script.
# If this flag is specified, the script will exit with an error if there are outstanding
# changes, and will display the diff. Do not specify other flags if `--dry-run` is specified.
#
# Specify `--use-existing-template=FILE` to prevent running cargo for extracting an up-to-date
# version of the localized strings. This flag is intended for testing setups which make it
# inconvenient to run cargo here, but run it in an earlier step to ensure up-to-date values.
# This argument is passed on to the `fish_xgettext.fish` script and has no other uses.
# `FILE` must be the path to a gettext template file generated from our compilation process.
# It can be obtained by running:
# set -l FILE (mktemp)
# FISH_GETTEXT_EXTRACTION_FILE=$FILE cargo check --features=gettext-extract
# The sort utility is locale-sensitive.
# Ensure that sorting output is consistent by setting LC_ALL here.
set -gx LC_ALL C.UTF-8
set -l build_tools (status dirname)
set -l po_dir $build_tools/../po
set -l extract
set -l po
Switch to builtin gettext implementation This completely removes our runtime dependency on gettext. As a replacement, we have our own code for runtime localization in `src/wutil/gettext.rs`. It considers the relevant locale variables to decide which message catalogs to take localizations from. The use of locale variables is mostly the same as in gettext, with the notable exception that we do not support "default dialects". If `LANGUAGE=ll` is set and we don't have a `ll` catalog but a `ll_CC` catalog, we will use the catalog with the country code suffix. If multiple such catalogs exist, we use an arbitrary one. (At the moment we have at most one catalog per language, so this is not particularly relevant.) By using an `EnvStack` to pass variables to gettext at runtime, we now respect locale variables which are not exported. For early output, we don't have an `EnvStack` to pass, so we add an initialization function which constructs an `EnvStack` containing the relevant locale variables from the corresponding Environment variables. Treat `LANGUAGE` as path variable. This add automatic colon-splitting. The sourcing of catalogs is completely reworked. Instead of looking for MO files at runtime, we create catalogs as Rust maps at build time, by converting PO files into MO data, which is not stored, but immediately parsed to extract the mappings. From the mappings, we create Rust source code as a build artifact, which is then macro-included in the crate's library, i.e. `crates/gettext-maps/src/lib.rs`. The code in `src/wutil/gettext.rs` includes the message catalogs from this library, resulting in the message catalogs being built into the executable. The `localize-messages` feature can now be used to control whether to build with gettext support. By default, it is enabled. If `msgfmt` is not available at build time, and `gettext` is enabled, a warning will be emitted and fish is built with gettext support, but without any message catalogs, so localization will not work then. As a performance optimization, for each language we cache a separate Rust source file containing its catalog as a map. This allows us to reuse parsing results if the corresponding PO files have not changed since we cached the parsing result. Note that this approach does not eliminate our build-time dependency on gettext. The process for generating PO files (which uses `msguniq` and `msgmerge`) is unchanged, and we still need `msgfmt` to translate from PO to MO. We could parse PO files directly, but these are significantly more complex to parse, so we use `msgfmt` to do it for us and parse the resulting MO data. Advantages of the new approach: - We have no runtime dependency on gettext anymore. - The implementation has the same behavior everywhere. - Our implementation is significantly simpler than GNU gettext. - We can have localization in cargo-only builds by embedding localizations into the code. Previously, localization in such builds could only work reliably as long as the binary was not moved from the build directory. - We no longer have to take care of building and installing MO files in build systems; everything we need for localization to work happens automatically when building fish. - Reduced overhead when disabling localization, both in compilation time and binary size. Disadvantages of this approach: - Our own runtime implementation of gettext needs to be maintained. - The implementation has a more limited feature set (but I don't think it lacks any features which have been in use by fish). Part of #11726 Closes #11583 Closes #11725 Closes #11683
2025-08-22 20:03:45 +02:00
argparse dry-run use-existing-template= -- $argv
or exit $status
if test -z $argv[1]
# Update everything if not specified otherwise.
set -g po_files $po_dir/*.po
else
set -l po_dir_id (stat --format='%d:%i' -- $po_dir)
for arg in $argv
set -l arg_dir_id (stat --format='%d:%i' -- (dirname $arg) 2>/dev/null)
if test $po_dir_id != "$arg_dir_id"
echo "Argument $arg is not a file in the directory $(realpath $po_dir)."
echo "Non-option arguments must specify paths to files in this directory."
echo ""
echo "If you want to add a new language to the translations not the following:"
echo "The filename must identify a language, with a two letter ISO 639-1 language code of the target language (e.g. 'pt' for Portuguese), and use the file extension '.po'."
echo "Optionally, you can specify a regional variant (e.g. 'pt_BR')."
echo "So valid filenames are of the shape 'll.po' or 'll_CC.po'."
exit 1
end
if not basename $arg | grep -qE '^[a-z]{2,3}(_[A-Z]{2})?\.po$'
echo "Filename does not match the expected format ('ll.po' or 'll_CC.po')."
exit 1
end
end
set -g po_files $argv
end
set -g template_file (mktemp)
# Protect from externally set $tmpdir leaking into this script.
set -g tmpdir
function cleanup_exit
set -l exit_status $status
rm $template_file
if set -g --query tmpdir[1]
rm -r $tmpdir
end
exit $exit_status
end
if set -l --query extract
set -l xgettext_args
if set -l --query _flag_use_existing_template
set xgettext_args --use-existing-template=$_flag_use_existing_template
end
$build_tools/fish_xgettext.fish $xgettext_args >$template_file
or cleanup_exit
end
if set -l --query _flag_dry_run
# On a dry run, we do not modify po/ but write to a temporary directory instead and check if
# there is a difference between po/ and the tmpdir after re-generating the PO files.
set -g tmpdir (mktemp -d)
# Ensure tmpdir has the same initial state as the po dir.
cp -r $po_dir/* $tmpdir
end
# This is used to identify lines which should be set here via $header_lines.
# Make sure that this prefix does not appear elsewhere in the file and only contains characters
# without special meaning in a sed pattern.
set -g header_prefix "# fish-note-sections: "
function print_header
set -l header_lines \
"Translations are divided into sections, each starting with a fish-section-* pseudo-message." \
"The first few sections are more important." \
"Ignore the tier3 sections unless you have a lot of time."
for line in $header_lines
printf '%s%s\n' $header_prefix $line
end
end
function merge_po_files --argument-names template_file po_file
msgmerge --no-wrap --update --no-fuzzy-matching --backup=none --quiet \
$po_file $template_file
or cleanup_exit
set -l new_po_file (mktemp) # TODO Remove on failure.
# Remove obsolete messages instead of keeping them as #~ entries.
and msgattrib --no-wrap --no-obsolete -o $new_po_file $po_file
or cleanup_exit
begin
print_header
# Paste PO file without old header lines.
sed '/^'$header_prefix'/d' $new_po_file
end >$po_file
rm $new_po_file
end
for po_file in $po_files
if set --query tmpdir[1]
set po_file $tmpdir/(basename $po_file)
end
if set -l --query po
if test -e $po_file
merge_po_files $template_file $po_file
else
begin
print_header
cat $template_file
end >$po_file
end
end
end
if set -g --query tmpdir[1]
diff -ur $po_dir $tmpdir
or begin
echo ERROR: translations in ./po/ are stale. Try running build_tools/update_translations.fish
cleanup_exit
end
end
cleanup_exit