This allows having the proc macro crate as an optional dependency and speeds up
compilation in situations where `FISH_GETTEXT_EXTRACTION_FILE` changes, such as
the `build_tools/check.sh` script. Because we don't need to recompile on changes
to the environment variable when the feature is disabled, cargo can reuse
earlier compilation results instead of recompiling everything.
This speeds up the compilation work in `build_tools/check.sh` when no changes
were made which necessitate recompilation.
For such runs of `build_tools/check.sh`, these changes reduce the runtime on my
system by about 10 seconds, from 70 to 60, approximately.
The difference comes from the following two commands recompiling code without
the changes in this commit, but not with them:
- `cargo test --doc --workspace`
- `cargo doc --workspace`
This is intended to allow translation updates in contexts where building within
the `fish_xgettext.fish` script is undesirable.
Specifically, this allows checking for PO file updates in the tests run by
`test_driver.py`. Because these use a tmpdir for `$HOME`, building within such a
test requires installing the entire Rust toolchain and doing a clean build,
which is a waste of resources.
With this argument, it is possible to build the template before running the
tests and passing the file path into the script.
This new wrapper type can be constructed via macros which invoke the
`gettext_extract` proc macro to extract the string literals for PO file
generation.
The type checking enabled by this wrapper should prevent trying to obtain
translations for a string for which none exist.
Because some strings (e.g. for completions) are not defined in Rust, but rather
in fish scripts, the `LocalizableString` type can also be constructed from
non-literals, in which case no extraction happens.
In such cases, it is the programmer's responsibility to only construct the type
for strings which are available for localization.
This approach is a replacement for the `cargo-expand`-based extraction.
When building with the `FISH_GETTEXT_EXTRACTION_FILE` environment variable set,
the `gettext_extract` proc macro will write the messages marked for extraction
to a file in the directory specified by the variable.
Updates to the po files:
- This is the result of running the `update_translations.fish` script using the
new proc_macro extraction. It finds additional messages compared to the
`cargo-expand` based approach.
- Messages IDs corresponding to paths are removed. The do not have localizations
in any language and localizing paths would not make sense. I have not
investigated how they made it into the po files in the first place.
- Some messages are reordered due to `msguniq` sorting differing from `sort`.
Remove docs about installing `cargo-expand`
These are no longer needed due to the switch to our extraction macro.
This allows msgfmt to detect issues with translations of format strings.
The detection used here is very simple. It just checks if a string contains '%',
and if it does, the entry in the po file is preceded by '#, c-format'.
Any entries with this marker are checked by msgfmt in our tests, so if an issue
arises, we will notice before it is merged.
This simplifies the logic a bit and performs a better.
Performance improvements for extract_fish_script_messages (time in
microseconds):
- explicit regex: from 128241 to 83471 (speedup 1.5)
- implicit regex: from 682203 to 463635 (speedup 1.5)
The replaces the `strs` list by a corresponding file, which eliminates the need
for looping over the list.
Use sed to transform strings into gettext po format entries.
Format the file with fish_indent and use more expressive variable name for the
file cargo expand writes to.
Performance improvements (in microseconds):
- sort+format rust strings: from 21750 to 11096 (speedup 2.0)
The fish builtin string functions are significantly slower than grep + sed.
The final replacement of \' to ' also does not make any sense here, because
single quotes appear unescaped in Rust strings.
Performance improvement: from 404880 to 44843 (speedup 9.0)
Profiling details (from separate runs):
Time (μs) Sum (μs) Command
174 404880 > set -a strs (string match -rv 'BUILD_VERSION:|PACKAGE_NAME' <$tmpfile |
string match -rg 'const [A-Z_]*: &str = "(.*)"' | string replace -a "\'" "'")
404706 404706 -> string match -rv 'BUILD_VERSION:|PACKAGE_NAME' <$tmpfile |
string match -rg 'const [A-Z_]*: &str = "(.*)"' | string replace -a "\'" "'"
202 44843 > set -a strs (grep -Ev 'BUILD_VERSION:|PACKAGE_NAME' <$tmpfile |
grep -E 'const [A-Z_]*: &str = "(.*)"' |
sed -E -e 's/^.*const [A-Z_]*: &str = "(.*)".*$/\1/' -e "s_\\\'_'_g")
4952 44641 -> grep -Ev 'BUILD_VERSION:|PACKAGE_NAME' <$tmpfile |
grep -E 'const [A-Z_]*: &str = "(.*)"' |
sed -E -e 's/^.*const [A-Z_]*: &str = "(.*)".*$/\1/' -e "s_\\\'_'_g"
28716 28716 --> command grep --color=auto $argv
10973 10973 --> command grep --color=auto $argv
Using a file is significantly faster.
Profiling overview (times in microseconds):
- cargo expand: from 4959320 to 4503409 (speedup 1.1)
- gettext call pipeline: from 436996 to 13536 (speedup 32.3)
- static string pipeline: from 477429 to 404880 (speedup 1.18)
Source locations (file name and line number) where a string originates is not
required by gettext tooling. It can help translators to identify context,
but the value of this is reduced by our lack of context support, meaning that
all occurrences of a string will receive the same translation.
Translators can use `rg` or similar tools to find the source locations.
For further details see this thread:
https://github.com/fish-shell/fish-shell/pull/11463#discussion_r2079378627
The main advantage is that updates to the PO files are now only necessary when
the source strings change, which greatly reduces the diff noise.
A secondary benefit is that the string extraction logic is simplified.
We can now directly extract the strings from fish scripts,
and several issues are fixed alongside, mostly related to quoting.
The regex for extracting implicit messages from fish scripts has been tweaked to
ignore commented-out lines, and properly support lines starting with `and`/`or`.
The old regex has the problem that it does not handle lines containing any
non-space characters in front of ` complete` (or ` function`), which results in
`string replace` leaving this part in the resulting string.
For example,
`and complete -d "foo"`
would turn into
`andN_ foo`
if passed to
`string replace --regex $regex 'N_ $1'` (where `$regex` is the `$implicit_regex`) variable.
Another issue are commented-out lines.
This greatly reduces the number of changes necessary to the PO files when the
Rust/fish source files are updated. (Changes to the line number can be applied
automatically, but this adds a lot of noise to the git history.)
Due to the way we have been extracting Rust strings, differentiation between
the same source string in different contexts has not been possible regardless
of the change.
It seems that duplicate msgid entries are not permitted in PO files, so since we
do not use context to distinguish the strings we extract, there is no way to
have context-/location-dependent translations, so we might as well reduce the
git noise by eliminating line numbers.
Including source locations helps translators with understanding context.
Because we do not distinguish between contexts for a given source string,
this is of limited utility, but keeping file names at least allows to open the
relevant files and search them for the string. This might also be helpful to
identify translations which do not make sense in all context in which they are
used. (Although without adding context support, the only remedy would be to
remove the translation altogether, as far as I can tell.)
For extraction from Rust, additional issues are fixed:
- File name extraction from the grep results now works properly. Previously,
lines not starting with whitespace resulted in missing or corrupted matches.
(missing if the source line contains no colon followed by a whitespace,
corrupted if it does, then the match included the part of the line in front of
the colon, instead of just the location)
- Only a single source location per string was supported (`head -n1`). The new
approach using sed does not have this limitation.
This should prevent occurrences of the search string from being found in other
locations (e.g. in a comment).
The whole approach of string extraction from Rust sources is sketchy,
but this at least prevents producing garbage when the content of a string
appears somewhere else unquoted.
The previous version generates files which do not preserve the line number from
the original fish script file, resulting in translation not working.
The new approach is quite ugly, and might have some issues,
but at least it seems to work in some cases.
Extracting explicit and implicit messages works essentially the same way, which
is also reflected in the code being identical, except for the regex.
Extract the duplicated code into a function.
This is absolutely disgusting code, but it works out okay-ish.
The problem is xgettext has no rust support (it's stuck in review
limbo). So we use cargo-expand to extract all invocations of
gettext, and massage all that to generate a
messages.pot ourselves.
We also assume any string constant could be translated.
Instead of using /tmp/fish as a temporary directory for this operation,
which could lead to clobbering user files, use mktemp to create an
actual temporary directory.
This change does several things. First, it works around a quirk of the
`xgetttext` command that only recognizes description strings in even
numbered position on the command. Second, it allows descriptions
introduced by the `-d` short flag to be recognized.
More importantly, it normalizes the strings so that `xgettext` correctly
extracts them into the *.po file. Prior to this change many fish script
strings were ignored due to how they were written (e.g., single versus
double quotes).
Fixes#4073