Remove line numbers from translation strings

This greatly reduces the number of changes necessary to the PO files when the
Rust/fish source files are updated. (Changes to the line number can be applied
automatically, but this adds a lot of noise to the git history.)

Due to the way we have been extracting Rust strings, differentiation between
the same source string in different contexts has not been possible regardless
of the change.

It seems that duplicate msgid entries are not permitted in PO files, so since we
do not use context to distinguish the strings we extract, there is no way to
have context-/location-dependent translations, so we might as well reduce the
git noise by eliminating line numbers.

Including source locations helps translators with understanding context.
Because we do not distinguish between contexts for a given source string,
this is of limited utility, but keeping file names at least allows to open the
relevant files and search them for the string. This might also be helpful to
identify translations which do not make sense in all context in which they are
used. (Although without adding context support, the only remedy would be to
remove the translation altogether, as far as I can tell.)

For extraction from Rust, additional issues are fixed:
- File name extraction from the grep results now works properly. Previously,
  lines not starting with whitespace resulted in missing or corrupted matches.
  (missing if the source line contains no colon followed by a whitespace,
  corrupted if it does, then the match included the part of the line in front of
  the colon, instead of just the location)
- Only a single source location per string was supported (`head -n1`). The new
  approach using sed does not have this limitation.
This commit is contained in:
Daniel Rainer
2025-05-07 22:41:50 +02:00
committed by Johannes Altmanninger
parent df591a2e0f
commit 2d58cfe4cb

View File

@@ -11,7 +11,7 @@ or exit 1
# This is a gigantic crime.
# xgettext still does not support rust *at all*, so we use cargo-expand to get all our wgettext invocations.
set -l expanded (cargo expand --lib; for f in fish{,_indent,_key_reader}; cargo expand --bin $f; end)
set -l expanded (cargo expand --lib; for f in fish fish_indent fish_key_reader; cargo expand --bin $f; end)
# Extract any gettext call
set -l strs (printf '%s\n' $expanded | grep -A1 wgettext_static_str |
@@ -28,10 +28,13 @@ set -a strs (string match -rv 'BUILD_VERSION:|PACKAGE_NAME' -- $expanded |
# The escaping so far works out okay.
for str in $strs
# grep -P needed for string escape to be compatible (PCRE-style),
# -H gives the filename, -n the line number.
# -H gives the filename.
# If you want to run this on non-GNU grep: Don't.
echo "#:" (grep -PHn -r -- \"(string escape --style=regex -- $str)\" src/ |
head -n1 | string replace -r ':\s.*' '')
# The sed command extracts just the filename from the matches grep finds,
# and prepends the '#: ' prefix, marking the line as a source refecence.
# sort -u just gets rid of duplicates.
grep -PH -r -- \"(string escape --style=regex -- $str)\" src/ |
sed -E 's/^([^:]*):.*$/#: \1/' | sort -u
echo "msgid \"$str\""
echo 'msgstr ""'
end >messages.pot
@@ -74,7 +77,7 @@ extract_fish_script_messages implicit $implicit_regex
set -l explicit_regex '.*\( *_ (([\'"]).+?(?<!\\\\)\\2) *\).*'
extract_fish_script_messages explicit $explicit_regex
xgettext -j -k -kN_ -LShell --from-code=UTF-8 -cDescription --no-wrap -o messages.pot $tmpdir/{ex,im}plicit/share/*/*.fish
xgettext -j -k -kN_ -LShell --from-code=UTF-8 -cDescription --no-wrap --add-location=file -o messages.pot $tmpdir/{ex,im}plicit/share/*/*.fish
# Remove the tmpdir from the location to avoid churn
sed -i 's_^#: /.*/share/_#: share/_' messages.pot