Remove line numbers from translation strings

This greatly reduces the number of changes necessary to the PO files when the Rust/fish source files are updated. (Changes to the line number can be applied automatically, but this adds a lot of noise to the git history.) Due to the way we have been extracting Rust strings, differentiation between the same source string in different contexts has not been possible regardless of the change. It seems that duplicate msgid entries are not permitted in PO files, so since we do not use context to distinguish the strings we extract, there is no way to have context-/location-dependent translations, so we might as well reduce the git noise by eliminating line numbers. Including source locations helps translators with understanding context. Because we do not distinguish between contexts for a given source string, this is of limited utility, but keeping file names at least allows to open the relevant files and search them for the string. This might also be helpful to identify translations which do not make sense in all context in which they are used. (Although without adding context support, the only remedy would be to remove the translation altogether, as far as I can tell.) For extraction from Rust, additional issues are fixed: - File name extraction from the grep results now works properly. Previously, lines not starting with whitespace resulted in missing or corrupted matches. (missing if the source line contains no colon followed by a whitespace, corrupted if it does, then the match included the part of the line in front of the colon, instead of just the location) - Only a single source location per string was supported (`head -n1`). The new approach using sed does not have this limitation.
2026-04-24 19:51:14 -03:00 · 2025-05-07 22:41:50 +02:00
parent df591a2e0f
commit 2d58cfe4cb
1 changed files with 8 additions and 5 deletions
--- a/build_tools/fish_xgettext.fish
+++ b/build_tools/fish_xgettext.fish
@@ -11,7 +11,7 @@ or exit 1

 # This is a gigantic crime.
 # xgettext still does not support rust *at all*, so we use cargo-expand to get all our wgettext invocations.
-set -l expanded (cargo expand --lib; for f in fish{,_indent,_key_reader}; cargo expand --bin $f; end)
+set -l expanded (cargo expand --lib; for f in fish fish_indent fish_key_reader; cargo expand --bin $f; end)

 # Extract any gettext call
 set -l strs (printf '%s\n' $expanded | grep -A1 wgettext_static_str |
@@ -28,10 +28,13 @@ set -a strs (string match -rv 'BUILD_VERSION:|PACKAGE_NAME' -- $expanded |
 # The escaping so far works out okay.
 for str in $strs
    # grep -P needed for string escape to be compatible (PCRE-style),
-    # -H gives the filename, -n the line number.
+    # -H gives the filename.
    # If you want to run this on non-GNU grep: Don't.
-    echo "#:" (grep -PHn -r -- \"(string escape --style=regex -- $str)\" src/ |
-    head -n1 | string replace -r ':\s.*' '')
+    # The sed command extracts just the filename from the matches grep finds,
+    # and prepends the '#: ' prefix, marking the line as a source refecence.
+    # sort -u just gets rid of duplicates.
+    grep -PH -r -- \"(string escape --style=regex -- $str)\" src/ |
+       sed -E 's/^([^:]*):.*$/#: \1/' | sort -u
    echo "msgid \"$str\""
    echo 'msgstr ""'
 end >messages.pot
@@ -74,7 +77,7 @@ extract_fish_script_messages implicit $implicit_regex
 set -l explicit_regex '.*\( *_ (([\'"]).+?(?<!\\\\)\\2) *\).*'
 extract_fish_script_messages explicit $explicit_regex

-xgettext -j -k -kN_ -LShell --from-code=UTF-8 -cDescription --no-wrap -o messages.pot $tmpdir/{ex,im}plicit/share/*/*.fish
+xgettext -j -k -kN_ -LShell --from-code=UTF-8 -cDescription --no-wrap --add-location=file -o messages.pot $tmpdir/{ex,im}plicit/share/*/*.fish

 # Remove the tmpdir from the location to avoid churn
 sed -i 's_^#: /.*/share/_#: share/_' messages.pot