This removes parent back-pointers from ast nodes. Nodes no longer store
references to their parents, resulting in memory size reductions for the ast
(__fish_complete_gpg.fish goes from 508 KB to 198 KB in memory usage).
These depths are used in calculating indents for ast pretty-printing.
This moves away from parent back-pointers, so that we can ultimately remove
them.
Prior to this commit, Traversal was a convenient way to walk the nodes of
an ast in pre-order. This worked simply: it kept a stack of Nodes, and then
when a Node was visited, it was popped off and its children added.
Enhance Traversal to track whether each node on the Stack is `NeedsVisit`
or `Visited`. The idea is that, when a Node is yielded from next(), we can
reconstruct its parents as those Visited nodes on the Traversal stack.
This will allow clients to get the parents of Nodes as they are traversed
without each Node needing to explicitly save its parent.
This function hid a bug! It converted two Nodes to `*const ()` which discards
the vtable pointer, using only the data pointer; but two nodes can and do have
the same data pointer (e.g. if one node is the first item in another).
Add the (statically dispatched) is_same_node function with a warning comment,
and adopt that instead.
Now that we have more confidence in our pointers, we can allocate directly
more often, instead of always through Box. This recovers the performance
lost from the previous commit.
Prior to this commit, lists of items (e.g. an argument list to a command) would
each be Boxed, i.e. we had effectively Vec<Box<Item>>. The rationale here is
that we had raw pointers and pointer stability was important to enforce.
But we have fewer raw pointers now - only the parent pointers - and we can be
confident that the Ast will not change or move after construction. So remove
this intermediate Box, simplifying some logic and reducing ast size by ~5%.
This slows down Ast construction because we're still constructing
the Box and moving things in and out of it - that will be addressed in
subsequent commits.
This saves a decent amount of memory, both because we no longer have excess
capacity sitting around that we'll never use, but also because we no longer need
to store the "capacity" value.
Indicate the units of the durations (microseconds).
Right-align the durations for better readability.
Use `format!` instead of `fprintf` for more flexible formatting.
Write to `File` instead of raw fd.
Closes#11396
As mentioned in
https://github.com/fish-shell/fish-shell/pull/9688#discussion_r1155089596,
commit b77d1d0e2b (Stop crashing on invalid Unicode input, 2024-02-27), Rust's
char type doesn't support arbitrary 32-bit values. Out-of-range Unicode
codepoints would cause crashes. That commit addressed this by converting
the encoded bytes (e.g. UTF-8) to special private-use-area characters that
fish knows about. It didn't bother to update the code path in builtin read
that relies on mbrtowc as well.
Fix that. Move and rename parse_codepoint() and rename/reorder its input/output
parameters.
Fixes#11383
(cherry picked from commit d9ba27f58f)
This also changes the single-byte locale code path to treat keyboard input
like "\x1ba" as alt-a instead of "escape,a". I can't off-hand reproduce
a problem with "LC_ALL=C fish_key_reader", I guess we always use a UTF-8
locale if available?
(cherry picked from commit b061178606)
Two issues:
1. typing the codepoint 0x123456 into fish_key_reader:
$ fish_key_reader -cV
# decoded from: \xf4\xa3\x91
bind \xf4 'do something'
# decoded from:
bind \xa3 'do something'
# decoded from:
bind \x91 'do something'
The invalid codepoint is represented in its original encoding, which leaks
to the UI. This was more or less intentionally added by b77d1d0e2b (Stop
crashing on invalid Unicode input, 2024-02-27). That commit rendered it
as replacement byte, but that was removed for other reasons in e25a1358e6
(Work around broken rendering of pasted multibyte chars in non-UTF-8-ish
locale, 2024-08-03).
We no longer insert such (PUA) codepoints into the commandline. The "bind"
comes above would work however. I don't think this is something we want
to support. Discard invalid codepoints in the reader, so they can't be
bound and fish_key_reader shows nothing.
2. builtin read silently drops invalid encodings This builtin is not really
suited to read binary data (#11383 is an error scenario), but I guess it can
be bent to do that. Some of its code paths use str2wcstring which passes
through e.g. invalid UTF-8. The read-one-char-at-a-time code path doesn't.
Fix this.
As mentioned in
https://github.com/fish-shell/fish-shell/pull/9688#discussion_r1155089596,
commit b77d1d0e2b (Stop crashing on invalid Unicode input, 2024-02-27), Rust's
char type doesn't support arbitrary 32-bit values. Out-of-range Unicode
codepoints would cause crashes. That commit addressed this by converting
the encoded bytes (e.g. UTF-8) to special private-use-area characters that
fish knows about. It didn't bother to update the code path in builtin read
that relies on mbrtowc as well.
Fix that. Move and rename parse_codepoint() and rename/reorder its input/output
parameters.
Note that the behavior is still wrong if builtin read can't decode the
input; see the next commit.
Fixes#11383
This also changes the single-byte locale code path to treat keyboard input
like "\x1ba" as alt-a instead of "escape,a". I can't off-hand reproduce
a problem with "LC_ALL=C fish_key_reader", I guess we always use a UTF-8
locale if available?
As reported in
https://matrix.to/#/!YLTeaulxSDauOOxBoR:matrix.org/$BDVmBtBgtKCj45dVfS36rP7Y6Fo7E4uBg1vcH9IIIQg
tmux new-session -c "" fish -C 'echo $PWD'
prints
/home/fishuser/
This is because our path canonicalization function only
removes trailing slashes if there were duplicate slashes in the string.
Else (for the case above), we end up with "trailing == len"
which means we ignore trailing slashes.
I don't think this was intended by 24f1da7f30 (Add a fancy new
paths_are_equivalent function to test for equivalent paths instead of merely
equal ones, 2013-08-27). Fix it.
While at it, also document the command to reset the shape to the default,
which we should probably use. See foot commit 49034bb7 (csi: let CSI 0 q mean
"switch to user configured cursor style", 2019-07-22).
As of today, the XTerm documentation is a not clear on this; in
XTerm itself, CSI 0 q may actually change the cursor because they have an
additional default cursor is configured differently..
Things like branch and tag name can take up a lot of space on the screen. The
empty status may be useful but we're still looking for evidence. For now let's
keep only the conflict status, which is fairly familiar from the Git prompt.
See also #11183