Files
fish-shell/src
Fabian Boehm 7988cff6bd Increase the string chunk size to increase performance
This is a *tiny* commit code-wise, but the explanation is a bit
longer.

When I made string read in chunks, I picked a chunk size from bash's
read, under the assumption that they had picked a good one.

It turns out, on the (linux) systems I've tested, that's simply not
true.

My tests show that a bigger chunk size of up to 4096 is better *across
the board*:

- It's better with very large inputs
- It's equal-to-slightly-better with small inputs
- It's equal-to-slightly-better even if we quit early

My test setup:

0. Create various fish builds with various sizes for
STRING_CHUNK_SIZE, name them "fish-$CHUNKSIZE".
1. Download the npm package names from
https://github.com/nice-registry/all-the-package-names/blob/master/names.json (I
used commit 87451ea77562a0b1b32550124e3ab4a657bf166c, so it's 46.8MB)
2. Extract the names so we get a line-based version:

```fish
jq '.[]' names.json | string trim -c '"' >/tmp/all
```

3. Create various sizes of random extracts:

```fish
for f in 10000 1000 500 50
    shuf /tmp/all | head -n $f > /tmp/$f
end
```

(the idea here is to defeat any form of pattern in the input).

4. Run benchmarks:

hyperfine -w 3 ./fish-{128,512,1024,2048,4096}"
    -c 'for i in (seq 1000)
            string match -re foot < $f
        end; true'"

(reduce the seq size for the larger files so you don't have to wait
for hours - the idea here is to have some time running string and not
just fish startup time)

This shows results pretty much like

```
Summary
'./fish-2048     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true'' ran
  1.01 ± 0.02 times faster than './fish-4096     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.02 ± 0.03 times faster than './fish-1024     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.08 ± 0.03 times faster than './fish-512     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
  1.47 ± 0.07 times faster than './fish-128     -c 'for i in (seq 1000)
          string match -re foot < /tmp/500
      end; true''
```

So we see that up to 1024 there's a difference, and after that the
returns are marginal. So we stick with 1024 because of the memory
trade-off.

----

Fun extra:

Comparisons with `grep` (GNU grep 3.7) are *weird*. Because you both
get

```
'./fish-4096 -c 'for i in (seq 100); string match -re foot < /tmp/500; end; true'' ran
11.65 ± 0.23 times faster than 'fish -c 'for i in (seq 100); command grep foot /tmp/500; end''
```

and

```
'fish -c 'for i in (seq 2); command grep foot /tmp/all; end'' ran
66.34 ± 3.00 times faster than './fish-4096 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
100.05 ± 4.31 times faster than './fish-128 -c 'for i in (seq 2);
string match -re foot < /tmp/all; end; true''
```

Basically, if you *can* give grep a lot of work at once (~40MB in this
case), it'll churn through it like butter. But if you have to call it
a lot, string beats it by virtue of cheating.
2022-08-15 20:16:12 +02:00
..
2022-02-14 22:19:28 +01:00
2020-07-04 14:58:02 -07:00
2022-07-02 11:30:59 -07:00
2021-07-15 13:15:24 -07:00
2021-02-08 15:16:21 -06:00
2021-10-01 03:39:43 -07:00
2022-07-27 10:05:41 +02:00
2021-09-24 09:30:25 -07:00
2022-04-10 13:41:21 -07:00
2022-08-12 17:25:31 +02:00
2022-03-17 18:15:11 +01:00
2022-04-02 11:28:30 -07:00
2022-04-02 11:28:30 -07:00
2021-09-21 18:33:14 -07:00
2022-08-13 12:51:36 -07:00
2022-05-11 21:28:26 +02:00
2022-06-01 10:02:09 -07:00
2022-07-23 23:16:44 +02:00
2022-06-01 10:02:09 -07:00
2022-06-19 15:15:20 -07:00
2022-06-19 15:15:20 -07:00
2022-03-20 14:39:00 -07:00
2022-03-20 14:39:00 -07:00
2021-04-21 17:37:44 -07:00
2021-08-17 18:57:16 -05:00
2022-03-14 15:36:17 +01:00
2022-04-03 15:54:08 +02:00
2022-08-15 20:01:50 +02:00
2022-04-23 15:24:27 -07:00
2022-07-27 10:05:41 +02:00
2019-10-13 15:50:48 -07:00
2022-07-10 11:17:19 -07:00
2022-04-08 17:59:09 -07:00
2021-11-08 12:21:11 -08:00
2021-11-08 12:21:11 -08:00
2022-06-01 10:02:09 -07:00
2021-03-21 19:41:36 +01:00
2020-12-14 22:54:53 +01:00
2021-10-28 19:39:30 +02:00
2020-05-01 13:30:56 -07:00
2020-02-14 19:06:19 +01:00
2021-05-17 15:25:21 -07:00
2021-10-28 10:37:43 -07:00
2022-08-04 08:13:19 +02:00
2021-03-28 15:31:25 -07:00
2022-04-02 11:28:30 -07:00
2021-11-27 12:48:04 -08:00
2021-11-27 12:48:04 -08:00
2022-07-24 12:24:42 +02:00
2022-07-17 14:41:35 -07:00