diff options
author | Taylor Blau <me@ttaylorr.com> | 2023-04-13 00:20:33 +0200 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2023-04-13 16:55:46 +0200 |
commit | a8dd7e05b1c033917b900ea3d930e79eea3ff9a4 (patch) | |
tree | 67073c8dcb5dcb3d77d4f9f12112591b92165d46 | |
parent | pack-revindex: introduce `pack.readReverseIndex` (diff) | |
download | git-a8dd7e05b1c033917b900ea3d930e79eea3ff9a4.tar.xz git-a8dd7e05b1c033917b900ea3d930e79eea3ff9a4.zip |
config: enable `pack.writeReverseIndex` by default
Back in e37d0b8730 (builtin/index-pack.c: write reverse indexes,
2021-01-25), Git learned how to read and write a pack's reverse index
from a file instead of in-memory.
A pack's reverse index is a mapping from pack position (that is, the
order that objects appear together in a ".pack") to their position in
lexical order (that is, the order that objects are listed in an ".idx"
file).
Reverse indexes are consulted often during pack-objects, as well as
during auxiliary operations that require mapping between pack offsets,
pack order, and index index.
They are useful in GitHub's infrastructure, where we have seen a
dramatic increase in performance when writing ".rev" files[1]. In
particular:
- an ~80% reduction in the time it takes to serve fetches on a popular
repository, Homebrew/homebrew-core.
- a ~60% reduction in the peak memory usage to serve fetches on that
same repository.
- a collective savings of ~35% in CPU time across all pack-objects
invocations serving fetches across all repositories in a single
datacenter.
Reverse indexes are also beneficial to end-users as well as forges. For
example, the time it takes to generate a pack containing the objects for
the 10 most recent commits in linux.git (representing a typical push) is
significantly faster when on-disk reverse indexes are available:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms]
Range (min … max): 521.0 ms … 577.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms]
Range (min … max): 226.0 ms … 259.6 ms 13 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
The same is true of writing a pack containing the objects for the 30
most-recent commits:
$ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms]
Range (min … max): 839.3 ms … 886.9 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null
Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms]
Range (min … max): 567.5 ms … 599.3 ms 10 runs
Summary
'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran
1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null'
...and savings on trivial operations like computing the on-disk size of
a single (packed) object are even more dramatic:
$ git rev-parse HEAD >in
$ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms]
Range (min … max): 290.3 ms … 331.1 ms 10 runs
Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms]
Range (min … max): 1.6 ms … 4.6 ms 1155 runs
Summary
'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'
In the more than two years since e37d0b8730 was merged, Git's
implementation of on-disk reverse indexes has been thoroughly tested,
both from users enabling `pack.writeReverseIndexes`, and from GitHub's
deployment of the feature. The latter has been running without incident
for more than two years.
This patch changes Git's behavior to write on-disk reverse indexes by
default when indexing a pack, which should make the above operations
faster for everybody's Git installation after a repack.
(The previous commit explains some potential drawbacks of using on-disk
reverse indexes in certain limited circumstances, that essentially boil
down to a trade-off between time to generate, and time to access. For
those limited cases, the `pack.readReverseIndex` escape hatch can be
used).
[1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-rw-r--r-- | Documentation/config/pack.txt | 2 | ||||
-rw-r--r-- | builtin/index-pack.c | 1 | ||||
-rw-r--r-- | builtin/pack-objects.c | 1 | ||||
-rwxr-xr-x | t/perf/p5312-pack-bitmaps-revs.sh | 3 | ||||
-rwxr-xr-x | t/t5325-reverse-index.sh | 1 |
5 files changed, 5 insertions, 3 deletions
diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt index 7db7fed466..d4c7c9d4e4 100644 --- a/Documentation/config/pack.txt +++ b/Documentation/config/pack.txt @@ -182,4 +182,4 @@ pack.writeReverseIndex:: linkgit:gitformat-pack[5]) for each new packfile that it writes in all places except for linkgit:git-fast-import[1] and in the bulk checkin mechanism. - Defaults to false. + Defaults to true. diff --git a/builtin/index-pack.c b/builtin/index-pack.c index b17e79cd40..323c063f9d 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1753,6 +1753,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) fsck_options.walk = mark_link; reset_pack_idx_option(&opts); + opts.flags |= WRITE_REV; git_config(git_index_pack_config, &opts); if (prefix && chdir(prefix)) die(_("Cannot come back to cwd")); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 77d88f85b0..dbaa04482f 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -4293,6 +4293,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } reset_pack_idx_option(&pack_idx_opts); + pack_idx_opts.flags |= WRITE_REV; git_config(git_pack_config, NULL); if (git_env_bool(GIT_TEST_WRITE_REV_INDEX, 0)) pack_idx_opts.flags |= WRITE_REV; diff --git a/t/perf/p5312-pack-bitmaps-revs.sh b/t/perf/p5312-pack-bitmaps-revs.sh index 0684b690af..ceec60656b 100755 --- a/t/perf/p5312-pack-bitmaps-revs.sh +++ b/t/perf/p5312-pack-bitmaps-revs.sh @@ -12,8 +12,7 @@ test_lookup_pack_bitmap () { test_perf_large_repo test_expect_success 'setup bitmap config' ' - git config pack.writebitmaps true && - git config pack.writeReverseIndex true + git config pack.writebitmaps true ' # we need to create the tag up front such that it is covered by the repack and diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 66171c1d67..149dcf5193 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -14,6 +14,7 @@ packdir=.git/objects/pack test_expect_success 'setup' ' test_commit base && + test_config pack.writeReverseIndex false && pack=$(git pack-objects --all $packdir/pack) && rev=$packdir/pack-$pack.rev && |