summaryrefslogtreecommitdiffstats
path: root/t/t7900-maintenance.sh
diff options
context:
space:
mode:
authorDerrick Stolee <dstolee@microsoft.com>2020-09-25 14:33:31 +0200
committerJunio C Hamano <gitster@pobox.com>2020-09-25 19:53:04 +0200
commit28cb5e66ddaed47aa8789fa326fec0339831b80b (patch)
treeb0b78b789b6660fd6a31587f913c345757ef51b9 /t/t7900-maintenance.sh
parentmaintenance: add trace2 regions for task execution (diff)
downloadgit-28cb5e66ddaed47aa8789fa326fec0339831b80b.tar.xz
git-28cb5e66ddaed47aa8789fa326fec0339831b80b.zip
maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch' command can download a large amount of data. If there are many other users pushing to a common repo, then this data can rival the initial pack-file size of a 'git clone' of a medium-size repo. Users may want to keep the data on their local repos as close as possible to the data on the remote repos by fetching periodically in the background. This can break up a large daily fetch into several smaller hourly fetches. The task is called "prefetch" because it is work done in advance of a foreground fetch to make that 'git fetch' command much faster. However, if we simply ran 'git fetch <remote>' in the background, then the user running a foreground 'git fetch <remote>' would lose some important feedback when a new branch appears or an existing branch updates. This is especially true if a remote branch is force-updated and this isn't noticed by the user because it occurred in the background. Further, the functionality of 'git push --force-with-lease' becomes suspect. When running 'git fetch <remote> <options>' in the background, use the following options for careful updating: 1. --no-tags prevents getting a new tag when a user wants to see the new tags appear in their foreground fetches. 2. --refmap= removes the configured refspec which usually updates refs/remotes/<remote>/* with the refs advertised by the remote. While this looks confusing, this was documented and tested by b40a50264ac (fetch: document and test --refmap="", 2020-01-21), including this sentence in the documentation: Providing an empty `<refspec>` to the `--refmap` option causes Git to ignore the configured refspecs and rely entirely on the refspecs supplied as command-line arguments. 3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*" we can ensure that we actually load the new values somewhere in our refspace while not updating refs/heads or refs/remotes. By storing these refs here, the commit-graph job will update the commit-graph with the commits from these hidden refs. 4. --prune will delete the refs/prefetch/<remote> refs that no longer appear on the remote. 5. --no-write-fetch-head prevents updating FETCH_HEAD. We've been using this step as a critical background job in Scalar [1] (and VFS for Git). This solved a pain point that was showing up in user reports: fetching was a pain! Users do not like waiting to download the data that was created while they were away from their machines. After implementing background fetch, the foreground fetch commands sped up significantly because they mostly just update refs and download a small amount of new data. The effect is especially dramatic when paried with --no-show-forced-udpates (through fetch.showForcedUpdates=false). [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to '')
-rwxr-xr-xt/t7900-maintenance.sh26
1 files changed, 26 insertions, 0 deletions
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 53c883531e..045524e6ad 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -62,4 +62,30 @@ test_expect_success 'run --task duplicate' '
test_i18ngrep "cannot be selected multiple times" err
'
+test_expect_success 'run --task=prefetch with no remotes' '
+ git maintenance run --task=prefetch 2>err &&
+ test_must_be_empty err
+'
+
+test_expect_success 'prefetch multiple remotes' '
+ git clone . clone1 &&
+ git clone . clone2 &&
+ git remote add remote1 "file://$(pwd)/clone1" &&
+ git remote add remote2 "file://$(pwd)/clone2" &&
+ git -C clone1 switch -c one &&
+ git -C clone2 switch -c two &&
+ test_commit -C clone1 one &&
+ test_commit -C clone2 two &&
+ GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
+ fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
+ test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&
+ test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
+ test_path_is_missing .git/refs/remotes &&
+ git log prefetch/remote1/one &&
+ git log prefetch/remote2/two &&
+ git fetch --all &&
+ test_cmp_rev refs/remotes/remote1/one refs/prefetch/remote1/one &&
+ test_cmp_rev refs/remotes/remote2/two refs/prefetch/remote2/two
+'
+
test_done