summaryrefslogtreecommitdiffstats
path: root/bloom.c
diff options
context:
space:
mode:
authorGarima Singh <garima.singh@microsoft.com>2020-04-06 18:59:52 +0200
committerJunio C Hamano <gitster@pobox.com>2020-04-06 20:08:37 +0200
commita56b9464cd0a49317fafde080ae4e73c5430ac9b (patch)
tree73a69c869e0dc5dabc5655940479510fafe09426 /bloom.c
parentcommit-graph: add --changed-paths option to write subcommand (diff)
downloadgit-a56b9464cd0a49317fafde080ae4e73c5430ac9b.tar.xz
git-a56b9464cd0a49317fafde080ae4e73c5430ac9b.zip
revision.c: use Bloom filters to speed up path based revision walks
Revision walk will now use Bloom filters for commits to speed up revision walks for a particular path (for computing history for that path), if they are present in the commit-graph file. We load the Bloom filters during the prepare_revision_walk step, currently only when dealing with a single pathspec. Extending it to work with multiple pathspecs can be explored and built on top of this series in the future. While comparing trees in rev_compare_trees(), if the Bloom filter says that the file is not different between the two trees, we don't need to compute the expensive diff. This is where we get our performance gains. The other response of the Bloom filter is '`:maybe', in which case we fall back to the full diff calculation to determine if the path was changed in the commit. We do not try to use Bloom filters when the '--walk-reflogs' option is specified. The '--walk-reflogs' option does not walk the commit ancestry chain like the rest of the options. Incorporating the performance gains when walking reflog entries would add more complexity, and can be explored in a later series. Performance Gains: We tested the performance of `git log -- <path>` on the git repo, the linux and some internal large repos, with a variety of paths of varying depths. On the git and linux repos: - we observed a 2x to 5x speed up. On a large internal repo with files seated 6-10 levels deep in the tree: - we observed 10x to 20x speed ups, with some paths going up to 28 times faster. Helped-by: Derrick Stolee <dstolee@microsoft.com Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'bloom.c')
-rw-r--r--bloom.c20
1 files changed, 20 insertions, 0 deletions
diff --git a/bloom.c b/bloom.c
index 0f714dd76a..c5b461d1cf 100644
--- a/bloom.c
+++ b/bloom.c
@@ -253,3 +253,23 @@ struct bloom_filter *get_bloom_filter(struct repository *r,
return filter;
}
+
+int bloom_filter_contains(const struct bloom_filter *filter,
+ const struct bloom_key *key,
+ const struct bloom_filter_settings *settings)
+{
+ int i;
+ uint64_t mod = filter->len * BITS_PER_WORD;
+
+ if (!mod)
+ return -1;
+
+ for (i = 0; i < settings->num_hashes; i++) {
+ uint64_t hash_mod = key->hashes[i] % mod;
+ uint64_t block_pos = hash_mod / BITS_PER_WORD;
+ if (!(filter->data[block_pos] & get_bitmask(hash_mod)))
+ return 0;
+ }
+
+ return 1;
+} \ No newline at end of file