diff options
author | Patrick Steinhardt <ps@pks.im> | 2024-03-04 11:49:13 +0100 |
---|---|---|
committer | Junio C Hamano <gitster@pobox.com> | 2024-03-04 19:19:40 +0100 |
commit | f8c1a8e2e18f71b46549ba722fc5e4a62db0864b (patch) | |
tree | 8eab6179d8972154a41a3d35e268af6269030c9c /reftable/merged.c | |
parent | reftable/merged: handle subiter cleanup on close only (diff) | |
download | git-f8c1a8e2e18f71b46549ba722fc5e4a62db0864b.tar.xz git-f8c1a8e2e18f71b46549ba722fc5e4a62db0864b.zip |
reftable/merged: circumvent pqueue with single subiter
The merged iterator uses a priority queue to order records so that we
can yielid them in the expected order. This priority queue of course
comes with some overhead as we need to add, compare and remove entries
in that priority queue.
In the general case, that overhead cannot really be avoided. But when we
have a single subiter left then there is no need to use the priority
queue anymore because the order is exactly the same as what that subiter
would return.
While having a single subiter may sound like an edge case, it happens
more frequently than one might think. In the most common scenario, you
can expect a repository to have a single large table that contains most
of the records and then a set of smaller tables which contain later
additions to the reftable stack. In this case it is quite likely that we
exhaust subiters of those smaller stacks before exhausting the large
table.
Special-case this and return records directly from the remaining
subiter. This results in a sizeable speedup when iterating over 1m refs
in a repository with a single table:
Benchmark 1: show-ref: single matching ref (revision = HEAD~)
Time (mean ± σ): 135.4 ms ± 4.4 ms [User: 132.5 ms, System: 2.8 ms]
Range (min … max): 131.0 ms … 166.3 ms 1000 runs
Benchmark 2: show-ref: single matching ref (revision = HEAD)
Time (mean ± σ): 126.3 ms ± 3.9 ms [User: 123.3 ms, System: 2.8 ms]
Range (min … max): 122.7 ms … 157.0 ms 1000 runs
Summary
show-ref: single matching ref (revision = HEAD) ran
1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to '')
-rw-r--r-- | reftable/merged.c | 24 |
1 files changed, 22 insertions, 2 deletions
diff --git a/reftable/merged.c b/reftable/merged.c index d9ed4a19dd..29161a32cf 100644 --- a/reftable/merged.c +++ b/reftable/merged.c @@ -87,16 +87,36 @@ static int merged_iter_next_entry(struct merged_iter *mi, struct reftable_record *rec) { struct pq_entry entry = { 0 }; - int err = 0; + int err = 0, empty; + + empty = merged_iter_pqueue_is_empty(mi->pq); if (mi->advance_index >= 0) { + /* + * When there are no pqueue entries then we only have a single + * subiter left. There is no need to use the pqueue in that + * case anymore as we know that the subiter will return entries + * in the correct order already. + * + * While this may sound like a very specific edge case, it may + * happen more frequently than you think. Most repositories + * will end up having a single large base table that contains + * most of the refs. It's thus likely that we exhaust all + * subiters but the one from that base ref. + */ + if (empty) + return iterator_next(&mi->subiters[mi->advance_index].iter, + rec); + err = merged_iter_advance_subiter(mi, mi->advance_index); if (err < 0) return err; + if (!err) + empty = 0; mi->advance_index = -1; } - if (merged_iter_pqueue_is_empty(mi->pq)) + if (empty) return 1; entry = merged_iter_pqueue_remove(&mi->pq); |