summaryrefslogtreecommitdiffstats
path: root/bloom.c
diff options
context:
space:
mode:
authorTaylor Blau <me@ttaylorr.com>2022-07-13 01:10:33 +0200
committerJunio C Hamano <gitster@pobox.com>2022-07-16 01:51:39 +0200
commit9550f6c16a8be18bd4868909d4d5e29d05bd9733 (patch)
tree6e85f0857a23e6c9dc8061fb3e22781a817470f3 /bloom.c
parentcommit-graph: introduce `repo_find_commit_pos_in_graph()` (diff)
downloadgit-9550f6c16a8be18bd4868909d4d5e29d05bd9733.tar.xz
git-9550f6c16a8be18bd4868909d4d5e29d05bd9733.zip
commit-graph: fix corrupt upgrade from generation v1 to v2
The previous commit demonstrates a bug where a commit-graph using generation v2 could enter a state where one of the GDA2 values has its most-significant bit set (indicating that its value should be read from the extended offset table in the GDO2 chunk) without having a GDO2 chunk to read from. This results in the following error message being displayed to the caller: fatal: commit-graph requires overflow generation data but has none This bug arises in the following scenario: - We decide to write a commit-graph using generation number v2, and decide (correctly) that no GDO2 chunk is necessary (e.g., because all of the commiter date offsets are no larger than 2^31-1). - The v2 generation numbers are stored in the `->generation` member of the commit slab holding `struct commit_graph_data`'s. - Later on, `load_commit_graph_info()` is called, overwriting the v2 generation data in the aforementioned slab with any existing v1 generation data. Then, when the commit-graph code goes to write the GDA2 chunk via `write_graph_chunk_generation_data()`, we use the overwritten generation v1 data in a place where we expect to use a v2 generation number: offset = commit_graph_data_at(c)->generation - c->date; ...because `commit_graph_data_at(c)->generation` used to hold the v2 generation data, but it was overwritten to contain the v1 generation number via `load_commit_graph_info()`. If the `offset` computation above overflows the v2 generation number max, then `write_graph_chunk_generation_data()` will update its count of large offsets and write the marker accordingly: if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; num_generation_data_overflows++; } and reads will look for the GDO2 chunk containing the overflowing v2 generation number, *after* the commit-graph code decided that no such chunk was necessary. The main problem is that the slab containing `struct commit_graph_data` has a dual purpose. It is used to hold data that we are about to write to disk while generating a commit-graph, as well as hold data that was read from an existing commit-graph. When the two mix, namely when the result of reading the commit-graph has a side-effect that mixes poorly with an in-progress commit-graph write, we end up with corrupt data. A complete fix might be to introduce a new slab that is used exclusively for writing, and gate access between the two slabs based on context provided by the caller (e.g., whether this computation is part of a "read" or "write" operation). But a more minimal fix addresses the only known path which overwrites the slab data, which is `compute_bloom_filters()` -> `get_or_compute_bloom_filter()` -> `load_commit_graph_info()` -> `fill_commit_graph_info()` by avoiding the last call which clobbers the data altogether. This path only needs to learn the graph position of a given commit so that it can be used in `load_bloom_filter_from_graph()`. By replacing the last steps of the above with one that records the graph position into a temporary variable which is then used to load the existing Bloom data, we eliminate the clobbering, removing the corruption. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'bloom.c')
-rw-r--r--bloom.c10
1 files changed, 5 insertions, 5 deletions
diff --git a/bloom.c b/bloom.c
index 5e297038bb..816f063dca 100644
--- a/bloom.c
+++ b/bloom.c
@@ -30,10 +30,9 @@ static inline unsigned char get_bitmask(uint32_t pos)
static int load_bloom_filter_from_graph(struct commit_graph *g,
struct bloom_filter *filter,
- struct commit *c)
+ uint32_t graph_pos)
{
uint32_t lex_pos, start_index, end_index;
- uint32_t graph_pos = commit_graph_position(c);
while (graph_pos < g->num_commits_in_base)
g = g->base_graph;
@@ -203,9 +202,10 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
filter = bloom_filter_slab_at(&bloom_filters, c);
if (!filter->data) {
- load_commit_graph_info(r, c);
- if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
- load_bloom_filter_from_graph(r->objects->commit_graph, filter, c);
+ uint32_t graph_pos;
+ if (repo_find_commit_pos_in_graph(r, c, &graph_pos))
+ load_bloom_filter_from_graph(r->objects->commit_graph,
+ filter, graph_pos);
}
if (filter->data && filter->len)