| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Building with "-Werror -Wwrite-strings" is now supported.
* ps/no-writable-strings: (27 commits)
config.mak.dev: enable `-Wwrite-strings` warning
builtin/merge: always store allocated strings in `pull_twohead`
builtin/rebase: always store allocated string in `options.strategy`
builtin/rebase: do not assign default backend to non-constant field
imap-send: fix leaking memory in `imap_server_conf`
imap-send: drop global `imap_server_conf` variable
mailmap: always store allocated strings in mailmap blob
revision: always store allocated strings in output encoding
remote-curl: avoid assigning string constant to non-const variable
send-pack: always allocate receive status
parse-options: cast long name for OPTION_ALIAS
http: do not assign string constant to non-const field
compat/win32: fix const-correctness with string constants
pretty: add casts for decoration option pointers
object-file: make `buf` parameter of `index_mem()` a constant
object-file: mark cached object buffers as const
ident: add casts for fallback name and GECOS
entry: refactor how we remove items for delayed checkouts
line-log: always allocate the output prefix
line-log: stop assigning string constant to file parent buffer
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We're about to enable `-Wwrite-strings`, which changes the type of
string constants to `const char[]`. Fix various sites where we assign
such constants to non-const variables.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A new command has been added to migrate a repository that uses the
files backend for its ref storage to use the reftable backend, with
limitations.
* ps/ref-storage-migration:
builtin/refs: new command to migrate ref storage formats
refs: implement logic to migrate between ref storage formats
refs: implement removal of ref storages
worktree: don't store main worktree twice
reftable: inline `merged_table_release()`
refs/files: fix NULL pointer deref when releasing ref store
refs/files: extract function to iterate through root refs
refs/files: refactor `add_pseudoref_and_head_entries()`
refs: allow to skip creation of reflog entries
refs: pass storage format to `ref_store_init()` explicitly
refs: convert ref storage format to an enum
setup: unset ref storage when reinitializing repository version
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The function `merged_table_release()` releases a merged table, whereas
`reftable_merged_table_free()` releases a merged table and then also
free's its pointer. But all callsites of `merged_table_release()` are in
fact followed by `reftable_merged_table_free()`, which is redundant.
Inline `merged_table_release()` into `reftable_merged_table_free()` to
get rid of this redundance.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Code clean-up to make the reftable iterator closer to be reusable.
* ps/reftable-reusable-iterator:
reftable/merged: adapt interface to allow reuse of iterators
reftable/stack: provide convenience functions to create iterators
reftable/reader: adapt interface to allow reuse of iterators
reftable/generic: adapt interface to allow reuse of iterators
reftable/generic: move seeking of records into the iterator
reftable/merged: simplify indices for subiterators
reftable/merged: split up initialization and seeking of records
reftable/reader: set up the reader when initializing table iterator
reftable/reader: inline `reader_seek_internal()`
reftable/reader: separate concerns of table iter and reftable reader
reftable/reader: unify indexed and linear seeking
reftable/reader: avoid copying index iterator
reftable/block: use `size_t` to track restart point index
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Refactor the interfaces exposed by `struct reftable_merged_table` and
`struct merged_iter` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There exist a bunch of call sites in the reftable backend that want to
create iterators for a reftable stack. This is rather convoluted right
now, where you always have to go via the merged table. And it is about
to become even more convoluted when we split up iterator initialization
and seeking in the next commit.
Introduce convenience functions that allow the caller to create an
iterator from a reftable stack directly without going through the merged
table. Adapt callers accordingly.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When auto-compacting, the reftable library packs references such that
the sizes of the tables form a geometric sequence. The factor for this
geometric sequence is hardcoded to 2 right now. We're about to expose
this as a config option though, so let's expose the factor via write
options.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We sometimes pass the refatble write options as value and sometimes as a
pointer. This is quite confusing and makes the reader wonder whether the
options get modified sometimes.
In fact, `reftable_new_writer()` does cause the caller-provided options
to get updated when some values aren't set up. This is quite unexpected,
but didn't cause any harm until now.
Adapt the code so that we do not modify the caller-provided values
anymore. While at it, refactor the code to code to consistently pass the
options as a constant pointer to clarify that the caller-provided opts
will not ever get modified.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|/
|
|
|
|
|
|
|
|
|
|
| |
Throughout the reftable library the `reftable_write_options` are
sometimes referred to as `cfg` and sometimes as `opts`. Unify these to
consistently use `opts` to avoid confusion.
While at it, touch up the coding style a bit by removing unneeded braces
around one-line statements and newlines between variable declarations.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Code to write out reftable has seen some optimization and
simplification.
* ps/reftable-write-optim:
reftable/block: reuse compressed array
reftable/block: reuse zstream when writing log blocks
reftable/writer: reset `last_key` instead of releasing it
reftable/writer: unify releasing memory
reftable/writer: refactorings for `writer_flush_nonempty_block()`
reftable/writer: refactorings for `writer_add_record()`
refs/reftable: don't recompute committer ident
reftable: remove name checks
refs/reftable: skip duplicate name checks
refs/reftable: perform explicit D/F check when writing symrefs
refs/reftable: fix D/F conflict error message on ref copy
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In the preceding commit we have disabled name checks in the "reftable"
backend. These checks were responsible for verifying multiple things
when writing records to the reftable stack:
- Detecting file/directory conflicts. Starting with the preceding
commits this is now handled by the reftable backend itself via
`refs_verify_refname_available()`.
- Validating refnames. This is handled by `check_refname_format()` in
the generic ref transacton layer.
The code in the reftable library is thus not used anymore and likely to
bitrot over time. Remove it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
To reduce the number of on-disk reftables, compaction is performed.
Contiguous tables with the same binary log value of size are grouped
into segments. The segment that has both the lowest binary log value and
contains more than one table is set as the starting point when
identifying the compaction segment.
Since segments containing a single table are not initially considered
for compaction, if the table appended to the list does not match the
previous table log value, no compaction occurs for the new table. It is
therefore possible for unbounded growth of the table list. This can be
demonstrated by repeating the following sequence:
git branch -f foo
git branch -d foo
Each operation results in a new table being written with no compaction
occurring until a separate operation produces a table matching the
previous table log value.
Instead, to avoid unbounded growth of the table list, the compaction
strategy is updated to ensure tables follow a geometric sequence after
each operation by individually evaluating each table in reverse index
order. This strategy results in a much simpler and more robust algorithm
compared to the previous one while also maintaining a minimal ordered
set of tables on-disk.
When creating 10 thousand references, the new strategy has no
performance impact:
Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s]
Range (min … max): 26.447 s … 26.569 s 10 runs
Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s]
Range (min … max): 26.366 s … 26.444 s 10 runs
Summary
update-ref: create refs sequentially (revision = HEAD) ran
1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)
Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
tables and are therefore updated to specify the correct new table count.
Since compaction is more aggressive in ensuring tables maintain a
geometric sequence, the expected table count is reduced in these tests.
In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
removed because the function is no longer needed. Also, the
`test_suggest_compaction_segment()` test is updated to better showcase
and reflect the new geometric compaction behavior.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The reftable stack already has a variable to configure whether or not to
run auto-compaction, but it is inaccessible to users of the library.
There exist use cases where a caller may want to have more control over
auto-compaction.
Move the `disable_auto_compact` option into `reftable_write_options` to
allow external callers to disable auto-compaction. This will be used in
a subsequent commit.
Signed-off-by: Justin Tobler <jltobler@gmail.com>
Acked-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Whenever we commit a new table to the reftable stack we will end up
invoking auto-compaction of the stack to keep the total number of tables
at bay. This auto-compaction may fail though in case at least one of the
tables which we are about to compact is locked. This is indicated by the
compaction function returning `REFTABLE_LOCK_ERROR`. We do not handle
this case though, and thus bubble that return value up the calling
chain, which will ultimately cause a failure.
Fix this bug by ignoring `REFTABLE_LOCK_ERROR`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Compaction of a reftable stack may fail gracefully when there is a
concurrent process that writes to the reftable stack and which has thus
locked either the "tables.list" file or one of the tables. This is
expected and can be handled gracefully by some of the callers which
invoke compaction. Thus, to indicate this situation to our callers, we
return a positive return code from `stack_compact_range()` and bubble it
up to the caller.
This kind of error handling is somewhat awkward though as many callers
in the call chain never even think of handling positive return values.
Thus, the result is either that such errors are swallowed by accident,
or that we abort operations with an unhelpful error message.
Make the code more robust by always using negative error codes when
compaction fails, with `REFTABLE_LOCK_ERROR` for the described benign
error case.
Note that only a single callsite knew to handle positive error codes
gracefully in the first place. Subsequent commits will touch up some of
the other sites to handle those errors better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We currently throw two different errors into a similar-but-different
error code:
- Errors when trying to lock the reftable stack.
- Errors when trying to write to the reftable stack which has been
modified concurrently.
This results in unclear error handling and user-visible error messages.
Create a new `REFTABLE_OUTDATED_ERROR` so that those error conditions
can be clearly told apart from each other. Adjust users of the old
`REFTABLE_LOCK_ERROR` to use the new error code as required.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `reftable_stack_init_addition()` we call `stack_uptodate()` after
having created the lockfile to check whether the stack was modified
concurrently, which is indicated by a positive return code from the
latter function. If so, we return a `REFTABLE_LOCK_ERROR` to the caller
and abort the addition.
The error handling has an off-by-one though because we check whether the
error code is `> 1` instead of `> 0`. Thus, instead of returning the
locking error, we would return a positive value. One of the callers of
`reftable_stack_init_addition()` works around this bug by repeating the
error code check without the off-by-one. But other callers are subtly
broken by this bug.
Fix this by checking for `err > 0` instead. This has the consequence
that `reftable_stack_init_addition()` won't ever return a positive error
code anymore, but will instead return `REFTABLE_LOCK_ERROR` now. Thus,
we can drop the check for a positive error code in `stack_try_add()`
now.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
| |
We do not register tables resulting from stack compaction with the
tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register compacted tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
| |
We do not register any of the locks we acquire when compacting the
reftable stack via our lockfiles interfaces. These locks will thus not
be released when Git gets killed.
Refactor the code to register locks as lockfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
| |
We do not register new tables which we're about to add to the stack with
the tempfile API. Those tables will thus not be deleted in case Git gets
killed.
Refactor the code to register tables as tempfiles.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Code clean-up in various reftable code paths.
* ps/reftable-styles:
reftable/record: improve semantics when initializing records
reftable/merged: refactor initialization of iterators
reftable/merged: refactor seeking of records
reftable/stack: use `size_t` to track stack length
reftable/stack: use `size_t` to track stack slices during compaction
reftable/stack: index segments with `size_t`
reftable/stack: fix parameter validation when compacting range
reftable: introduce macros to allocate arrays
reftable: introduce macros to grow arrays
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While the stack length is already stored as `size_t`, we frequently use
`int`s to refer to those stacks throughout the reftable library. Convert
those cases to use `size_t` instead to make things consistent.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We use `int`s to track reftable slices when compacting the reftable
stack, which is considered to be a code smell in the Git project.
Convert the code to use `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We use `int`s to index into arrays of segments and track the length of
them, which is considered to be a code smell in the Git project. Convert
the code to use `size_t` instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The `stack_compact_range()` function receives a "first" and "last" index
that indicates which tables of the reftable stack should be compacted.
Naturally, "first" must be smaller than "last" in order to identify a
proper range of tables to compress, which we indeed also assert in the
function. But the validations happens after we have already allocated
arrays with a size of `last - first + 1`, leading to an underflow and
thus an invalid allocation size.
Fix this by reordering the array allocations to happen after we have
validated parameters. While at it, convert the array allocations to use
the newly introduced macros.
Note that the relevant variables pointing into arrays should also be
converted to use `size_t` instead of `int`. This is left for a later
commit in this series.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Similar to the preceding commit, let's carry over macros to allocate
arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This
requires us to change the signature of `reftable_calloc()`, which only
takes a single argument right now and thus puts the burden on the caller
to calculate the final array's size. This is a net improvement though as
it means that we can now provide proper overflow checks when multiplying
the array size with the member size.
Convert callsites of `reftable_calloc()` to the new signature and start
using the new macros where possible.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Throughout the reftable library we have many cases where we need to grow
arrays. In order to avoid too many reallocations, we roughly double the
capacity of the array on each iteration. The resulting code pattern is
duplicated across many sites.
We have similar patterns in our main codebase, which is why we have
eventually introduced an `ALLOC_GROW()` macro to abstract it away and
avoid some code duplication. We cannot easily reuse this macro here
though because `ALLOC_GROW()` uses `REALLOC_ARRAY()`, which in turn will
call realloc(3P) to grow the array. The reftable code is structured as a
library though (even if the boundaries are fuzzy), and one property this
brings with it is that it is possible to plug in your own allocators. So
instead of using realloc(3P), we need to use `reftable_realloc()` that
knows to use the user-provided implementation.
So let's introduce two new macros `REFTABLE_REALLOC_ARRAY()` and
`REFTABLE_ALLOC_GROW()` that mirror what we do in our main codebase,
with two modifications:
- They use `reftable_realloc()`, as explained above.
- They use a different growth factor of `2 * cap + 1` instead of `(cap
+ 16) * 3 / 2`.
The second change is because we know a bit more about the allocation
patterns in the reftable library. In most cases, we end up only having a
handful of items in the array and don't end up growing them. The initial
capacity that our normal growth factor uses (which is 24) would thus end
up over-allocating in a lot of code paths. This effect is measurable:
- Before change:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,402 bytes allocated
- After change with a growth factor of `(2 * alloc + 1)`:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,410 bytes allocated
- After change with a growth factor of `(alloc + 16)* 2 / 3`:
HEAP SUMMARY:
in use at exit: 671,983 bytes in 152 blocks
total heap usage: 3,833,673 allocs, 3,833,521 frees, 4,728,251,742 bytes allocated
While the total heap usage is roughly the same, we do end up allocating
significantly more bytes with our usual growth factor (in fact, roughly
21 times as many).
Convert the reftable library to use these new macros.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | | |
Reftable bugfix.
* ps/reftable-compacted-tables-permission-fix:
reftable/stack: adjust permissions of compacted tables
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When creating a new compacted table from a range of preexisting ones we
don't set the default permissions on the resulting table when specified
by the user. This has the effect that the "core.sharedRepository" config
will not be honored correctly.
Fix this bug and add a test to catch this issue. Note that we only test
on non-Windows platforms because Windows does not use POSIX permissions
natively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | | |
The write codepath for the reftable data learned to honor
core.fsync configuration.
* jc/reftable-core-fsync:
reftable/stack: fsync "tables.list" during compaction
reftable: honor core.fsync
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In 1df18a1c9a (reftable: honor core.fsync, 2024-01-23), we have added
code to fsync both newly written reftables as well as "tables.list" to
disk. But there are two code paths where "tables.list" is being written:
- When appending a new table due to a normal ref update.
- When compacting a range of tables during compaction.
We have only addressed the former code path, but do not yet sync the new
"tables.list" file in the latter. Fix this omission.
Note that we are not yet adding any tests. These tests will be added
once the "reftable" backend has been upstreamed.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
While the reffiles backend honors configured fsync settings, the
reftable backend does not. Address this by fsyncing reftable files using
the write-or-die api's fsync_component() in two places: when we
add additional entries into the table, and when we close the reftable
writer.
This commits adds a flush function pointer as a new member of
reftable_writer because we are not sure that the first argument to the
*write function pointer always contains a file descriptor. In the case of
strbuf_add_void, the first argument is a buffer. This way, we can pass
in a corresponding flush function that knows how to flush depending on
which writer is being used.
This patch does not contain tests as they will need to wait for another
patch to start to exercise the reftable backend. At that point, the
tests will be added to observe that fsyncs are happening when the
reftable is in use.
Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Low-level I/O optimization for reftable.
* ps/reftable-optimize-io:
reftable/stack: fix race in up-to-date check
reftable/stack: unconditionally reload stack after commit
reftable/blocksource: use mmap to read tables
reftable/blocksource: refactor code to match our coding style
reftable/stack: use stat info to avoid re-reading stack list
reftable/stack: refactor reloading to use file descriptor
reftable/stack: refactor stack reloading to have common exit path
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In 6fdfaf15a0 (reftable/stack: use stat info to avoid re-reading stack
list, 2024-01-11) we have introduced a new mechanism to avoid re-reading
the table list in case stat(3P) figures out that the stack didn't change
since the last time we read it.
While this change significantly improved performance when writing many
refs, it can unfortunately lead to false negatives in very specific
scenarios. Given two processes A and B, there is a feasible sequence of
events that cause us to accidentally treat the table list as up-to-date
even though it changed:
1. A reads the reftable stack and caches its stat info.
2. B updates the stack, appending a new table to "tables.list". This
will both use a new inode and result in a different file size, thus
invalidating A's cache in theory.
3. B decides to auto-compact the stack and merges two tables. The file
size now matches what A has cached again. Furthermore, the
filesystem may decide to recycle the inode number of the file we
have replaced in (2) because it is not in use anymore.
4. A reloads the reftable stack. Neither the inode number nor the
file size changed. If the timestamps did not change either then we
think the cached copy of our stack is up-to-date.
In fact, the commit introduced three related issues:
- Non-POSIX compliant systems may not report proper `st_dev` and
`st_ino` values in stat(3P), which made us rely solely on the
file's potentially coarse-grained mtime and ctime.
- `stat_validity_check()` and friends may end up not comparing
`st_dev` and `st_ino` depending on the "core.checkstat" config,
again reducing the signal to the mtime and ctime.
- `st_ino` can be recycled, rendering the check moot even on
POSIX-compliant systems.
Given that POSIX defines that "The st_ino and st_dev fields taken
together uniquely identify the file within the system", these issues led
to the most important signal to establish file identity to be ignored or
become useless in some cases.
Refactor the code to stop using `stat_validity_check()`. Instead, we
manually stat(3P) the file descriptors to make relevant information
available. On Windows and MSYS2 the result will have both `st_dev` and
`st_ino` set to 0, which allows us to address the first issue by not
using the stat-based cache in that case. It also allows us to make sure
that we always compare `st_dev` and `st_ino`, addressing the second
issue.
The third issue of inode recycling can be addressed by keeping the file
descriptor of "files.list" open during the lifetime of the reftable
stack. As the file will still exist on disk even though it has been
unlinked it is impossible for its inode to be recycled as long as the
file descriptor is still open.
This should address the race in a POSIX-compliant way. The only real
downside is that this mechanism cannot be used on non-POSIX-compliant
systems like Windows. But we at least have the second-level caching
mechanism in place that compares contents of "files.list" with the
currently loaded list of tables.
This new mechanism performs roughly the same as the previous one that
relied on `stat_validity_check()`:
Benchmark 1: update-ref: create many refs (HEAD~)
Time (mean ± σ): 4.754 s ± 0.026 s [User: 2.204 s, System: 2.549 s]
Range (min … max): 4.694 s … 4.802 s 20 runs
Benchmark 2: update-ref: create many refs (HEAD)
Time (mean ± σ): 4.721 s ± 0.020 s [User: 2.194 s, System: 2.527 s]
Range (min … max): 4.691 s … 4.753 s 20 runs
Summary
update-ref: create many refs (HEAD~) ran
1.01 ± 0.01 times faster than update-ref: create many refs (HEAD)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After we have committed an addition to the reftable stack we call
`reftable_stack_reload()` to reload the stack and thus reflect the
changes that were just added. This function will only conditionally
reload the stack in case `stack_uptodate()` tells us that the stack
needs reloading. This check is wasteful though because we already know
that the stack needs reloading.
Call `reftable_stack_reload_maybe_reuse()` instead, which will
unconditionally reload the stack. This is merely a conceptual fix, the
code in question was not found to cause any problems in practice.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Whenever we call into the refs interfaces we potentially have to reload
refs in case they have been concurrently modified, either in-process or
externally. While this happens somewhat automatically for loose refs
because we simply try to re-read the files, the "packed" backend will
reload its snapshot of the packed-refs file in case its stat info has
changed since last reading it.
In the reftable backend we have a similar mechanism that is provided by
`reftable_stack_reload()`. This function will read the list of stacks
from "tables.list" and, if they have changed from the currently stored
list, reload the stacks. This is heavily inefficient though, as we have
to check whether the stack is up-to-date on basically every read and
thus keep on re-reading the file all the time even if it didn't change
at all.
We can do better and use the same stat(3P)-based mechanism that the
"packed" backend uses. Instead of reading the file, we will only open
the file descriptor, fstat(3P) it, and then compare the info against the
cached value from the last time we have updated the stack. This should
always work alright because "tables.list" is updated atomically via a
rename, so even if the ctime or mtime wasn't granular enough to identify
a change, at least the inode number or file size should have changed.
This change significantly speeds up operations where many refs are read,
like when using git-update-ref(1). The following benchmark creates N
refs in an otherwise-empty repository via `git update-ref --stdin`:
Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~)
Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms]
Range (min … max): 4.8 ms … 7.2 ms 109 runs
Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~)
Time (mean ± σ): 19.1 ms ± 0.9 ms [User: 8.9 ms, System: 9.9 ms]
Range (min … max): 18.4 ms … 26.7 ms 72 runs
Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~)
Time (mean ± σ): 1.336 s ± 0.018 s [User: 0.590 s, System: 0.724 s]
Range (min … max): 1.314 s … 1.373 s 10 runs
Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD)
Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms]
Range (min … max): 4.8 ms … 7.2 ms 109 runs
Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD)
Time (mean ± σ): 14.8 ms ± 0.2 ms [User: 7.1 ms, System: 7.5 ms]
Range (min … max): 14.2 ms … 15.2 ms 82 runs
Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD)
Time (mean ± σ): 927.6 ms ± 5.3 ms [User: 437.8 ms, System: 489.5 ms]
Range (min … max): 919.4 ms … 936.4 ms 10 runs
Summary
update-ref: create many refs (refcount = 1, revision = HEAD) ran
1.00 ± 0.07 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~)
2.89 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD)
3.74 ± 0.25 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~)
181.26 ± 8.30 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD)
261.01 ± 12.35 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We're about to introduce a stat(3P)-based caching mechanism to reload
the list of stacks only when it has changed. In order to avoid race
conditions this requires us to have a file descriptor available that we
can use to call fstat(3P) on.
Prepare for this by converting the code to use `fd_read_lines()` so that
we have the file descriptor readily available.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The `reftable_stack_reload_maybe_reuse()` function is responsible for
reloading the reftable list from disk. The function is quite hard to
follow though because it has a bunch of different exit paths, many of
which have to free the same set of resources.
Refactor the function to have a common exit path. While at it, touch up
the style of this function a bit to match our usual coding style better.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In 5c086453ff (reftable/stack: perform auto-compaction with
transactional interface, 2023-12-11), we fixed a bug where the
transactional interface to add changes to a reftable stack did not
perform auto-compaction by calling `reftable_stack_auto_compact()` in
`reftable_stack_addition_commit()`. While correct, this change may now
cause us to perform auto-compaction twice in the non-transactional
interface `reftable_stack_add()`:
- It performs auto-compaction by itself.
- It now transitively performs auto-compaction via the transactional
interface.
Remove the first instance so that we only end up doing auto-compaction
once.
Reported-by: Han-Wen Nienhuys <hanwenn@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to compact multiple stacks we iterate through the merged ref
and log records. When there is any error either when reading the records
from the old merged table or when writing the records to the new table
then we break out of the respective loops. When breaking out of the loop
for the ref records though the error code will be overwritten, which may
cause us to inadvertently skip over bad ref records. In the worst case,
this can lead to a compacted stack that is missing records.
Fix the code by using `goto done` instead so that any potential error
codes are properly returned to the caller.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When writing a new reftable stack, Git will first create the stack with
a random suffix so that concurrent updates will not try to write to the
same file. This random suffix is computed via a call to rand(3P). But we
never seed the function via srand(3P), which means that the suffix is in
fact always the same.
Fix this bug by using `git_rand()` instead, which does not need to be
initialized. While this function is likely going to be slower depending
on the platform, this slowness should not matter in practice as we only
use it when writing a new reftable stack.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When starting a transaction via `reftable_stack_init_addition()`, we
create a lockfile for the reftable stack itself which we'll write the
new list of tables to. But if we terminate abnormally e.g. via a call to
`die()`, then we do not remove the lockfile. Subsequent executions of
Git which try to modify references will thus fail with an out-of-date
error.
Fix this bug by registering the lock as a `struct tempfile`, which
ensures automatic cleanup for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In `reftable_stack_reload_once()` we iterate over all the tables added
to the stack in order to figure out whether any of the tables needs to
be reloaded. We use a set of buffers in this context to compute the
paths of these tables, but discard those buffers on every iteration.
This is quite wasteful given that we do not need to transfer ownership
of the allocated buffer outside of the loop.
Refactor the code to instead reuse the buffers to reduce the number of
allocations we need to do. Note that we do not have to manually reset
the buffer because `stack_filename()` does this for us already.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Whenever updating references or reflog entries in the reftable stack, we
need to add a new table to the stack, thus growing the stack's length by
one. The stack can grow to become quite long rather quickly, leading to
performance issues when trying to read records. But besides performance
issues, this can also lead to exhaustion of file descriptors very
rapidly as every single table requires a separate descriptor when
opening the stack.
While git-pack-refs(1) fixes this issue for us by merging the tables, it
runs too irregularly to keep the length of the stack within reasonable
limits. This is why the reftable stack has an auto-compaction mechanism:
`reftable_stack_add()` will call `reftable_stack_auto_compact()` after
its added the new table, which will auto-compact the stack as required.
But while this logic works alright for `reftable_stack_add()`, we do not
do the same in `reftable_addition_commit()`, which is the transactional
equivalent to the former function that allows us to write multiple
updates to the stack atomically. Consequentially, we will easily run
into file descriptor exhaustion in code paths that use many separate
transactions like e.g. non-atomic fetches.
Fix this issue by calling `reftable_stack_auto_compact()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
| |
There are calls to write(3P) where we don't properly handle interrupts.
Convert them to use `write_in_full()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
|
|
|
|
|
|
|
| |
There are calls to pread(3P) and read(3P) where we don't properly handle
interrupts. Convert them to use `pread_in_full()` and `read_in_full()`,
respectively.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Problems identified by Coverity in the reftable code have been
corrected.
* hn/reftable-coverity-fixes:
reftable: add print functions to the record types
reftable: make reftable_record a tagged union
reftable: remove outdated file reftable.c
reftable: implement record equality generically
reftable: make reftable-record.h function signatures const correct
reftable: handle null refnames in reftable_ref_record_equal
reftable: drop stray printf in readwrite_test
reftable: order unittests by complexity
reftable: all xxx_free() functions accept NULL arguments
reftable: fix resource warning
reftable: ignore remove() return value in stack_test.c
reftable: check reftable_stack_auto_compact() return value
reftable: fix resource leak blocksource.c
reftable: fix resource leak in block.c error path
reftable: fix OOB stack write in print functions
|
| |
| |
| |
| |
| |
| |
| |
| | |
This would trigger in the unlikely event that we are compacting, and
the next available file handle is 0.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|/
|
|
|
|
|
|
| |
Create files with mode 0666, so umask works as intended. Provides an override,
which is useful to support shared repos (test t1301-shared-repo.sh).
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|