git - git

	Commit message (Collapse)	Author	Files	Lines
2022-10-28	The eighth batch	Junio C Hamano	1	-15/+15
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-28	The seventh batch	Junio C Hamano	1	-0/+19
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-28	Downmerge a bit more for 2.38.2	Junio C Hamano	1	-0/+13
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-28	Makefile: force -O0 when compiling with SANITIZE=leak	Jeff King	1	-0/+1
	Cherry pick commit d3775de0 (Makefile: force -O0 when compiling with SANITIZE=leak, 2022-10-18), as otherwise the leak checker at GitHub Actions CI seems to fail with a false positive. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	branch: error code with --edit-description	Rubén Justo	2	-2/+2
	Since c2d17ba3db0d (branch --edit-description: protect against mistyped branch name, 2012-02-05) we return -1 on error editing the branch description. Let's change to 1, which follows the established convention and it is better for portability reasons. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	branch: error copying or renaming a detached HEAD	Rubén Justo	2	-20/+19
	In c847f53712 (Detached HEAD (experimental), 2007-01-01) an error condition was introduced in rename_branch() to prevent renaming, later also copying, a detached HEAD. The condition used was checking for NULL in oldname, the source branch to rename/copy. That condition cannot be satisfied because if no source branch is specified, HEAD is going to be used in the call. The error issued instead is: fatal: Invalid branch name: 'HEAD' Let's remove the condition in copy_or_rename_branch() (the current function name) and check for HEAD before calling it, dying with the original intended error if we're in a detached HEAD. Signed-off-by: Rubén Justo <rjusto@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	negotiator/skipping: avoid stack overflow	Jonathan Tan	1	-12/+17
	mark_common() in negotiator/skipping.c may overflow the stack due to recursive function calls. Avoid this by instead recursing using a heap-allocated data structure. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	The sixth batch	Junio C Hamano	1	-0/+14
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	Downmerge a handful of topics for 2.38.2	Junio C Hamano	2	-1/+48

2022-10-26	Documentation: add lint-fsck-msgids	Junio C Hamano	2	-0/+81
	During the initial development of the fsck-msgids.txt feature, it has become apparent that it is very much error prone to make sure the description in the documentation file are sorted and correctly match what is in the fsck.h header file. Add a quick-and-dirty Perl script and doc-lint target to sanity check that the fsck-msgids.txt is consistent with the error type list in the fsck.h header file. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	fsck: document msg-id	John Cai	4	-0/+183
	The documentation lacks mention of specific <msg-id> that are supported. While git-help --config will display a list of these options, often developers' first instinct is to consult the git docs to find valid config values. Add a list of fsck error messages, and link to it from the git-fsck documentation. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	fsck: remove the unused MISSING_TREE_OBJECT	Junio C Hamano	1	-1/+0
	This error type has never been used since it was introduced at 159e7b08 (fsck: detect gitmodules files, 2018-05-02). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	fsck: remove the unused BAD_TAG_OBJECT	John Cai	1	-1/+0
	2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped checking the tagged object referred to by a tag object, which is what the error message BAD_TAG_OBJECT was for. Since then the BAD_TAG_OBJECT message is no longer used anywhere. Remove the BAD_TAG_OBJECT msg-id. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26	apply: reject patches larger than ~1 GiB	Taylor Blau	2	-1/+34
	The apply code is not prepared to handle extremely large files. It uses "int" in some places, and "unsigned long" in others. This combination leads to unfortunate problems when switching between the two types. Using "int" prevents us from handling large files, since large offsets will wrap around and spill into small negative values, which can result in wrong behavior (like accessing the patch buffer with a negative offset). Converting from "unsigned long" to "int" also has truncation problems even on LLP64 platforms where "long" is the same size as "int", since the former is unsigned but the latter is not. To avoid potential overflow and truncation issues in `git apply`, apply similar treatment as in dcd1742e56 (xdiff: reject files larger than ~1GB, 2015-09-24), where the xdiff code was taught to reject large files for similar reasons. The maximum size was chosen somewhat arbitrarily, but picking a value just shy of a gigabyte allows us to double it without overflowing 2^31-1 (after which point our value would wrap around to a negative number). To give ourselves a bit of extra margin, the maximum patch size is a MiB smaller than a full GiB, which gives us some slop in case we allocate "(records + 1) * sizeof(int)" or similar. Luckily, the security implications of these conversion issues are relatively uninteresting, because a victim needs to be convinced to apply a malicious patch. Reported-by: 정재우 <thebound7@gmail.com> Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25	embargoed releases: also describe the git-security list and the process	Julia Ramer	1	-25/+140
	With the recent turnover on the git-security list, questions came up how things are usually run. Rather than answering questions individually, extend Git's existing documentation about security vulnerabilities to describe the git-security mailing list, how things are run on that list, and what to expect throughout the process from the time a security bug is reported all the way to the time when a fix is released. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Julia Ramer <gitprplr@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25	builtin: patch-id: remove unused diff-tree prefix	Jerry Zhang	1	-2/+2
	The last git version that had "diff-tree" in the header text of "git diff-tree" output was v1.3.0 from 2006. The header text was changed from "diff-tree" to "commit" in 91539833 ("Log message printout cleanups"). Given how long ago this change was made, it is highly unlikely that anyone is still feeding in outputs from that git version. Remove the handling of the "diff-tree" prefix and document the source of the other prefixes so that the overall functionality is more clear. Signed-off-by: Jerry Zhang <Jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25	builtin: patch-id: add --verbatim as a command mode	Jerry Zhang	3	-39/+124
	There are situations where the user might not want the default setting where patch-id strips all whitespace. They might be working in a language where white space is syntactically important, or they might have CI testing that enforces strict whitespace linting. In these cases, a whitespace change would result in the patch fundamentally changing, and thus deserving of a different id. Add a new mode that is exclusive of --stable and --unstable called --verbatim. It also corresponds to the config patchid.verbatim = true. In this mode, the stable algorithm is used and whitespace is not stripped from the patch text. Users of --unstable mainly care about compatibility with old git versions, which unstripping the whitespace would break. Thus there isn't a usecase for the combination of --verbatim and --unstable, and we don't expose this so as to not add maintainence burden. Signed-off-by: Jerry Zhang <jerry@skydio.com> fixes https://github.com/Skydio/revup/issues/2 Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25	patch-id: fix patch-id for mode changes	Jerry Zhang	2	-1/+35
	Currently patch-id as used in rebase and cherry-pick does not account for file modes if the file is modified. One consequence of this is that if you have a local patch that changes modes, but upstream has applied an outdated version of the patch that doesn't include that mode change, "git rebase" will drop your local version of the patch along with your mode changes. It also means that internal patch-id doesn't produce the same output as the builtin, which does account for mode changes due to them being part of diff output. Fix by adding mode to the patch-id if it has changed, in the same format that would be produced by diff, so that it is compatible with builtin patch-id. Signed-off-by: Jerry Zhang <Jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25	builtin: patch-id: fix patch-id with binary diffs	Jerry Zhang	2	-3/+62
	"git patch-id" currently doesn't produce correct output if the incoming diff has any binary files. Add logic to get_one_patchid to handle the different possible styles of binary diff. This attempts to keep resulting patch-ids identical to what would be produced by the counterpart logic in diff.c, that is it produces the id by hashing the a and b oids in succession. In general we handle binary diffs by first caching the object ids from the "index" line and using those if we then find an indication that the diff is binary. The input could contain patches generated with "git diff --binary". This currently breaks the parse logic and results in multiple patch-ids output for a single commit. Here we have to skip the contents of the patch itself since those do not go into the patch id. --binary implies --full-index so the object ids are always available. When the diff is generated with --full-index there is no patch content to skip over. When a diff is generated without --full-index or --binary, it will contain abbreviated object ids. This will still result in a sufficiently unique patch-id when hashed, but does not match internal patch id output. We'll call this ok for now as we already need specialized arguments to diff in order to match internal patch id (namely -U3). Signed-off-by: Jerry Zhang <Jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25	patch-id: use stable patch-id for rebases	Jerry Zhang	5	-16/+12
	Git doesn't persist patch-ids during the rebase process, so there is no need to specifically invoke the unstable variant. Use the stable logic for all internal patch-id calculations to minimize the number of code paths and improve test coverage. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25	patch-id: fix stable patch id for binary / header-only	Jerry Zhang	2	-39/+53
	Patch-ids for binary patches are found by hashing the object ids of the before and after objects in succession. However in the --stable case, there is a bug where hunks are not flushed for binary and header-only patch ids, which would always result in a patch-id of 0000. The --unstable case is currently correct. Reorder the logic to branch into 3 cases for populating the patch body: header-only which populates nothing, binary which populates the object ids, and normal which populates the text diff. All branches will end up flushing the hunk. Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond to those lines not being present in the "git diff" text output. This is necessary because we advertise that the patch-id calculated internally and used in format-patch is the same that what the builtin "git patch-id" would produce when piped from a diff. Update the test to run on both binary and normal files. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	shortlog: implement `--group=committer` in terms of `--group=<format>`	Taylor Blau	1	-11/+3
	In the same spirit as the previous commit, reimplement `--group=committer` as a special case of `--group=<format>`, too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	shortlog: implement `--group=author` in terms of `--group=<format>`	Taylor Blau	1	-9/+4
	Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as a special case of the new `--group=<format>` mode, where the author mode is a shorthand for `--group='%aN <%aE>'. Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it has a different meaning in `read_from_stdin()`, where it is still used for a different purpose. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	shortlog: extract `shortlog_finish_setup()`	Taylor Blau	3	-1/+8
	Extract a function which finishes setting up the shortlog struct for use. The caller in `make_cover_letter()` does not care about trailer sorting, so it isn't strictly necessary to add a call there in this patch. But the next patch will add additional functionality to the new `shortlog_finish_setup()` function, which the caller in `make_cover_letter()` will care about. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	shortlog: support arbitrary commit format `--group`s	Taylor Blau	4	-2/+78
	In addition to generating a shortlog based on committer, author, or the identity in one or more specified trailers, it can be useful to generate a shortlog based on an arbitrary commit format. This can be used, for example, to generate a distribution of commit activity over time, like so: $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0.. 117 2022-06 274 2022-07 324 2022-08 263 2022-09 7 2022-10 Arbitrary commit formats can be used. In fact, `git shortlog`'s default behavior (to count by commit authors) can be emulated as follows: $ git shortlog --group='%aN <%aE>' ... and future patches will make the default behavior (as well as `--committer`, and `--group=trailer:<trailer>`) special cases of the more flexible `--group` option. Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to designate that `--group:<format>` is in use when in stdin mode to declare that the combination is invalid. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	shortlog: extract `--group` fragment for translation	Taylor Blau	1	-1/+1
	The subsequent commit will add another unhandled case in `read_from_stdin()` which will want to use the same message as with `--group=trailer`. Extract the "--group=trailer" part from this message so the same translation key can be used for both cases. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	shortlog: make trailer insertion a noop when appropriate	Taylor Blau	1	-3/+4
	When there are no trailers to insert, it is natural that insert_records_from_trailers() should return without having done any work. But instead we guard this call unnecessarily by first checking whether `log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set. Prepare to match a similar pattern in the future where a function which inserts records of a certain type does no work when no specifiers matching that type are given. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	shortlog: accept `--date`-related options	Jeff King	4	-1/+16
	Prepare for a future patch which will introduce arbitrary pretty formats via the `--group` argument. To allow additional customizability (for example, to support something like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which groups commits by the datestring 'YYYY-mm' according to author date), we must store off the `--date` parsed from calling `parse_revision_opt()`. Note that this also affects custom output `--format` strings in `git shortlog`. Though this is a behavior change, this is arguably fixing a long-standing bug (ie., that `--format` strings are not affected by `--date` specifiers as they should be). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	trace2: add global counter mechanism	Jeff Hostetler	13	-7/+517
	Add global counters mechanism to Trace2. The Trace2 counters mechanism adds the ability to create a set of global counter variables and an API to increment them efficiently. Counters can optionally report per-thread usage in addition to the sum across all threads. Counter events are emitted to the Trace2 logs when a thread exits and at process exit. Counters are an alternative to `data` and `data_json` events. Counters are useful when you want to measure something across the life of the process, when you don't want per-measurement events for performance reasons, when the data does not fit conveniently within a region, or when your control flow does not easily let you write the final total. For example, you might use this to report the number of calls to unzip() or the number of de-delta steps during a checkout. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	trace2: add stopwatch timers	Jeff Hostetler	15	-0/+786
	Add stopwatch timer mechanism to Trace2. Timers are an alternative to Trace2 Regions. Regions are useful for measuring the time spent in various computation phases, such as the time to read the index, time to scan for unstaged files, time to scan for untracked files, and etc. However, regions are not appropriate in all places. For example, during a checkout, it would be very inefficient to use regions to measure the total time spent inflating objects from the ODB from across the entire lifetime of the process; a per-unzip() region would flood the output and significantly slow the command; and some form of post-processing would be requried to compute the time spent in unzip(). Timers can be used to measure a series of timer intervals and emit a single summary event (at thread and/or process exit). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	trace2: convert ctx.thread_name from strbuf to pointer	Jeff Hostetler	4	-10/+12
	Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf` to a "const char*" pointer. The `thread_name` field is a constant string that is constructed when the context is created. Using a (non-const) `strbuf` structure for it caused some confusion in the past because it implied that someone could rename a thread after it was created. That usage was not intended. Change it to a const pointer to make the intent more clear. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	trace2: improve thread-name documentation in the thread-context	Jeff Hostetler	1	-6/+9
	Improve the documentation of the tr2tls_thread_ctx.thread_name field and its relation to the tr2tls_thread_ctx.thread_id field. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	trace2: rename the thread_name argument to trace2_thread_start	Jeff Hostetler	4	-11/+12
	Rename the `thread_name` argument in `tr2tls_create_self()` and `trace2_thread_start()` to be `thread_base_name` to make it clearer that the passed argument is a component used in the construction of the actual `struct tr2tls_thread_ctx.thread_name` variable. The base name will be used along with the thread id to create a unique thread name. This commit does not change how the `thread_name` field is allocated or stored within the `tr2tls_thread_ctx` structure. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	api-trace2.txt: elminate section describing the public trace2 API	Jeff Hostetler	1	-54/+7
	Eliminate the mostly obsolete `Public API` sub-section from the `Trace2 API` section in the documentation. Strengthen the referral to `trace2.h`. Most of the technical information in this sub-section was moved to `trace2.h` in 6c51cb525d (trace2: move doc to trace2.h, 2019-11-17) to be adjacent to the function prototypes. The remaining text wasn't that useful by itself. Furthermore, the text would need a bit of overhaul to add routines that do not immediately generate a message, such as stopwatch timers. So it seemed simpler to just get rid of it. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	tr2tls: clarify TLS terminology	Jeff Hostetler	5	-20/+24
	Reduce or eliminate use of the term "TLS" in the Trace2 code. The term "TLS" has two popular meanings: "thread-local storage" and "transport layer security". In the Trace2 source, the term is associated with the former. There was concern on the mailing list about it refering to the latter. Update the source and documentation to eliminate the use of the "TLS" term or replace it with the phrase "thread-local storage" to reduce ambiguity. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24	trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx	Jeff Hostetler	1	-2/+2
	Use "size_t" rather than "int" for the "alloc" and "nr_open_regions" fields in the "tr2tls_thread_ctx". These are used by ALLOC_GROW(). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23	submodule: use strvec_pushf() for --super-prefix	René Scharfe	1	-9/+3
	absorb_git_dir_into_superproject() uses a strbuf and strvec_pushl() to build and add the --super-prefix option and its argument. Use a single strvec_pushf() call to add the stuck form instead, which reduces the code size and avoids a strbuf allocation and release. The same is already done in submodule_reset_index() and submodule_move_head(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23	t7700: annotate cruft-pack failure with ok=sigpipe	Jeff King	1	-1/+1
	One of our tests intentionally causes the cruft-pack generation phase of repack to fail, in order to stimulate an exit from repack at the desired moment. It does so by feeding a bogus option argument to pack-objects. This is a simple and reliable way to get pack-objects to fail, but it has one downside: pack-objects will die before reading its stdin, which means the caller repack may racily get SIGPIPE writing to it. For the purposes of this test, that's OK. We are checking whether repack cleans up already-created .tmp files, and it will do so whether it exits or dies by signal (because the tempfile API hooks both). But we have to tell test_must_fail that either outcome is OK, or it complains about the signal. Arguably this is a workaround (compared to fixing repack), as repack dying to SIGPIPE means that it loses the opportunity to give a more detailed message. But we don't actually write such a message anyway; we rely on pack-objects to have written something useful to stderr, and it does. In either case (signal or exit), that is the main thing the user will see. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23	merge-tree: support multiple batched merges with --stdin	Elijah Newren	3	-4/+109
	Add an option, --stdin, to merge-tree which will accept lines of input with two branches to merge per line, and which will perform all the merges and give output for each in turn. This option implies -z, and modifies the output to also include a merge status since the exit code of the program can no longer convey that information now that multiple merges are involved. This could be useful, for example, by Git hosting providers. When one branch is updated, one may want to check whether all code reviews targetting that branch can still cleanly merge. Avoiding the overhead of starting up a separate process for each of those code reviews might provide significant savings in a repository with many code reviews. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23	merge-tree: update documentation for differences in -z output	Elijah Newren	1	-7/+39
	The Informational Messages was updated in de90581141 ("merge-ort: optionally produce machine-readable output", 2022-06-18) to provide more detailed and machine parseable output when `-z` is passed, but the Documentation was not updated to reflect these changes. Update it now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23	Git.pm: trust rev-parse to find bare repositories	Jeff King	3	-20/+32
	When initializing a repository object, we run "git rev-parse --git-dir" to let the C version of Git find the correct directory. But curiously, if this fails we don't automatically say "not a git repository". Instead, we do our own pure-perl check to see if we're in a bare repository. This makes little sense, as rev-parse will report both bare and non-bare directories. This logic comes from d5c7721d58 (Git.pm: Add support for subdirectories inside of working copies, 2006-06-24), but I don't see any reason given why we can't just rely on rev-parse. Worse, because we treat any non-error response from rev-parse as a non-bare repository, we'll erroneously set the object's WorkingCopy, even in a bare repository. But it gets worse. Since 8959555cee (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), it's actively wrong (and dangerous). The perl code doesn't implement the same ownership checks. And worse, after "finding" the bare repository, it sets GIT_DIR in the environment, which tells any subsequent Git commands that we've confirmed the directory is OK, and to trust us. I.e., it re-opens the vulnerability plugged by 8959555cee when using Git.pm's repository discovery code. We can fix this by just relying on rev-parse to tell us when we're not in a repository, which fixes the vulnerability. Furthermore, we'll ask its --is-bare-repository function to tell us if we're bare or not, and rely on that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23	merge-ort: fix bug with dir rename vs change dir to symlink	Elijah Newren	2	-2/+90
	When changing a directory to a symlink on one side of history, and renaming the parent of that directory to a different directory name on the other side, e.g. with this kind of setup: Base commit: Has a file named dir/subdir/file Side1: Rename dir/ -> renamed-dir/ Side2: delete dir/subdir/file, add dir/subdir as symlink Then merge-ort was running into an assertion failure: git: merge-ort.c:2622: apply_directory_rename_modifications: Assertion `ci->dirmask == 0' failed merge-recursive did not have as obvious an issue handling this case, likely because we never fixed it to handle the case from commit 902c521a35 ("t6423: more involved directory rename test", 2020-10-15) where we need to be careful about nested renames when a directory rename occurs (dir/ -> renamed-dir/ implies dir/subdir/ -> renamed-dir/subdir/). However, merge-recursive does have multiple problems with this testcase: * Incorrect stages for the file: merge-recursive omits the stage in the index corresponding to the base stage, making `git status` report "added by us" for renamed-dir/subdir/file instead of the expected "deleted by them". * Poor directory/file conflict handling: For the renamed-dir/subdir symlink, instead of reporting a file/directory conflict as expected, it reports "Error: Refusing to lose untracked file at renamed-dir/subdir". This is a lie because there is no untracked file at that location. It then does the normal suboptimal merge-recursive thing of having the symlink be tracked in the index at a location where it can't be written due to D/F conflicts (namely, renamed-dir/subdir), but writes it to the working tree at a different location as a new untracked file (namely, renamed-dir/subdir~B^0) Technically, these problems don't prevent the user from resolving the merge if they can figure out to ignore the confusion, but because both pieces of output are quite confusing I don't want to modify the test to claim the recursive also passes it even if it doesn't have the bug that ort did. So, fix the bug in ort by splitting the conflict_info for "dir/subdir" into two, one for the directory part, one for the file (i.e. symlink) part, since the symlink is being renamed by directory rename detection. The directory part is needed for proper nesting, since there are still conflict_info fields for files underneath it (though those are marked as is_null, they are still present until the entries are processed, and the entry processing wants every non-toplevel entry to have a parent directory). Reported-by: Stefano Rivera <stefano@rivera.za.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22	repack: drop remove_temporary_files()	Jeff King	2	-32/+8
	After we've successfully finished the repack, we call remove_temporary_files(), which looks for and removes any files matching ".tmp-$$-pack-", where $$ is the pid of the current process. But this is pointless. If we make it this far in the process, we've already renamed these tempfiles into place, and there is nothing left to delete. Nor is there a point in trying to call it to clean up when we _aren't_ successful. It's not safe for using in a signal handler, and the previous commit already handed that job over to the tempfile API. It might seem like it would be useful to clean up stray .tmp files left by other invocations of git-repack. But it won't clean those files; it only matches ones with its pid, and leaves the rest. Fortunately, those are cleaned up naturally by successive calls to git-repack; we'll consider .tmp-.pack the same as normal packfiles, so "repack -ad", etc, will roll up their contents and eventually delete them. The one case that could matter is if pack-objects generates an extension we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current code will quietly delete such a file, while after this patch we'd leave it in place. In practice this doesn't happen, and would be indicative of a bug. Leaving the file as cruft is arguably a better behavior, as it means somebody is more likely to eventually notice and fix the bug. If we really wanted to be paranoid, we could scan for and warn about such files, but that seems like overkill. There's nothing to test with regard to the removal of this function. It was doing nothing, so the behavior should be the same. However, we can verify (and protect) our assumption that "repack -ad" will eventually remove stray files by adding a test for that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22	repack: use tempfiles for signal cleanup	Jeff King	2	-18/+16
	When git-repack exits due to a signal, it tries to clean up by calling its remove_temporary_files() function, which walks through the packs dir looking for ".tmp-$$-pack-" files to delete (where "$$" is the pid of the current process). The biggest problem here is that remove_temporary_files() is not safe to call in a signal handler. It uses opendir(), which isn't on the POSIX async-signal-safe list. The details will be platform-specific, but a likely issue is that it needs to allocate memory; if we receive a signal while inside malloc(), etc, we'll conflict on the allocator lock and deadlock with ourselves. We can fix this by just cleaning up the files directly, without walking the directory. We already know the complete list of .tmp- files that were generated, because we recorded them via populate_pack_exts(). When we find files there, we can use register_tempfile() to record the filenames. If we receive a signal, then the tempfile API will clean them up for us, and it's async-safe and pretty battle-tested. Note that this is slightly racier than the existing scheme. We don't record the filenames until pack-objects tells us the hash over stdout. So during the period between it generating the file and reporting the hash, we'd fail to clean up. However, that period is very small. During most of the pack generation process pack-objects is using its own internal tempfiles. It's only at the very end that it moves them into the names git-repack expects, and then it immediately reports the name to us. Given that cleanup like this is best effort (after all, we may get SIGKILL), this level of race is acceptable. When we register the tempfiles, we'll record them locally and use the result to call rename_tempfile(), rather than renaming by hand. This isn't strictly necessary, as once we've renamed the files they're gone, and the tempfile API's cleanup unlink() would simply become a pointless noop. But managing the lifetimes of the tempfile objects is the cleanest thing to do, and the tempfile pointers naturally fill the same role as the old booleans. This patch also fixes another small problem. We only hook signals, and don't set up an atexit handler. So if we see an error that causes us to die(), we'll leave the .tmp-* files in place. But since the tempfile API handles this for us, this is now fixed for free. The new test covers this by stimulating a failure of pack-objects when generating a cruft pack. Before this patch, the .tmp-* file for the main pack would have been left, but now we correctly clean it up. Two small subtleties on the implementation: - in the renaming loop, we can stop re-constructing fname_old; we only use it when we have a tempfile to rename, so we can just ask the tempfile for its path (which, barring bugs, should be identical) - when renaming fails, our error message mentions fname_old. But since a failed rename_tempfile() invalidates the tempfile struct, we'll lose access to that string. Instead, let's mention the destination filename, which is what most other callers do. Reported-by: Jan Pokorný <poki@fnusa.cz> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22	repack: expand error message for missing pack files	Jeff King	1	-1/+2
	If pack-objects tells us it generated pack $hash, we expect to find .tmp-$$-pack-$hash.pack, .idx, .rev, and so on. Some of these files are optional, but others are not. For the required ones, we'll bail with an error if any of them is missing. The error message is just "missing required file", which is a bit vague. We should be more clear that it is not the user's fault, but rather that the sub-pgoram we called is not operating as expected. In practice, nobody should ever see this message, as it would generally only be caused by a bug in Git. It probably doesn't make sense to convert this to a BUG(), though, as there are other (unlikely) possibilities, such as somebody else racily deleting the files, filesystem errors causing stat() to fail, and so on. A nice side effect here is that we stop relying on fname_old in this code path, which will let us deal with it only in the first part of the conditional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22	repack: populate extension bits incrementally	Jeff King	1	-6/+9
	After generating the main pack and then any additional cruft packs, we iterate over the "names" list (which contains hashes of packs generated by pack-objects), and call populate_pack_exts() for each. There's one small problem with this. In repack_promisor_objects(), we may add entries to "names" and call populate_pack_exts() for them. Calling it again is mostly just wasteful, as we'll stat() the filename with each possible extension, get the same result, and just overwrite our bits. So we could drop the call there, and leave the final loop to populate all of the bits. But instead, this patch does the reverse: drops the final loop, and teaches the other two sites to populate the bits as they add entries. This makes the code easier to reason about, as you never have to worry about when the util field is valid; it is always valid for each entry. It also serves my ulterior purpose: recording the generated filenames as soon as possible will make it easier for a future patch to use them for cleaning up from a failed operation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22	repack: convert "names" util bitfield to array	Jeff King	1	-8/+14
	We keep a string_list "names" containing the hashes of packs generated on our behalf by pack-objects. The util field of each item is treated as a bitfield that tells us which extensions (.pack, .idx, .rev, etc) are present for each name. Let's switch this to allocating a real array. That will give us room in a future patch to store more data than just a single bit per extension. And it makes the code a little easier to read, as we avoid casting back and forth between uintptr_t and a void pointer. Since the only thing we're storing is an array, we could just allocate it directly. But instead I've put it into a named struct here. That further increases readability around the casts, and in particular helps differentiate us from other string_lists in the same file which use their util field differently. E.g., the existing_*_packs lists still do bit-twiddling, but their bits have different meaning than the ones in "names". This makes it hard to grep around the code to see how the util fields are used; now you can look for "generated_pack_data". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22	diff: leave NEEDWORK notes in show_stats() function	Junio C Hamano	1	-0/+15
	The previous step made an attempt to correctly compute display columns allocated and padded different parts of diffstat output. There are at least two known codepaths in the function that still mixes up display widths and byte length that need to be fixed. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21	subtree: fix split after annotated tag was squashed merged	Philippe Blain	3	-9/+36
	The previous commit fixed a failure in 'git subtree merge --squash' when the previous squash-merge merged an annotated tag of the subtree repository which is missing locally. The same failure happens in 'git subtree split', either directly or when called by 'git subtree push', under the same circumstances: 'cmd_split' invokes 'find_existing_splits', which loops through previous commits and invokes 'git rev-parse' (via 'process_subtree_split_trailer') on the value of any 'git subtree-split' trailer it finds. This fails if this value is the hash of an annotated tag which is missing locally. Add a new optional argument 'repository' to 'cmd_split' and 'find_existing_splits', and invoke 'cmd_split' with that argument from 'cmd_push'. This allows 'process_subtree_split_trailer' to try to fetch the missing tag from the 'repository' if it's not available locally, mirroring the new behaviour of 'git subtree pull' and 'git subtree merge'. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21	subtree: fix squash merging after annotated tag was squashed merged	Philippe Blain	3	-15/+86
	When 'git subtree merge --squash $ref' is invoked, either directly or through 'git subtree pull --squash $repo $ref', the code looks for the latest squash merge of the subtree in order to create the new merge commit as a child of the previous squash merge. This search is done in function 'process_subtree_split_trailer', invoked by 'find_latest_squash', which looks for the most recent commit with a 'git-subtree-split' trailer; that trailer's value is the object name in the subtree repository of the ref that was last squash-merged. The function verifies that this object is present locally with 'git rev-parse', and aborts if it's not. The hash referenced by the 'git-subtree-split' trailer is guaranteed to correspond to a commit since it is the result of running 'git rev-parse -q --verify "$1^{commit}"' on the first argument of 'cmd_merge' (this corresponds to 'rev' in 'cmd_merge' which is passed through to 'new_squash_commit' and 'squash_msg'). But this is only the case since e4f8baa88a (subtree: parse revs in individual cmd_ functions, 2021-04-27), which went into Git 2.32. Before that commit, 'cmd_merge' verified the revision it was given using 'git rev-parse --revs-only "$@"'. Such an invocation, when fed the name of an annotated tag, would return the hash of the tag, not of the commit referenced by the tag. This leads to a failure in 'find_latest_squash' when squash-merging if the most recent squash-merge merged an annotated tag of the subtree repository, using a pre-2.32 version of 'git subtree', unless that previous annotated tag is present locally (which is not usually the case). We can fix this by fetching the object directly by its hash in 'process_subtree_split_trailer' when 'git rev-parse' fails, but in order to do so we need to know the name or URL of the subtree repository. This is not possible in general for 'git subtree merge', but is easy when it is invoked through 'git subtree pull' since in that case the subtree repository is passed by the user at the command line. Allow the 'git subtree pull' scenario to work out-of-the-box by adding an optional 'repository' argument to functions 'cmd_merge', 'find_latest_squash' and 'process_subtree_split_trailer', and invoke 'cmd_merge' with that 'repository' argument in 'cmd_pull'. If 'repository' is absent in 'process_subtree_split_trailer', instruct the user to try fetching the missing object directly. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>