git - git

	Commit message (Collapse)	Author	Files	Lines
2023-10-29	The twenty-second batch	Junio C Hamano	1	-0/+19
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-29	reflog: fix expire --single-worktree	René Scharfe	2	-3/+26
	33d7bdd645 (builtin/reflog.c: use parse-options api for expire, delete subcommands, 2022-01-06) broke the option --single-worktree of git reflog expire and added a non-printable short flag for it, presumably by accident. While before it set the variable "all_worktrees" to 0, now it sets it to 1, its default value. --no-single-worktree is required now to set it to 0. Fix it by replacing the variable with one that has the opposite meaning, to avoid the negation and its potential for confusion. The new variable "single_worktree" directly captures whether --single-worktree was given. Also remove the unprintable short flag SOH (start of heading) because it is undocumented, hard to use and is likely to have been added by mistake in connection with the negation bug above. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-29	am, rebase: fix arghelp syntax of --empty	René Scharfe	3	-4/+4
	Use parentheses and pipes to present alternatives in the argument help for the --empty options of git am and git rebase, like in the rest of the documentation. While at it remove a stray use of the enum empty_action value STOP_ON_EMPTY_COMMIT to indicate that no short option is present. While it has a value of 0 and thus there is no user-visible change, that enum is not meant to hold short option characters. Hard-code 0, like we do for other options without a short option. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-29	am: simplify --show-current-patch handling	René Scharfe	1	-68/+44
	Let the parse-options code detect and handle the use of options that are incompatible with --show-current-patch. This requires exposing the distinction between the "raw" and "diff" sub-modes. Do that by splitting the mode RESUME_SHOW_PATCH into RESUME_SHOW_PATCH_RAW and RESUME_SHOW_PATCH_DIFF and stop tracking sub-modes in a separate struct. The result is a simpler callback function and more precise error messages. The original reports a spurious argument or a NULL pointer: $ git am --show-current-patch --show-current-patch=diff error: options '--show-current-patch=diff' and '--show-current-patch=raw' cannot be used together $ git am --show-current-patch=diff --show-current-patch error: options '--show-current-patch=(null)' and '--show-current-patch=diff' cannot be used together With this patch we get the more precise: $ git am --show-current-patch --show-current-patch=diff error: --show-current-patch=diff is incompatible with --show-current-patch $ git am --show-current-patch=diff --show-current-patch error: --show-current-patch is incompatible with --show-current-patch=diff Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-29	parse-options: make CMDMODE errors more precise	René Scharfe	4	-57/+139
	Only a single PARSE_OPT_CMDMODE option can be specified for the same variable at the same time. This is enforced by get_value(), but the error messages are imprecise in three ways: 1. If a non-PARSE_OPT_CMDMODE option changes the value variable of a PARSE_OPT_CMDMODE option then an ominously vague message is shown: $ t/helper/test-tool parse-options --set23 --mode1 error: option `mode1' : incompatible with something else Worse: If the order of options is reversed then no error is reported at all: $ t/helper/test-tool parse-options --mode1 --set23 boolean: 0 integer: 23 magnitude: 0 timestamp: 0 string: (not set) abbrev: 7 verbose: -1 quiet: 0 dry run: no file: (not set) Fortunately this can currently only happen in the test helper; actual Git commands don't share the same variable for the value of options with and without the flag PARSE_OPT_CMDMODE. 2. If there are multiple options with the same value (synonyms), then the one that is defined first is shown rather than the one actually given on the command line, which is confusing: $ git am --resolved --quit error: option `quit' is incompatible with --continue 3. Arguments of PARSE_OPT_CMDMODE options are not handled by the parse-option machinery. This is left to the callback function. We currently only have a single affected option, --show-current-patch of git am. Errors for it can show an argument that was not actually given on the command line: $ git am --show-current-patch --show-current-patch=diff error: options '--show-current-patch=diff' and '--show-current-patch=raw' cannot be used together The options --show-current-patch and --show-current-patch=raw are synonyms, but the error accuses the user of input they did not actually made. Or it can awkwardly print a NULL pointer: $ git am --show-current-patch=diff --show-current-patch error: options '--show-current-patch=(null)' and '--show-current-patch=diff' cannot be used together The reasons for these shortcomings is that the current code checks incompatibility only when encountering a PARSE_OPT_CMDMODE option at the command line, and that it searches the previous incompatible option by value. Fix the first two points by checking all PARSE_OPT_CMDMODE variables after parsing each option and by storing all relevant details if their value changed. Do that whether or not the changing options has the flag PARSE_OPT_CMDMODE set. Report an incompatibility only if two options change the variable to different values and at least one of them is a PARSE_OPT_CMDMODE option. This changes the output of the first three examples above to: $ t/helper/test-tool parse-options --set23 --mode1 error: --mode1 is incompatible with --set23 $ t/helper/test-tool parse-options --mode1 --set23 error: --set23 is incompatible with --mode1 $ git am --resolved --quit error: --quit is incompatible with --resolved Store the argument of PARSE_OPT_CMDMODE options of type OPTION_CALLBACK as well to allow taking over the responsibility for compatibility checking from the callback function. The next patch will use this capability to fix the messages for git am --show-current-patch. Use a linked list for storing the PARSE_OPT_CMDMODE variables. This somewhat outdated data structure is simple and suffices, as the number of elements per command is currently only zero or one. We do support multiple different command modes variables per command, but I don't expect that we'd ever use a significant number of them. Once we do we can switch to a hashmap. Since we no longer need to search the conflicting option, the all_opts parameter of get_value() is no longer used. Remove it. Extend the tests to check for both conflicting option names, but don't insist on a particular order. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-26	send-email: move validation code below process_address_list	Michael Strawbridge	2	-28/+51
	Move validation logic below processing of email address lists so that email validation gets the proper email addresses. As a side effect, some initialization needed to be moved down. In order for validation and the actual email sending to have the same initial state, the initialized variables that get modified by pre_process_file are encapsulated in a new function. This fixes email address validation errors when the optional perl module Email::Valid is installed and multiple addresses are passed in on a single to/cc argument like --to=foo@example.com,bar@example.com. A new test was added to t9001 to expose failures with this case in the future. Reported-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Michael Strawbridge <michael.strawbridge@amd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-25	SubmittingPatches: call gitk's command "Copy commit reference"	Andrei Rybak	1	-1/+1
	Documentation/SubmittingPatches informs the contributor that gitk's context menu command "Copy commit summary" can be used to obtain the conventional format of referencing existing commits. This command in gitk was renamed to "Copy commit reference" in commit [1], following implementation of Git's "reference" pretty format in [2]. Update mention of this gitk command in Documentation/SubmittingPatches to its new name. [1] b8b60957ce (gitk: rename "commit summary" to "commit reference", 2019-12-12) [2] commit 1f0fc1d (pretty: implement 'reference' format, 2019-11-20) Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-23	The twenty-first batch	Junio C Hamano	1	-0/+17
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-23	doc/git-bisect: clarify `git bisect run` syntax	Javier Mora	2	-4/+4
	The description of the `git bisect run` command syntax at the beginning of the manpage is `git bisect run <cmd>...`, which isn't quite clear about what `<cmd>` is or what the `...` mean; one could think that it is the whole (quoted) command line with all arguments in a single string, or that it supports multiple commands, or that it doesn't accept commands with arguments at all. Change to `git bisect run <cmd> [<arg>...]` to clarify the syntax, in both the manpage and the `git bisect -h` command output. Additionally, change `--term-{new,bad}` et al to `--term-(new\|bad)` for consistency with the synopsis syntax conventions. Signed-off-by: Javier Mora <cousteaulecommandant@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-23	builtin/branch.c: adjust error messages to coding guidelines	Isoken June Ibizugbe	4	-47/+47
	As per the CodingGuidelines document, it is recommended that error messages such as die(), error() and warning(), should start with a lowercase letter and should not end with a period. This patch adjusts tests to match updated messages. Signed-off-by: Isoken June Ibizugbe <isokenjune@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-22	merge-ort.c: fix typo 'neeed' to 'needed'	王常新	1	-1/+1
	Signed-off-by: 王常新 <wchangxin824@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-21	The twentieth batch	Junio C Hamano	1	-0/+12
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-21	git-push doc: more visibility for -q option	Michal Suchanek	1	-1/+1
	The "-v" option is shown in the SYNOPSIS section near the top, but "-q" is not shown anywhere there. List "-q" alongside "-v". Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20	rebase: move parse_opt_keep_empty() down	Oswald Buddenhagen	1	-13/+12
	This moves it right next to parse_opt_empty(), which is a much more logical place. As a side effect, this removes the need for a forward declaration of imply_merge(). Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20	rebase: handle --strategy via imply_merge() as well	Oswald Buddenhagen	1	-12/+1
	At least after the successive trimming of enum rebase_type mentioned in the previous commit, this code did exactly what imply_merge() does, so just call it instead. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20	rebase: simplify code related to imply_merge()	Oswald Buddenhagen	1	-5/+1
	The code's evolution left in some bits surrounding enum rebase_type that don't really make sense any more. In particular, it makes no sense to invoke imply_merge() if the type is already known not to be REBASE_APPLY, and it makes no sense to assign the type after calling imply_merge(). enum rebase_type had more values until commit a74b35081c ("rebase: drop support for `--preserve-merges`") and commit 10cdb9f38a ("rebase: rename the two primary rebase backends"). The latter commit also renamed imply_interactive() to imply_merge(). Signed-off-by: Oswald Buddenhagen <oswald.buddenhagen@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20	send-email: handle to/cc/bcc from --compose message	Jeff King	3	-12/+31
	If the user writes a message via --compose, send-email will pick up various headers like "From", "Subject", etc and use them for other patches as if they were specified on the command-line. But we don't handle "To", "Cc", or "Bcc" this way; we just tell the user "those aren't interpeted yet" and ignore them. But it seems like an obvious thing to want, especially as the same feature exists when the cover letter is generated separately by format-patch. There it is gated behind the --to-cover option, but I don't think we'd need the same control here; since we generate the --compose template ourselves based on the existing input, if the user leaves the lines unchanged then the behavior remains the same. So let's fill in the implementation; like those other headers we already handle, we just need to assign to the initial_* variables. The only difference in this case is that they are arrays, so we'll feed them through parse_address_line() to split them (just like we would when reading a single string via prompting). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20	Revert "send-email: extract email-parsing code into a subroutine"	Jeff King	2	-80/+75
	This reverts commit b6049542b97e7b135e0e82bf996084d461224d32. Prior to that commit, we read the results of the user editing the "--compose" message in a loop, picking out parts we cared about, and streaming the result out to a ".final" file. That commit split the reading/interpreting into two phases; we'd now read into a hash, and then pick things out of the hash. The goal was making the code more readable. And in some ways it did, because the ugly regexes are confined to the reading phase. But it also introduced several bugs, because now the two phases need to match each other. In particular: - we pick out headers like "Subject: foo" with a case-insensitive regex, and then use the user-provided header name as the key in a case-sensitive hash. So if the user wrote "subject: foo", we'd no longer recognize it as a subject. - the namespace for the hash keys conflates header names with meta information like "body". If you put "body: foo" in your message, it would be misinterpreted as the actual message body (nobody is likely to do that in practice, but it seems like an unnecessary danger). - the handling for to/cc/bcc is totally broken. The behavior before that commit is to recognize and skip those headers, with a note to the user that they are not yet handled. Not great, but OK. But after the patch, the reading side now splits the addresses into a perl array-ref. But the interpreting side doesn't handle this at all, and blindly prints the stringified array-ref value. This leads to garbage like: (mbox) Adding to: ARRAY (0x555b4345c428) from line 'To: ARRAY(0x555b4345c428)' error: unable to extract a valid address from: ARRAY (0x555b4345c428) What to do with this address? ([q]uit\|[d]rop\|[e]dit): Probably not a huge deal, since nobody should even try to use those headers in the first place (since they were not implemented). But the new behavior is worse, and indicative of the sorts of problems that come from having the two layers. The revert had a few conflicts, due to later work in this area from 15dc3b9161 (send-email: rename variable for clarity, 2018-03-04) and d11c943c78 (send-email: support separate Reply-To address, 2018-03-04). I've ported the changes from those commits over as part of the conflict resolution. The new tests show the bugs. Note the use of GIT_SEND_EMAIL_NOTTY in the second one. Without it, the test is happy to reach outside the test harness to the developer's actual terminal (when run with the buggy state before this patch). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20	doc/send-email: mention handling of "reply-to" with --compose	Jeff King	1	-5/+5
	The documentation for git-send-email lists the headers handled specially by --compose in a way that implies that this is the complete set of headers that are special. But one more was added by d11c943c78 (send-email: support separate Reply-To address, 2018-03-04) and never documented. Let's add it, and reword the documentation slightly to avoid having to specify the list of headers twice (as it is growing and will continue to do so as we add new features). If you read the code, you may notice that we also handle MIME-Version specially, in that we'll avoid over-writing user-provided MIME headers. I don't think this is worth mentioning, as it's what you'd expect to happen (as opposed to the other headers, which are picked up to be used in later emails). And certainly this feature existed when the documentation was expanded in 01d3861217 (git-send-email.txt: describe --compose better, 2009-03-16), and we chose not to mention it then. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-20	grep: die gracefully when outside repository	Kristoffer Haugsbakk	2	-1/+33
	Die gracefully when `git grep --no-index` is run outside of a Git repository and the path is outside the directory tree. If you are not in a Git repository and say: git grep --no-index search .. You trigger a `BUG`: BUG: environment.c:213: git environment hasn't been setup Aborted (core dumped) Because `..` is a valid path which is treated as a pathspec. Then `pathspec` figures out that it is not in the current directory tree. The `BUG` is triggered when `pathspec` tries to advise the user about how the path is not in the current (non-existing) repository. Reported-by: ks1322 ks1322 <ks1322@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-19	git-p4 shouldn't attempt to store symlinks in LFS	Matthew McClain	1	-0/+4
	git-p4.py would attempt to put a symlink in LFS if its file extension matched git-p4.largeFileExtensions. Git LFS doesn't store symlinks because smudge/clean filters don't handle symlinks. They never get passed to the filter process nor the smudge/clean filters, nor could that occur without a change to the protocol or command-line interface. Unless Git learned how to send them to the filters, Git LFS would have a hard time using them in any useful way. Git LFS's goal is to move large files out of the repository history, and symlinks are functionally limited to 4 KiB or a similar size on most systems. Signed-off-by: Matthew McClain <mmcclain@noprivs.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-19	t7601: use "test_path_is_file" etc. instead of "test -f"	Dorcas AnonoLitunya	1	-12/+12
	Some tests in t7601 use "test -f" and "test ! -f" to see if a path exists or is missing. Use test_path_is_file and test_path_is_missing helper functions to clarify these tests a bit better. This especially matters for the "missing" case because "test ! -f F" will be happy if "F" exists as a directory, but the intent of the test is that "F" should not exist, even as a directory. The updated code expresses this better. Signed-off-by: Dorcas AnonoLitunya <anonolitunya@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-19	am: align placeholder for --whitespace option with apply	Junio C Hamano	1	-2/+2
	`git am` passes the value given to its `--whitespace` option through to the underlying `git apply`, and the value is called <action> over there. Fix the documentation for the command that calls the value <option> to say <action> instead. Note that the option help given by `git am -h` already calls the value <action>, so there is no need to make a matching change there. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-18	The nineteenth batch	Junio C Hamano	1	-1/+9
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-18	commit: do not use cryptic "new_index" in end-user facing messages	Junio C Hamano	1	-4/+4
	These error messages say "new_index" as if that spelling has some significance to the end users (e.g. the file "$GIT_DIR/new_index" has some issues), but that is not the case at all. The i18n folks were made to include the word literally in the translated messages, which was not a good idea at all. Spell it "new index", as we are just telling the users that we failed to create a new index file. The term is expected to be translated to the end-users' languages, not left as if it were a literal file name. This dates all the way back to the first re-implemenation of "git commit" command in C (the scripted version did not have such wording in its error messages), in f5bbc322 (Port git commit to C., 2007-11-08). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-17	builtin/add.c: clean up die() messages	Naomi Ibe	1	-5/+5
	As described in the CodingGuidelines document, a single line message given to die() and its friends should not capitalize its first word, and should not add full-stop at the end. Signed-off-by: Naomi Ibe <naomi.ibeh69@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-16	doc/git-repack: don't mention nonexistent "--unpacked" option	Patrick Steinhardt	1	-5/+2
	The documentation for geometric repacking mentions a "--unpacked" option that supposedly changes how loose objects are rolled up. This option has never existed, and the implied behaviour, namely to include all unpacked objects into the resulting packfile, is in fact the default behaviour. Correct the documentation to not mention this option. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-16	doc/git-repack: fix syntax for `-g` shorthand option	Patrick Steinhardt	1	-1/+1
	The `-g` switch is a shorthand for `--geometric=` and allows the user to specify the geometric. The documentation is wrong though and indicates that the syntax for the shorthand is `-g=<factor>`. In fact though, the option must be specified without the equals sign via `-g<factor>`. Fix the syntax accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-14	t5319: make corrupted large-offset test more robust	Jeff King	1	-2/+4
	The test t5319.88 ("reader bounds-checks large offset table") can fail intermittently. The failure mode looks like this: 1. An earlier test sets up "objects64", a directory that can be used to produce a midx with a corrupted large-offsets table. To get the large offsets, it corrupts the normal ".idx" file to have a fake large offset, and then builds a midx from that. That midx now has a large offset table, which is what we want. But we also have a .idx on disk that has a corrupted entry. We'll call the object with the corrupted large-offset "X". 2. In t5319.88, we further corrupt the midx by reducing the size of the large-offset chunk (because our goal is to make sure we do not do an out-of-bounds read on it). 3. We then enumerate all of the objects with "cat-file --batch-check --batch-all-objects", expecting to see a complaint when we try to show object X. We use --batch-all-objects because our objects64 repo doesn't actually have any refs (but if we check them all, one of them will be the failing one). The default batch-check format includes %(objecttype) and %(objectsize), both of which require us to access the actual pack data (and thus requires looking at the offset). 4a. Usually, this succeeds. We try to output object X, do a lookup via the midx for the type/size lookup, and run into the corrupt large-offset table. 4b. But sometimes we hit a different error. If another object points to X as a delta base, then trying to find the type of that object requires walking the delta chain to the base entry (since only the base has the concrete type; deltas themselves are either OFS_DELTA or REF_DELTA). Normally this would not require separate offset lookups at all, as deltas are usually stored as OFS_DELTA, specifying the relative offset to the base. But the corrupt idx created in step 1 is done directly with "git pack-objects" and does not pass the --delta-base-offset option, meaning we have REF_DELTA entries! Those do have to consult an index to find the location of the base object, and they use the pack .idx to do this. The same pack .idx that we know is corrupted from step 1! Git does notice the error, but it does so by seeing the corrupt .idx file, not the corrupt midx file, and the error it reports is different, causing the test to fail. The set of objects created in the test is deterministic. But the delta selection seems not to be (which is not too surprising, as it is multi-threaded). I have seen the failure in Windows CI but haven't reproduced it locally (not even with --stress). Re-running a failed Windows CI job tends to work. But when I download and examine the trash directory from a failed run, it shows a different set of deltas than I get locally. But the exact source of non-determinism isn't that important; our test should be robust against any order. There are a few options to fix this: a. It would be OK for the "objects64" setup to "unbreak" the .idx file after generating the midx. But then it would be hard for subsequent tests to reuse it, since it is the corrupted idx that forces the midx to have a large offset table. b. The "objects64" setup could use --delta-base-offset. This would fix our problem, but earlier tests have many hard-coded offsets. Using OFS_DELTA would change the locations of objects in the pack (this might even be OK because I think most of the offsets are within the .idx file, but it seems brittle and I'm afraid to touch it). c. Our cat-file output is in oid order by default. Since we store bases before deltas, if we went in pack order (using the "--unordered" flag), we'd always see our corrupt X before any delta which depends on it. But using "--unordered" means we skip the midx entirely. That makes sense, since it is just enumerating all of the packs, using the offsets found in their .idx files directly. So it doesn't work for our test. d. We could ask directly about object X, rather than enumerating all of them. But that requires further hard-coding of the oid (both sha1 and sha256) of object X. I'd prefer not to introduce more brittleness. e. We can use a --batch-check format that looks at the pack data, but doesn't have to chase deltas. The problem in this case is %(objecttype), which has to walk to the base. But %(objectsize) does not; we can get the value directly from the delta itself. Another option would be %(deltabase), where we report the REF_DELTA name but don't look at its data. I've gone with option (e) here. It's kind of subtle, but it's simple and has no side effects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-13	The eighteenth batch	Junio C Hamano	1	-0/+21
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-13	Prevent git from rehashing 4GiB files	Jason Hatton	2	-2/+34
	The index stores file sizes using a uint32_t. This causes any file that is a multiple of 2^32 to have a cached file size of zero. Zero is a special value used by racily clean. This causes git to rehash every file that is a multiple of 2^32 every time git status or git commit is run. This patch mitigates the problem by making all files that are a multiple of 2^32 appear to have a size of 1<<31 instead of zero. The value of 1<<31 is chosen to keep it as far away from zero as possible to help prevent things getting mixed up with unpatched versions of git. An example would be to have a 2^32 sized file in the index of patched git. Patched git would save the file as 2^31 in the cache. An unpatched git would very much see the file has changed in size and force it to rehash the file, which is safe. The file would have to grow or shrink by exactly 2^31 and retain all of its ctime, mtime, and other attributes for old git to not notice the change. This patch does not change the behavior of any file that is not an exact multiple of 2^32. Signed-off-by: Jason D. Hatton <jhatton@globalfinishing.com> Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-13	t: add a test helper to truncate files	brian m. carlson	4	-0/+28
	In a future commit, we're going to work with some large files which will be at least 4 GiB in size. To take advantage of the sparseness functionality on most Unix systems and avoid running the system out of disk, it would be convenient to use truncate(2) to simply create a sparse file of sufficient size. However, the GNU truncate(1) utility isn't portable, so let's write a tiny test helper that does the work for us. Signed-off-by: brian m. carlson <bk2204@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-13	attr: add attr.tree for setting the treeish to read attributes from	John Cai	6	-0/+95
	44451a2 (attr: teach "--attr-source=<tree>" global option to "git", 2023-05-06) provided the ability to pass in a treeish as the attr source. In the context of serving Git repositories as bare repos like we do at GitLab however, it would be easier to point --attr-source to HEAD for all commands by setting it once. Add a new config attr.tree that allows this. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-13	attr: read attributes from HEAD when bare repo	John Cai	3	-2/+23
	The motivation for 44451a2e5e (attr: teach "--attr-source=<tree>" global option to "git" , 2023-05-06), was to make it possible to use gitattributes with bare repositories. To make it easier to read gitattributes in bare repositories however, let's just make HEAD:.gitattributes the default. This is in line with how mailmap works, 8c473cecfd (mailmap: default mailmap.blob in bare repositories, 2012-12-13). Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-12	The seventeenth batch	Junio C Hamano	1	-0/+4
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-12	mailmap: change primary address for Derrick Stolee	Derrick Stolee	1	-3/+3
	The previous primary address is no longer valid. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-12	grep: -f <path> is relative to $cwd	Junio C Hamano	2	-2/+24
	Just like OPT_FILENAME() does, "git grep -f <path>" should treat the <path> relative to the original $cwd by paying attention to the prefix the command is given. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-12	stash: be careful what we store	Junio C Hamano	2	-0/+10
	"git stash store" is meant to store what "git stash create" produces, as these two are implementation details of the end-user facing "git stash save" command. Even though it is clearly documented as such, users would try silly things like "git stash store HEAD" to render their stash unusable. Worse yet, because "git stash drop" does not allow such a stash entry to be removed, "git stash clear" would be the only way to recover from such a mishap. Reuse the logic that allows "drop" to refrain from working on such a stash entry to teach "store" to avoid storing an object that is not a stash entry in the first place. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-11	merge: introduce {copy\|clear}_merge_options()	Junio C Hamano	3	-1/+22
	When mostly the same set of options are to be used to perform multiple merges, one instance of the merge_options structure may want to be created and used by copying from the same template instance. We saw such a use recently in "git merge-tree". Let's make the pattern official by introducing copy_merge_options() as a supported way to make a copy of the structure, and also give clear_merge_options() to release any resources held by a copied instance. Currently we only make a shallow copy, so the former is a mere structure assignment while the latter is a no-op, but this may change in the future as the members of merge_options structure evolve. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	The sixteenth batch	Junio C Hamano	1	-0/+7
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	doc/git-worktree: mention "refs/rewritten" as per-worktree refs	Patrick Steinhardt	1	-3/+4
	Some references are special in the context of worktrees as they are considered to be per-worktree instead of shared across all of the worktrees. Most importantly, this includes "refs/worktree/" that have explicitly been designed such that users can create per-woorktree refs. But there are also special references that have an associated meaning like "refs/bisect/", which is used to track state of git-bisect(1). These special per-worktree references are documented in git-worktree(1), but one instance is missing. In a9be29c9817 (sequencer: make refs generated by the `label` command worktree-local, 2018-04-25), we have converted "refs/rewritten/" to be a per-worktree reference as well. These references are used by our sequencer infrastructure to generate labels for rebased commits. So in order to allow for multiple concurrent rebases to happen in different worktrees, these references need to be tracked per worktree. We forgot to update our documentation to mention these new per-worktree references, which is fixed by this patch. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	chunk-format: drop pair_chunk_unsafe()	Jeff King	2	-21/+0
	There are no callers left, and we don't want anybody to add new ones (they should use the not-unsafe version instead). So let's drop the function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: detect out-of-order BIDX offsets	Jeff King	2	-0/+23
	The BIDX chunk tells us the offsets at which each commit's Bloom filters can be found in the BDAT chunk. We compute the length of each filter by checking the offsets of neighbors and subtracting them. If the offsets are out of order, then we'll get a negative length, which we then store as a very large unsigned value. This can cause us to read out-of-bounds memory, as we access the hash data modulo "filter->len * BITS_PER_WORD". We can easily detect this case when loading the individual filters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: check bounds when accessing BIDX chunk	Jeff King	2	-2/+23
	We load the bloom_filter_indexes chunk using pair_chunk(), so we have no idea how big it is. This can lead to out-of-bounds reads if it is smaller than expected, since we index it based on the number of commits found elsewhere in the graph file. We can check the chunk size up front, like we do for CDAT and other chunks with one fixed-size record per commit. The test case demonstrates the problem. It actually won't segfault, because we end up reading random data from the follow-on chunk (BDAT in this case), and the bounds checks added in the previous patch complain. But this is by no means assured, and you can craft a commit-graph file with BIDX at the end (or a smaller BDAT) that does segfault. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: check bounds when accessing BDAT chunk	Jeff King	4	-0/+63
	When loading Bloom filters from a commit-graph file, we use the offset values in the BIDX chunk to index into the memory mapped for the BDAT chunk. But since we don't record how big the BDAT chunk is, we just trust that the BIDX offsets won't cause us to read outside of the chunk memory. A corrupted or malicious commit-graph file will cause us to segfault (in practice this isn't a very interesting attack, since commit-graph files are local-only, and the worst case is an out-of-bounds read). We can't fix this by checking the chunk size during parsing, since the data in the BDAT chunk doesn't have a fixed size (that's why we need the BIDX in the first place). So we'll fix it in two parts: 1. Record the BDAT chunk size during parsing, and then later check that the BIDX offsets we look up are within bounds. 2. Because the offsets are relative to the end of the BDAT header, we must also make sure that the BDAT chunk is at least as large as the expected header size. Otherwise, we overflow when trying to move past the header, even for an offset of "0". We can check this early, during the parsing stage. The error messages are rather verbose, but since this is not something you'd expect to see outside of severe bugs or corruption, it makes sense to err on the side of too many details. Sadly we can't mention the filename during the chunk-parsing stage, as we haven't set g->filename at this point, nor passed it down through the stack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: bounds-check generation overflow chunk	Jeff King	3	-3/+18
	If the generation entry in a commit-graph doesn't fit, we instead insert an offset into a generation overflow chunk. But since we don't record the size of the chunk, we may read outside the chunk if the offset we find on disk is malicious or corrupted. We can't check the size of the chunk up-front; it will vary based on how many entries need overflow. So instead, we'll do a bounds-check before accessing the chunk memory. Unfortunately there is no error-return from this function, so we'll just have to die(), which is what it does for other forms of corruption. As with other cases, we can drop the st_mult() call, since we know our bounds-checked value will fit within a size_t. Before this patch, the test here actually "works" because we read garbage data from the next chunk. And since that garbage data happens not to provide a generation number which changes the output, it appears to work. We could construct a case that actually segfaults or produces wrong output, but it would be a bit tricky. For our purposes its sufficient to check that we've detected the bounds error. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: check size of generations chunk	Jeff King	2	-2/+20
	We neither check nor record the size of the generations chunk we parse from a commit-graph file. This should have one uint32_t for each commit in the file; if it is smaller (due to corruption, etc), we may read outside the mapped memory. The included test segfaults without this patch, as it shrinks the size considerably (and the chunk is near the end of the file, so we read off the end of the array rather than accidentally reading another chunk). We can fix this by checking the size up front (like we do for other fixed-size chunks, like CDAT). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: bounds-check base graphs chunk	Jeff King	3	-1/+22
	When we are loading a commit-graph chain, we check that each slice of the chain points to the appropriate set of base graphs via its BASE chunk. But since we don't record the size of the chunk, we may access out-of-bounds memory if the file is corrupted. Since we know the number of entries we expect to find (based on the position within the commit-graph-chain file), we can just check the size up front. In theory this would also let us drop the st_mult() call a few lines later when we actually access the memory, since we know that the computed offset will fit in a size_t. But because the operands "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to cast to size_t first. Leaving the st_mult() does that cast, and makes it more obvious that we don't have an overflow problem. Note that the test does not actually segfault before this patch, since it just reads garbage from the chunk after BASE (and indeed, it even rejects the file because that garbage does not have the expected hash value). You could construct a file with BASE at the end that did segfault, but corrupting the existing one is easy, and we can check stderr for the expected message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: detect out-of-bounds extra-edges pointers	Jeff King	3	-6/+23
	If an entry in a commit-graph file has more than 2 parents, the fixed-size parent fields instead point to an offset within an "extra edges" chunk. We blindly follow these, assuming that the chunk is present and sufficiently large; this can lead to an out-of-bounds read for a corrupt or malicious file. We can fix this by recording the size of the chunk and adding a bounds-check in fill_commit_in_graph(). There are a few tricky bits: 1. We'll switch from working with a pointer to an offset. This makes some corner cases just fall out naturally: a. If we did not find an EDGE chunk at all, our size will correctly be zero (so everything is "out of bounds"). b. Comparing "size / 4" lets us make sure we have at least 4 bytes to read, and we never compute a pointer more than one element past the end of the array (computing a larger pointer is probably OK in practice, but is technically undefined behavior). c. The current code casts to "uint32_t ". Replacing it with an offset avoids any comparison between different types of pointer (since the chunk is stored as "unsigned char "). 2. This is the first case in which fill_commit_in_graph() may return anything but success. We need to make sure to roll back the "parsed" flag (and any parents we might have added before running out of buffer) so that the caller can cleanly fall back to loading the commit object itself. It's a little non-trivial to do this, and we might benefit from factoring it out. But we can wait on that until we actually see a second case where we return an error. As a bonus, this lets us drop the st_mult() call. Since we've already done a bounds check, we know there won't be any integer overflow (it would imply our buffer is larger than a size_t can hold). The included test does not actually segfault before this patch (though you could construct a case where it does). Instead, it reads garbage from the next chunk which results in it complaining about a bogus parent id. This is sufficient for our needs, though (we care that the fallback succeeds, and that stderr mentions the out-of-bounds read). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-10	commit-graph: check size of commit data chunk	Jeff King	2	-1/+20
	We expect a commit-graph file to have a fixed-size data record for each commit in the file (and we know the number of commits to expct from the size of the lookup table). If we encounter a file where this is too small, we'll look past the end of the chunk (and possibly even off the mapped memory). We can fix this by checking the size up front when we record the pointer. The included test doesn't segfault, since it ends up reading bytes from another chunk. But it produces nonsense results, since the values it reads are garbage. Our test notices this by comparing the output to a non-corrupted run of the same command (and of course we also check that the expected error is printed to stderr). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>