summaryrefslogtreecommitdiffstats
path: root/pathspec.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2022-08-25The fifteenth batchJunio C Hamano1-0/+16
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18The fourteenth batchJunio C Hamano1-0/+24
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18merge-ort: provide helpful submodule update message when possibleElijah Newren1-11/+7
In commit 4057523a40 ("submodule merge: update conflict error message", 2022-08-04), a more detailed message was provided when submodules conflict, in order to help users know how to resolve those conflicts. There were a couple situations for which a different message would be more appropriate, but that commit left handling those for future work. Unfortunately, that commit would check if any submodules were of the type that it didn't know how to explain, and, if so, would avoid providing the more detailed explanation even for the submodules it did know how to explain. Change this to have the code print the helpful messages for the subset of submodules it knows how to explain. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18merge-ort: avoid surprise with new sub_flag variableElijah Newren1-1/+1
Commit 4057523a40 ("submodule merge: update conflict error message", 2022-08-04) added a sub_flag variable that is used to store a value from enum conflict_and_info_types, but initializes it with a value of -1 that does not correspond to any of the conflict_and_info_types. The code may never set it to a valid value and yet still use it, which can be surprising when reading over the code at first. Initialize it instead to the generic CONFLICT_SUBMODULE_FAILED_TO_MERGE value, which is still distinct from the two values we need to special case. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18merge-ort: remove translator lego in new "submodule conflict suggestion"Elijah Newren1-60/+28
In commit 4057523a40 ("submodule merge: update conflict error message", 2022-08-04), the new "submodule conflict suggestion" code was translating 6 different pieces of the new message and then used carefully crafted logic to allow stitching it back together with special formatting. Keep the components of the message together as much as possible, so that: * we reduce the number of things translators have to translate * translators have more control over the format of the output * the code is much easier for developers to understand too Also, reformat some comments running beyond the 80th column while at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-17pipe_command(): mark stdin descriptor as non-blockingJeff King2-0/+23
Our pipe_command() helper lets you both write to and read from a child process on its stdin/stdout. It's supposed to work without deadlocks because we use poll() to check when descriptors are ready for reading or writing. But there's a bug: if both the data to be written and the data to be read back exceed the pipe buffer, we'll deadlock. The issue is that the code assumes that if you have, say, a 2MB buffer to write and poll() tells you that the pipe descriptor is ready for writing, that calling: write(cmd->in, buf, 2*1024*1024); will do a partial write, filling the pipe buffer and then returning what it did write. And that is what it would do on a socket, but not for a pipe. When writing to a pipe, at least on Linux, it will block waiting for the child process to read() more. And now we have a potential deadlock, because the child may be writing back to us, waiting for us to read() ourselves. An easy way to trigger this is: git -c add.interactive.useBuiltin=true \ -c interactive.diffFilter=cat \ checkout -p HEAD~200 The diff against HEAD~200 will be big, and the filter wants to write all of it back to us (obviously this is a dummy filter, but in the real world something like diff-highlight would similarly stream back a big output). If you set add.interactive.useBuiltin to false, the problem goes away, because now we're not using pipe_command() anymore (instead, that part happens in perl). But this isn't a bug in the interactive code at all. It's the underlying pipe_command() code which is broken, and has been all along. We presumably didn't notice because most calls only do input _or_ output, not both. And the few that do both, like gpg calls, may have large inputs or outputs, but never both at the same time (e.g., consider signing, which has a large payload but a small signature comes back). The obvious fix is to put the descriptor into non-blocking mode, and indeed, that makes the problem go away. Callers shouldn't need to care, because they never see the descriptor (they hand us a buffer to feed into it). The included test fails reliably on Linux without this patch. Curiously, it doesn't fail in our Windows CI environment, but has been reported to do so for individual developers. It should pass in any environment after this patch (courtesy of the compat/ layers added in the last few commits). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-17pipe_command(): handle ENOSPC when writing to a pipeJeff King1-1/+2
When write() to a non-blocking pipe fails because the buffer is full, POSIX says we should see EAGAIN. But our mingw_write() compat layer on Windows actually returns ENOSPC for this case. This is probably something we want to correct, but given that we don't plan to use non-blocking descriptors in a lot of places, we can work around it by just catching ENOSPC alongside EAGAIN. If we ever do fix mingw_write(), then this patch can be reverted. We don't actually use a non-blocking pipe yet, so this is still just preparation. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-17pipe_command(): avoid xwrite() for writing to pipeJeff King1-5/+17
If xwrite() sees an EAGAIN response, it will loop forever until the write succeeds (or encounters a real error). This is due to ef1cf0167a (xwrite: poll on non-blocking FDs, 2016-06-26), with the idea that we won't be surprised by a descriptor unexpectedly set as non-blocking. But that will make things awkward when we do want a non-blocking descriptor, and a future patch will switch pipe_command() to using one. In that case, looping on EAGAIN is bad, because the process on the other end of the pipe may be waiting on us before doing another read() on the pipe, which would mean we deadlock. In practice we're not supposed to ever see EAGAIN here, since poll() will have just told us the descriptor is ready for writing. But our Windows emulation of poll() will always return "ready" for writing to a pipe descriptor! This is due to 94f4d01932 (mingw: workaround for hangs when sending STDIN, 2020-02-17). Our best bet in that case is to keep handling other descriptors, as any read() we do may allow the child command to make forward progress (i.e., its write() finishes, and then it read()s from its stdin, freeing up space in the pipe buffer). This means we might busy-loop between poll() and write() on Windows if the child command is slow to read our input, but it's much better than the alternative of deadlocking. In practice, this busy-looping should be rare: - for small inputs, we'll just write the whole thing in a single write() anyway, non-blocking or not - for larger inputs where the child reads input and then processes it before writing (e.g., gpg verifying a signature), we may make a few extra write() calls that get EAGAIN during the initial write, but once it has taken in the whole input, we'll correctly block waiting to read back the data. - for larger inputs where the child process is streaming output back (like a diff filter), we'll likewise see some extra EAGAINs, but most of them will be followed immediately by a read(), which will let the child command make forward progress. Of course it won't happen at all for now, since we don't yet use a non-blocking pipe. This is just preparation for when we do. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-17git-compat-util: make MAX_IO_SIZE define globally availableJeff King2-22/+22
We define MAX_IO_SIZE within wrapper.c, but it's useful for any code that wants to do a raw write() for whatever reason (say, because they want different EAGAIN handling). Let's make it available everywhere. The alternative would be adding xwrite_foo() variants to give callers more options. But there's really no reason MAX_IO_SIZE needs to be abstracted away, so this give callers the most flexibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-17nonblock: support WindowsRené Scharfe1-0/+27
Implement enable_pipe_nonblock() using the Windows API. This works only for pipes, but that is sufficient for this limited interface. Despite the API calls used, it handles both "named" and anonymous pipes from our pipe() emulation. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-17compat: add function to enable nonblocking pipesJeff King3-0/+33
We'd like to be able to make some of our pipes nonblocking so that poll() can be used effectively, but O_NONBLOCK isn't portable. Let's introduce a compat wrapper so this can be abstracted for each platform. The interface is as narrow as possible to let platforms do what's natural there (rather than having to implement fcntl() and a fake O_NONBLOCK for example, or having to handle other types of descriptors). The next commit will add Windows support, at which point we should be covering all platforms in practice. But if we do find some other platform without O_NONBLOCK, we'll return ENOSYS. Arguably we could just trigger a build-time #error in this case, which would catch the problem earlier. But since we're not planning to use this compat wrapper in many code paths, a seldom-seen runtime error may be friendlier for such a platform than blocking compilation completely. Our test suite would still notice it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-15fetch-pack: add tracing for negotiation roundsJosh Steadmon5-7/+75
Currently, negotiation for V0/V1/V2 fetch have trace2 regions covering the entire negotiation process. However, we'd like additional data, such as timing for each round of negotiation or the number of "haves" in each round. Additionally, "independent negotiation" (AKA push negotiation) has no tracing at all. Having this data would allow us to compare the performance of the various negotation implementations, and to debug unexpectedly slow fetch & push sessions. Add per-round trace2 regions for all negotiation implementations (V0+V1, V2, and independent negotiation), as well as an overall region for independent negotiation. Add trace2 data logging for the number of haves and "in vain" objects for each round, and for the total number of rounds once negotiation completes. Finally, add a few checks into various tests to verify that the number of rounds is logged as expected. Signed-off-by: Josh Steadmon <steadmon@google.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-15The thirteenth batchJunio C Hamano1-0/+12
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-15is_promisor_object(): fix use-after-free of tree bufferJeff King2-2/+20
Since commit fcc07e980b (is_promisor_object(): free tree buffer after parsing, 2021-04-13), we'll always free the buffers attached to a "struct tree" after searching them for promisor links. But there's an important case where we don't want to do so: if somebody else is already using the tree! This can happen during a "rev-list --missing=allow-promisor" traversal in a partial clone that is missing one or more trees or blobs. The backtrace for the free looks like this: #1 free_tree_buffer tree.c:147 #2 add_promisor_object packfile.c:2250 #3 for_each_object_in_pack packfile.c:2190 #4 for_each_packed_object packfile.c:2215 #5 is_promisor_object packfile.c:2272 #6 finish_object__ma builtin/rev-list.c:245 #7 finish_object builtin/rev-list.c:261 #8 show_object builtin/rev-list.c:274 #9 process_blob list-objects.c:63 #10 process_tree_contents list-objects.c:145 #11 process_tree list-objects.c:201 #12 traverse_trees_and_blobs list-objects.c:344 [...] We're in the middle of walking through the entries of a tree object via process_tree_contents(). We see a blob (or it could even be another tree entry) that we don't have, so we call is_promisor_object() to check it. That function loops over all of the objects in the promisor packfile, including the tree we're currently walking. When we're done with it there, we free the tree buffer. But as we return to the walk in process_tree_contents(), it's still holding on to a pointer to that buffer, via its tree_desc iterator, and it accesses the freed memory. Even a trivial use of "--missing=allow-promisor" triggers this problem, as the included test demonstrates (it's just a vanilla --blob:none clone). We can detect this case by only freeing the tree buffer if it was allocated on our behalf. This is a little tricky since that happens inside parse_object(), and it doesn't tell us whether the object was already parsed, or whether it allocated the buffer itself. But by checking for an already-parsed tree beforehand, we can distinguish the two cases. That feels a little hacky, and does incur an extra lookup in the object-hash table. But that cost is fairly minimal compared to actually loading objects (and since we're iterating the whole pack here, we're likely to be loading most objects, rather than reusing cached results). It may also be a good direction for this function in general, as there are other possible optimizations that rely on doing some analysis before parsing: - we could detect blobs and avoid reading their contents; they can't link to other objects, but parse_object() doesn't know that we don't care about checking their hashes. - we could avoid allocating object structs entirely for most objects (since we really only need them in the oidset), which would save some memory. - promisor commits could use the commit-graph rather than loading the object from disk This commit doesn't do any of those optimizations, but I think it argues that this direction is reasonable, rather than relying on parse_object() and trying to teach it to give us more information about whether it parsed. The included test fails reliably under SANITIZE=address just when running "rev-list --missing=allow-promisor". Checking the output isn't strictly necessary to detect the bug, but it seems like a reasonable addition given the general lack of coverage for "allow-promisor" in the test suite. Reported-by: Andrew Olsen <andrew.olsen@koordinates.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12scalar: update technical doc roadmapVictoria Dye1-6/+3
Update the Scalar roadmap to reflect the completion of generalizing 'scalar diagnose' into 'git diagnose' and 'git bugreport --diagnose'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12scalar-diagnose: use 'git diagnose --mode=all'Victoria Dye1-22/+6
Replace implementation of 'scalar diagnose' with an internal invocation of 'git diagnose --mode=all'. This simplifies the implementation of 'cmd_diagnose' by making it a direct alias of 'git diagnose' and removes some code in 'scalar.c' that is duplicated in 'builtin/diagnose.c'. The simplicity of the alias also sets up a clean deprecation path for 'scalar diagnose' (in favor of 'git diagnose'), if that is desired in the future. This introduces one minor change to the output of 'scalar diagnose', which is that the prefix of the created zip archive is changed from 'scalar_' to 'git-diagnostics-'. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12builtin/bugreport.c: create '--diagnose' optionVictoria Dye3-3/+90
Create a '--diagnose' option for 'git bugreport' to collect additional information about the repository and write it to a zipped archive. The '--diagnose' option behaves effectively as an alias for simultaneously running 'git bugreport' and 'git diagnose'. In the documentation, users are explicitly recommended to attach the diagnostics alongside a bug report to provide additional context to readers, ideally reducing some back-and-forth between reporters and those debugging the issue. Note that '--diagnose' may take an optional string arg (either 'stats' or 'all'). If specified without the arg, the behavior corresponds to running 'git diagnose' without '--mode'. As with 'git diagnose', this default is intended to help reduce unintentional leaking of sensitive information). Users can also explicitly specify '--diagnose=(stats|all)' to generate the respective archive created by 'git diagnose --mode=(stats|all)'. Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12builtin/diagnose.c: add '--mode' optionVictoria Dye5-4/+84
Create '--mode=<mode>' option in 'git diagnose' to allow users to optionally select non-default diagnostic information to include in the output archive. Additionally, document the currently-available modes, emphasizing the importance of not sharing a '--mode=all' archive publicly due to the presence of sensitive information. Note that the option parsing callback - 'option_parse_diagnose()' - is added to 'diagnose.c' rather than 'builtin/diagnose.c' so that it may be reused in future callers configuring a diagnostics archive. Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12builtin/diagnose.c: create 'git diagnose' builtinVictoria Dye7-0/+143
Create a 'git diagnose' builtin to generate a standalone zip archive of repository diagnostics. The "diagnose" functionality was originally implemented for Scalar in aa5c79a331 (scalar: implement `scalar diagnose`, 2022-05-28). However, the diagnostics gathered are not specific to Scalar-cloned repositories and can be useful when diagnosing issues in any Git repository. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12diagnose.c: add option to configure archive contentsVictoria Dye3-10/+39
Update 'create_diagnostics_archive()' to take an argument 'mode'. When archiving diagnostics for a repository, 'mode' is used to selectively include/exclude information based on its value. The initial options for 'mode' are: * DIAGNOSE_NONE: do not collect any diagnostics or create an archive (no-op). * DIAGNOSE_STATS: collect basic repository metadata (Git version, repo path, filesystem available space) as well as sizing and count statistics for the repository's objects and packfiles. * DIAGNOSE_ALL: collect basic repository metadata, sizing/count statistics, and copies of the '.git', '.git/hooks', '.git/info', '.git/logs', and '.git/objects/info' directories. These modes are introduced to provide users the option to collect diagnostics without the sensitive information included in copies of '.git' dir contents. At the moment, only 'scalar diagnose' uses 'create_diagnostics_archive()' (with a hardcoded 'DIAGNOSE_ALL' mode to match existing functionality), but more callers will be introduced in subsequent patches. Finally, refactor from a hardcoded set of 'add_directory_to_archiver()' calls to iterative invocations gated by 'DIAGNOSE_ALL'. This allows for easier future modification of the set of directories to archive and improves error reporting when 'add_directory_to_archiver()' fails. Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12scalar-diagnose: move functionality to common locationVictoria Dye4-200/+227
Move the core functionality of 'scalar diagnose' into a new 'diagnose.[c,h]' library to prepare for new callers in the main Git tree generating diagnostic archives. These callers will be introduced in subsequent patches. While this patch appears large, it is mostly made up of moving code out of 'scalar.c' and into 'diagnose.c'. Specifically, the functions - dir_file_stats_objects() - dir_file_stats() - count_files() - loose_objs_stats() - add_directory_to_archiver() are all copied verbatim from 'scalar.c'. The 'create_diagnostics_archive()' function is a mostly identical (partial) copy of 'cmd_diagnose()', with the primary changes being that 'zip_path' is an input and "Enlistment root" is corrected to "Repository root" in the archiver log. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12scalar-diagnose: move 'get_disk_info()' to 'compat/'Victoria Dye3-52/+58
Move 'get_disk_info()' function into 'compat/'. Although Scalar-specific code is generally not part of the main Git tree, 'get_disk_info()' will be used in subsequent patches by additional callers beyond 'scalar diagnose'. This patch prepares for that change, at which point this platform-specific code should be part of 'compat/' as a matter of convention. The function is copied *mostly* verbatim, with two exceptions: * '#ifdef WIN32' is replaced with '#ifdef GIT_WINDOWS_NATIVE' to allow 'statvfs' to be used with Cygwin. * the 'struct strbuf buf' and 'int res' (as well as their corresponding cleanup & return) are moved outside of the '#ifdef' block. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12scalar-diagnose: add directory to archiver more gentlyVictoria Dye1-2/+8
If a directory added to the 'scalar diagnose' archiver does not exist, warn and return 0 from 'add_directory_to_archiver()' rather than failing with a fatal error. This handles a failure edge case where the '.git/logs' has not yet been created when running 'scalar diagnose', but extends to any situation where a directory may be missing in the '.git' dir. Now, when a directory is missing a warning is captured in the diagnostic logs. This provides a user with more complete information than if 'scalar diagnose' simply failed with an error. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12scalar-diagnose: avoid 32-bit overflow of size_tVictoria Dye1-1/+1
Avoid 32-bit size_t overflow when reporting the available disk space in 'get_disk_info' by casting the block size and available block count to 'off_t' before multiplying them. Without this change, 'st_mult' would (correctly) report a size_t overflow on 32-bit systems at or exceeding 2^32 bytes of available space. Note that 'off_t' is a 64-bit integer even on 32-bit systems due to the inclusion of '#define _FILE_OFFSET_BITS 64' in 'git-compat-util.h' (see b97e911643 (Support for large files on 32bit systems., 2007-02-17)). Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12scalar-diagnose: use "$GIT_UNZIP" in testVictoria Dye1-4/+4
Use the "$GIT_UNZIP" test variable rather than verbatim 'unzip' to unzip the 'scalar diagnose' archive. Using "$GIT_UNZIP" is needed to run the Scalar tests on systems where 'unzip' is not in the system path. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-12The twelfth batchJunio C Hamano1-0/+4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-11rev-list: support human-readable output for `--disk-usage`Li Linchao3-4/+57
The '--disk-usage' option for git-rev-list was introduced in 16950f8384 (rev-list: add --disk-usage option for calculating disk usage, 2021-02-09). This is very useful for people inspect their git repo's objects usage infomation, but the resulting number is quit hard for a human to read. Teach git rev-list to output a human readable result when using '--disk-usage'. Signed-off-by: Li Linchao <lilinchao@oschina.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-11Git 2.37.2v2.37.2Junio C Hamano2-1/+25
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10fsck: downgrade tree badFilemode to "info"Jeff King2-1/+18
The previous commit un-broke the "badFileMode" check; before then it was literally testing nothing. And as far as I can tell, it has been so since the very initial version of fsck. The current severity of "badFileMode" is just "warning". But in the --strict mode used by transfer.fsckObjects, that is elevated to an error. This will potentially cause hassle for users, because historical objects with bad modes will suddenly start causing pushes to many server operators to be rejected. At the same time, these bogus modes aren't actually a big risk. Because we canonicalize them everywhere besides fsck, they can't cause too much mischief in the real world. The worst thing you can do is end up with two almost-identical trees that have different hashes but are interpreted the same. That will generally cause things to be inefficient rather than wrong, and is a bug somebody working on a Git implementation would want to fix, but probably not worth inconveniencing users by refusing to push or fetch. So let's downgrade this to "info" by default, which is our setting for "mention this when fscking, but don't ever reject, even under strict mode". If somebody really wants to be paranoid, they can still adjust the level using config. Suggested-by: Xavier Morel <xavier.morel@masklinn.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10fsck: actually detect bad file modes in treesJeff King2-1/+15
We use the normal tree_desc code to iterate over trees in fsck, meaning we only see the canonicalized modes it returns. And hence we'd never see anything unexpected, since it will coerce literally any garbage into one of our normal and accepted modes. We can use the new RAW_MODES flag to see the real modes, and then use the existing code to actually analyze them. The existing code is written as allow-known-good, so there's not much point in testing a variety of breakages. The one tested here should be S_IFREG but with nonsense permissions. Do note that the error-reporting here isn't great. We don't mention the specific bad mode, but just that the tree has one or more broken modes. But when you go to look at it with "git ls-tree", we'll report the canonicalized mode! This isn't ideal, but given that this should come up rarely, and that any number of other tree corruptions might force you into looking at the binary bytes via "cat-file", it's not the end of the world. And it's something we can improve on top later if we choose. Reported-by: Xavier Morel <xavier.morel@masklinn.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10tree-walk: add a mechanism for getting non-canonicalized modesJeff King4-9/+19
When using init_tree_desc() and tree_entry() to iterate over a tree, we always canonicalize the modes coming out of the tree. This is a good thing to prevent bugs or oddities in normal code paths, but it's counter-productive for tools like fsck that want to see the exact contents. We can address this by adding an option to avoid the extra canonicalization. A few notes on the implementation: - I've attached the new option to the tree_desc struct itself. The actual code change is in decode_tree_entry(), which is in turn called by the public update_tree_entry(), tree_entry(), and init_tree_desc() functions, plus their "gently" counterparts. By letting it ride along in the struct, we can avoid changing the signature of those functions, which are called many times. Plus it's conceptually simpler: you really want a particular iteration of a tree to be "raw" or not, rather than individual calls. - We still have to set the new option somewhere. The struct is initialized by init_tree_desc(). I added the new flags field only to the "gently" version. That avoids disturbing the much more numerous non-gentle callers, and it makes sense that anybody being careful about looking at raw modes would also be careful about bogus trees (i.e., the caller will be something like fsck in the first place). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10bundle-uri: add example bundle organizationDerrick Stolee1-0/+105
The previous change introduced the bundle URI design document. It creates a flexible set of options that allow bundle providers many ways to organize Git object data and speed up clones and fetches. It is particularly important that we have flexibility so we can apply future advancements as new ideas for efficiently organizing Git data are discovered. However, the design document does not provide even an example of how bundles could be organized, and that makes it difficult to envision how the feature should work at the end of the implementation plan. Add a section that details how a bundle provider could work, including using the Git server advertisement for multiple geo-distributed servers. This organization is based on the GVFS Cache Servers which have successfully used similar ideas to provide fast object access and reduced server load for very large repositories. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10docs: document bundle URI standardDerrick Stolee2-0/+469
Introduce the idea of bundle URIs to the Git codebase through an aspirational design document. This document includes the full design intended to include the feature in its fully-implemented form. This will take several steps as detailed in the Implementation Plan section. By committing this document now, it can be used to motivate changes necessary to reach these final goals. The design can still be altered as new information is discovered. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10mergetools: vimdiff: simplify tabfirstFelipe Contreras1-25/+19
If we wrap the tabdo command there's no need for a separate command call. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10mergetools: vimdiff: fix single window layoutsFelipe Contreras1-12/+8
Layouts with a single window other than "MERGED" do not work (e.g. "LOCAL" or "MERGED+LOCAL"). This is because as the documentation of bufdo says: The last buffer (or where an error occurred) becomes the current buffer. And we do always do bufdo the end. Additionally, we do it only once, when it should be per tab. Fix this by doing it once per tab right after it's created and before any buffer is switched. Cc: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10mergetools: vimdiff: rework tab logicFelipe Contreras1-28/+22
If we treat tabs especially, the logic becomes much simpler. Cc: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10mergetools: vimdiff: fix for diffoptFelipe Contreras1-18/+18
When diffopt has hiddenoff set and there's only one window (as is the case in the single window mode) the diff mode is turned off. We don't want that, so turn that option off. Cc: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10mergetools: vimdiff: silence annoying messagesFelipe Contreras1-2/+2
When using the single window mode we are greeted with the following warning: "./content_LOCAL_8975" 6L, 28B "./content_BASE_8975" 6 lines, 29 bytes "./content_REMOTE_8975" 6 lines, 29 bytes "content" 16 lines, 115 bytes Press ENTER or type command to continue every time. Silence that. Suggested-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10mergetools: vimdiff: make vimdiff3 actually workFelipe Contreras1-18/+18
When vimdiff3 was added in 7c147b77d3 (mergetools: add vimdiff3 mode, 2014-04-20), the description made clear the intention: It's similar to the default, except that the other windows are hidden. This ensures that removed/added colors are still visible on the main merge window, but the other windows not visible. However, in 0041797449 (vimdiff: new implementation with layout support, 2022-03-30) this was broken by generating a command that never creates windows, and therefore vim never shows the diff. The layout support implementation broke the whole purpose of vimdiff3, and simply shows MERGED, which is no different from simply opening the file with vim. In order to show the diff, the windows need to be created first, and then when they are hidden the diff remains (if hidenoff isn't set), but by setting the `hidden` option the initial buffers are marked as hidden thus making the feature work. Suggested-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10mergetools: vimdiff: fix commentFelipe Contreras1-2/+2
The name of the variable is wrong, and it can be set to anything, like 1. Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com> Reviewed-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-10doc add: renormalize is not idempotent for CRCRLFPhilip Oakley1-1/+3
Bug report https://lore.kernel.org/git/AM0PR02MB56357CC96B702244F3271014E8DC9@AM0PR02MB5635.eurprd02.prod.outlook.com/ noted that a file containing /r/r/n needed renormalising twice. This is by design. Lone CR characters, not paired with an LF, are left unchanged. Note this limitation of the "clean" filter in the documentation. Renormalize was introduced at 9472935d81e (add: introduce "--renormalize", Torsten Bögershausen, 2017-11-16) Signed-off-by: Philip Oakley <philipoakley@iee.email> Reviewed-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08rm: integrate with sparse-indexShaoxuan Yuan3-3/+20
Enable the sparse index within the `git-rm` command. The `p2000` tests demonstrate a ~92% execution time reduction for 'git rm' using a sparse index. Test HEAD~1 HEAD -------------------------------------------------------------------------- 2000.74: git rm ... (full-v3) 0.41(0.37+0.05) 0.43(0.36+0.07) +4.9% 2000.75: git rm ... (full-v4) 0.38(0.34+0.05) 0.39(0.35+0.05) +2.6% 2000.76: git rm ... (sparse-v3) 0.57(0.56+0.01) 0.05(0.05+0.00) -91.2% 2000.77: git rm ... (sparse-v4) 0.57(0.55+0.02) 0.03(0.03+0.00) -94.7% ---- Also, normalize a behavioral difference of `git-rm` under sparse-index. See related discussion [1]. `git-rm` a sparse-directory entry within a sparse-index enabled repo behaves differently from a sparse directory within a sparse-checkout enabled repo. For example, in a sparse-index repo, where 'folder1' is a sparse-directory entry, `git rm -r --sparse folder1` provides this: rm 'folder1/' Whereas in a sparse-checkout repo *without* sparse-index, doing so provides this: rm 'folder1/0/0/0' rm 'folder1/0/1' rm 'folder1/a' Because `git rm` a sparse-directory entry does not need to expand the index, therefore we should accept the current behavior, which is faster than "expand the sparse-directory entry to match the sparse-checkout situation". Modify a previous test so such difference is not considered as an error. [1] https://github.com/ffyuanda/git/pull/6#discussion_r934861398 Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08rm: expand the index only when necessaryShaoxuan Yuan2-4/+28
Remove the `ensure_full_index()` method so `git-rm` does not always expand the index when the expansion is unnecessary, i.e. when <pathspec> does not have any possibilities to match anything outside of sparse-checkout definition. Expand the index when the <pathspec> needs an expanded index, i.e. the <pathspec> contains wildcard that may need a full-index or the <pathspec> is simply outside of sparse-checkout definition. Notice that the test 'rm pathspec expands index when necessary' in t1092 *is* testing this code change behavior, though it will be marked as 'test_expect_success' only in the next patch, where we officially mark `command_requires_full_index = 0`, so the index does not expand unless we tell it to do so. Notice that because we also want `ensure_full_index` to record the stdout and stderr from Git command, a corresponding modification is also included in this patch. The reason we want the "sparse-index-out" and "sparse-index-err", is that we need to make sure there is no error from Git command itself, so we can rely on the `test_region` result and determine if the index is expanded or not. Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08pathspec.h: move pathspec_needs_expanded_index() from reset.c to hereShaoxuan Yuan3-83/+102
Method pathspec_needs_expanded_index() in reset.c from 4d1cfc1351 (reset: make --mixed sparse-aware, 2021-11-29) is reusable when we need to verify if the index needs to be expanded when the command is utilizing a pathspec rather than a literal path. Move it to pathspec.h for reusability. Add a few items to the function so it can better serve its purpose as a standalone public function: * Add a check in front so if the index is not sparse, return early since no expansion is needed. * It now takes an arbitrary 'struct index_state' pointer instead of using `the_index` and `active_cache`. * Add documentation to the function. Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08t1092: add tests for `git-rm`Shaoxuan Yuan1-0/+57
Add tests for `git-rm`, make sure it behaves as expected when <pathspec> is both inside or outside of sparse-checkout definition. Helped-by: Victoria Dye <vdye@github.com> Helped-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08unpack-trees: unpack new trees as sparse directoriesVictoria Dye2-10/+113
If 'unpack_single_entry()' is unpacking a new directory tree (that is, one not already present in the index) into a sparse index, unpack the tree as a sparse directory rather than traversing its contents and unpacking each file individually. This helps keep the sparse index as collapsed as possible in cases such as 'git reset --hard' restoring a outside-of-cone directory removed with 'git rm -r --sparse'. Without this patch, 'unpack_single_entry()' will only unpack a directory into the index as a sparse directory (rather than traversing into it and unpacking its files one-by-one) if an entry with the same name already exists in the index. This patch allows sparse directory unpacking without a matching index entry when the following conditions are met: 1. the directory's path is outside the sparse cone, and 2. there are no children of the directory in the index If a directory meets these requirements (as determined by 'is_new_sparse_dir()'), 'unpack_single_entry()' unpacks the sparse directory index entry and propagates the decision back up to 'unpack_callback()' to prevent unnecessary tree traversal into the unpacked directory. Reported-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08cache.h: create 'index_name_pos_sparse()'Victoria Dye2-0/+14
Add 'index_name_pos_sparse()', which behaves the same as 'index_name_pos()', except that it does not expand a sparse index to search for an entry inside a sparse directory. 'index_entry_exists()' was originally implemented in 20ec2d034c (reset: make sparse-aware (except --mixed), 2021-11-29) as an alternative to 'index_name_pos()' to allow callers to search for an index entry without expanding a sparse index. However, that particular use case only required knowing whether the requested entry existed, so 'index_entry_exists()' does not return the index positioning information provided by 'index_name_pos()'. This patch implements 'index_name_pos_sparse()' to accommodate callers that need the positioning information of 'index_name_pos()', but do not want to expand the index. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08oneway_diff: handle removed sparse directoriesVictoria Dye1-0/+5
Update 'do_oneway_diff()' to perform a 'diff_tree_oid()' on removed sparse directories, as it does for added or modified sparse directories (see 9eb00af562 (diff-lib: handle index diffs with sparse dirs, 2021-07-14)). At the moment, this update is unreachable code because 'unpack_trees()' (currently the only way 'oneway_diff()' can be called, via 'diff_cache()') will always traverse trees down to the individual removed files of a deleted sparse directory. A subsequent patch will change this to better preserve a sparse index in other uses of 'unpack_tree()', e.g. 'git reset --hard'. However, making that change without this patch would result in (among other issues) 'git status' printing only the name of a deleted sparse directory, not its contents. To avoid introducing that bug, 'do_oneway_diff()' is updated before modifying 'unpack_trees()'. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08checkout: fix nested sparse directory diff in sparse indexVictoria Dye2-0/+9
Add the 'recursive' diff flag to the local changes reporting done by 'git checkout' in 'show_local_changes()'. Without the flag enabled, unexpanded sparse directories will not be recursed into to report the diff of each file's contents, resulting in the reported local changes including "modified" sparse directories. The same issue was found and fixed for 'git status' in 2c521b0e49 (status: fix nested sparse directory diff in sparse index, 2022-03-01) Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-08The eleventh batchJunio C Hamano1-0/+20
Signed-off-by: Junio C Hamano <gitster@pobox.com>