summaryrefslogtreecommitdiffstats
path: root/advice.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2018-05-22Git 2.17.1v2.17.1Junio C Hamano3-2/+18
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Git 2.16.4v2.16.4Junio C Hamano3-2/+7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Git 2.15.2v2.15.2Junio C Hamano2-1/+4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Git 2.14.4v2.14.4Junio C Hamano3-2/+7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22Git 2.13.7v2.13.7Junio C Hamano3-2/+22
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22fsck: complain when .gitmodules is a symlinkJeff King2-2/+38
We've recently forbidden .gitmodules to be a symlink in verify_path(). And it's an easy way to circumvent our fsck checks for .gitmodules content. So let's complain when we see it. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22index-pack: check .gitmodules files with --strictJeff King3-0/+60
Now that the internal fsck code has all of the plumbing we need, we can start checking incoming .gitmodules files. Naively, it seems like we would just need to add a call to fsck_finish() after we've processed all of the objects. And that would be enough to cover the initial test included here. But there are two extra bits: 1. We currently don't bother calling fsck_object() at all for blobs, since it has traditionally been a noop. We'd actually catch these blobs in fsck_finish() at the end, but it's more efficient to check them when we already have the object loaded in memory. 2. The second pass done by fsck_finish() needs to access the objects, but we're actually indexing the pack in this process. In theory we could give the fsck code a special callback for accessing the in-pack data, but it's actually quite tricky: a. We don't have an internal efficient index mapping oids to packfile offsets. We only generate it on the fly as part of writing out the .idx file. b. We'd still have to reconstruct deltas, which means we'd basically have to replicate all of the reading logic in packfile.c. Instead, let's avoid running fsck_finish() until after we've written out the .idx file, and then just add it to our internal packed_git list. This does mean that the objects are "in the repository" before we finish our fsck checks. But unpack-objects already exhibits this same behavior, and it's an acceptable tradeoff here for the same reason: the quarantine mechanism means that pushes will be fully protected. In addition to a basic push test in t7415, we add a sneaky pack that reverses the usual object order in the pack, requiring that index-pack access the tree and blob during the "finish" step. This already works for unpack-objects (since it will have written out loose objects), but we'll check it with this sneaky pack for good measure. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22unpack-objects: call fsck_finish() after fscking objectsJeff King2-1/+11
As with the previous commit, we must call fsck's "finish" function in order to catch any queued objects for .gitmodules checks. This second pass will be able to access any incoming objects, because we will have exploded them to loose objects by now. This isn't quite ideal, because it means that bad objects may have been written to the object database (and a subsequent operation could then reference them, even if the other side doesn't send the objects again). However, this is sufficient when used with receive.fsckObjects, since those loose objects will all be placed in a temporary quarantine area that will get wiped if we find any problems. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22fsck: call fsck_finish() after fscking objectsJeff King2-0/+7
Now that the internal fsck code is capable of checking .gitmodules files, we just need to teach its callers to use the "finish" function to check any queued objects. With this, we can now catch the malicious case in t7415 with git-fsck. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22fsck: check .gitmodules contentJeff King1-1/+59
This patch detects and blocks submodule names which do not match the policy set forth in submodule-config. These should already be caught by the submodule code itself, but putting the check here means that newer versions of Git can protect older ones from malicious entries (e.g., a server with receive.fsckObjects will block the objects, protecting clients which fetch from it). As a side effect, this means fsck will also complain about .gitmodules files that cannot be parsed (or were larger than core.bigFileThreshold). Signed-off-by: Jeff King <peff@peff.net>
2018-05-22fsck: handle promisor objects in .gitmodules checkJeff King1-0/+3
If we have a tree that points to a .gitmodules blob but don't have that blob, we can't check its contents. This produces an fsck error when we encounter it. But in the case of a promisor object, this absence is expected, and we must not complain. Note that this can technically circumvent our transfer.fsckObjects check. Imagine a client fetches a tree, but not the matching .gitmodules blob. An fsck of the incoming objects will show that we don't have enough information. Later, we do fetch the actual blob. But we have no idea that it's a .gitmodules file. The only ways to get around this would be to re-scan all of the existing trees whenever new ones enter (which is expensive), or to somehow persist the gitmodules_found set between fsck runs (which is complicated). In practice, it's probably OK to ignore the problem. Any repository which has all of the objects (including the one serving the promisor packs) can perform the checks. Since promisor packs are inherently about a hierarchical topology in which clients rely on upstream repositories, those upstream repositories can protect all of their downstream clients from broken objects. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22fsck: detect gitmodules filesJeff King2-0/+65
In preparation for performing fsck checks on .gitmodules files, this commit plumbs in the actual detection of the files. Note that unlike most other fsck checks, this cannot be a property of a single object: we must know that the object is found at a ".gitmodules" path at the root tree of a commit. Since the fsck code only sees one object at a time, we have to mark the related objects to fit the puzzle together. When we see a commit we mark its tree as a root tree, and when we see a root tree with a .gitmodules file, we mark the corresponding blob to be checked. In an ideal world, we'd check the objects in topological order: commits followed by trees followed by blobs. In that case we can avoid ever loading an object twice, since all markings would be complete by the time we get to the marked objects. And indeed, if we are checking a single packfile, this is the order in which Git will generally write the objects. But we can't count on that: 1. git-fsck may show us the objects in arbitrary order (loose objects are fed in sha1 order, but we may also have multiple packs, and we process each pack fully in sequence). 2. The type ordering is just what git-pack-objects happens to write now. The pack format does not require a specific order, and it's possible that future versions of Git (or a custom version trying to fool official Git's fsck checks!) may order it differently. 3. We may not even be fscking all of the relevant objects at once. Consider pushing with transfer.fsckObjects, where one push adds a blob at path "foo", and then a second push adds the same blob at path ".gitmodules". The blob is not part of the second push at all, but we need to mark and check it. So in the general case, we need to make up to three passes over the objects: once to make sure we've seen all commits, then once to cover any trees we might have missed, and then a final pass to cover any .gitmodules blobs we found in the second pass. We can simplify things a bit by loosening the requirement that we find .gitmodules only at root trees. Technically a file like "subdir/.gitmodules" is not parsed by Git, but it's not unreasonable for us to declare that Git is aware of all ".gitmodules" files and make them eligible for checking. That lets us drop the root-tree requirement, which eliminates one pass entirely. And it makes our worst case much better: instead of potentially queueing every root tree to be re-examined, the worst case is that we queue each unique .gitmodules blob for a second look. This patch just adds the boilerplate to find .gitmodules files. The actual content checks will come in a subsequent commit. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22fsck: actually fsck blob dataJeff King3-24/+28
Because fscking a blob has always been a noop, we didn't bother passing around the blob data. In preparation for content-level checks, let's fix up a few things: 1. The fsck_object() function just returns success for any blob. Let's a noop fsck_blob(), which we can fill in with actual logic later. 2. The fsck_loose() function in builtin/fsck.c just threw away blob content after loading it. Let's hold onto it until after we've called fsck_object(). The easiest way to do this is to just drop the parse_loose_object() helper entirely. Incidentally, this also fixes a memory leak: if we successfully loaded the object data but did not parse it, we would have left the function without freeing it. 3. When fsck_loose() loads the object data, it does so with a custom read_loose_object() helper. This function streams any blobs, regardless of size, under the assumption that we're only checking the sha1. Instead, let's actually load blobs smaller than big_file_threshold, as the normal object-reading code-paths would do. This lets us fsck small files, and a NULL return is an indication that the blob was so big that it needed to be streamed, and we can pass that information along to fsck_blob(). Signed-off-by: Jeff King <peff@peff.net>
2018-05-22fsck: simplify ".git" checkJeff King1-3/+1
There's no need for us to manually check for ".git"; it's a subset of the other filesystem-specific tests. Dropping it makes our code slightly shorter. More importantly, the existing code may make a reader wonder why ".GIT" is not covered here, and whether that is a bug (it isn't, as it's also covered in the filesystem-specific tests). Signed-off-by: Jeff King <peff@peff.net>
2018-05-22index-pack: make fsck error message more specificJeff King2-2/+2
If fsck reports an error, we say only "Error in object". This isn't quite as bad as it might seem, since the fsck code would have dumped some errors to stderr already. But it might help to give a little more context. The earlier output would not have even mentioned "fsck", and that may be a clue that the "fsck.*" or "*.fsckObjects" config may be relevant. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22verify_path: disallow symlinks in .gitmodulesJeff King4-15/+37
There are a few reasons it's not a good idea to make .gitmodules a symlink, including: 1. It won't be portable to systems without symlinks. 2. It may behave inconsistently, since Git may look at this file in the index or a tree without bothering to resolve any symbolic links. We don't do this _yet_, but the config infrastructure is there and it's planned for the future. With some clever code, we could make (2) work. And some people may not care about (1) if they only work on one platform. But there are a few security reasons to simply disallow it: a. A symlinked .gitmodules file may circumvent any fsck checks of the content. b. Git may read and write from the on-disk file without sanity checking the symlink target. So for example, if you link ".gitmodules" to "../oops" and run "git submodule add", we'll write to the file "oops" outside the repository. Again, both of those are problems that _could_ be solved with sufficient code, but given the complications in (1) and (2), we're better off just outlawing it explicitly. Note the slightly tricky call to verify_path() in update-index's update_one(). There we may not have a mode if we're not updating from the filesystem (e.g., we might just be removing the file). Passing "0" as the mode there works fine; since it's not a symlink, we'll just skip the extra checks. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22update-index: stat updated files earlierJeff King1-8/+17
In the update_one(), we check verify_path() on the proposed path before doing anything else. In preparation for having verify_path() look at the file mode, let's stat the file earlier, so we can check the mode accurately. This is made a bit trickier by the fact that this function only does an lstat in a few code paths (the ones that flow down through process_path()). So we can speculatively do the lstat() here and pass the results down, and just use a dummy mode for cases where we won't actually be updating the index from the filesystem. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22verify_dotfile: mention case-insensitivity in commentJeff King1-1/+4
We're more restrictive than we need to be in matching ".GIT" on case-sensitive filesystems; let's make a note that this is intentional. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22verify_path: drop clever fallthroughJeff King1-4/+4
We check ".git" and ".." in the same switch statement, and fall through the cases to share the end-of-component check. While this saves us a line or two, it makes modifying the function much harder. Let's just write it out. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22skip_prefix: add case-insensitive variantJeff King1-0/+17
We have the convenient skip_prefix() helper, but if you want to do case-insensitive matching, you're stuck doing it by hand. We could add an extra parameter to the function to let callers ask for this, but the function is small and somewhat performance-critical. Let's just re-implement it for the case-insensitive version. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22is_{hfs,ntfs}_dotgitmodules: add testsJohannes Schindelin2-0/+106
This tests primarily for NTFS issues, but also adds one example of an HFS+ issue. Thanks go to Congyi Wu for coming up with the list of examples where NTFS would possibly equate the filename with `.gitmodules`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net>
2018-05-22is_ntfs_dotgit: match other .git filesJohannes Schindelin2-1/+93
When we started to catch NTFS short names that clash with .git, we only looked for GIT~1. This is sufficient because we only ever clone into an empty directory, so .git is guaranteed to be the first subdirectory or file in that directory. However, even with a fresh clone, .gitmodules is *not* necessarily the first file to be written that would want the NTFS short name GITMOD~1: a malicious repository can add .gitmodul0000 and friends, which sorts before `.gitmodules` and is therefore checked out *first*. For that reason, we have to test not only for ~1 short names, but for others, too. It's hard to just adapt the existing checks in is_ntfs_dotgit(): since Windows 2000 (i.e., in all Windows versions still supported by Git), NTFS short names are only generated in the <prefix>~<number> form up to number 4. After that, a *different* prefix is used, calculated from the long file name using an undocumented, but stable algorithm. For example, the short name of .gitmodules would be GITMOD~1, but if it is taken, and all of ~2, ~3 and ~4 are taken, too, the short name GI7EBA~1 will be used. From there, collisions are handled by incrementing the number, shortening the prefix as needed (until ~9999999 is reached, in which case NTFS will not allow the file to be created). We'd also want to handle .gitignore and .gitattributes, which suffer from a similar problem, using the fall-back short names GI250A~1 and GI7D29~1, respectively. To accommodate for that, we could reimplement the hashing algorithm, but it is just safer and simpler to provide the known prefixes. This algorithm has been reverse-engineered and described at https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but still available via https://web.archive.org/. These can be recomputed by running the following Perl script: -- snip -- use warnings; use strict; sub compute_short_name_hash ($) { my $checksum = 0; foreach (split('', $_[0])) { $checksum = ($checksum * 0x25 + ord($_)) & 0xffff; } $checksum = ($checksum * 314159269) & 0xffffffff; $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000); $checksum -= (($checksum * 1152921497) >> 60) * 1000000007; return scalar reverse sprintf("%x", $checksum & 0xffff); } print compute_short_name_hash($ARGV[0]); -- snap -- E.g., running that with the argument ".gitignore" will result in "250a" (which then becomes "gi250a" in the code). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net>
2018-05-22is_hfs_dotgit: match other .git filesJeff King2-12/+51
Both verify_path() and fsck match ".git", ".GIT", and other variants specific to HFS+. Let's allow matching other special files like ".gitmodules", which we'll later use to enforce extra restrictions via verify_path() and fsck. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22is_ntfs_dotgit: use a size_t for traversing stringJeff King1-1/+1
We walk through the "name" string using an int, which can wrap to a negative value and cause us to read random memory before our array (e.g., by creating a tree with a name >2GB, since "int" is still 32 bits even on most 64-bit platforms). Worse, this is easy to trigger during the fsck_tree() check, which is supposed to be protecting us from malicious garbage. Note one bit of trickiness in the existing code: we sometimes assign -1 to "len" at the end of the loop, and then rely on the "len++" in the for-loop's increment to take it back to 0. This is still legal with a size_t, since assigning -1 will turn into SIZE_MAX, which then wraps around to 0 on increment. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22submodule-config: verify submodule names as pathsJeff King5-0/+143
Submodule "names" come from the untrusted .gitmodules file, but we blindly append them to $GIT_DIR/modules to create our on-disk repo paths. This means you can do bad things by putting "../" into the name (among other things). Let's sanity-check these names to avoid building a path that can be exploited. There are two main decisions: 1. What should the allowed syntax be? It's tempting to reuse verify_path(), since submodule names typically come from in-repo paths. But there are two reasons not to: a. It's technically more strict than what we need, as we really care only about breaking out of the $GIT_DIR/modules/ hierarchy. E.g., having a submodule named "foo/.git" isn't actually dangerous, and it's possible that somebody has manually given such a funny name. b. Since we'll eventually use this checking logic in fsck to prevent downstream repositories, it should be consistent across platforms. Because verify_path() relies on is_dir_sep(), it wouldn't block "foo\..\bar" on a non-Windows machine. 2. Where should we enforce it? These days most of the .gitmodules reads go through submodule-config.c, so I've put it there in the reading step. That should cover all of the C code. We also construct the name for "git submodule add" inside the git-submodule.sh script. This is probably not a big deal for security since the name is coming from the user anyway, but it would be polite to remind them if the name they pick is invalid (and we need to expose the name-checker to the shell anyway for our test scripts). This patch issues a warning when reading .gitmodules and just ignores the related config entry completely. This will generally end up producing a sensible error, as it works the same as a .gitmodules file which is missing a submodule entry (so "submodule update" will barf, but "git clone --recurse-submodules" will print an error but not abort the clone. There is one minor oddity, which is that we print the warning once per malformed config key (since that's how the config subsystem gives us the entries). So in the new test, for example, the user would see three warnings. That's OK, since the intent is that this case should never come up outside of malicious repositories (and then it might even benefit the user to see the message multiple times). Credit for finding this vulnerability and the proof of concept from which the test script was adapted goes to Etienne Stalmans. Signed-off-by: Jeff King <peff@peff.net>
2018-05-22submodule: add --dissociate option to add/update commandsCasey Fitzpatrick4-5/+48
Add --dissociate option to add and update commands, both clone helper commands that already have the --reference option --dissociate pairs with. Signed-off-by: Casey Fitzpatrick <kcghost@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22submodule: add --progress option to add commandCasey Fitzpatrick3-1/+27
The '--progress' was introduced in 72c5f88311d (clone: pass --progress decision to recursive submodules, 2016-09-22) to fix the progress reporting of the clone command. Also add the progress option to the 'submodule add' command. The update command already supports the progress flag, but it is not documented. Signed-off-by: Casey Fitzpatrick <kcghost@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22submodule: clean up substitutions in scriptCasey Fitzpatrick1-4/+4
'recommend_shallow' and 'jobs' variables do not need quotes. They only hold a single token value, and even if they were multi-token it is likely we would want them split at IFS rather than pass a single string. 'progress' is a boolean value. Treat it like the other boolean values in the script by using a substitution. Signed-off-by: Casey Fitzpatrick <kcghost@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21travis-ci: run gcc-8 on linux-gcc jobsNguyễn Thái Ngọc Duy2-0/+6
Switch from gcc-4.8 to gcc-8. Newer compilers come with more warning checks (usually in -Wextra). Since -Wextra is enabled in developer mode (which is also enabled in travis), this lets travis report more warnings before other people do it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21get_main_ref_store: BUG() when outside a repositoryJeff King1-0/+3
If we don't have a repository, then we can't initialize the ref store. Prior to 64a741619d (refs: store the main ref store inside the repository struct, 2018-04-11), we'd try to access get_git_dir(), and outside a repository that would trigger a BUG(). After that commit, though, we directly use the_repository->git_dir; if it's NULL we'll just segfault. Let's catch this case and restore the BUG() behavior. Obviously we don't ever want to hit this code, but a BUG() is a lot more helpful than a segfault if we do. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21t9902-completion: exercise __git_complete_index_file() directlySZEDER Gábor1-107/+118
The tests added in 2f271cd9cf (t9902-completion: add tests demonstrating issues with quoted pathnames, 2018-05-08) and in 2ab6eab4fe (completion: improve handling quoted paths in 'git ls-files's output, 2018-03-28) have a few shortcomings: - All these tests use the 'test_completion' helper function, thus they are exercising the whole completion machinery, although they are only interested in how git-aware path completion, specifically the __git_complete_index_file() function deals with unusual characters in pathnames and on the command line. - These tests can't satisfactorily test the case of pathnames containing spaces, because 'test_completion' gets the words on the command line as a single argument and it uses space as word separator. - Some of the tests are protected by different FUNNYNAMES_* prereqs depending on whether they put backslashes and double quotes or separator characters (FS, GS, RS, US) in pathnames, although a filesystem not allowing one likely doesn't allow the others either. - One of the tests operates on paths containing '|' and '&' characters without being protected by a FUNNYNAMES prereq, but some filesystems (notably on Windows) don't allow these characters in pathnames, either. Replace these tests with basically equivalent, more focused tests that call __git_complete_index_file() directly. Since this function only looks at the current word to be completed, i.e. the $cur variable, we can easily include pathnames containing spaces in the tests, so use such pathnames instead of pathnames containing '|' and '&'. Finally, use only a single FUNNYNAMES prereq for all kinds of special characters. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21completion: don't return with error from __gitcomp_file_direct()SZEDER Gábor1-2/+4
In __gitcomp_file_direct() we tell Bash that it should handle our possible completion words as filenames with the following piece of cleverness: # use a hack to enable file mode in bash < 4 compopt -o filenames +o nospace 2>/dev/null || compgen -f /non-existing-dir/ > /dev/null Unfortunately, this makes this function always return with error when it is not invoked in real completion, but e.g. in tests of 't9902-completion.sh': - First the 'compopt' line errors out - either because in Bash v3.x there is no such command, - or because in Bash v4.x it complains about "not currently executing completion function", - then 'compgen' just silently returns with error because of the non-existing directory. Since __gitcomp_file_direct() is now the last command executed in __git_complete_index_file(), that function returns with error as well, which prevents it from being invoked in tests directly as is, and would require extra steps in test to hide its error code. So let's make sure that __gitcomp_file_direct() doesn't return with error, because in the tests coming in the following patch we do want to exercise __git_complete_index_file() directly, __gitcomp_file() contains the same construct, and thus it, too, always returns with error. Update that function accordingly as well. While at it, also remove the space from between the redirection operator and the filename in both functions. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18object.c: clear replace map before freeing itStefan Beller1-0/+2
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-15t7005-editor: get rid of the SPACES_IN_FILENAMES prereqSZEDER Gábor1-9/+3
The last two tests 'editor with a space' and 'core.editor with a space' in 't7005-editor.sh' need the SPACES_IN_FILENAMES prereq to ensure that they are only run on filesystems that allow, well, spaces in filenames. However, we have been putting a space in the name of the trash directory for just over a decade now, so we wouldn't be able to run any of our tests on such a filesystem in the first place. This prereq is therefore unnecessary, remove it. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-14t5512: run git fetch inside testRené Scharfe1-1/+1
Do the preparatory fetch inside the test of ls-remote --symref to avoid cluttering the test output and to be able to catch unexpected fetch failures. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13mark_parents_uninteresting(): avoid most allocationJeff King1-19/+25
Commit 941ba8db57 (Eliminate recursion in setting/clearing marks in commit list, 2012-01-14) used a clever double-loop to avoid allocations for single-parent chains of history. However, it did so only when following parents of parents (which was an uncommon case), and _always_ incurred at least one allocation to populate the list of pending parents in the first place. We can turn this into zero-allocation in the common case by iterating directly over the initial parent list, and then following up on any pending items we might have discovered. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13mark_parents_uninteresting(): replace list with stackJeff King1-24/+43
The mark_parents_uninteresting() function uses two loops: an outer one to process our queue of pending parents, and an inner one to process first-parent chains. This is a clever optimization from 941ba8db57 (Eliminate recursion in setting/clearing marks in commit list, 2012-01-14) to limit the number of linked-list allocations when following single-parent chains. Unfortunately, this makes the result a little hard to read. Let's replace the list with a stack. Then we don't have to worry about doing this double-loop optimization, as we'll just reuse the top element of the stack as we pop/push. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13mark_parents_uninteresting(): drop missing object checkJeff King1-12/+0
We allow UNINTERESTING objects in a traversal to be unavailable. As part of this, mark_parents_uninteresting() checks whether we have a particular uninteresting parent; if not, we will mark it as "parsed" so that later code skips it. This code is redundant and even a little bit harmful, so let's drop it. It's redundant because when our parse_object() call in add_parents_to_list() fails, we already quietly skip UNINTERESTING parents. This redundancy is a historical artifact. The mark_parents_uninteresting() protection is from 454fbbcde3 (git-rev-list: allow missing objects when the parent is marked UNINTERESTING, 2005-07-10). Much later, aeeae1b771 (revision traversal: allow UNINTERESTING objects to be missing, 2009-01-27) covered more cases by making the actual parse more gentle. As an aside, even if this weren't redundant, it would be insufficient. The gentle parsing handles both missing and corrupted objects, whereas the has_object_file() check we're getting rid of covers only missing ones. And the code we're dropping is harmful for two reasons: 1. We spend extra time on the object lookup, even though we don't actually need the information at this point (and will just repeat that lookup later when we parse for the common case that we _do_ have the object). 2. It "lies" about the commit by setting the parsed flag, even though we didn't load any useful data into the struct. This shouldn't matter for the UNINTERESTING case, but we may later clear our flags and do another traversal in the same process. While pretty unlikely, it's possible that we could then look at the same commit without the UNINTERESTING flag, in which case we'd produce the wrong result (we'd think it's a commit with no parents, when in fact we should probably die due to the missing object). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13mark_tree_contents_uninteresting(): drop missing object checkJeff King1-4/+1
It's generally acceptable for UNINTERESTING objects in a traversal to be unavailable (e.g., see aeeae1b771). When marking trees UNINTERESTING, we access the object database twice: once to check if the object is missing (and return quietly if it is), and then again to actually parse it. We can instead just try to parse; if that fails, we can then return quietly. That halves the effort we spend on locating the object. Note that this isn't _exactly_ the same as the original behavior, as the parse failure could be due to other problems than a missing object: it could be corrupted, in which case the original code would have died. But the new behavior is arguably better, as it covers the object being unavailable for any reason. We'll also still issue a warning to stderr in such a case. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13commit.h: rearrange 'index' to shrink struct commitNguyễn Thái Ngọc Duy1-1/+1
On linux 64-bit architecture, pahole finds that there's a 4 bytes padding after 'index'. Moving it to the end reduces this struct's size from 72 to 64 bytes (because of another 4 bytes padding after graph_pos). On linux 32-bit, the struct size remains 52 bytes like before. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13add status config and command line options for rename detectionBen Peart8-2/+194
After performing a merge that has conflicts git status will, by default, attempt to detect renames which causes many objects to be examined. In a virtualized repo, those objects do not exist locally so the rename logic triggers them to be fetched from the server. This results in the status call taking hours to complete on very large repos vs seconds with this patch. Add a new config status.renames setting to enable turning off rename detection during status and commit. This setting will default to the value of diff.renames. Add a new config status.renamelimit setting to to enable bounding the time spent finding out inexact renames during status and commit. This setting will default to the value of diff.renamelimit. Add --no-renames command line option to status that enables overriding the config setting from the command line. Add --find-renames[=<n>] command line option to status that enables detecting renames and optionally setting the similarity index. Reviewed-by: Elijah Newren <newren@gmail.com> Original-Patch-by: Alejandro Pauly <alpauly@microsoft.com> Signed-off-by: Ben Peart <Ben.Peart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13column: fix off-by-one default widthNguyễn Thái Ngọc Duy2-2/+1
By default we want to fill the whole screen if possible, but we do not want to use up _all_ terminal columns because the last character is going hit the border, push the cursor over and wrap. Keep it at default value zero, which will make print_columns() set the width at term_columns() - 1. This affects the test in t7004 because effective column width before was 40 but now 39 so we need to compensate it by one or the output at 39 columns has a different layout. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13pager: set COLUMNS to term_columns()Jeff King1-3/+8
After we invoke the pager, our stdout goes to a pipe, not the terminal, meaning we can no longer use an ioctl to get the terminal width. For that reason, ad6c3739a3 (pager: find out the terminal width before spawning the pager, 2012-02-12) started caching the terminal width. But that cache is only an in-process variable. Any programs we spawn will also not be able to run that ioctl, but won't have access to our cache. They'll end up falling back to our 80-column default. You can see the problem with: git tag --column=row Since git-tag spawns a pager these days, its spawned git-column helper will see neither the terminal on stdout nor a useful COLUMNS value (assuming you do not export it from your shell already). And you'll end up with 80-column output in the pager, regardless of your terminal size. We can fix this by setting COLUMNS right before spawning the pager. That fixes this case, as well as any more complicated ones (e.g., a paged program spawns another script which then generates columnized output). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13refs: handle zero oid for pseudorefsMartin Ågren2-5/+15
According to the documentation, it is possible to "specify 40 '0' or an empty string as <oldvalue> to make sure that the ref you are creating does not exist." But in the code for pseudorefs, we do not implement this, as demonstrated by the failing tests added in the previous commit. If we fail to read the old ref, we immediately die. But a failure to read would actually be a good thing if we have been given the zero oid. With the zero oid, allow -- and even require -- the ref-reading to fail. This implements the "make sure that the ref ... does not exist" part of the documentation and fixes both failing tests from the previous commit. Since we have a `strbuf err` for collecting errors, let's use it and signal an error to the caller instead of dying hard. Reported-by: Rafael Ascensão <rafa.almas@gmail.com> Helped-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13t1400: add tests around adding/deleting pseudorefsMartin Ågren1-0/+60
I have not been able to find any tests around adding pseudorefs using `git update-ref`. Add some as outlined in this table (original design by Michael Haggerty; modified and extended by me): Pre-update value | ref-update old OID | Expected result -------------------|----------------------|---------------- missing | value | reject missing | none given | accept set | none given | accept set | correct value | accept set | wrong value | reject missing | zero | accept * set | zero | reject * The tests marked with a * currently fail, despite git-update-ref(1) claiming that it is possible to "specify 40 '0' or an empty string as <oldvalue> to make sure that the ref you are creating does not exist." These failing tests will be fixed in the next commit. It is only natural to test deletion as well. Test deletion without an old OID, with a correct one and with an incorrect one. Suggested-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13refs.c: refer to "object ID", not "sha1", in error messagesMartin Ågren1-2/+4
We have two error messages that complain about the "sha1". Because we are about to touch one of these sites and add some tests, let's first modernize the messages to say "object ID" instead. While at it, make the second one use `error()` instead of `warning()`. After printing the message, we do not continue, but actually drop the lock and return -1 without deleting the pseudoref. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13pack-format.txt: more details on pack file formatNguyễn Thái Ngọc Duy2-0/+97
The current document mentions OBJ_* constants without their actual values. A git developer would know these are from cache.h but that's not very friendly to a person who wants to read this file to implement a pack file parser. Similarly, the deltified representation is not documented at all (the "document" is basically patch-delta.c). Translate that C code to English with a bit more about what ofs-delta and ref-delta mean. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11apply: clarify "-p" documentationJeff King1-2/+4
We're not really removing slashes, but slash-separated path components. Let's make that more clear. Reported-by: kelly elton <its.the.doc@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11t5516-fetch-push: fix broken &&-chainSZEDER Gábor1-1/+1
b2dc968e60 (t5516: refactor oddball tests, 2008-11-07) accidentaly broke the &&-chain in the test 'push does not update local refs on failure', but since it was in a subshell, chain-lint couldn't notice it. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11t5516-fetch-push: fix 'push with dry-run' testSZEDER Gábor1-4/+4
In a while-at-it cleanup replacing a 'cd dir && <...> && cd ..' with a subshell, commit 28391a80a9 (receive-pack: allow deletion of corrupt refs, 2007-11-29) also moved the assignment of the $old_commit variable to that subshell. This variable, however, is used outside of that subshell as a parameter of check_push_result(), to check that a ref still points to the commit where it is supposed to. With the variable remaining unset outside the subshell check_push_result() doesn't perform that check at all. Use 'git -C <dir> cmd...', so we don't need to change directory, and thus don't need the subshell either when setting $old_commit. Furthermore, change check_push_result() to require at least three parameters (the repo, the oid, and at least one ref), so it will catch similar issues earlier should they ever arise. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>