summaryrefslogtreecommitdiffstats
path: root/tree-diff.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2018-10-22t1300: extract and use test_cmp_config()Nguyễn Thái Ngọc Duy2-60/+42
In many config-related tests it's common to check if a config variable has expected value and we want to print the differences when the test fails. Doing it the normal way is three lines of shell code. Let's add a function do to all this (and a little more). This function has uses outside t1300 as well but I'm not going to convert them all. And it will be used in the next commit where per-worktree config feature is introduced. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-19Fifth batch for 2.20Junio C Hamano1-0/+82
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-16Fourth batch for 2.20Junio C Hamano1-0/+50
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: load cache entries on worker threadsBen Peart1-37/+193
This patch helps address the CPU cost of loading the index by utilizing the Index Entry Offset Table (IEOT) to divide loading and conversion of the cache entries across multiple threads in parallel. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 32.24% Test w/1,000,000 files reduced the time by -4.77% Note that on the 1,000,000 files case, multi-threading the cache entry parsing does not yield a performance win. This is because the cost to parse the index extensions in this repo, far outweigh the cost of loading the cache entries. The high cost of parsing the index extensions is driven by the cache tree and the untracked cache extensions. As this is currently the longest pole, any reduction in this time will reduce the overall index load times so is worth further investigation in another patch series. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11ieot: add Index Entry Offset Table (IEOT) extensionBen Peart2-3/+211
This patch enables addressing the CPU cost of loading the index by adding additional data to the index that will allow us to efficiently multi- thread the loading and conversion of cache entries. It accomplishes this by adding an (optional) index extension that is a table of offsets to blocks of cache entries in the index file. To make this work for V4 indexes, when writing the cache entries, it periodically "resets" the prefix-compression by encoding the current entry as if the path name for the previous entry is completely different and saves the offset of that entry in the IEOT. Basically, with V4 indexes, it generates offsets into blocks of prefix-compressed entries. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: load cache extensions on a worker threadBen Peart1-16/+79
This patch helps address the CPU cost of loading the index by loading the cache extensions on a worker thread in parallel with loading the cache entries. In some cases, loading the extensions takes longer than loading the cache entries so this patch utilizes the new EOIE to start the thread to load the extensions before loading all the cache entries in parallel. This is possible because the current extensions don't access the cache entries in the index_state structure so are OK that they don't all exist yet. The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED extensions don't even get a pointer to the index so don't have access to the cache entries. CACHE_EXT_LINK only uses the index_state to initialize the split index. CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last update and dirty flags. I used p0002-read-cache.sh to generate some performance data: Test w/100,000 files reduced the time by 0.53% Test w/1,000,000 files reduced the time by 27.78% Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11config: add new index.threads config settingBen Peart5-0/+36
Add support for a new index.threads config setting which will be used to control the threading code in do_read_index(). A value of 0 will tell the index code to automatically determine the correct number of threads to use. A value of 1 will make the code single threaded. A value greater than 1 will set the maximum number of threads to use. For testing purposes, this setting can be overwritten by setting the GIT_TEST_INDEX_THREADS=<n> environment variable to a value greater than 0. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11eoie: add End of Index Entry (EOIE) extensionBen Peart3-12/+177
The End of Index Entry (EOIE) is used to locate the end of the variable length index entries and the beginning of the extensions. Code can take advantage of this to quickly locate the index extensions without having to parse through all of the index entries. The EOIE extension is always written out to the index file including to the shared index when using the split index feature. Because it is always written out, the SHA checksums in t/t1700-split-index.sh were updated to reflect its inclusion. It is written as an optional extension to ensure compatibility with other git implementations that do not yet support it. It is always written out to ensure it is available as often as possible to speed up index operations. Because it must be able to be loaded before the variable length cache entries and other index extensions, this extension must be written last. The signature for this extension is { 'E', 'O', 'I', 'E' }. The extension consists of: - 32-bit offset to the end of the index entries - 160-bit SHA-1 over the extension types and their sizes (but not their contents). E.g. if we have "TREE" extension that is N-bytes long, "REUC" extension that is M-bytes long, followed by "EOIE", then the hash would be: SHA-1("TREE" + <binary representation of N> + "REUC" + <binary representation of M>) Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11read-cache: clean up casting and byte decodingBen Peart1-12/+11
This patch does a clean up pass to minimize the casting required to work with the memory mapped index (mmap). It also makes the decoding of network byte order more consistent by using get_be32() where possible. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-10Third batch for 2.20Junio C Hamano1-0/+16
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-10cache-tree: skip some blob checks in partial cloneJonathan Tan2-1/+38
In a partial clone, whenever a sparse checkout occurs, the existence of all blobs in the index is verified, whether they are included or excluded by the .git/info/sparse-checkout specification. This significantly degrades performance because a lazy fetch occurs whenever the existence of a missing blob is checked. This is because cache_tree_update() checks the existence of all objects in the index, whether or not CE_SKIP_WORKTREE is set on them. Teach cache_tree_update() to skip checking CE_SKIP_WORKTREE objects when the repository is a partial clone. This improves performance for sparse checkout and also other operations that use cache_tree_update(). Instead of completely removing the check, an argument could be made that the check should instead be replaced by a check that the blob is promised, but for performance reasons, I decided not to do this. If the user needs to verify the repository, it can be done using fsck (which will notify if a tree points to a missing and non-promised blob, whether the blob is included or excluded by the sparse-checkout specification). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-10Declare that the next one will be named 2.20Junio C Hamano2-167/+168
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-09transport.c: introduce core.alternateRefsPrefixesTaylor Blau3-0/+20
The recently-introduced "core.alternateRefsCommand" allows callers to specify with high flexibility the tips that they wish to advertise from alternates. This flexibility comes at the cost of some inconvenience when the caller only wishes to limit the advertisement to one or more prefixes. For example, to advertise only tags, a caller using 'core.alternateRefsCommand' would have to do: $ git config core.alternateRefsCommand ' \ f() { git -C "$1" for-each-ref \ refs/tags --format="%(objectname)" }; f "$@"' The above is cumbersome to write, so let's introduce a "core.alternateRefsPrefixes" to address this common case. Instead, the caller can run: $ git config core.alternateRefsPrefixes 'refs/tags' Which will behave identically to the longer example using "core.alternateRefsCommand". Since the value of "core.alternateRefsPrefixes" is appended to 'git for-each-ref' and then executed, include a "--" before taking the configured value to avoid misinterpreting arguments as flags to 'git for-each-ref'. In the case that the caller wishes to specify multiple prefixes, they may separate them by whitespace. If "core.alternateRefsCommand" is set, it will take precedence over "core.alternateRefsPrefixes". Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-09transport.c: introduce core.alternateRefsCommandTaylor Blau3-4/+59
When in a repository containing one or more alternates, Git would sometimes like to list references from those alternates. For example, 'git receive-pack' lists the "tips" pointed to by references in those alternates as special ".have" references. Listing ".have" references is designed to make pushing changes from upstream to a fork a lightweight operation, by advertising to the pusher that the fork already has the objects (via its alternate). Thus, the client can avoid sending them. However, when the alternate (upstream, in the previous example) has a pathologically large number of references, the initial advertisement is too expensive. In fact, it can dominate any such optimization where the pusher avoids sending certain objects. Introduce "core.alternateRefsCommand" in order to provide a facility to limit or filter alternate references. This can be used, for example, to filter out references the alternate does not wish to send (for space concerns, or otherwise) during the initial advertisement. Let the repository that has alternates configure this command to avoid trusting the alternate to provide us a safe command to run in the shell. To find the alternate, pass its absolute path as the first argument. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-09transport.c: extract 'fill_alternate_refs_command'Taylor Blau1-6/+12
To list alternate references, 'read_alternate_refs' creates a child process running 'git for-each-ref' in the alternate's Git directory. Prepare to run other commands besides 'git for-each-ref' by introducing and moving the relevant code from 'read_alternate_refs' to 'fill_alternate_refs_command'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-09transport: drop refnames from for_each_alternate_refJeff King4-8/+6
None of the current callers use the refname parameter we pass to their callbacks. In theory somebody _could_ do so, but it's actually quite weird if you think about it: it's a ref in somebody else's repository. So the name has no meaning locally, and in fact there may be duplicates if there are multiple alternates. The users of this interface really only care about seeing some ref tips, since that promises that the alternate has the full commit graph reachable from there. So let's keep the information we pass back to the bare minimum. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07docs: typo: s/isimilar/similar/Michael Witten1-1/+1
Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07docs: graph: remove unnecessary `graph_update()' callMichael Witten1-1/+0
The sample code calls `get_revision()' followed by `graph_update()', but the documentation and source code indicate that `get_revision()' already calls `graph_update()' for you. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07docs: typo: s/go/to/Michael Witten1-1/+1
Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07git-completion.bash: add completion for stash listSteven Fernandez1-0/+3
Since stash list accepts git-log options, add the following useful options that make sense in the context of the `git stash list` command: --name-status --oneline --patch-with-stat Signed-off-by: Steven Fernandez <steve@lonetwin.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07fetch: do not list refs if fetching only hashesJonathan Tan3-6/+54
If only hash literals are given on a "git fetch" command-line, tag following is not requested, and the fetch is done using protocol v2, a list of refs is not required from the remote. Therefore, optimize by invoking transport_get_remote_refs() only if we need the refs. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07transport: list refs before fetch if necessaryJonathan Tan3-0/+19
The built-in bundle transport and the transport helper interface do not work when transport_fetch_refs() is called immediately after transport creation. This will be needed in a subsequent patch, so fix this. Evidence: fetch_refs_from_bundle() relies on data->header being initialized in get_refs_from_bundle(), and fetch() in transport-helper.c relies on either data->fetch or data->import being set by get_helper(), but neither transport_helper_init() nor fetch() calls get_helper(). Up until the introduction of the partial clone feature, this has not been a problem, because transport_fetch_refs() is always called after transport_get_remote_refs(). With the introduction of the partial clone feature, which involves calling transport_fetch_refs() (to fetch objects by their OIDs) without transport_get_remote_refs(), this is still not a problem, but only coincidentally - we do not support partially cloning a bundle, and as for cloning using a transport-helper-using protocol, it so happens that before transport_fetch_refs() is called, fetch_refs() in fetch-object.c calls transport_set_option(), which means that the aforementioned get_helper() is invoked through set_helper_option() in transport-helper.c. This could be fixed by fixing the transports themselves, but it doesn't seem like a good idea to me to open up previously untested code paths; also, there may be transport helpers in the wild that assume that "list" is always called before "fetch". Instead, fix this by having transport_fetch_refs() call transport_get_remote_refs() to ensure that the latter is always called at least once, unless the transport explicitly states that it supports fetching without listing refs. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07transport: do not list refs if possibleJonathan Tan3-3/+16
When all refs to be fetched are exact OIDs, it is possible to perform a fetch without requiring the remote to list refs if protocol v2 is used. Teach Git to do this. This currently has an effect only for lazy fetches done from partial clones. The change necessary to likewise optimize "git fetch <remote> <sha-1>" will be done in a subsequent patch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07transport: allow skipping of ref listingJonathan Tan1-4/+25
The get_refs_via_connect() function both performs the handshake (including determining the protocol version) and obtaining the list of remote refs. However, the fetch protocol v2 supports fetching objects without the listing of refs, so make it possible for the user to skip the listing by creating a new handshake() function. This will be used in a subsequent commit. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07tests: order arguments to git-rev-list properlyMatthew DeVore3-29/+44
It is a common mistake to put positional arguments before flags when invoking git-rev-list. Order the positional arguments last. This patch skips git-rev-list invocations which include the --not flag, since the ordering of flags and positional arguments affects the behavior. This patch also skips invocations of git-rev-list that occur in command substitution in which the exit code is discarded, since fixing those properly will require a more involved cleanup. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07t9109: don't swallow Git errors upstream of pipesMatthew DeVore1-13/+21
'git ... | foo' will mask any errors or crashes in git, so split up such pipes in this file. One testcase uses several separate pipe sequences in a row which are awkward to split up. Wrap the split-up pipe in a function so the awkwardness is not repeated. Also change that testcase's surrounding quotes from double to single to avoid premature string interpolation. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07tests: don't swallow Git errors upstream of pipesMatthew DeVore3-132/+141
Some pipes in tests lose the exit code of git processes, which can mask unexpected behavior like crashes. Split these pipes up so that git commands are only at the end of pipes rather than the beginning or middle. The violations fixed in this patch were found in the process of fixing pipe placement in a prior patch. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07t/*: fix ordering of expected/observed argumentsMatthew DeVore32-112/+112
Fix various places where the ordering was obviously wrong, meaning it was easy to find with grep. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07tests: standardize pipe placementMatthew DeVore7-250/+344
Instead of using a line-continuation and pipe on the second line, take advantage of the shell's implicit line continuation after a pipe character. So for example, instead of some long line \ | next line use some long line | next line And add a blank line before and after the pipe where it aids readability (it usually does). This better matches the coding style documented in Documentation/CodingGuidelines and used in shell scripts elsewhere in the tree. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07Documentation: add shell guidelinesMatthew DeVore2-0/+45
Add the following guideline to Documentation/CodingGuidelines: Break overlong lines after "&&", "||", and "|", not before them; that way the command can continue to subsequent lines without backslash at the end. And the following to t/README (since it is specific to writing tests): Pipes and $(git ...) should be avoided when they swallow exit codes of Git processes Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07t/README: reformat Do, Don't, Keep in mind listsMatthew DeVore1-22/+20
The list of Don'ts for test writing has grown large such that it is hard to see at a glance which section an item is in. In other words, if I ignore a little bit of surrounding context, the "don'ts" look like "do's." To make the list more readable, prefix "Don't" in front of every first sentence in the items. Also, the "Keep in mind" list is out of place and awkward, because it was a very short "list" beneath two very long ones, and it seemed easy to miss under the list of "don'ts," and it only had one item. So move this item to the list of "do's" and phrase as "Remember..." Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07commit-graph: reduce initial oid allocationDerrick Stolee1-1/+1
While writing a commit-graph file, we store the full list of commits in a flat list. We use this list for sorting and ensuring we are closed under reachability. The initial allocation assumed that (at most) one in four objects is a commit. This is a dramatic over-count for many repos, especially large ones. Since we grow the repo dynamically, reduce this count by a factor of eight. We still set it to a minimum of 1024 before allocating. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07builtin/commit-graph.c: UNLEAK variablesMartin Ågren1-5/+6
`graph_verify()`, `graph_read()` and `graph_write()` do the hard work of `cmd_commit_graph()`. As soon as these return, so does `cmd_commit_graph()`. `strbuf_getline()` may allocate memory in the strbuf, yet return EOF. We need to release the strbuf or UNLEAK it. Go for the latter since we are close to returning from `graph_write()`. `graph_write()` also fails to free the strings in the string list. They have been added to the list with `strdup_strings` set to 0. We could flip `strdup_strings` before clearing the list, which is our usual hack in situations like this. But since we are about to exit, let's just UNLEAK the whole string list instead. UNLEAK `graph` in `graph_verify`. While at it, and for consistency, UNLEAK in `graph_read()` as well, and remove an unnecessary UNLEAK just before dying. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07commit-graph: clean up leaked memory during writeDerrick Stolee1-5/+9
The write_commit_graph() method in commit-graph.c leaks some lits and strings during execution. In addition, a list of strings is leaked in write_commit_graph_reachable(). Clean these up so our memory checking is cleaner. Further, if we use a list of pack-files to find the commits, we can leak the packed_git structs after scanning them for commits. Running the following commands demonstrates the leak before and the fix after: * valgrind --leak-check=full ./git commit-graph write --reachable * valgrind --leak-check=full ./git commit-graph write --stdin-packs Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-05diff --color-moved: fix a memory leakPhillip Wood1-2/+2
Free the hashmap items as well as the hashmap itself. This was found with asan. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-05diff --color-moved-ws: fix another memory leakPhillip Wood1-0/+2
This is obvious in retrospect, it was found with asan. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-05diff --color-moved-ws: fix a memory leakPhillip Wood1-1/+4
Don't duplicate the indentation string if we're not going to use it. This was found with asan. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-05diff --color-moved-ws: fix out of bounds string accessPhillip Wood1-1/+1
When adjusting the start of the string to take account of the change in indentation the code was not checking that the string being adjusted was in fact longer than the indentation change. This was detected by asan. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-05diff --color-moved-ws: fix double free crashPhillip Wood1-37/+45
Running git diff --color-moved-ws=allow-indentation-change v2.18.0 v2.19.0 results in a crash due to a double free. This happens when two potential moved blocks start with consecutive lines. As pmb_advance_or_null_multi_match() advances it copies the ws_delta from the last matching line to the next. When the first of our consecutive lines is advanced its ws_delta well be copied to the second, overwriting the ws_delta of the block containing the second line. Then when the second line is advanced it will copy the new ws_delta to the line below it and so on. Eventually one of these blocks will stop matching and the ws_delta will be freed. From then on the other block is in a use-after-free state and when it stops matching it will try to free the ws_delta that has already been freed by the other block. The solution is to store the ws_delta in the array of potential moved blocks rather than with the lines. This means that it no longer needs to be copied around and one block cannot overwrite the ws_delta of another. Additionally it saves some malloc/free calls as we don't keep allocating and freeing ws_deltas. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Reviewed-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04oidset: uninline oidset_init()René Scharfe2-6/+14
There is no need to inline oidset_init(), as it's typically only called twice in the lifetime of an oidset (once at the beginning and at the end by oidset_clear()) and kh_resize_* is quite big, so move its definition to oidset.c. Document it while we're at it. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04oidset: use khashRené Scharfe2-30/+40
Reimplement oidset using khash.h in order to reduce its memory footprint and make it faster. Performance of a command that mainly checks for duplicate objects using an oidset, with master and Clang 6.0.1: $ cmd="./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)'" $ /usr/bin/time $cmd >/dev/null 0.22user 0.03system 0:00.25elapsed 99%CPU (0avgtext+0avgdata 48484maxresident)k 0inputs+0outputs (0major+11204minor)pagefaults 0swaps $ hyperfine "$cmd" Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)' Time (mean ± σ): 250.0 ms ± 6.0 ms [User: 225.9 ms, System: 23.6 ms] Range (min … max): 242.0 ms … 261.1 ms And with this patch: $ /usr/bin/time $cmd >/dev/null 0.14user 0.00system 0:00.15elapsed 100%CPU (0avgtext+0avgdata 41396maxresident)k 0inputs+0outputs (0major+8318minor)pagefaults 0swaps $ hyperfine "$cmd" Benchmark #1: ./git-cat-file --batch-all-objects --unordered --buffer --batch-check='%(objectname)' Time (mean ± σ): 151.9 ms ± 4.9 ms [User: 130.5 ms, System: 21.2 ms] Range (min … max): 148.2 ms … 170.4 ms Initial-patch-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04khash: factor out kh_release_*René Scharfe1-2/+7
Add a function for releasing the khash-internal allocations, but not the khash structure itself. It can be used with on-stack khash structs. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: load tip_oids eagerly iff neededRené Scharfe1-21/+15
tip_oids_contain() lazily loads refs into an oidset at its first call. It abuses the internal (sub)member .map.tablesize of that oidset to check if it has done that already. Determine if the oidset needs to be populated upfront and then do that instead. This duplicates a loop, but simplifies the existing one by separating concerns between the two. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: factor out is_unmatched_ref()René Scharfe1-8/+11
Move the code to determine if a request is unmatched to its own little helper. This allows us to reuse it in a subsequent patch. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: exclude blobs when lazy-fetching treesJonathan Tan3-0/+62
A partial clone with missing trees can be obtained using "git clone --filter=tree:none <repo>". In such a repository, when a tree needs to be lazily fetched, any tree or blob it directly or indirectly references is fetched as well, regardless of whether the original command required those objects, or if the local repository already had some of them. This is because the fetch protocol, which the lazy fetch uses, does not allow clients to request that only the wanted objects be sent, which would be the ideal solution. This patch implements a partial solution: specify the "blob:none" filter, somewhat reducing the fetch payload. This change has no effect when lazily fetching blobs (due to how filters work). And if lazily fetching a commit (such repositories are difficult to construct and is not a use case we support very well, but it is possible), referenced commits and trees are still fetched - only the blobs are not fetched. The necessary code change is done in fetch_pack() instead of somewhere closer to where the "filter" instruction is written to the wire so that only one part of the code needs to be changed in order for users of all protocol versions to benefit from this optimization. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04fetch-pack: avoid object flags if no_dependentsJonathan Tan1-42/+59
When fetch_pack() is invoked as part of another Git command (due to a lazy fetch from a partial clone, for example), it uses object flags that may already be used by the outer Git command. The commit that introduced the lazy fetch feature (88e2f9ed8e ("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried to avoid this overlap, but it did not avoid it totally. It was successful in avoiding writing COMPLETE, but did not avoid reading COMPLETE, and did not avoid writing and reading ALTERNATE. Ensure that no flags are written or read by fetch_pack() in the case where it is used to perform a lazy fetch. To do this, it is sufficient to avoid checking completeness of wanted refs (unnecessary in the case of lazy fetches), and to avoid negotiation-related work (in the current implementation, already, no negotiation is performed). After that was done, the lack of overlap was verified by checking all direct and indirect usages of COMPLETE and ALTERNATE - that they are read or written only if no_dependents is false. There are other possible solutions to this issue: (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using part, and whenever no_dependents is set, only use the non-flag-using part. (2) Make fetch_pack() be able to be used with arbitrary repository objects. fetch_pack() should then create its own repository object based on the given repository object, with its own object hashtable, so that the flags do not conflict. (1) is possible but invasive - some functions would need to be split; and such invasiveness would potentially be unnecessary if we ever were to need (2) anyway. (2) would be useful if we were to support, say, submodules that were partial clones themselves, but I don't know when or if the Git project plans to support those. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04mingw: bump the minimum Windows version to VistaJohannes Schindelin2-5/+1
Quite some time ago, a last plea to the XP users out there who want to see Windows XP support in Git for Windows, asking them to get engaged and help, vanished into the depths of the universe. We tried for a long time to play nice with the last remaining XP users who somehow manage to build Git from source, but a recent update of mingw-w64 (7.0.0.5233.e0c09544 -> 7.0.0.5245.edf66197) finally dropped the last sign of XP support, and Git for Windows' SDK is no longer able to build core Git's `master` branch as a consequence. (Git for Windows' `master` branch already bumped the minimum Windows version to Vista a while ago, so it is fine.) It is time to require Windows Vista or later to build Git from source. This, incidentally, lets us use quite a few nice new APIs. It also means that we no longer need the inet_pton() and inet_ntop() emulation, which is nice. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04mingw: set _WIN32_WINNT explicitly for Git for WindowsJohannes Schindelin1-1/+1
Previously, we only ever declared a target Windows version if compiling with Visual C. Which meant that we were relying on the MinGW headers to guess which Windows version we want to target... Let's be explicit about it, in particular because we actually want to bump the target Windows version to Vista (which we will do in the next commit). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04compat/poll: prepare for targeting Windows VistaJohannes Schindelin2-3/+18
Windows Vista (and later) actually have a working poll(), but we still cannot use it because it only works on sockets. So let's detect when we are targeting Windows Vista and undefine those constants, and define `pollfd` so that we can declare our own pollfd struct. We also need to make sure that we override those constants *after* `winsock2.h` has been `#include`d (otherwise we would not really override those constants). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04more oideq/hasheq conversionsJeff King4-7/+8
We added faster equality-comparison functions for hashes in 14438c4497 (introduce hasheq() and oideq(), 2018-08-28). A few topics were in-flight at the time, and can now be converted. This covers all spots found by "make coccicheck" in master (the coccicheck results were tweaked by hand for style). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>