summaryrefslogtreecommitdiffstats
path: root/http-walker.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt2024-12-061-0/+1
| | | | | | | | | Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* packfile: convert find_sha1_pack() to use object_idJeff King2024-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The find_sha1_pack() function has a few problems: - it's badly named, since it works with any object hash - it takes the hash as a bare pointer rather than an object_id struct We can fix both of these easily, as all callers actually have a real object_id anyway. I also found the existence of this function somewhat confusing, as it is about looking in an arbitrary set of linked packed_git structs. It's good for things like dumb-http which are looking in downloaded remote packs, and not our local packs. But despite the name, it is not a good way to find the pack which contains a local object (it skips the use of the midx, the pack mru list, and so on). So let's also add an explanatory comment above the declaration that may point people in the right direction. I suspect the calls in fast-import.c, which use the packed_git list from the repository struct, could actually just be using find_pack_entry(). But since we'd need to keep it anyway for dumb-http, I didn't dig further there. If we eventually drop dumb-http support, then it might be worth examining them to see if we can get rid of the function entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* http-walker: use object_id instead of bare hashJeff King2024-10-251-12/+13
| | | | | | | | | | | | | | | | | | | | | We long ago switched most code to using object_id structs instead of bare "unsigned char *" hashes. This gives us more type safety from the compiler, and generally makes it easier to understand what we expect in each parameter. But the dumb-http code has lagged behind. And indeed, the whole "walker" subsystem interface has the same problem, though http-walker is the only user left. So let's update the walker interface to pass object_id structs (which we already have anyway at all call sites!), and likewise use those within the http-walker methods that it calls. This cleans up the dumb-http code a bit, but will also let us fix a few more commonly used helper functions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* http-walker: free fake packed_git listJeff King2024-09-251-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dumb-http walker code creates a "fake" packed_git list representing packs we've downloaded from the remote (I call it "fake" because generally that struct is only used and managed by the local repository struct). But during our cleanup phase we don't touch those at all, causing a leak. There's no support here from the rest of the object-database API, as these structs are not meant to be freed, except when closing the object store completely. But we can see that raw_object_store_clear() just calls free() on them, and that's enough here to fix the leak. I also added a call to close_pack() before each. In the regular code this happens via close_object_store(), which we do as part of raw_object_store_clear(). This is necessary to prevent leaking mmap'd data (like the pack idx) or descriptors. The leak-checker won't catch either of these itself, but I did confirm with some hacky warning() calls and running t5550 that it's easy to leak at least index data. This is all much more intimate with the packed_git struct than I'd like, but I think fixing it would be a pretty big refactor. And it's just not worth it for dumb-http code which is rarely used these days. If we can silence the leak-checker without creating too much hassle, we should just do that. This lets us mark t5550 as leak-free. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* http: fix leak of http_object_request structJeff King2024-09-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new_http_object_request() function allocates a struct on the heap, along with some fields inside the struct. But the matching function to clean it up, release_http_object_request(), only frees the interior fields without freeing the struct itself, causing a leak. The related http_pack_request new/release pair gets this right, and at first glance we should be able to do the same thing and just add a single free() call. But there's a catch. These http_object_request structs are typically embedded in the object_request struct of http-walker.c. And when we clean up that parent struct, it sanity-checks the embedded struct to make sure we are not leaking descriptors. Which means a use-after-free if we simply free() the embedded struct. I have no idea how valuable that sanity-check is, or whether it can simply be deleted. This all goes back to 5424bc557f (http*: add helper methods for fetching objects (loose), 2009-06-06). But the obvious way to make it all work is to be sure we set the pointer to NULL after freeing it (and our freeing process closes the descriptor, so we know there is no leak). To make sure we do that consistently, we'll switch the pointer we take in release_http_object_request() to a pointer-to-pointer, and we'll set it to NULL ourselves. And then the compiler can help us find each caller which needs to be updated. Most cases will just pass "&obj_req->req", which will obviously do the right thing. In a few cases, like http-push's finish_request(), we are working with a copy of the pointer, so we don't NULL the original. But it's OK because the next step is to free the struct containing the original pointer anyway. This lets us mark t5551 as leak-free. Ironically this is the "smart" http test, and the leak here only affects dumb http. But there's a single dumb-http invocation in there. The full dumb tests are in t5550, which still has some more leaks. This also makes t5559 leak-free, as it's just an HTTP/2 variant of t5551. But we don't need to mark it as such, since it inherits the flag from t5551. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* global: introduce `USE_THE_REPOSITORY_VARIABLE` macroPatrick Steinhardt2024-06-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use of the `the_repository` variable is deprecated nowadays, and we slowly but steadily convert the codebase to not use it anymore. Instead, callers should be passing down the repository to work on via parameters. It is hard though to prove that a given code unit does not use this variable anymore. The most trivial case, merely demonstrating that there is no direct use of `the_repository`, is already a bit of a pain during code reviews as the reviewer needs to manually verify claims made by the patch author. The bigger problem though is that we have many interfaces that implicitly rely on `the_repository`. Introduce a new `USE_THE_REPOSITORY_VARIABLE` macro that allows code units to opt into usage of `the_repository`. The intent of this macro is to demonstrate that a certain code unit does not use this variable anymore, and to keep it from new dependencies on it in future changes, be it explicit or implicit For now, the macro only guards `the_repository` itself as well as `the_hash_algo`. There are many more known interfaces where we have an implicit dependency on `the_repository`, but those are not guarded at the current point in time. Over time though, we should start to add guards as required (or even better, just remove them). Define the macro as required in our code units. As expected, most of our code still relies on the global variable. Nearly all of our builtins rely on the variable as there is no way yet to pass `the_repository` to their entry point. For now, declare the macro in "biultin.h" to keep the required changes at least a little bit more contained. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* hash: require hash algorithm in `oidread()` and `oidclr()`Patrick Steinhardt2024-06-141-1/+1
| | | | | | | | | Both `oidread()` and `oidclr()` use `the_repository` to derive the hash function that shall be used. Require callers to pass in the hash algorithm to get rid of this implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()`Patrick Steinhardt2024-06-141-1/+1
| | | | | | | | | | | | | | | | | | | | | Many of our hash functions have two variants, one receiving a `struct git_hash_algo` and one that derives it via `the_repository`. Adapt all of those functions to always require the hash algorithm as input and drop the variants that do not accept one. As those functions are now independent of `the_repository`, we can move them from "hash.h" to "hash-ll.h". Note that both in this and subsequent commits in this series we always just pass `the_repository->hash_algo` as input even if it is obvious that there is a repository in the context that we should be using the hash from instead. This is done to be on the safe side and not introduce any regressions. All callsites should eventually be amended to use a repo passed via parameters, but this is outside the scope of this patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* treewide: remove unnecessary includes in source filesElijah Newren2023-12-261-1/+0
| | | | | | | | | | | | | | | | | | Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* object-store-ll.h: split this header out of object-store.hElijah Newren2023-06-211-1/+1
| | | | | | | | | | | | | | | | | The vast majority of files including object-store.h did not need dir.h nor khash.h. Split the header into two files, and let most just depend upon object-store-ll.h, while letting the two callers that need it depend on the full object-store.h. After this patch: $ git grep -h include..object-store | sort | uniq -c 2 #include "object-store.h" 129 #include "object-store-ll.h" Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'en/header-split-cache-h'Junio C Hamano2023-04-251-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Header clean-up. * en/header-split-cache-h: (24 commits) protocol.h: move definition of DEFAULT_GIT_PORT from cache.h mailmap, quote: move declarations of global vars to correct unit treewide: reduce includes of cache.h in other headers treewide: remove double forward declaration of read_in_full cache.h: remove unnecessary includes treewide: remove cache.h inclusion due to pager.h changes pager.h: move declarations for pager.c functions from cache.h treewide: remove cache.h inclusion due to editor.h changes editor: move editor-related functions and declarations into common file treewide: remove cache.h inclusion due to object.h changes object.h: move some inline functions and defines from cache.h treewide: remove cache.h inclusion due to object-file.h changes object-file.h: move declarations for object-file.c functions from cache.h treewide: remove cache.h inclusion due to git-zlib changes git-zlib: move declarations for git-zlib functions from cache.h treewide: remove cache.h inclusion due to object-name.h changes object-name.h: move declarations for object-name.c functions from cache.h treewide: remove unnecessary cache.h inclusion treewide: be explicit about dependence on mem-pool.h treewide: be explicit about dependence on oid-array.h ...
| * treewide: remove cache.h inclusion due to object-file.h changesElijah Newren2023-04-111-1/+1
| | | | | | | | | | | | Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * Merge branch 'ab/remove-implicit-use-of-the-repository' into ↵Junio C Hamano2023-04-041-2/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
* | \ Merge branch 'ab/remove-implicit-use-of-the-repository'Junio C Hamano2023-04-061-2/+2
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository *" argument, not "the_repository" post-cocci: adjust comments for recent repo_* migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"
| * | cocci: apply the "object-store.h" part of "the_repository.pending"Ævar Arnfjörð Bjarmason2023-03-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | http: mark unused parameter in fill_active_slot() callbacksJeff King2023-03-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a generic "fill" function that is used by both the dumb http push and fetch code paths. It takes a void parameter in case the caller wants to pass along extra data, but (since the previous commit) neither does so. So we could simply drop the extra parameter. But since it's good practice to provide a void pointer for in callback functions, we'll leave it here for the future, and just annotate it as unused (to appease -Wunused-parameter). While we're marking it, let's also fix the type in http-walker's function to have the correct "void" type. The original had to cast the function pointer and was technically undefined behavior (though generally OK in practice). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | http: drop unused parameter from start_object_request()Jeff King2023-03-171-4/+3
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | We take a "walker" parameter for the request, but don't actually look at it. This is due to 5424bc557f (http*: add helper methods for fetching objects (loose), 2009-06-06). Before then, we consulted the "walker" struct to tell us if we should be verbose, but now those messages are printed elsewhere. Let's drop the unused parameter to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | cache.h: remove dependence on hex.h; make other files include it explicitlyElijah Newren2023-02-241-0/+1
|/ | | | | Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tree-wide: apply equals-null.cocciJunio C Hamano2022-05-021-7/+7
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* http: rename CURLOPT_FILE to CURLOPT_WRITEDATAÆvar Arnfjörð Bjarmason2021-07-311-1/+1
| | | | | | | | | | | The CURLOPT_FILE name is an alias for CURLOPT_WRITEDATA, the CURLOPT_WRITEDATA name has been preferred since curl 7.9.7, released in May 2002[1]. 1. https://curl.se/libcurl/c/CURLOPT_WRITEDATA.html Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* http: drop support for curl < 7.16.0Jeff King2021-07-301-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | In the last commit we dropped support for curl < 7.11.1, let's continue that and drop support for versions older than 7.16.0. This allows us to get rid of some now-obsolete #ifdefs. Choosing 7.16.0 is a somewhat arbitrary cutoff: 1. It came out in October of 2006, almost 15 years ago. Besides being a nice round number, around 10 years is a common end-of-life support period, even for conservative distributions. 2. That version introduced the curl_multi interface, which gives us a lot of bang for the buck in removing #ifdefs RHEL 5 came with curl 7.15.5[1] (released in August 2006). RHEL 5's extended life cycle program ended on 2020-11-30[1]. RHEL 6 comes with curl 7.19.7 (released in November 2009), and RHEL 7 comes with 7.29.0 (released in February 2013). 1. http://lore.kernel.org/git/873e1f31-2a96-5b72-2f20-a5816cad1b51@jupiterrise.com Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Always use oidread to read into struct object_idbrian m. carlson2021-04-271-1/+1
| | | | | | | | | | In the future, we'll want oidread to automatically set the hash algorithm member for an object ID we read into it, so ensure we use oidread instead of hashcpy everywhere we're copying a hash value into a struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* http: refactor finish_http_pack_request()Jonathan Tan2020-06-111-2/+3
| | | | | | | | | | | | | | | | | | | | finish_http_pack_request() does multiple tasks, including some housekeeping on a struct packed_git - (1) closing its index, (2) removing it from a list, and (3) installing it. These concerns are independent of fetching a pack through HTTP: they are there only because (1) the calling code opens the pack's index before deciding to fetch it, (2) the calling code maintains a list of packfiles that can be fetched, and (3) the calling code fetches it in order to make use of its objects in the same process. In preparation for a subsequent commit, which adds a feature that does not need any of this housekeeping, remove (1), (2), and (3) from finish_http_pack_request(). (2) and (3) are now done by a helper function, and (1) is the responsibility of the caller (in this patch, done closer to the point where the pack index is opened). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'bc/hash-transition-16'Junio C Hamano2019-04-251-9/+9
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conversion from unsigned char[20] to struct object_id continues. * bc/hash-transition-16: (35 commits) gitweb: make hash size independent Git.pm: make hash size independent read-cache: read data in a hash-independent way dir: make untracked cache extension hash size independent builtin/difftool: use parse_oid_hex refspec: make hash size independent archive: convert struct archiver_args to object_id builtin/get-tar-commit-id: make hash size independent get-tar-commit-id: parse comment record hash: add a function to lookup hash algorithm by length remote-curl: make hash size independent http: replace sha1_to_hex http: compute hash of downloaded objects using the_hash_algo http: replace hard-coded constant with the_hash_algo http-walker: replace sha1_to_hex http-push: remove remaining uses of sha1_to_hex http-backend: allow 64-character hex names http-push: convert to use the_hash_algo builtin/pull: make hash-size independent builtin/am: make hash size independent ...
| * http-walker: replace sha1_to_hexbrian m. carlson2019-04-011-9/+9
| | | | | | | | | | | | | | | | | | Since sha1_to_hex is limited to SHA-1, replace the uses of it in this file with hash_to_hex. Rename several variables accordingly to reflect that they are no longer limited to SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * object-store: rename and expand packed_git's sha1 memberbrian m. carlson2019-04-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This member is used to represent the pack checksum of the pack in question. Expand this member to be GIT_MAX_RAWSZ bytes in length so it works with longer hashes and rename it to be "hash" instead of "sha1". This transformation was made with a change to the definition and the following semantic patch: @@ struct packed_git *E1; @@ - E1->sha1 + E1->hash @@ struct packed_git E1; @@ - E1.sha1 + E1.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | http: use normalize_curl_result() instead of manual conversionJeff King2019-03-241-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we switched off CURLOPT_FAILONERROR in 17966c0a63 (http: avoid disconnecting on 404s for loose objects, 2016-07-11), the fetch_object() function started manually handling 404's. Since we now have normalize_curl_result() for use elsewhere, we can use it here as well, shortening the code. Note that we lose the check for http/https in the URL here. None of the other result-normalizing code paths bother with this. Looking at missing_target(), which checks specifically for an FTP-specific CURLcode and "http" code 550, it seems likely that git-over-ftp has been subtly broken since 17966c0a63. This patch does nothing to fix that, but nor should it make anything worse (in fact, it may be slightly better because we'll actually recognize an error as such, rather than assuming CURLE_OK means we actually got some data). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | http: normalize curl results for dumb loose and alternates fetchesJeff King2019-03-241-0/+8
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the dumb-http walker encounters a 404 when fetching a loose object, it then looks at any http-alternates for the object. The 404 check is implemented by missing_target(), which checks not only the http code, but also that we got an http error from the CURLcode. That broke when we stopped using CURLOPT_FAILONERROR in 17966c0a63 (http: avoid disconnecting on 404s for loose objects, 2016-07-11), since our CURLcode will now be CURLE_OK. As a result, fetching over dumb-http from a repository with alternates could result in Git printing "Unable to find abcd1234..." and aborting. We could probably fix this just by loosening missing_target(). However, there's other code which looks at the curl result, and it would have to be tweaked as well. Instead, let's just normalize the result the same way the smart-http code does. There's a similar case in processing the alternates (where we failover from "info/http-alternates" to "info/alternates"). We'll give it the same treatment. After this patch, we should be hitting all code paths that need this normalization (notably absent here is the http_pack_request path, but it does not use FAILONERROR, nor missing_target()). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* convert has_sha1_file() callers to has_object_file()Jeff King2019-01-081-2/+2
| | | | | | | | | | | | The only remaining callers of has_sha1_file() actually have an object_id already. They can use the "object" variant, rather than dereferencing the hash themselves. The code changes here were completely generated by the included coccinelle patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sha1-file: modernize loose object file functionsJeff King2019-01-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The loose object access code in sha1-file.c is some of the oldest in Git, and could use some modernizing. It mostly uses "unsigned char *" for object ids, which these days should be "struct object_id". It also uses the term "sha1_file" in many functions, which is confusing. The term "loose_objects" is much better. It clearly distinguishes them from packed objects (which didn't even exist back when the name "sha1_file" came into being). And it also distinguishes it from the checksummed-file concept in csum-file.c (which until recently was actually called "struct sha1file"!). This patch converts the functions {open,close,map,stat}_sha1_file() into open_loose_object(), etc, and switches their sha1 arguments for object_id structs. Similarly, path functions like fill_sha1_path() become fill_loose_path() and use object_ids. The function sha1_loose_object_info() already says "loose", so we can just drop the "sha1" (and teach it to use object_id). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* http: use struct object_id instead of bare sha1Jeff King2019-01-081-3/+3
| | | | | | | | | | | | | | The dumb-http walker code still passes around and stores object ids as "unsigned char *sha1". Let's modernize it. There's probably still more work to be done to handle dumb-http fetches with a new, larger hash. But that can wait; this is enough that we can now convert some of the low-level object routines that we call into from here (and in fact, some of the "oid.hash" references added here will be further improved in the next patch). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sha1_file_name(): overwrite buffer instead of appendingJeff King2018-11-131-1/+1
| | | | | | | | | | | | | | | | | The sha1_file_name() function is used to generate the path to a loose object in the object directory. It doesn't make much sense for it to append, since the the path we write may be absolute (i.e., you cannot reliably build up a path with it). Because many callers use it with a static buffer, they have to strbuf_reset() manually before each call (and the other callers always use an empty buffer, so they don't care either way). Let's handle this automatically. Since we're changing the semantics, let's take the opportunity to give it a more hash-neutral name (which will also catch any callers from topics in flight). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* convert "hashcmp() != 0" to "!hasheq()"Jeff King2018-08-291-1/+1
| | | | | | | | | | | | This rounds out the previous three patches, covering the inequality logic for the "hash" variant of the functions. As with the previous three, the accompanying code changes are the mechanical result of applying the coccinelle patch; see those patches for more discussion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* convert "hashcmp() == 0" to hasheq()Jeff King2018-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | This is the partner patch to the previous one, but covering the "hash" variants instead of "oid". Note that our coccinelle rule is slightly more complex to avoid triggering the call in hasheq(). I didn't bother to add a new rule to convert: - hasheq(E1->hash, E2->hash) + oideq(E1, E2) Since these are new functions, there won't be any such existing callers. And since most of the code is already using oideq, we're not likely to introduce new ones. We might still see "!hashcmp(E1->hash, E2->hash)" from topics in flight. But because our new rule comes after the existing ones, that should first get converted to "!oidcmp(E1, E2)" and then to "oideq(E1, E2)". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'sb/object-store'Junio C Hamano2018-04-111-1/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactoring the internal global data structure to make it possible to open multiple repositories, work with and then close them. Rerolled by Duy on top of a separate preliminary clean-up topic. The resulting structure of the topics looked very sensible. * sb/object-store: (27 commits) sha1_file: allow sha1_loose_object_info to handle arbitrary repositories sha1_file: allow map_sha1_file to handle arbitrary repositories sha1_file: allow map_sha1_file_1 to handle arbitrary repositories sha1_file: allow open_sha1_file to handle arbitrary repositories sha1_file: allow stat_sha1_file to handle arbitrary repositories sha1_file: allow sha1_file_name to handle arbitrary repositories sha1_file: add repository argument to sha1_loose_object_info sha1_file: add repository argument to map_sha1_file sha1_file: add repository argument to map_sha1_file_1 sha1_file: add repository argument to open_sha1_file sha1_file: add repository argument to stat_sha1_file sha1_file: add repository argument to sha1_file_name sha1_file: allow prepare_alt_odb to handle arbitrary repositories sha1_file: allow link_alt_odb_entries to handle arbitrary repositories sha1_file: add repository argument to prepare_alt_odb sha1_file: add repository argument to link_alt_odb_entries sha1_file: add repository argument to read_info_alternates sha1_file: add repository argument to link_alt_odb_entry sha1_file: add raw_object_store argument to alt_odb_usable pack: move approximate object count to object store ...
| * sha1_file: add repository argument to sha1_file_nameStefan Beller2018-03-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a repository argument to allow sha1_file_name callers to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. While at it, move the declaration to object-store.h, where it should be easier to find. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * object-store: move packed_git and packed_git_mru to object storeStefan Beller2018-03-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a process with multiple repositories open, packfile accessors should be associated to a single repository and not shared globally. Move packed_git and packed_git_mru into the_repository and adjust callers to reflect this. [nd: while at there, wrap access to these two fields in get_packed_git() and get_packed_git_mru(). This allows us to lazily initialize these fields without caller doing that explicitly] Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | http-walker: convert struct object_request to use struct object_idbrian m. carlson2018-03-141-8/+8
|/ | | | | | | | | | | | | | | | | | | | | Convert struct object_request to use struct object_id by updating the definition and applying the following semantic patch, plus the standard object_id transforms: @@ struct object_request E1; @@ - E1.sha1 + E1.oid.hash @@ struct object_request *E1; @@ - E1->sha1 + E1->oid.hash Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sha1_file: remove static strbuf from sha1_file_name()Christian Couder2018-01-171-2/+4
| | | | | | | | | | | | | | Using a static buffer in sha1_file_name() is error prone and the performance improvements it gives are not needed in many of the callers. So let's get rid of this static buffer and, if necessary or helpful, let's use one in the caller. Suggested-by: Jeff Hostetler <git@jeffhostetler.com> Helped-by: Kevin Daudt <me@ikke.info> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pack: move find_sha1_pack()Jonathan Tan2017-08-241-0/+1
| | | | | Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'jk/http-walker-buffer-underflow-fix'Junio C Hamano2017-03-171-4/+7
|\ | | | | | | | | | | | | | | "Dumb http" transport used to misparse a nonsense http-alternates response, which has been fixed. * jk/http-walker-buffer-underflow-fix: http-walker: fix buffer underflow processing remote alternates
| * http-walker: fix buffer underflow processing remote alternatesJeff King2017-03-131-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we parse a remote alternates (or http-alternates), we expect relative lines like: ../../foo.git/objects which we convert into "$URL/../foo.git/" (and then use that as a base for fetching more objects). But if the remote feeds us nonsense like just: ../ we will try to blindly strip the last 7 characters, assuming they contain the string "objects". Since we don't _have_ 7 characters at all, this results in feeding a small negative value to strbuf_add(), which converts it to a size_t, resulting in a big positive value. This should consistently fail (since we can't generall allocate the max size_t minus 7 bytes), so there shouldn't be any security implications. Let's fix this by using strbuf_strip_suffix() to drop the characters we want. If they're not present, we'll ignore the alternate (in theory we could use it as-is, but the rest of the http-walker code unconditionally tacks "objects/" back on, so it is it not prepared to handle such a case). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | http: release strbuf on disabled alternatesEric Wong2017-03-061-0/+2
| | | | | | | | | | | | | | | | | | | | This likely has no real-world impact on memory usage, but it is cleaner for future readers. Fixes: abcbdc03895f ("http: respect protocol.*.allow=user for http-alternates") Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | http: inform about alternates-as-redirects behaviorEric Wong2017-03-061-3/+5
|/ | | | | | | | | | | | | | It is disconcerting for users to not notice the behavior change in handling alternates from commit cb4d2d35c4622ec2 ("http: treat http-alternates like redirects") Give the user a hint about the config option so they can see the URL and decide whether or not they want to enable http.followRedirects in their config. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* http: respect protocol.*.allow=user for http-alternatesJeff King2016-12-151-11/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The http-walker may fetch the http-alternates (or alternates) file from a remote in order to find more objects. This should count as a "not from the user" use of the protocol. But because we implement the redirection ourselves and feed the new URL to curl, it will use the CURLOPT_PROTOCOLS rules, not the more restrictive CURLOPT_REDIR_PROTOCOLS. The ideal solution would be for each curl request we make to know whether or not is directly from the user or part of an alternates redirect, and then set CURLOPT_PROTOCOLS as appropriate. However, that would require plumbing that information through all of the various layers of the http code. Instead, let's check the protocol at the source: when we are parsing the remote http-alternates file. The only downside is that if there's any mismatch between what protocol we think it is versus what curl thinks it is, it could violate the policy. To address this, we'll make the parsing err on the picky side, and only allow protocols that it can parse definitively. So for example, you can't elude the "http" policy by asking for "HTTP://", even though curl might handle it; we would reject it as unknown. The only unsafe case would be if you have a URL that starts with "http://" but curl interprets as another protocol. That seems like an unlikely failure mode (and we are still protected by our base CURLOPT_PROTOCOL setting, so the worst you could do is trigger one of https, ftp, or ftps). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* http-walker: complain about non-404 loose object errorsJeff King2016-12-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 17966c0a6 (http: avoid disconnecting on 404s for loose objects, 2016-07-11), we turn off curl's FAILONERROR option and instead manually deal with failing HTTP codes. However, the logic to do so only recognizes HTTP 404 as a failure. This is probably the most common result, but if we were to get another code, the curl result remains CURLE_OK, and we treat it as success. We still end up detecting the failure when we try to zlib-inflate the object (which will fail), but instead of reporting the HTTP error, we just claim that the object is corrupt. Instead, let's catch anything in the 300's or above as an error (300's are redirects which are not an error at the HTTP level, but are an indication that we've explicitly disabled redirects, so we should treat them as such; we certainly don't have the resulting object content). Note that we also fill in req->errorstr, which we didn't do before. Without FAILONERROR, curl will not have filled this in, and it will remain a blank string. This never mattered for the 404 case, because in the logic below we hit the "missing_target()" branch and print nothing. But for other errors, we'd want to say _something_, if only to fill in the blank slot in the error message. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ew/http-walker' into jk/http-walker-limit-redirectJunio C Hamano2016-12-061-29/+26
|\ | | | | | | | | | | | | | | * ew/http-walker: list: avoid incompatibility with *BSD sys/queue.h http-walker: reduce O(n) ops with doubly-linked list http: avoid disconnecting on 404s for loose objects http-walker: remove unused parameter from fetch_object
| * http-walker: reduce O(n) ops with doubly-linked listEric Wong2016-07-131-27/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the a Linux-kernel-derived doubly-linked list implementation from the Userspace RCU library allows us to enqueue and delete items from the object request queue in constant time. This change reduces enqueue times in the prefetch() function where object request queue could grow to several thousand objects. I left out the list_for_each_entry* family macros from list.h which relied on the __typeof__ operator as we support platforms without it. Thus, list_entry (aka "container_of") needs to be called explicitly inside macro-wrapped for loops. The downside is this costs us an additional pointer per object request, but this is offset by reduced overhead on queue operations leading to improved performance and shorter queue depths. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * http: avoid disconnecting on 404s for loose objectsEric Wong2016-07-131-0/+9
| | | | | | | | | | | | | | | | | | 404s are common when fetching loose objects on static HTTP servers, and reestablishing a connection for every single 404 adds additional latency. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * http-walker: remove unused parameter from fetch_objectEric Wong2016-07-131-2/+2
| | | | | | | | | | | | | | | | This parameter has not been used since commit 1d389ab65dc6 ("Add support for parallel HTTP transfers") back in 2005 Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>