summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mailinfo: libifyJunio C Hamano2015-10-224-1048/+1062
| | | | | | | | | | Move the bulk of the code from builtin/mailinfo.c to mailinfo.c so that new callers can start calling mailinfo() directly. Note that a few calls to exit() and die() need to be cleaned up for the API to be truly useful, which will come in later steps. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: keep the parsed log message in a strbufJunio C Hamano2015-10-221-12/+16
| | | | | | | | | | | | | When mailinfo() is eventually libified, the calling "git am" still will have to write out the log message in the "msg" file for hooks and other users of the information, but it does not have to reopen and reread what it wrote earlier if the function kept it in a strbuf. This also removes the need for seeking and truncating the output file when we see a scissors mark in the input, which in turn allows us to lose two callsites of die_errno(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: handle_commit_msg() shouldn't be called after finding patchbreakJunio C Hamano2015-10-221-2/+1
| | | | | | | | | | | | | | | | | | | There is a strange "if (!mi->cmitmsg) return 0" at the very beginning of handle_commit_msg(), but the condition should never trigger, because: * The only place cmitmsg is set to NULL is after this function sees a patch break, closes the FILE * to write the commit log message and returns 1. This function returns non-zero only from that codepath. * The caller of this function, upon seeing a non-zero return, increments filter_stage, starts treating the input as patch text and will never call handle_commit_msg() again. Replace it with an assert(!mi->filter_stage) to ensure the above observation will stay to be true. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move content/content_top to struct mailinfoJunio C Hamano2015-10-221-20/+26
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move [ps]_hdr_data to struct mailinfoJunio C Hamano2015-10-221-14/+23
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move cmitmsg and patchfile to struct mailinfoJunio C Hamano2015-10-221-16/+16
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move charset to struct mailinfoJunio C Hamano2015-10-221-7/+8
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move transfer_encoding to struct mailinfoJunio C Hamano2015-10-221-13/+14
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move check for metainfo_charset to convert_to_utf8()Junio C Hamano2015-10-221-5/+3
| | | | | | | | | All callers of this function refrain from calling it when mi->metainfo_charset is NULL; move the check to the callee, as it already has a few conditions at its beginning to turn it into a no-op. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move metainfo_charset to struct mailinfoJunio C Hamano2015-10-221-19/+19
| | | | | | | This requires us to pass the struct down to decode_header() and convert_to_utf8() callchain. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move use_scissors and use_inbody_headers to struct mailinfoJunio C Hamano2015-10-221-10/+13
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move add_message_id and message_id to struct mailinfoJunio C Hamano2015-10-221-14/+17
| | | | | | This requires us to pass the structure into check_header() codepath. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move patch_lines to struct mailinfoJunio C Hamano2015-10-221-5/+5
| | | | | | | This one is trivial thanks to previous steps that started passing the structure throughout the input codepaths. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move filter/header stage to struct mailinfoJunio C Hamano2015-10-221-20/+21
| | | | | | | | | | Earlier we got rid of two function-scope static variables that kept track of the states of helper functions by making them extra arguments that are passed throughout the callchain. Now we have a convenient place to store and pass them around in the form of "struct mailinfo", change them into two fields in the struct. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move global "FILE *fin, *fout" to struct mailinfoJunio C Hamano2015-10-221-26/+28
| | | | | | | | | This requires us to pass "struct mailinfo" to more functions throughout the codepath that read input lines. Incidentally, later steps are helped by this patch passing the struct to more callchains. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move keep_subject & keep_non_patch_bracket to struct mailinfoJunio C Hamano2015-10-221-8/+8
| | | | | | | These two are the only easy ones that do not require passing the structure around to deep corners of the callchain. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: introduce "struct mailinfo" to hold globalsJunio C Hamano2015-10-221-24/+47
| | | | | | | | | In this first step, move only 'email' and 'name' fields in there and remove the corresponding globals. In subsequent patches, more globals will be moved to this and the structure will be passed around as a new parameter to more functions. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move global "line" into mailinfo() functionJunio C Hamano2015-10-221-2/+3
| | | | | | | | With the previous steps, it becomes clear that the mailinfo() function is the only one that wants the "line" to be directly touchable. Move it to the function scope of this function. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: do not let find_boundary() touch global "line" directlyJunio C Hamano2015-10-221-5/+5
| | | | | | | | | | | | | | With the previous two commits, we established that the local variable "line" in handle_body() and handle_boundary() functions always refer to the global "line" that is used as the common and shared "current line from the input". They are the only callers of the last function that refers to the global line directly, i.e. find_boundary(). Pass "line" as a parameter to this leaf function to complete the clean-up. Now the only function that directly refers to the global "line" is the caller of handle_body() at the very beginning of this whole callchain. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: do not let handle_boundary() touch global "line" directlyJunio C Hamano2015-10-221-8/+8
| | | | | | | | | | | | | | | | | | This function has a single caller, and called with the global "line" holding the multi-part boundary line the caller saw while processing the e-mail body. The function then goes into a loop to process each line of the input, and fills the same global "line" variable from the input as it needs to read more lines to process the multi-part headers. Let the caller explicitly pass a pointer to this global "line" variable as an argument, and have the function itself use that strbuf throughout, instead of referring to the global "line" itself. There still is a helper function that this function calls that still touches the global directly; it will be updated as the series progresses. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: do not let handle_body() touch global "line" directlyJunio C Hamano2015-10-221-8/+8
| | | | | | | | | | | | | | | | | | This function has a single caller, and called with the global "line" holding the first line of the e-mail body after the caller finished processing the e-mail headers. The function then goes into a loop to process each line of the input, starting from what was given by its caller, and fills the same global "line" variable from the input as it needs to process more lines. Let the caller explicitly pass a pointer to this global "line" variable as an argument, and have the function itself use that strbuf throughout, instead of referring to the global "line" itself. There are helper functions that this function calls that still touch the global directly; they will be updated as the series progresses. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: get rid of function-local static statesJunio C Hamano2015-10-221-22/+19
| | | | | | | | | | Two helper functions use "static int" in their scope to keep track of the state while repeatedly getting called once for each input line. Move these state variables to their ultimate caller and pass down pointers to them along the callchain, as a small step in preparation for making this entire callchain more reentrant. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move definition of MAX_HDR_PARSED closer to its useJunio C Hamano2015-10-221-1/+1
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move cleanup_space() before its usersJunio C Hamano2015-10-221-14/+11
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move check_header() after the helpers it usesJunio C Hamano2015-10-221-68/+67
| | | | | | This way, we can lose a forward decl for decode_header(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move read_one_header_line() closer to its callersJunio C Hamano2015-10-221-68/+68
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: move handle_boundary() lowerJunio C Hamano2015-10-221-58/+56
| | | | | | | | This function wants to call find_boundary() and is called only from one place without any recursing, so it becomes easier to read if it appears after the called function. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: plug strbuf leak during continuation line handlingJunio C Hamano2015-10-221-1/+3
| | | | | | | | Whether this loop is left via EOF/break or upon finding a non-continuation line, the storage used for the contination line handling is left behind. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: explicitly close file handle to the patch outputJunio C Hamano2015-10-191-0/+2
| | | | | | | | | | | This does not make a difference within the context of "git mailinfo" that runs once and exits, as flushing and closing would happen upon process termination. It however will matter when we eventually make it callable as an API function. Besides, cleaning after yourself once you are done is a good hygiene. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: fix an off-by-one error in the boundary stackJunio C Hamano2015-10-191-1/+1
| | | | | | | | We pre-increment the pointer that we will use to store something at, so the pointer is already beyond the end of the array if it points at content[MAX_BOUNDARIES]. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: fold decode_header_bq() into decode_header()Junio C Hamano2015-10-191-16/+7
| | | | | | | | | In olden days we might have wanted to behave differently in decode_header() if the header line was encoded with RFC2047, but we apparently do not do so, hence this helper function can go, together with its return value. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mailinfo: remove a no-op call convert_to_utf8(it, "")Junio C Hamano2015-10-191-5/+0
| | | | | | | | The called function checks if the second parameter is either a NULL or an empty string at the very beginning and returns without doing anything. Remove the useless call. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Git 2.6.1v2.6.1Junio C Hamano2015-09-294-3/+22
| | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Sync with v2.5.4Junio C Hamano2015-09-2929-30/+486
|\
| * Git 2.5.4v2.5.4Junio C Hamano2015-09-294-3/+22
| | | | | | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * Sync with 2.4.10Junio C Hamano2015-09-2928-29/+466
| |\
| | * Git 2.4.10v2.4.10Junio C Hamano2015-09-294-3/+22
| | | | | | | | | | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | * Sync with 2.3.10Junio C Hamano2015-09-2927-28/+446
| | |\
| | | * Git 2.3.10v2.3.10Junio C Hamano2015-09-294-3/+22
| | | | | | | | | | | | | | | | Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | * Merge branch 'jk/xdiff-memory-limits' into maint-2.3Junio C Hamano2015-09-2811-26/+57
| | | |\
| | | | * merge-file: enforce MAX_XDIFF_SIZE on incoming filesJeff King2015-09-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous commit enforces MAX_XDIFF_SIZE at the interfaces to xdiff: xdi_diff (which calls xdl_diff) and ll_xdl_merge (which calls xdl_merge). But we have another direct call to xdl_merge in merge-file.c. If it were written today, this probably would just use the ll_merge machinery. But it predates that code, and uses slightly different options to xdl_merge (e.g., ZEALOUS_ALNUM). We could try to abstract out an xdi_merge to match the existing xdi_diff, but even that is difficult. Rather than simply report error, we try to treat large files as binary, and that distinction would happen outside of xdi_merge. The simplest fix is to just replicate the MAX_XDIFF_SIZE check in merge-file.c. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | | * xdiff: reject files larger than ~1GBJeff King2015-09-283-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xdiff code is not prepared to handle extremely large files. It uses "int" in many places, which can overflow if we have a very large number of lines or even bytes in our input files. This can cause us to produce incorrect diffs, with no indication that the output is wrong. Or worse, we may even underallocate a buffer whose size is the result of an overflowing addition. We're much better off to tell the user that we cannot diff or merge such a large file. This patch covers both cases, but in slightly different ways: 1. For merging, we notice the large file and cleanly fall back to a binary merge (which is effectively "we cannot merge this"). 2. For diffing, we make the binary/text distinction much earlier, and in many different places. For this case, we'll use the xdi_diff as our choke point, and reject any diff there before it hits the xdiff code. This means in most cases we'll die() immediately after. That's not ideal, but in practice we shouldn't generally hit this code path unless the user is trying to do something tricky. We already consider files larger than core.bigfilethreshold to be binary, so this code would only kick in when that is circumvented (either by bumping that value, or by using a .gitattribute to mark a file as diffable). In other words, we can avoid being "nice" here, because there is already nice code that tries to do the right thing. We are adding the suspenders to the nice code's belt, so notice when it has been worked around (both to protect the user from malicious inputs, and because it is better to die() than generate bogus output). The maximum size was chosen after experimenting with feeding large files to the xdiff code. It's just under a gigabyte, which leaves room for two obvious cases: - a diff3 merge conflict result on files of maximum size X could be 3*X plus the size of the markers, which would still be only about 3G, which fits in a 32-bit int. - some of the diff code allocates arrays of one int per record. Even if each file consists only of blank lines, then a file smaller than 1G will have fewer than 1G records, and therefore the int array will fit in 4G. Since the limit is arbitrary anyway, I chose to go under a gigabyte, to leave a safety margin (e.g., we would not want to overflow by allocating "(records + 1) * sizeof(int)" or similar. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | | * react to errors in xdi_diffJeff King2015-09-287-24/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we call into xdiff to perform a diff, we generally lose the return code completely. Typically by ignoring the return of our xdi_diff wrapper, but sometimes we even propagate that return value up and then ignore it later. This can lead to us silently producing incorrect diffs (e.g., "git log" might produce no output at all, not even a diff header, for a content-level diff). In practice this does not happen very often, because the typical reason for xdiff to report failure is that it malloc() failed (it uses straight malloc, and not our xmalloc wrapper). But it could also happen when xdiff triggers one our callbacks, which returns an error (e.g., outf() in builtin/rerere.c tries to report a write failure in this way). And the next patch also plans to add more failure modes. Let's notice an error return from xdiff and react appropriately. In most of the diff.c code, we can simply die(), which matches the surrounding code (e.g., that is what we do if we fail to load a file for diffing in the first place). This is not that elegant, but we are probably better off dying to let the user know there was a problem, rather than simply generating bogus output. We could also just die() directly in xdi_diff, but the callers typically have a bit more context, and can provide a better message (and if we do later decide to pass errors up, we're one step closer to doing so). There is one interesting case, which is in diff_grep(). Here if we cannot generate the diff, there is nothing to match, and we silently return "no hits". This is actually what the existing code does already, but we make it a little more explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | * | Merge branch 'jk/transfer-limit-redirection' into maint-2.3Junio C Hamano2015-09-286-15/+78
| | | |\ \
| | | | * | http: limit redirection depthBlake Burkhart2015-09-263-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default, libcurl will follow circular http redirects forever. Let's put a cap on this so that somebody who can trigger an automated fetch of an arbitrary repository (e.g., for CI) cannot convince git to loop infinitely. The value chosen is 20, which is the same default that Firefox uses. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | | * | http: limit redirection to protocol-whitelistBlake Burkhart2015-09-264-5/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, libcurl would follow redirection to any protocol it was compiled for support with. This is desirable to allow redirection from HTTP to HTTPS. However, it would even successfully allow redirection from HTTP to SFTP, a protocol that git does not otherwise support at all. Furthermore git's new protocol-whitelisting could be bypassed by following a redirect within the remote helper, as it was only enforced at transport selection time. This patch limits redirects within libcurl to HTTP, HTTPS, FTP and FTPS. If there is a protocol-whitelist present, this list is limited to those also allowed by the whitelist. As redirection happens from within libcurl, it is impossible for an HTTP redirect to a protocol implemented within another remote helper. When the curl version git was compiled with is too old to support restrictions on protocol redirection, we warn the user if GIT_ALLOW_PROTOCOL restrictions were requested. This is a little inaccurate, as even without that variable in the environment, we would still restrict SFTP, etc, and we do not warn in that case. But anything else means we would literally warn every time git accesses an http remote. This commit includes a test, but it is not as robust as we would hope. It redirects an http request to ftp, and checks that curl complained about the protocol, which means that we are relying on curl's specific error message to know what happened. Ideally we would redirect to a working ftp server and confirm that we can clone without protocol restrictions, and not with them. But we do not have a portable way of providing an ftp server, nor any other protocol that curl supports (https is the closest, but we would have to deal with certificates). [jk: added test and version warning] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | | * | transport: refactor protocol whitelist codeJeff King2015-09-262-10/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current callers only want to die when their transport is prohibited. But future callers want to query the mechanism without dying. Let's break out a few query functions, and also save the results in a static list so we don't have to re-parse for each query. Based-on-a-patch-by: Blake Burkhart <bburky@bburky.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | * | | Merge branch 'jk/transfer-limit-protocol' into maint-2.3Junio C Hamano2015-09-2813-1/+306
| | | |\| | | | | | |/ | | | |/|
| | | | * submodule: allow only certain protocols for submodule fetchesJeff King2015-09-232-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some protocols (like git-remote-ext) can execute arbitrary code found in the URL. The URLs that submodules use may come from arbitrary sources (e.g., .gitmodules files in a remote repository). Let's restrict submodules to fetching from a known-good subset of protocols. Note that we apply this restriction to all submodule commands, whether the URL comes from .gitmodules or not. This is more restrictive than we need to be; for example, in the tests we run: git submodule add ext::... which should be trusted, as the URL comes directly from the command line provided by the user. But doing it this way is simpler, and makes it much less likely that we would miss a case. And since such protocols should be an exception (especially because nobody who clones from them will be able to update the submodules!), it's not likely to inconvenience anyone in practice. Reported-by: Blake Burkhart <bburky@bburky.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| | | | * transport: add a protocol-whitelist environment variableJeff King2015-09-2311-1/+254
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are cloning an untrusted remote repository into a sandbox, we may also want to fetch remote submodules in order to get the complete view as intended by the other side. However, that opens us up to attacks where a malicious user gets us to clone something they would not otherwise have access to (this is not necessarily a problem by itself, but we may then act on the cloned contents in a way that exposes them to the attacker). Ideally such a setup would sandbox git entirely away from high-value items, but this is not always practical or easy to set up (e.g., OS network controls may block multiple protocols, and we would want to enable some but not others). We can help this case by providing a way to restrict particular protocols. We use a whitelist in the environment. This is more annoying to set up than a blacklist, but defaults to safety if the set of protocols git supports grows). If no whitelist is specified, we continue to default to allowing all protocols (this is an "unsafe" default, but since the minority of users will want this sandboxing effect, it is the only sensible one). A note on the tests: ideally these would all be in a single test file, but the git-daemon and httpd test infrastructure is an all-or-nothing proposition rather than a test-by-test prerequisite. By putting them all together, we would be unable to test the file-local code on machines without apache. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>