git - git

	Commit message (Collapse)	Author	Files	Lines
2022-07-20	The fifth batch	Junio C Hamano	1	-0/+24
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19	transport.c: avoid "whitelist"	Derrick Stolee	1	-4/+4
	The word "whitelist" has cultural implications that are not inclusive. Thankfully, it is not difficult to reword and avoid its use. The GIT_ALLOW_PROTOCOL environment variable was referred to as a "whitelist", but the word "allow" is already part of the variable. Replace "whitelist" with "allow_list" in these cases to demonstrate that we are processing a list of allowed protocols. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19	t: avoid "whitelist"	Derrick Stolee	4	-8/+7
	The word "whitelist" has cultural implications that are not inclusive. Thankfully, it is not difficult to reword and avoid its use. Focus on changes in the test scripts, since most of the changes are in comments and test names. The renamed test_allow_var helper is only used once inside the widely-used test_proto helper. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19	git.txt: remove redundant language	Derrick Stolee	1	-3/+1
	The documentation for GIT_ALLOW_PROTOCOL has a sentence that adds no value, since it repeats the meaning from the previous sentence (twice!). The word "whitelist" has cultural implications that are not inclusive, which brought attention to this sentence. Helped-by: Jeff King <peff@peff.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19	git-cvsserver: clarify directory list	Derrick Stolee	3	-11/+12
	The documentation and error messages for git-cvsserver include some references to a "whitelist" that is not otherwise included in the documentation. When different parts of the documentation do not use common language, this can lead to confusion as to how things are meant to operate. Further, the word "whitelist" has cultural implications that make its use non-inclusive. Thankfully, we can remove it while increasing clarity. Update Documentation/git-cvsserver.txt in a similar way to the previous change to Documentation/git-daemon.txt. The optional '<directory>...' list can specify a list of allowed directories. We refer to that list directly inside of the documentation for the GIT_CVSSERVER_ROOT environment variable. While modifying this documentation, update the environment variables to use a list format. We use the modern way of tabbing the description of each variable in this section. We do _not_ update the description of '<directory>...' to use tabs this way since the rest of the items in the OPTIONS list do not use this modern formatting. A single error message in the actual git-cvsserver.perl code refers to the whitelist during argument parsing. Instead, refer to the directory list that has been clarified in the documentation. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19	daemon: clarify directory arguments	Derrick Stolee	2	-14/+15
	The undecorated arguments to the 'git-daemon' command provide a list of directories. When at least one directory is specified, then 'git-daemon' only serves requests that are within that directory list. The boolean '--strict-paths' option makes the list more explicit in that subdirectories are no longer included. The existing documentation and error messages around this directory list refer to it and its behavior as a "whitelist". The word "whitelist" has cultural implications that are not inclusive. Thankfully, it is not difficult to reword and avoid its use. In the process, we can define the purpose of this directory list directly. In Documentation/git-daemon.txt, rewrite the OPTIONS section around the '<directory>' option. Add additional clarity to the other options that refer to these directories. Some error messages can also be improved in daemon.c. The '--strict-paths' option requires '<directory>' arguments, so refer to that section of the documentation directly. A logerror() call points out that a requested directory is not in the specified directory list. We can use "list" here without any loss of information. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	The fourth batch	Junio C Hamano	1	-0/+34
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: continue looping when first MIDX bitmap is found	Teng Long	1	-2/+3
	In "open_midx_bitmap()", we do a loop with the MIDX(es) in repo, when the first one has been found, then will break out by a "return" directly. But actually, it's better to continue the loop until we have visited both the MIDX in our repository, as well as any alternates (along with _their_ alternates, recursively). The reason for this is, there may exist more than one MIDX file in a repo. The "multi_pack_index" struct is actually designed as a singly linked list, and if a MIDX file has been already opened successfully, then the other MIDX files will be skipped and left with a warning "ignoring extra bitmap file." to the output. The discussion link of community: https://public-inbox.org/git/YjzCTLLDCby+kJrZ@nand.local/ Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: using error() instead of silently returning -1	Teng Long	1	-1/+5
	In "open_pack_bitmap_1()" and "open_midx_bitmap_1()", it's better to return error() instead of "-1" when some unexpected error occurs like "stat bitmap file failed", "bitmap header is invalid" or "checksum mismatch", etc. There are places where we do not replace, such as when the bitmap does not exist (no bitmap in repository is allowed) or when another bitmap has already been opened (in which case it should be a warning rather than an error). Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: do not ignore error when opening a bitmap file	Teng Long	1	-5/+12
	Calls to git_open() to open the pack bitmap file and multi-pack bitmap file do not report any error when they fail. These files are optional and it is not an error if open failed due to ENOENT, but we shouldn't be ignoring other kinds of errors. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: rename "idx_name" to "bitmap_name"	Teng Long	1	-7/+7
	In "open_pack_bitmap_1()" and "open_midx_bitmap_1()" we use a var named "idx_name" to represent the bitmap filename which is computed by "midx_bitmap_filename()" or "pack_bitmap_filename()" before we open it. There may bring some confusion in this "idx_name" naming, which might lead us to think of ".idx "or" multi-pack-index" files, although bitmap is essentially can be understood as a kind of index, let's define this name a little more accurate here. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: mark more strings for translations	Teng Long	1	-24/+24
	In pack-bitmap.c, some printed texts are translated, some are not. Let's support the translations of the bitmap related output. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	pack-bitmap.c: fix formatting of error messages	Teng Long	1	-23/+24
	There are some text output issues in 'pack-bitmap.c', they exist in die(), error() etc. This includes issues with capitalization the first letter, newlines, error() instead of BUG(), and substitution that don't have quotes around them. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	scalar: convert README.md into a technical design doc	Victoria Dye	2	-82/+127
	Adapt the content from 'contrib/scalar/README.md' into a design document in 'Documentation/technical/'. In addition to reformatting for asciidoc, elaborate on the background, purpose, and design choices that went into Scalar. Most of this document will persist in the 'Documentation/technical/' after Scalar has been moved out of 'contrib/' and into the root of Git. Until that time, it will also contain a temporary "Roadmap" section detailing the remaining series needed to finish the initial version of Scalar. The section will be removed once Scalar is moved to the repo root, but in the meantime serves as a guide for readers to keep up with progress on the feature. Signed-off-by: Victoria Dye <vdye@github.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	scalar: reword command documentation to clarify purpose	Victoria Dye	1	-5/+4
	Rephrase documentation to describe scalar as a "large repo management tool" rather than an "opinionated management tool". The new description is intended to more directly reflect the utility of scalar to better guide users in preparation for scalar being built and installed as part of Git. Signed-off-by: Victoria Dye <vdye@github.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	t4200: drop irrelevant code	Martin Ågren	1	-3/+0
	While setting up an unresolved merge for `git rerere`, we run `git rev-parse` and `git fmt-merge-msg` to create a variable `$fifth` and a commit-message file `msg`, which we then never actually use. This has been like that since these tests were added in 672d1b789b ("rerere: migrate to parse-options API", 2010-08-05). This does exercise `git rev-parse` and `git fmt-merge-msg`, but doesn't contribute to testing `git rerere`. Drop these lines. Reported-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	trace2: only include "fsync" events if we git_fsync()	Ævar Arnfjörð Bjarmason	3	-9/+42
	Fix the overly verbose trace2 logging added in 9a4987677d3 (trace2: add stats for fsync operations, 2022-03-30) (first released with v2.36.0). Since that change every single "git" command invocation has included these "data" events, even though we'll only make use of these with core.fsyncMethod=batch, and even then only have non-zero values if we're writing object data to disk. See c0f4752ed2f (core.fsyncmethod: batched disk flushes for loose-objects, 2022-04-04) for that feature. As we're needing to indent the trace2_data_intmax() lines let's introduce helper variables to ensure that our resulting lines (which were already too) don't exceed the recommendations of the CodingGuidelines. Doing that requires either wrapping them twice, or introducing short throwaway variable names, let's do the latter. The result was that e.g. "git version" would previously emit a total of 6 trace2 events with the GIT_TRACE2_EVENT target (version, start, cmd_ancestry, cmd_name, exit, atexit), but afterwards would emit 8. We'd emit 2 "data" events before the "exit" event. The reason we didn't catch this was that the trace2 unit tests added in a15860dca3f (trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh, 2019-02-22) would omit any "data" events that weren't the ones it cared about. Before this change to the C code 6/7 of our "t/t0212-trace2-event.sh" tests would fail if this change was applied to "t/t0212/parse_events.perl". Let's make the trace2 testing more strict, and further append any new events types we don't know about in "t/t0212/parse_events.perl". Since we only invoke the "test-tool trace2" there's no guarantee that we'll catch other overly verbose events in the future, but we'll at least notice if we start emitting new events that are issues every time we log anything with trace2's JSON target. We exclude the "data_json" event type, we'd otherwise would fail on both "win test" and "win+VS test" CI due to the logging added in 353d3d77f4f (trace2: collect Windows-specific process information, 2019-02-22). It looks like that logging should really be using trace2_cmd_ancestry() instead, which was introduced later in 2f732bf15e6 (tr2: log parent process name, 2021-07-21), but let's leave it for now. The fix-up to aaf81223f48 (unpack-objects: use stream_loose_object() to unpack large objects, 2022-06-11) is needed because we're changing the behavior of these events as discussed above. Since we'd always emit a "hardware-flush" event the test added in aaf81223f48 wasn't testing anything except that this trace2 data was unconditionally logged. Even if "core.fsyncMethod" wasn't set to "batch" we'd pass the test. Now we'll check the expected number of "writeout" v.s. "flush" calls under "core.fsyncMethod=batch", but note that this doesn't actually test if we carried out the sync using that method, on a platform where we'd have to fall back to fsync() each of those "writeout" would really be a "flush" (i.e. a full fsync()). But in this case what we're testing is that the logic in "unpack-objects" behaves as expected, not the OS-specific question of whether we actually were able to use the "bulk" method. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	config/core.txt: fix minor issues for `core.sparseCheckoutCone`	Martin Ågren	1	-2/+2
	The sparse checkout feature can be used in "cone mode" or "non-cone mode". In this one instance in the documentation, we refer to the latter as "non cone mode" with whitespace rather than a hyphen. Align this with the rest of our documentation. A few words later in the same paragraph, there's mention of "a more flexible patterns". Drop that leading "a" to fix the grammar. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18	index-format.txt: remove outdated list of supported extensions	SZEDER Gábor	1	-2/+0
	The first section of 'Documentation/technical/index-format.txt' mentions that "Git currently supports cache tree and resolve undo extensions", but then goes on, and in the "Extensions" section describes not only these two, but six other extensions [1]. Remove this sentence, as it's misleading about the status of all those other extensions. Alternatively we could keep that sentence and update the list of extensions, but that might well lead to a recurring issue, because apparently this list is never updated when a new index extension is added. [1] Split index, untracked cache, FS monitor cache, end of index entry, index entry offset table and sparse directory entries. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17	config.txt: document include, includeIf	Manuel Boni	3	-0/+21
	Git config's tab completion does not yet know about the "include" and "includeIf" sections, nor the related "path" variable. Add a description for these two sections in 'Documentation/config/includeif.txt', which points to git-config's documentation, specifically the "Includes" and "Conditional Includes" subsections. As a side effect, tab completion can successfully complete the 'include', 'includeIf', and 'include.add' expressions. This effect is tested by two new ad-hoc tests. Variable completion only works for "include" for now. Credit for the ideas behind this patch goes to Ævar Arnfjörð Bjarmason. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Manuel Boni <ziosombrero@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	mingw: avoid mktemp() in mkstemp() implementation	René Scharfe	1	-4/+1
	The implementation of mkstemp() for MinGW uses mktemp() and open() without the flag O_EXCL, which is racy. It's not a security problem for now because all of its callers only create files within the repository (incl. worktrees). Replace it with a call to our more secure internal function, git_mkstemp_mode(), to prevent possible future issues. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	commit-graph: pass repo_settings instead of repository	Taylor Blau	5	-10/+33
	The parse_commit_graph() function takes a 'struct repository ' pointer, but it only ever accesses config settings (either directly or through the .settings field of the repo struct). Move all relevant config settings into the repo_settings struct, and update parse_commit_graph() and its existing callers so that it takes 'struct repo_settings ' instead. Callers of parse_commit_graph() will now need to call prepare_repo_settings() themselves, or initialize a 'struct repo_settings' directly. Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more places, 2020-09-09), parsing a commit-graph was a pure function depending only on the contents of the commit-graph itself. Commit ab14d0676c introduced a dependency on a `struct repository` pointer, and later commits such as b66d84756f (commit-graph: respect 'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config settings, which were accessed through the `settings` field of the repository pointer. This field was initialized via a call to `prepare_repo_settings()`. Additionally, this fixes an issue in fuzz-commit-graph: In 44c7e62 (2021-12-06, repo-settings:prepare_repo_settings only in git repos), prepare_repo_settings was changed to issue a BUG() if it is called by a process whose CWD is not a Git repository. The combination of commits mentioned above broke fuzz-commit-graph, which attempts to parse arbitrary fuzzing-engine-provided bytes as a commit graph file. Prior to this change, parse_commit_graph() called prepare_repo_settings(), but since we run the fuzz tests without a valid repository, we are hitting the BUG() from 44c7e62 for every test case. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	setup.c: create `safe.bareRepository`	Glen Choo	3	-1/+129
	There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file. A user could run a Git command inside the bare repository thinking that the config file of the 'outer' repository would be used, but in reality, the bare repository's config file (which is attacker-controlled) is used, which may result in arbitrary code execution. See [1] for a fuller description and deeper discussion. A simple mitigation is to forbid bare repositories unless specified via `--git-dir` or `GIT_DIR`. In environments that don't use bare repositories, this would be minimally disruptive. Create a config variable, `safe.bareRepository`, that tells Git whether or not to die() when working with a bare repository. This config is an enum of: - "all": allow all bare repositories (this is the default) - "explicit": only allow bare repositories specified via --git-dir or GIT_DIR. If we want to protect users from such attacks by default, neither value will suffice - "all" provides no protection, but "explicit" is impractical for bare repository users. A more usable default would be to allow only non-embedded bare repositories ([2] contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series. [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com [2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	safe.directory: use git_protected_config()	Glen Choo	3	-18/+14
	Use git_protected_config() to read `safe.directory` instead of read_very_early_config(), making it 'protected configuration only'. As a result, `safe.directory` now respects "-c", so update the tests and docs accordingly. It used to ignore "-c" due to how it was implemented, not because of security or correctness concerns [1]. [1] https://lore.kernel.org/git/xmqqlevabcsu.fsf@gitster.g/ Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	config: learn `git_protected_config()`	Glen Choo	4	-11/+82
	`uploadpack.packObjectsHook` is the only 'protected configuration only' variable today, but we've noted that `safe.directory` and the upcoming `safe.bareRepository` should also be 'protected configuration only'. So, for consistency, we'd like to have a single implementation for protected configuration. The primary constraints are: 1. Reading from protected configuration should be fast. Nearly all "git" commands inside a bare repository will read both `safe.directory` and `safe.bareRepository`, so we cannot afford to be slow. 2. Protected configuration must be readable when the gitdir is not known. `safe.directory` and `safe.bareRepository` both affect repository discovery and the gitdir is not known at that point [1]. The chosen implementation in this commit is to read protected configuration and cache the values in a global configset. This is similar to the caching behavior we get with the_repository->config. Introduce git_protected_config(), which reads protected configuration and caches them in the global configset protected_config. Then, refactor `uploadpack.packObjectsHook` to use git_protected_config(). The protected configuration functions are named similarly to their non-protected counterparts, e.g. git_protected_config_check_init() vs git_config_check_init(). In light of constraint 1, this implementation can still be improved. git_protected_config() iterates through every variable in protected_config, which is wasteful, but it makes the conversion simple because it matches existing patterns. We will likely implement constant time lookup functions for protected configuration in a future series (such functions already exist for non-protected configuration, i.e. repo_config_get_*()). An alternative that avoids introducing another configset is to continue to read all config using git_config(), but only accept values that have the correct config scope [2]. This technically fulfills constraint 2, because git_config() simply ignores the local and worktree config when the gitdir is not known. However, this would read incomplete config into the_repository->config, which would need to be reset when the gitdir is known and git_config() needs to read the local and worktree config. Resetting the_repository->config might be reasonable while we only have these 'protected configuration only' variables, but it's not clear whether this extends well to future variables. [1] In this case, we do have a candidate gitdir though, so with a little refactoring, it might be possible to provide a gitdir. [2] This is how `uploadpack.packObjectsHook` was implemented prior to this commit. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	Documentation: define protected configuration	Glen Choo	2	-3/+16
	For security reasons, there are config variables that are only trusted when they are specified in certain configuration scopes, which are sometimes referred to on-list as 'protected configuration' [1]. A future commit will introduce another such variable, so let's define our terms so that we can have consistent documentation and implementation. In our documentation, define 'protected configuration' as the system, global and command config scopes. As a shorthand, I will refer to variables that are only respected in protected configuration as 'protected configuration only', but this term is not used in the documentation. This definition of protected configuration is based on whether or not Git can reasonably protect the user by ignoring the configuration scope: - System, global and command line config are considered protected because an attacker who has control over any of those can do plenty of harm without Git, so we gain very little by ignoring those scopes. - On the other hand, local (and similarly, worktree) config are not considered protected because it is relatively easy for an attacker to control local config, e.g.: - On some shared user environments, a non-admin attacker can create a repository high up the directory hierarchy (e.g. C:\.git on Windows), and a user may accidentally use it when their PS1 automatically invokes "git" commands. `safe.directory` prevents attacks of this form by making sure that the user intended to use the shared repository. It obviously shouldn't be read from the repository, because that would end up trusting the repository that Git was supposed to reject. - "git upload-pack" is expected to run in repositories that may not be controlled by the user. We cannot ignore all config in that repository (because "git upload-pack" would fail), but we can limit the risks by ignoring `uploadpack.packObjectsHook`. Only `uploadpack.packObjectsHook` is 'protected configuration only'. The following variables are intentionally excluded: - `safe.directory` should be 'protected configuration only', but it does not technically fit the definition because it is not respected in the "command" scope. A future commit will fix this. - `trace2.` happens to read the same scopes as `safe.directory` because they share an implementation. However, this is not for security reasons; it is because we want to start tracing so early that repository-level config and "-c" are not available [2]. This requirement is unique to `trace2.`, so it does not makes sense for protected configuration to be subject to the same constraints. [1] For example, https://lore.kernel.org/git/6af83767-576b-75c4-c778-0284344a8fe7@github.com/ [2] https://lore.kernel.org/git/a0c89d0d-669e-bf56-25d2-cbb09b012e70@jeffhostetler.com/ Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	Documentation/git-config.txt: add SCOPES section	Glen Choo	1	-23/+59
	In a subsequent commit, we will introduce "protected configuration", which is easiest to describe in terms of configuration scopes (i.e. it's the union of the 'system', 'global', and 'command' scopes). This description is fine for ML discussions, but it's inadequate for end users because we don't provide a good description of "configuration scopes" in the public docs. 145d59f482 (config: add '--show-scope' to print the scope of a config value, 2020-02-10) introduced the word "scope" to our public docs, but that only enumerates the scopes and assumes the user can figure out what those values mean. Add a SCOPES section to Documentation/git-config.txt that describes the configuration scopes, their corresponding CLI options, and mentions that some configuration options are only respected in certain scopes. Then, use the word "scope" to simplify the FILES section and change some confusing wording. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15	The third batch	Junio C Hamano	1	-2/+22
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14	shortlog: use a stable sort	Johannes Schindelin	1	-1/+1
	When sorting the output of `git shortlog` by count, a list of authors in alphabetical order is then sorted by contribution count. Obviously, the idea is to maintain the alphabetical order for items with identical contribution count. At the moment, this job is performed by `qsort()`. As that function is not guaranteed to implement a stable sort algorithm, this can lead to inconsistent and/or surprising behavior: items with identical contribution count could lose their alphabetical sub-order. The `qsort()` in MS Visual C's runtime does _not_ implement a stable sort algorithm, and under certain circumstances this even causes a test failure in t4201.21 "shortlog can match multiple groups", where two authors both are listed with 2 contributions, and are listed in inverse alphabetical order. Let's instead use the stable sort provided by `git_stable_qsort()` to avoid this inconsistency. This is a companion to 2049b8dc65 (diffcore_rename(): use a stable sort, 2019-09-30). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14	mergetool(vimdiff): allow paths to contain spaces again	Johannes Schindelin	1	-4/+35
	In 0041797449d (vimdiff: new implementation with layout support, 2022-03-30), we introduced a completely new implementation of the `vimdiff` backend for `git mergetool`. In this implementation, we no longer call `vim` directly but we accumulate in the variable `FINAL_CMD` an arbitrary number of commands for `vim` to execute, which necessitates the use of `eval` to split the commands properly into multiple command-line arguments. That same `eval` command also needs to pass the paths to `vim`, and while it looks as if they are quoted correctly, that quoting only reaches the `eval` instruction and is lost after that, therefore paths that contain whitespace characters (or other characters that are interpreted by the POSIX shell) are handled incorrectly. This is a simple reproducer: git init -b main bam-merge-fail cd bam-merge-fail echo a>"a file.txt" git add "a file.txt" git commit -m "added 'a file.txt'" echo b>"a file.txt" git add "a file.txt" git commit -m "diverged b 'a file.txt'" git checkout -b c HEAD~ echo c>"a file.txt" git add "a file.txt" git commit -m "diverged c 'a file.txt'" git checkout main git merge c git mergetool --tool=vimdiff With Git v2.37.0/v2.37.1, this will open 7 buffers, not four, and not display the correct contents at all. To fix this, let's not expand the variables containing the path parameters before passing them to the `eval` command, but let that command expand the variables instead. This fixes https://github.com/git-for-windows/git/issues/3945 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14	tests: fix incorrect --write-junit-xml code	Johannes Schindelin	1	-5/+5
	In 78d5e4cfb4b (tests: refactor --write-junit-xml code, 2022-05-21), this developer refactored the `--write-junit-xml` code a bit, including the part where the current test case's title was used in a `set` invocation, but failed to account for the fact that some test cases' titles start with a long option, which the `set` misinterprets as being intended for parsing. Let's fix this by using the `set -- <...>` form. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-13	The second batch	Junio C Hamano	1	-0/+35
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-12	t5330: remove run_with_limited_processses()	Han Xin	1	-24/+1
	run_with_limited_processses() is used to end the loop faster when an infinite loop happen. But "ulimit" is tied to the entire development station, and the test will fail due to too many other processes or using "--stress". Without run_with_limited_processses() the infinite loop can also be stopped due to global configrations or quotas, and the verification still works fine. So let's remove run_with_limited_processses(). Signed-off-by: Han Xin <hanxin.hx@bytedance.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-12	diff-files: move misplaced cleanup label	Jeff King	1	-1/+1
	Commit 0139c58ab9 (revisions API users: add "goto cleanup" for release_revisions(), 2022-04-13) converted an early return in cmd_diff_files() into a goto. But it put the cleanup label too early: if read_cache_preload() returns an error, we'll set result to "-1", but then jump to calling run_diff_files(), overwriting our result. We should jump past the call to run_diff_files(). Likewise, we should go past diff_result_code(), which is expecting to see a code from an actual diff, not a negative error code. In practice, I suspect this bug cannot actually be triggered, because read_cache_preload() does not seem to ever return an error. Its return value (eventually) comes from do_read_index(), which gives the number of cache entries found, and calls die() on error. Still, it makes sense to fix the inadvertent change from 0139c58ab9 first, and we can look into the overall error handling of read_cache() separately (which is present in many other callsites). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-12	fsck: do not dereference NULL while checking resolve-undo data	Junio C Hamano	1	-0/+1
	When we found an invalid object recorded in the resolve-undo data, we would have ended up dereferencing NULL while fsck. Reporting the problem and going on to the next object is the right thing to do here. Noticed by SZEDER Gábor. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-12	The first batch after Git 2.37	Junio C Hamano	1	-0/+31
	Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-11	ref-filter: disable save_commit_buffer while traversing	Jeff King	1	-0/+5
	Various ref-filter options like "--contains" or "--merged" may cause us to traverse large segments of the history graph. It's counter-productive to have save_commit_buffer turned on, as that will instruct the commit code to cache in-memory the object contents for each commit we traverse. This increases the amount of heap memory used while providing little or no benefit, since we're not actually planning to display those commits (which is the usual reason that tools like git-log want to keep them around). We can easily disable this feature while ref-filter is running. This lowers peak heap (as measured by massif) for running: git tag --contains 1da177e4c3 in linux.git from ~100MB to ~20MB. It also seems to improve runtime by 4-5% (600ms vs 630ms). A few points to note: - it should be safe to temporarily disable save_commit_buffer like this. The saved buffers are accessed through get_commit_buffer(), which treats the saved ones like a cache, and loads on-demand from the object database on a cache miss. So any code that was using this would not be wrong, it might just incur an extra object lookup for some objects. But... - I don't think any ref-filter related code is using the cache. While it's true that an option like "--format=%(contents:subject)" or "--sort=authordate" will need to look at the commit contents, ref-filter doesn't use get_commit_buffer() to do so! It always reads the objects directly via read_object_file(), though it does avoid re-reading objects if the format can be satisfied without them. Timing "git tag --format=%(*authordate)" shows that we're the same before and after, as expected. - Note that all of this assumes you don't have a commit-graph file. if you do, then the heap usage is even lower, and the runtime is 10x faster. So in that sense this is not urgent, as there's a much better solution. But since it's such an obvious and easy win for fallback cases (including commits which aren't yet in the graph file), there's no reason not to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-11	clone: move unborn head creation to update_head()	Jeff King	1	-12/+15
	Prior to 4f37d45706 (clone: respect remote unborn HEAD, 2021-02-05), creation of the local HEAD was always done in update_head(). That commit added code to handle an unborn head in an empty repository, and just did all symref creation and config setup there. This makes the code flow a little bit confusing, especially as new corner cases have been covered (like the previous commit to match our default branch name to a non-HEAD remote branch). Let's move the creation of the unborn symref into update_head(). This matches the other HEAD-creation cases, and now the logic is consistently separated: the main cmd_clone() function only examines the situation and sets variables based on what it finds, and update_head() actually performs the update. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-11	remote-curl: send Accept-Language header to server	Li Linchao	6	-8/+51
	Git server end's ability to accept Accept-Language header was introduced in f18604bbf2 (http: add Accept-Language header if possible, 2015-01-28), but this is only used by very early phase of the transfer, which is HTTP GET request to discover references. For other phases, like POST request in the smart HTTP, the server does not know what language the client speaks. Teach git client to learn end-user's preferred language and throw accept-language header to the server side. Once the server gets this header, it has the ability to talk to end-user with language they understand. This would be very helpful for many non-English speakers. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Li Linchao <lilinchao@oschina.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-11	gpg-interface: add function for converting trust level to string	Jaydeep Das	3	-23/+31
	Add new helper function `gpg_trust_level_to_str()` which will convert a given member of `enum signature_trust_level` to its corresponding string (in lowercase). For example, `TRUST_ULTIMATE` will yield the string "ultimate". This will abstract out some code in `pretty.c` relating to gpg signature trust levels. Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Hariom Verma <hariom18599@gmail.com> Signed-off-by: Jaydeep Das <jaydeepjd.8914@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-10	multi-pack-index: simplify handling of unknown --options	SZEDER Gábor	1	-4/+4
	Although parse_options() can handle unknown --options just fine, none of 'git multi-pack-index's subcommands rely on it, but do it on their own: they invoke parse_options() with the PARSE_OPT_KEEP_UNKNOWN flag, then check whether there are any unparsed arguments left, and print usage and quit if necessary. Drop that PARSE_OPT_KEEP_UNKNOWN flag to let parse_options() handle unknown options instead, which has the additional benefit that it prints not only the usage but an "error: unknown option `foo'" message as well. Do leave the unparsed arguments check to catch any unexpected non-option arguments, though, e.g. 'git multi-pack-index write foo'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-10	cocci: avoid normalization rules for memcpy	René Scharfe	1	-42/+40
	Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are intended to reduce the number of sizeof variants to deal with. They can have unintended side effects if only they match, but not the one for the COPY_ARRAY conversion at the end. Avoid these side effects by instead using a self-contained rule for each combination of array and pointer for source and destination which lists all sizeof variants inline. This lets "make contrib/coccinelle/array.cocci.patch" take 15% longer on my machine, but gives peace of mind that no incomplete transformation will be generated. Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-10	sha256: add support for Nettle	brian m. carlson	3	-1/+44
	For SHA-256, we currently have support for OpenSSL and libgcrypt because these two libraries contain optimized implementations that can take advantage of native processor instructions. However, OpenSSL is not suitable for linking against for Linux distros due to licensing incompatibilities with the GPLv2, and libgcrypt has been less favored by cryptographers due to some security-related implementation issues, which, while not affecting our use of hash algorithms, has affected its reputation. Let's add another option that's compatible with the GPLv2, which is Nettle. This is an option which is generally better than libgcrypt because on many distros GnuTLS (which uses Nettle) is used for HTTPS and therefore as a practical matter it will be available on most systems. As a result, prefer it over libgcrypt and our built-in implementation. Nettle also has recently gained support for Intel's SHA-NI instructions, which compare very favorably to other implementations, as well as assembly implementations for when SHA-NI is not available. A git gc on git.git sees a 12% performance improvement with Nettle over our block SHA-256 implementation due to general assembly improvements. With SHA-NI, the performance of raw SHA-256 on a 2 GiB file goes from 7.296 seconds with block SHA-256 to 1.523 seconds with Nettle. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-10	builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove()	Junio C Hamano	1	-9/+7
	The variables 'source', 'destination', and 'submodule_gitfile' are all of type "const char *", and an element of such an array is of "type const char ", but these memmove() calls were written as if these variables are of type "char *". Once these memmove() calls are fixed to use the correct type to compute the number of bytes to be moved, e.g. - memmove(source + i, source + i + 1, n sizeof(char )); + memmove(source + i, source + i + 1, n sizeof(const char *)); existing contrib/coccinelle/array.cocci rules can recognize them as candidates for turning into MOVE_ARRAY(). While at it, use CALLOC_ARRAY() instead of xcalloc() to allocate the modes[] array that is involved in the change. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08	vimdiff: make layout engine more robust against user vim settings	Fernando Ramos	1	-18/+18
	'vim' has two configuration options ('splitbelow' and 'splitright') that change the way the 'split' command behaves. When they are set, the commands that the layout engine generates no longer work as expected. In order to fix this we can append special keyword 'leftabove' to each 'split' and 'vertical split' subcommand found inside the command string generated by the layout engine. This works because whatever comes after 'leftabove' will temporally ignore settings 'splitbelow' and 'splitright'. Reported-by: Matthew Klein <mklein994@gmail.com> Signed-off-by: Fernando Ramos <greenfoo@u92.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08	clone: use remote branch if it matches default HEAD	Jeff King	3	-6/+48
	Usually clone tries to use the same local HEAD as the remote (unless the user has given --branch explicitly). Even if the remote HEAD is detached or unborn, we can detect those situations with modern versions of Git. If the remote is too old to support the "unborn" extension (or it has been disabled via config), then we can't know the name of the remote's unborn HEAD, and we fall back whatever the local default branch name is configured to be. But that leads to one weird corner case. It's rare because it needs a number of factors: - the remote has an unborn HEAD - the remote is too old to support "unborn", or has disabled it - the remote has another branch "foo" - the local default branch name is "foo" In that case you end up with a local clone on an unborn "foo" branch, disconnected completely from the remote's "foo". This is rare in practice, but the result is quite confusing. When choosing "foo", we can double check whether the remote has such a name, and if so, start our local "foo" at the same spot, rather than making it unborn. Note that this causes a test failure in t5605, which is cloning from a bundle that doesn't contain HEAD (so it behaves like a remote that doesn't support "unborn"), but has a single "main" branch. That test expects that we end up in the weird "unborn main" case, where we don't actually check out the remote branch of the same name. Even though we have to update the test, this seems like an argument in favor of this patch: checking out main is what I'd expect from such a bundle. So this patch updates the test for the new behavior and adds an adjacent one that checks what the original was going for: if there's no HEAD and the bundle _doesn't_ have a branch that matches our local default name, then we end up with nothing checked out. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08	clone: propagate empty remote HEAD even with other branches	Jeff King	2	-22/+55
	Unless "--branch" was given, clone generally tries to match the local HEAD to the remote one. For most repositories, this is easy: the remote tells us which branch HEAD was pointing to, and we call our local checkout() function on that branch. When cloning an empty repository, it's a little more tricky: we have special code that checks the transport's "unborn" extension, or falls back to our local idea of what the default branch should be. In either case, we point the new HEAD to that, and set up the branch.* config. But that leaves one case unhandled: when the remote repository _isn't_ empty, but its HEAD is unborn. The checkout() function is smart enough to realize we didn't fetch the remote HEAD and it bails with a warning. But we'll have ignored any information the remote gave us via the unborn extension. This leads to nonsense outcomes: - If the remote has its HEAD pointing to an unborn "foo" and contains another branch "bar", cloning will get branch "bar" but leave the local HEAD pointing at "master" (or whatever our local default is), which is useless. The project does not use "master" as a branch. - Worse, if the other branch "bar" is instead called "master" (but again, the remote HEAD is not pointing to it), then we end up with a local unborn branch "master", which is not connected to the remote "master" (it shares no history, and there's no branch.* config). Instead, we should try to use the remote's HEAD, even if its unborn, to be consistent with the other cases. The reason this case was missed is that cmd_clone() handles empty and non-empty repositories on two different sides of a conditional: if (we have any refs) { fetch refs; check for --branch; otherwise, try to point our head at remote head; otherwise, our head is NULL; } else { check for --branch; otherwise, try to use "unborn" extension; otherwise, fall back to our default name name; } So the smallest change would be to repeat the "unborn" logic at the end of the first block. But we can note some other overlaps and inconsistencies: - both sides have to handle --branch (though note that it's always an error for the empty repo case, since an empty repo by definition does not have a matching branch) - the fall back to the default name is much more explicit in the empty-repo case. The non-empty case eventually ends up bailing from checkout() with a warning, which produces a similar result, but fails to set up the branch config we do in the empty case. So let's pull the HEAD setup out of this conditional entirely. This de-duplicates some of the code and the result is easy to follow, because helper functions like find_ref_by_name() do the right thing even in the empty-repo case (i.e., by returning NULL). There are two subtleties: - for a remote with a detached HEAD, it will advertise an oid for HEAD (which we store in our "remote_head" variable), but we won't find a matching refname (so our "remote_head_points_at" is NULL). In this case we make a local detached HEAD to match. Right now this happens implicitly by reaching update_head() with a non-NULL remote_head (since we skip all of the unborn-fallback). We'll now need to account for it explicitly before doing the fallback. - for an empty repo, we issue a warning to the user that they've cloned an empty repo. The text of that warning doesn't make sense for a non-empty repo with an unborn HEAD, so we'll have to differentiate the two cases there. We could just use different text, but instead let's allow the code to continue down to checkout(), which will issue an appropriate warning, like: remote HEAD refers to nonexistent ref, unable to checkout Continuing down to checkout() will make it easier to do more fixes on top (see below). Note that this patch fixes the case where the other side reports an unborn head to us using the protocol extension. It _doesn't_ fix the case where the other side doesn't tell us, we locally guess "master", and the other side happens to have a "master" which its HEAD doesn't point. But it doesn't make anything worse there, and it should actually make it easier to fix that problem on top. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-08	clone: drop extra newline from warning message	Jeff King	1	-1/+1
	We don't need to put a "\n" in calls to warning(), since it adds one itself (and the user sees an extra blank line). Drop it, and while we're here, drop the full-stop from the message, which goes against our guidelines. This bug dates all the way back to 8434c2f1af (Build in clone, 2008-04-27), but presumably nobody noticed because it's hard to trigger: you have to clone a repository whose HEAD is unborn, but which is not otherwise empty. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06	cocci: generalize "unused" rule to cover more than "strbuf"	Ævar Arnfjörð Bjarmason	4	-6/+57
	Generalize the newly added "unused.cocci" rule to find more than just "struct strbuf", let's have it find the same unused patterns for "struct string_list", as well as other code that uses similar-looking _{release,clear,free}() and {release,clear,free}_() functions. We're intentionally loose in accepting e.g. a "strbuf_init(&sb)" followed by a "string_list_clear(&sb, 0)". It's assumed that the compiler will catch any such invalid code, i.e. that our constructors/destructors don't take a "void *". See [1] for example of code that would be covered by the "get_worktrees()" part of this rule. We'd still need work that the series is based on (we were passing "worktrees" to a function), but could now do the change in [1] automatically. 1. https://lore.kernel.org/git/Yq6eJFUPPTv%2Fzc0o@coredump.intra.peff.net/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06	cocci: add and apply a rule to find "unused" strbufs	Ævar Arnfjörð Bjarmason	7	-10/+119
	Add a coccinelle rule to remove "struct strbuf" initialization followed by calling "strbuf_release()" function, without any uses of the strbuf in the same function. See the tests in contrib/coccinelle/tests/unused.{c,res} for what it's intended to find and replace. The inclusion of "contrib/scalar/scalar.c" is because "spatch" was manually run on it (we don't usually run spatch on contrib). Per the "buggy code" comment we also match a strbuf_init() before the xmalloc(), but we're not seeking to be so strict as to make checks that the compiler will catch for us redundant. Saying we'll match either "init" or "xmalloc" lines makes the rule simpler. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>