summaryrefslogtreecommitdiffstats
path: root/dir.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* win32: override `fspathcmp()` with a directory separator-aware versionJohannes Schindelin2024-07-141-2/+2
| | | | | | | | | | | | | | | | | | | | On Windows, the backslash is the directory separator, even if the forward slash can be used, too, at least since Windows NT. This means that the paths `a/b` and `a\b` are equivalent, and `fspathcmp()` needs to be made aware of that fact. Note that we have to override both `fspathcmp()` and `fspathncmp()`, and the former cannot be a mere pre-processor constant that transforms calls to `fspathcmp(a, b)` into `fspathncmp(a, b, (size_t)-1)` because the function `report_collided_checkout()` in `unpack-trees.c` wants to assign `list.cmp = fspathcmp`. Also note that `fspatheq()` does _not_ need to be overridden because it calls `fspathcmp()` internally. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* hash-ll: merge with "hash.h"Patrick Steinhardt2024-06-141-1/+1
| | | | | | | | | | | | | | | The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split out of hash.h to remove dependency on repository.h, 2023-04-22) to make explicit the split between hash-related functions that rely on the global `the_repository`, and those that don't. This split is no longer necessary now that we we have removed the reliance on `the_repository`. Merge "hash-ll.h" back into "hash.h". This causes some code units to not include "repository.h" anymore, which requires us to add some forward declarations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dir.c: always copy input to add_pattern()Jeff King2024-06-051-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The add_pattern() function has a subtle and undocumented gotcha: the pattern string you pass in must remain valid as long as the pattern_list is in use (and nor do we take ownership of it). This is easy to get wrong, causing either subtle bugs (because you free or reuse the string buffer) or leaks (because you copy the string, but don't track ownership separately). All of this "pattern" code was originally the "exclude" mechanism. So this _usually_ works OK because you add entries in one of two ways: 1. From the command-line (e.g., "--exclude"), in which case we're pointing to an argv entry which remains valid for the lifetime of the program. 2. From a file (e.g., ".gitignore"), in which case we read the whole file into a buffer, attach it to the pattern_list's "filebuf" entry, then parse the buffer in-place (adding NULs). The strings point into the filebuf, which is cleaned up when the whole pattern_list goes away. But other code, like sparse-checkout, reads individual lines from stdin and passes them one by one to add_pattern(), leaking each. We could fix this by refactoring it to take in the whole buffer at once, like (2) above, and stuff it in "filebuf". But given how subtle the interface is, let's just fix it to always copy the string. That seems at first like we'd be wasting extra memory, but we can mitigate that: a. The path_pattern struct already uses a FLEXPTR, since we sometimes make a copy (when we see "foo/", we strip off the trailing slash, requiring a modifiable copy of the string). Since we'll now always embed the string inside the struct, we can switch to the regular FLEX_ARRAY pattern, saving us 8 bytes of pointer. So patterns with a trailing slash and ones under 8 bytes actually get smaller. b. Now that we don't need the original string to hang around, we can get rid of the "filebuf" mechanism entirely, and just free the file contents after parsing. Since files are the sources we'd expect to have the largest pattern sets, we should mostly break even on stuffing the same data into the individual structs. This patch just adjusts the add_pattern() interface; it doesn't fix any leaky callers yet. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Sync with 2.44.1Johannes Schindelin2024-04-291-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint-2.44: (41 commits) Git 2.44.1 Git 2.43.4 Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel ...
| * Sync with 2.42.2Johannes Schindelin2024-04-191-0/+7
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint-2.42: (39 commits) Git 2.42.2 Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' ...
| | * Sync with 2.41.1Johannes Schindelin2024-04-191-0/+7
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint-2.41: (38 commits) Git 2.41.1 Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs ...
| | | * Sync with 2.40.2Johannes Schindelin2024-04-191-0/+7
| | | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint-2.40: (39 commits) Git 2.40.2 Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default ...
| | | | * Sync with 2.39.4Johannes Schindelin2024-04-191-0/+7
| | | | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * maint-2.39: (38 commits) Git 2.39.4 fsck: warn about symlink pointing inside a gitdir core.hooksPath: add some protection while cloning init.templateDir: consider this config setting protected clone: prevent hooks from running during a clone Add a helper function to compare file contents init: refactor the template directory discovery into its own function find_hook(): refactor the `STRIP_EXTENSION` logic clone: when symbolic links collide with directories, keep the latter entry: report more colliding paths t5510: verify that D/F confusion cannot lead to an RCE submodule: require the submodule path to contain directories only clone_submodule: avoid using `access()` on directories submodules: submodule paths must not contain symlinks clone: prevent clashing git dirs when cloning submodule in parallel t7423: add tests for symlinked submodule directories has_dir_name(): do not get confused by characters < '/' docs: document security issues around untrusted .git dirs upload-pack: disable lazy-fetching by default fetch/clone: detect dubious ownership of local repositories ...
| | | | | * entry: report more colliding pathsJohannes Schindelin2024-04-171-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In b878579ae7 (clone: report duplicate entries on case-insensitive filesystems, 2018-08-17) code was added to warn about index entries that resolve to the same file system entity (usually the cause is a case-insensitive filesystem). In Git for Windows, where inodes are not trusted (because of a performance trade-off, inodes are equal to 0 by default), that check does not compare inode numbers but the verbatim path. This logic works well when index entries' paths differ only in case. However, for file/directory conflicts only the file's path was reported, leaving the user puzzled with what that path collides. Let's try ot catch colliding paths even if one path is the prefix of the other. We do this also in setups where the file system is case-sensitive because the inode check would not be able to catch those collisions. While not a complete solution (for example, on macOS, Unicode normalization could also lead to file/directory conflicts but be missed by this logic), it is at least another defensive layer on top of what the previous commits added. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* | | | | | dir: create untracked_cache_invalidate_trimmed_path()Jeff Hostetler2024-02-271-0/+7
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a wrapper function for untracked_cache_invalidate_path() that silently trims a trailing slash, if present, before calling the wrapped function. The untracked cache expects to be called with a pathname that does not contain a trailing slash. This can make it inconvenient for callers that have a directory path. Lets hide this complexity. This will be used by a later commit in the FSMonitor code which may receive directory pathnames from an FSEvent. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | dir.[ch]: add 'follow_symlink' arg to 'get_dtype'Victoria Dye2023-10-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a 'follow_symlink' boolean option to 'get_type()'. If 'follow_symlink' is enabled, DT_LNK (in addition to DT_UNKNOWN) d_types triggers the stat-based d_type resolution, using 'stat' instead of 'lstat' to get the type of the followed symlink. Note that symlinks are not followed recursively, so a symlink pointing to another symlink will still resolve to DT_LNK. Update callers in 'diagnose.c' to specify 'follow_symlink = 0' to preserve current behavior. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | | dir.[ch]: expose 'get_dtype'Victoria Dye2023-10-101-0/+11
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move 'get_dtype()' from 'diagnose.c' to 'dir.c' and add its declaration to 'dir.h' so that it is accessible to callers in other files. The function and its documentation are moved verbatim except for a small addition to the description clarifying what the 'path' arg represents. Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'cw/strbuf-cleanup'Junio C Hamano2023-07-061-0/+4
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move functions that are not about pure string manipulation out of strbuf.[ch] * cw/strbuf-cleanup: strbuf: remove global variable path: move related function to path object-name: move related functions to object-name credential-store: move related functions to credential-store file abspath: move related functions to abspath strbuf: clarify dependency strbuf: clarify API boundary
| * | | | object-name: move related functions to object-nameCalvin Wan2023-06-121-0/+2
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move object-name-related functions from strbuf.[ch] to object-name.[ch] so that strbuf is focused on string manipulation routines with minimal dependencies. dir.h relied on the forward declration of the repository struct in strbuf.h. Since that is removed in this patch, add the forward declaration to dir.h. Signed-off-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | | Merge branch 'en/header-split-cache-h-part-3'Junio C Hamano2023-06-301-0/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Header files cleanup. * en/header-split-cache-h-part-3: (28 commits) fsmonitor-ll.h: split this header out of fsmonitor.h hash-ll, hashmap: move oidhash() to hash-ll object-store-ll.h: split this header out of object-store.h khash: name the structs that khash declares merge-ll: rename from ll-merge git-compat-util.h: remove unneccessary include of wildmatch.h builtin.h: remove unneccessary includes list-objects-filter-options.h: remove unneccessary include diff.h: remove unnecessary include of oidset.h repository: remove unnecessary include of path.h log-tree: replace include of revision.h with simple forward declaration cache.h: remove this no-longer-used header read-cache*.h: move declarations for read-cache.c functions from cache.h repository.h: move declaration of the_index from cache.h merge.h: move declarations for merge.c from cache.h diff.h: move declaration for global in diff.c from cache.h preload-index.h: move declarations for preload-index.c from elsewhere sparse-index.h: move declarations for sparse-index.c from cache.h name-hash.h: move declarations for name-hash.c from cache.h run-command.h: move declarations for run-command.c from cache.h ...
| * | | | hash-ll, hashmap: move oidhash() to hash-llElijah Newren2023-06-211-0/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | oidhash() was used by both hashmap and khash, which makes sense. However, the location of this function in hashmap.[ch] meant that khash.h had to depend upon hashmap.h, making people unfamiliar with khash think that it was built upon hashmap. (Or at least, I personally was confused for a while about this in the past.) Move this function to hash-ll, so that khash.h can stop depending upon hashmap.h. This has another benefit as well: it allows us to remove hashmap.h's dependency on hash-ll.h. While some callers of hashmap.h were making use of oidhash, most were not, so this change provides another way to reduce the number of includes. Diff best viewed with `--color-moved`. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* / / / statinfo.h: move DTYPE defines from dir.hAlejandro R. Sedeño2023-06-121-14/+0
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 592fc5b3 (dir.h: move DTYPE defines from cache.h, 2023-04-22) moved DTYPE macros from cache.h to dir.h, but they are still used by cache.h to implement ce_to_dtype(); cache.h cannot include dir.h because that would cause name-hash.c to have two different and conflicting definitions of `struct dir_entry`. (That should be separately fixed.) Both dir.h and cache.h include statinfo.h, and this seems a reasonable place for these definitions. This change fixes a broken build issue on old SunOS. Signed-off-by: Alejandro R. Sedeño <asedeno@mit.edu> Signed-off-by: Alejandro R Sedeño <asedeno@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | dir.h: move DTYPE defines from cache.hElijah Newren2023-04-241-0/+15
| | | | | | | | | | | | | | | Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | Merge branch 'en/header-cleanup'Junio C Hamano2023-03-171-14/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code clean-up to clarify the rule that "git-compat-util.h" must be the first to be included. * en/header-cleanup: diff.h: remove unnecessary include of object.h Remove unnecessary includes of builtin.h treewide: replace cache.h with more direct headers, where possible replace-object.h: move read_replace_refs declaration from cache.h to here object-store.h: move struct object_info from cache.h dir.h: refactor to no longer need to include cache.h object.h: stop depending on cache.h; make cache.h depend on object.h ident.h: move ident-related declarations out of cache.h pretty.h: move has_non_ascii() declaration from commit.h cache.h: remove dependence on hex.h; make other files include it explicitly hex.h: move some hex-related declarations from cache.h hash.h: move some oid-related declarations from cache.h alloc.h: move ALLOC_GROW() functions from cache.h treewide: remove unnecessary cache.h includes in source files treewide: remove unnecessary cache.h includes treewide: remove unnecessary git-compat-util.h includes in headers treewide: ensure one of the appropriate headers is sourced first
| * | | dir.h: refactor to no longer need to include cache.hElijah Newren2023-02-241-14/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | Moving a few functions around allows us to make dir.h no longer need to include cache.h. This commit is best viewed with: git log -1 -p --color-moved Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | dir: mark output only fields of dir_struct as suchElijah Newren2023-02-271-8/+8
| | | | | | | | | | | | | | | | | | | | | While at it, also group these fields together for convenience. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | dir: add a usage note to exclude_per_dirElijah Newren2023-02-271-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As evidenced by the fix a couple commits ago, places in the code using exclude_per_dir are likely buggy and should be adapted to call setup_standard_excludes() instead. Unfortunately, the usage of exclude_per_dir has been hardcoded into the arguments ls-files accepts, so we cannot actually remove it. Add a note that it is deprecated and no other callers should use it directly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | | dir: separate public from internal portion of dir_structElijah Newren2023-02-271-40/+46
|/ / | | | | | | | | | | | | | | | | | | In order to make it clearer to callers what portions of dir_struct are public API, and avoid errors from them setting fields that are meant as internal API, split the fields used for internal implementation reasons into a separate embedded struct. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* / dir.c: free "ident" and "exclude_per_dir" in "struct untracked_cache"Ævar Arnfjörð Bjarmason2022-11-211-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | When the "ident" member of the structure was added in 1e8fef609e7 (untracked cache: guard and disable on system changes, 2015-03-08) this function wasn't updated to free it. Let's do so. Let's also free the "exclude_per_dir" memory we've been leaking since[1], while making sure not to free() the constant ".gitignore" string we add by default[2]. As we now have three struct members we're freeing let's change free_untracked_cache() to return early if "uc" isn't defined. We won't hand it to free() now, but that was just for convenience, once we're dealing with >=2 struct members this pattern is more convenient. 1. f9e6c649589 (untracked cache: load from UNTR index extension, 2015-03-08) 2. 039bc64e886 (core.excludesfile clean-up, 2007-11-14) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com>
* match_pathname(): drop unused "flags" parameterJeff King2022-08-191-1/+1
| | | | | | | | | | | | | | | | | This field has not been used since the function was introduced in b559263216 (exclude: split pathname matching code into a separate function, 2012-10-15), though there was a brief period where it was erroneously used and then reverted in ed4958477b (dir: fix pattern matching on dirs, 2021-09-24) and 5ceb663e92 (dir: fix directory-matching bug, 2021-11-02). It's possible we'd eventually add a flag that makes it useful here, but there are only a handful of callers. It would be easy to add back if necessary, and in the meantime this makes the function interface less misleading. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dir API: add a generalized path_match_flags() functionÆvar Arnfjörð Bjarmason2022-05-171-0/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a path_match_flags() function and have the two sets of starts_with_dot_{,dot_}slash() functions added in 63e95beb085 (submodule: port resolve_relative_url from shell to C, 2016-04-15) and a2b26ffb1a8 (fsck: convert gitmodules url to URL passed to curl, 2020-04-18) be thin wrappers for it. As the latter of those notes the fsck version was copied from the initial builtin/submodule--helper.c version. Since the code added in a2b26ffb1a8 was doing really doing the same as win32_is_dir_sep() added in 1cadad6f658 (git clone <url> C:\cygwin\home\USER\repo' is working (again), 2018-12-15) let's move the latter to git-compat-util.h is a is_xplatform_dir_sep(). We can then call either it or the platform-specific is_dir_sep() from this new function. Let's likewise change code in various other places that was hardcoding checks for "'/' || '\\'" with the new is_xplatform_dir_sep(). As can be seen in those callers some of them still concern themselves with ':' (Mac OS classic?), but let's leave the question of whether that should be consolidated for some other time. As we expect to make wider use of the "native" case in the future, define and use two starts_with_dot_{,dot_}slash_native() convenience wrappers. This makes the diff in builtin/submodule--helper.c much smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dir: new flag to remove_dir_recurse() to spare the original_cwdElijah Newren2021-12-091-0/+3
| | | | | | | | | | | | | | | | | | remove_dir_recurse(), and its non-static wrapper called remove_dir_recursively(), both take flags for modifying its behavior. As with the previous commits, we would generally like to protect the original_cwd, but we want to forced user commands (e.g. 'git rm -rf ...') or other special cases to remove it. Add a flag for this purpose. After reading through every caller of remove_dir_recursively() in the current codebase, there was only one that should be adjusted and that one only in a very unusual circumstance. Add a pair of new testcases to highlight that very specific case involving submodules && --git-dir && --work-tree. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dir: avoid incidentally removing the original_cwd in remove_path()Elijah Newren2021-12-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modern git often tries to avoid leaving empty directories around when removing files. Originally, it did not bother. This behavior started with commit 80e21a9ed809 (merge-recursive::removeFile: remove empty directories, 2005-11-19), stating the reason simply as: When the last file in a directory is removed as the result of a merge, try to rmdir the now-empty directory. This was reimplemented in C and renamed to remove_path() in commit e1b3a2cad7 ("Build-in merge-recursive", 2008-02-07), but was still internal to merge-recursive. This trend towards removing leading empty directories continued with commit d9b814cc97f1 (Add builtin "git rm" command, 2006-05-19), which stated the reasoning as: The other question is what to do with leading directories. The old "git rm" script didn't do anything, which is somewhat inconsistent. This one will actually clean up directories that have become empty as a result of removing the last file, but maybe we want to have a flag to decide the behaviour? remove_path() in dir.c was added in 4a92d1bfb784 (Add remove_path: a function to remove as much as possible of a path, 2008-09-27), because it was noted that we had two separate implementations of the same idea AND both were buggy. It described the purpose of the function as a function to remove as much as possible of a path Why remove as much as possible? Well, at the time we probably would have said something like: * removing leading directories makes things feel tidy * removing leading directories doesn't hurt anything so long as they had no files in them. But I don't believe those reasons hold when the empty directory happens to be the current working directory we inherited from our parent process. Leaving the parent process in a deleted directory can cause user confusion when subsequent processes fail: any git command, for example, will immediately fail with fatal: Unable to read current working directory: No such file or directory Other commands may similarly get confused. Modify remove_path() so that the empty leading directories it also deletes does not include the current working directory we inherited from our parent process. I have looked through every caller of remove_path() in the current codebase to make sure that all should take this change. Acked-by: Derrick Stolee <stolee@gmail.com> Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ds/sparse-index-ignored-files'Junio C Hamano2021-09-211-0/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cone mode, the sparse-index code path learned to remove ignored files (like build artifacts) outside the sparse cone, allowing the entire directory outside the sparse cone to be removed, which is especially useful when the sparse patterns change. * ds/sparse-index-ignored-files: sparse-checkout: clear tracked sparse dirs sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag attr: be careful about sparse directories sparse-checkout: create helper methods sparse-index: use WRITE_TREE_MISSING_OK sparse-index: silently return when cache tree fails unpack-trees: fix nested sparse-dir search sparse-index: silently return when not using cone-mode patterns t7519: rewrite sparse index test
| * sparse-checkout: create helper methodsDerrick Stolee2021-09-081-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we integrate the sparse index into more builtins, we occasionally need to check the sparse-checkout patterns to see if a path is within the sparse-checkout cone. Create some helper methods that help initialize the patterns and check for pattern matching to make this easier. The existing callers of commands like get_sparse_checkout_patterns() use a custom 'struct pattern_list' that is not necessarily the one in the 'struct index_state', so there are not many previous uses that could adopt these helpers. There are just two in builtin/add.c and sparse-index.c that can use path_in_sparse_checkout(). We add a path_in_cone_mode_sparse_checkout() as well that will only return false if the path is outside of the sparse-checkout definition _and_ the sparse-checkout patterns are in cone mode. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | dir: libify and export helper functions from clone.cAtharva Raykar2021-08-101-0/+11
|/ | | | | | | | | | These functions can be useful to other parts of Git. Let's move them to dir.c, while renaming them to be make their functionality more explicit. Signed-off-by: Atharva Raykar <raykar.ath@gmail.com> Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: Shourya Shukla <periperidip@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ew/many-alternate-optim'Junio C Hamano2021-07-281-0/+2
|\ | | | | | | | | | | | | | | | | | | | | Optimization for repositories with many alternate object store. * ew/many-alternate-optim: oidtree: a crit-bit tree for odb_loose_cache oidcpy_with_padding: constify `src' arg make object_directory.loose_objects_subdir_seen a bitmap avoid strlen via strbuf_addstr in link_alt_odb_entry speed up alt_odb_usable() with many alternates
| * speed up alt_odb_usable() with many alternatesEric Wong2021-07-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With many alternates, the duplicate check in alt_odb_usable() wastes many cycles doing repeated fspathcmp() on every existing alternate. Use a khash to speed up lookups by odb->path. Since the kh_put_* API uses the supplied key without duplicating it, we also take advantage of it to replace both xstrdup() and strbuf_release() in link_alt_odb_entry() with strbuf_detach() to avoid the allocation and copy. In a test repository with 50K alternates and each of those 50K alternates having one alternate each (for a total of 100K total alternates); this speeds up lookup of a non-existent blob from over 16 minutes to roughly 2.7 seconds on my busy workstation. Note: all underlying git object directories were small and unpacked with only loose objects and no packs. Having to load packs increases times significantly. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | dir.[ch]: replace dir_init() with DIR_INITÆvar Arnfjörð Bjarmason2021-07-011-2/+2
|/ | | | | | | | | | | | | | | | | Remove the dir_init() function and replace it with a DIR_INIT macro. In many cases in the codebase we need to initialize things with a function for good reasons, e.g. needing to call another function on initialization. The "dir_init()" function was not one such case, and could trivially be replaced with a more idiomatic macro initialization pattern. The only place where we made use of its use of memset() was in dir_clear() itself, which resets the contents of an an existing struct pointer. Let's use the new "memcpy() a 'blank' struct on the stack" idiom to do that reset. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'en/dir-traversal'Junio C Hamano2021-05-201-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "git clean" and "git ls-files -i" had confusion around working on or showing ignored paths inside an ignored directory, which has been corrected. * en/dir-traversal: dir: introduce readdir_skip_dot_and_dotdot() helper dir: update stale description of treat_directory() dir: traverse into untracked directories if they may have ignored subfiles dir: avoid unnecessary traversal into ignored directory t3001, t7300: add testcase showcasing missed directory traversal t7300: add testcase showing unnecessary traversal into ignored directory ls-files: error out on -i unless -o or -c are specified dir: report number of visited directories and paths with trace2 dir: convert trace calls to trace2 equivalents
| * dir: introduce readdir_skip_dot_and_dotdot() helperElijah Newren2021-05-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many places in the code were doing while ((d = readdir(dir)) != NULL) { if (is_dot_or_dotdot(d->d_name)) continue; ...process d... } Introduce a readdir_skip_dot_and_dotdot() helper to make that a one-liner: while ((d = readdir_skip_dot_and_dotdot(dir)) != NULL) { ...process d... } This helper particularly simplifies checks for empty directories. Also use this helper in read_cached_dir() so that our statistics are consistent across platforms. (In other words, read_cached_dir() should have been using is_dot_or_dotdot() and skipping such entries, but did not and left it to treat_path() to detect and mark such entries as path_none.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * dir: report number of visited directories and paths with trace2Elijah Newren2021-05-131-0/+4
| | | | | | | | | | | | | | | | | | | | Provide more statistics in trace2 output that include the number of directories and total paths visited by the directory traversal logic. Subsequent patches will take advantage of this to ensure we do not unnecessarily traverse into ignored directories. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'ds/sparse-index-protections'Junio C Hamano2021-04-301-4/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
| * | *: remove 'const' qualifier for struct index_stateDerrick Stolee2021-04-141-4/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several methods specify that they take a 'struct index_state' pointer with the 'const' qualifier because they intend to only query the data, not change it. However, we will be introducing a step very low in the method stack that might modify a sparse-index to become a full index in the case that our queries venture inside a sparse-directory entry. This change only removes the 'const' qualifiers that are necessary for the following change which will actually modify the implementation of index_name_stage_pos(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* / exclude: add flags parameter to add_patterns()Jeff King2021-02-161-1/+2
|/ | | | | | | | | | There are a number of callers of add_patterns() and its sibling functions. Let's give them a "flags" parameter for adding new options without having to touch each caller. We'll use this in a future patch to add O_NOFOLLOW support. But for now each caller just passes 0. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sparse-checkout: load sparse-checkout patternsDerrick Stolee2021-01-241-0/+2
| | | | | | | | | | | | | | | | | | | | A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dir: fix problematic API to avoid memory leaksElijah Newren2020-08-191-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dir structure seemed to have a number of leaks and problems around it. First I noticed that parent_hashmap and recursive_hashmap were being leaked (though Peff noticed and submitted fixes before me). Then I noticed in the previous commit that clear_directory() was only taking responsibility for a subset of fields within dir_struct, despite the fact that entries[] and ignored[] we allocated internally to dir.c. That, of course, resulted in many callers either leaking or haphazardly trying to free these arrays and their contents. Digging further, I found that despite the pretty clear documentation near the top of dir.h that folks were supposed to call clear_directory() when the user no longer needed the dir_struct, there were four callers that didn't bother doing that at all. However, two of them clearly thought about leaks since they had an UNLEAK(dir) directive, which to me suggests that the method to free the data was too unclear. I suspect the non-obviousness of the API and its holes led folks to avoid it, which then snowballed into further problems with the entries[], ignored[], parent_hashmap, and recursive_hashmap problems. Rename clear_directory() to dir_clear() to be more in line with other data structures in git, and introduce a dir_init() to handle the suggested memsetting of dir_struct to all zeroes. I hope that a name like "dir_clear()" is more clear, and that the presence of dir_init() will provide a hint to those looking at the code that they need to look for either a dir_clear() or a dir_free() and lead them to find dir_clear(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dir: make clear_directory() free all relevant memoryElijah Newren2020-08-191-1/+1
| | | | | | | | | | | | | | | | | | | The calling convention for the dir API is supposed to end with a call to clear_directory() to free up no longer needed memory. However, clear_directory() didn't free dir->entries or dir->ignored. I believe this was an oversight, but a number of callers noticed memory leaks and started free'ing these. Unfortunately, they did so somewhat haphazardly (sometimes freeing the entries in the arrays, and sometimes only free'ing the arrays themselves). This suggests the callers weren't trying to make sure any possible memory used might be free'd, but just the memory they noticed their usecase definitely had allocated. Fix this mess by moving all the duplicated free'ing logic into clear_directory(). End by resetting dir to a pristine state so it could be reused if desired. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'ds/sparse-cone'Junio C Hamano2019-12-251-0/+36
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Management of sparsely checked-out working tree has gained a dedicated "sparse-checkout" command. * ds/sparse-cone: (21 commits) sparse-checkout: improve OS ls compatibility sparse-checkout: respect core.ignoreCase in cone mode sparse-checkout: check for dirty status sparse-checkout: update working directory in-process for 'init' sparse-checkout: cone mode should not interact with .gitignore sparse-checkout: write using lockfile sparse-checkout: use in-process update for disable subcommand sparse-checkout: update working directory in-process sparse-checkout: sanitize for nested folders unpack-trees: add progress to clear_ce_flags() unpack-trees: hash less in cone mode sparse-checkout: init and set in cone mode sparse-checkout: use hashmaps for cone patterns sparse-checkout: add 'cone' mode trace2: add region in clear_ce_flags sparse-checkout: create 'disable' subcommand sparse-checkout: add '--stdin' option to set subcommand sparse-checkout: 'set' subcommand clone: add --sparse mode sparse-checkout: create 'init' subcommand ...
| * unpack-trees: hash less in cone modeDerrick Stolee2019-11-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The sparse-checkout feature in "cone mode" can use the fact that the recursive patterns are "connected" to the root via parent patterns to decide if a directory is entirely contained in the sparse-checkout or entirely removed. In these cases, we can skip hashing the paths within those directories and simply set the skipworktree bit to the correct value. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * sparse-checkout: init and set in cone modeDerrick Stolee2019-11-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make the cone pattern set easy to use, update the behavior of 'git sparse-checkout (init|set)'. Add '--cone' flag to 'git sparse-checkout init' to set the config option 'core.sparseCheckoutCone=true'. When running 'git sparse-checkout set' in cone mode, a user only needs to supply a list of recursive folder matches. Git will automatically add the necessary parent matches for the leading directories. When testing 'git sparse-checkout set' in cone mode, check the error stream to ensure we do not see any errors. Specifically, we want to avoid the warning that the patterns do not match the cone-mode patterns. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * sparse-checkout: use hashmaps for cone patternsDerrick Stolee2019-11-221-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The parent and recursive patterns allowed by the "cone mode" option in sparse-checkout are restrictive enough that we can avoid using the regex parsing. Everything is based on prefix matches, so we can use hashsets to store the prefixes from the sparse-checkout file. When checking a path, we can strip path entries from the path and check the hashset for an exact match. As a test, I created a cone-mode sparse-checkout file for the Linux repository that actually includes every file. This was constructed by taking every folder in the Linux repo and creating the pattern pairs here: /$folder/ !/$folder/*/ This resulted in a sparse-checkout file sith 8,296 patterns. Running 'git read-tree -mu HEAD' on this file had the following performance: core.sparseCheckout=false: 0.21 s (0.00 s) core.sparseCheckout=true: 3.75 s (3.50 s) core.sparseCheckoutCone=true: 0.23 s (0.01 s) The times in parentheses above correspond to the time spent in the first clear_ce_flags() call, according to the trace2 performance traces. While this example is contrived, it demonstrates how these patterns can slow the sparse-checkout feature. Helped-by: Eric Wong <e@80x24.org> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | dir: move doc to dir.hHeba Waly2019-11-181-5/+114
|/ | | | | | | | | | | | | Move the documentation from Documentation/technical/api-directory-listing.txt to dir.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-directory-listing.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header files. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'en/clean-nested-with-ignored'Junio C Hamano2019-10-111-3/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "git clean" fixes. * en/clean-nested-with-ignored: dir: special case check for the possibility that pathspec is NULL clean: fix theoretical path corruption clean: rewrap overly long line clean: avoid removing untracked files in a nested git repository clean: disambiguate the definition of -d git-clean.txt: do not claim we will delete files with -n/--dry-run dir: add commentary explaining match_pathspec_item's return value dir: if our pathspec might match files under a dir, recurse into it dir: make the DO_MATCH_SUBMODULE code reusable for a non-submodule case dir: also check directories for matching pathspecs dir: fix off-by-one error in match_pathspec_item dir: fix typo in comment t7300: add testcases showing failure to clean specified pathspecs
| * clean: avoid removing untracked files in a nested git repositoryElijah Newren2019-09-171-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users expect files in a nested git repository to be left alone unless sufficiently forced (with two -f's). Unfortunately, in certain circumstances, git would delete both tracked (and possibly dirty) files and untracked files within a nested repository. To explain how this happens, let's contrast a couple cases. First, take the following example setup (which assumes we are already within a git repo): git init nested cd nested >tracked git add tracked git commit -m init >untracked cd .. In this setup, everything works as expected; running 'git clean -fd' will result in fill_directory() returning the following paths: nested/ nested/tracked nested/untracked and then correct_untracked_entries() would notice this can be compressed to nested/ and then since "nested/" is a directory, we would call remove_dirs("nested/", ...), which would check is_nonbare_repository_dir() and then decide to skip it. However, if someone also creates an ignored file: >nested/ignored then running 'git clean -fd' would result in fill_directory() returning the same paths: nested/ nested/tracked nested/untracked but correct_untracked_entries() will notice that we had ignored entries under nested/ and thus simplify this list to nested/tracked nested/untracked Since these are not directories, we do not call remove_dirs() which was the only place that had the is_nonbare_repository_dir() safety check -- resulting in us deleting both the untracked file and the tracked (and possibly dirty) file. One possible fix for this issue would be walking the parent directories of each path and checking if they represent nonbare repositories, but that would be wasteful. Even if we added caching of some sort, it's still a waste because we should have been able to check that "nested/" represented a nonbare repository before even descending into it in the first place. Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use it to prevent fill_directory() and friends from descending into nested git repos. With this change, we also modify two regression tests added in commit 91479b9c72f1 ("t7300: add tests to document behavior of clean and nested git", 2015-06-15). That commit, nor its series, nor the six previous iterations of that series on the mailing list discussed why those tests coded the expectation they did. In fact, it appears their purpose was simply to test _existing_ behavior to make sure that the performance changes didn't change the behavior. However, these two tests directly contradicted the manpage's claims that two -f's were required to delete files/directories under a nested git repository. While one could argue that the user gave an explicit path which matched files/directories that were within a nested repository, there's a slippery slope that becomes very difficult for users to understand once you go down that route (e.g. what if they specified "git clean -f -d '*.c'"?) It would also be hard to explain what the exact behavior was; avoid such problems by making it really simple. Also, clean up some grammar errors describing this functionality in the git-clean manpage. Finally, there are still a couple bugs with -ffd not cleaning out enough (e.g. missing the nested .git) and with -ffdX possibly cleaning out the wrong files (paying attention to outer .gitignore instead of inner). This patch does not address these cases at all (and does not change the behavior relative to those flags), it only fixes the handling when given a single -f. See https://public-inbox.org/git/20190905212043.GC32087@szeder.dev/ for more discussion of the -ffd[X?] bugs. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>