summaryrefslogtreecommitdiffstats
path: root/archive-tar.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* archive: support compression levels beyond 9René Scharfe2020-11-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | Compression programs like zip, gzip, bzip2 and xz allow to adjust the trade-off between CPU cost and size gain with numerical options from -1 for fast compression and -9 for high compression ratio. zip also accepts -0 for storing files verbatim. git archive directly support these single-digit compression levels for ZIP output and passes them to filters like gzip. Zstandard additionally supports compression level options -10 to -19, or up to -22 with --ultra. This *seems* to work with git archive in most cases, e.g. it will produce an archive with -19 without complaining, but since it only supports single-digit compression level options this is the same as -1 -9 and thus -9. Allow git archive to accept multi-digit compression levels to support the full range supported by zstd. Explicitly reject them for the ZIP format, as otherwise deflateInit2() would just fail with a somewhat cryptic "stream consistency error". Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive: read short blobs in archive.c::write_archive_entry()René Scharfe2020-09-201-19/+3
| | | | | | | | | | | | | Centralize reading of symlink destinations and the contents of regular files that are too small to be streamed. This reduces code duplication and allows future patches to add support for adding non-tracked files to archives. The backends are expected to stream blobs if buffer is NULL. object_file_to_archive() is only called from archive.c and thus no longer exported. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* parse_config_key(): return subsection len as size_tJeff King2020-04-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | We return the length to a subset of a string using an "int *" out-parameter. This is fine most of the time, as we'd expect config keys to be relatively short, but it could behave oddly if we had a gigantic config key. A more appropriate type is size_t. Let's switch over, which lets our callers use size_t as appropriate (they are bound by our type because they must pass the out-parameter as a pointer). This is mostly just a cleanup to make it clear this code handles long strings correctly. In practice, our config parser already chokes on long key names (because of a similar int/size_t mixup!). When doing an int/size_t conversion, we have to be careful that nobody was trying to assign a negative value to the variable. I manually confirmed that for each case here. They tend to just feed the result to xmemdupz() or similar; in a few cases I adjusted the parameter types for helper functions to make sure the size_t is preserved. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* streaming: allow open_istream() to handle any repoMatheus Tavares2020-01-311-3/+3
| | | | | | | | | | | | | | | Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'rs/pax-extended-header-length-fix'Junio C Hamano2019-09-091-4/+10
|\ | | | | | | | | | | | | | | | | | | | | "git archive" recorded incorrect length in extended pax header in some corner cases, which has been corrected. * rs/pax-extended-header-length-fix: archive-tar: turn length miscalculation warning into BUG archive-tar: use size_t in strbuf_append_ext_header() archive-tar: fix pax extended header length calculation archive-tar: report wrong pax extended header length
| * archive-tar: turn length miscalculation warning into BUGRené Scharfe2019-08-191-3/+3
| | | | | | | | | | | | | | | | | | | | Now that we're confident our pax extended header calculation is correct, turn the criticality of the assertion up to the maximum, from warning right up to BUG. Simplify the test, as the stderr comparison step would not be reached in case the BUG message is triggered. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * archive-tar: use size_t in strbuf_append_ext_header()René Scharfe2019-08-191-5/+5
| | | | | | | | | | | | | | | | One of its callers already passes in a size_t value. Use it consistently in this function. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * archive-tar: fix pax extended header length calculationRené Scharfe2019-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A pax extended header record starts with a decimal number. Its value is the length of the whole record, including its own length. The calculation of that number in strbuf_append_ext_header() is off by one in case the length of the rest is close to a higher order of magnitude. This affects paths and link targets a bit shorter than 1000, 10000, 100000 etc. characters -- paths with a length of up to 100 fit into the tar header and don't need a pax extended header. The mistake has been present since the function was added by ae64bbc18c ("tar-tree: Introduce write_entry()", 2006-03-25). Account for digits added to len during the loop and keep incrementing until we have enough space for len and the rest. The crucial change is to check against the current value of len before each iteration, instead of against its value before the loop. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
| * archive-tar: report wrong pax extended header lengthRené Scharfe2019-08-191-0/+6
| | | | | | | | | | | | | | | | | | | | Extended header entries contain a length value that is a bit tricky to calculate because it includes its own length (number of decimal digits) as well. We get it wrong in corner cases. Add a check, report wrong results as a warning and add a test for exercising it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | archive: convert struct archiver_args to object_idbrian m. carlson2019-04-011-3/+4
|/ | | | | | | | | Change the commit_sha1 member to be called "commit_oid" and change it to be a pointer to struct object_id. Additionally, update some uses of GIT_SHA1_HEXSZ and hard-coded values to use the_hash_algo instead. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Indent code with TABsNguyễn Thái Ngọc Duy2018-12-091-1/+1
| | | | | | | | | | | | | | We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Upcast size_t variables to uintmax_t when printingTorsten Bögershausen2018-11-121-1/+1
| | | | | | | | | | | | | | | When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'nd/no-the-index'Junio C Hamano2018-08-201-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The more library-ish parts of the codebase learned to work on the in-core index-state instance that is passed in by their callers, instead of always working on the singleton "the_index" instance. * nd/no-the-index: (24 commits) blame.c: remove implicit dependency on the_index apply.c: remove implicit dependency on the_index apply.c: make init_apply_state() take a struct repository apply.c: pass struct apply_state to more functions resolve-undo.c: use the right index instead of the_index archive-*.c: use the right repository archive.c: avoid access to the_index grep: use the right index instead of the_index attr: remove index from git_attr_set_direction() entry.c: use the right index instead of the_index submodule.c: use the right index instead of the_index pathspec.c: use the right index instead of the_index unpack-trees: avoid the_index in verify_absent() unpack-trees: convert clear_ce_flags* to avoid the_index unpack-trees: don't shadow global var the_index unpack-trees: add a note about path invalidation unpack-trees: remove 'extern' on function declaration ls-files: correct index argument to get_convert_attr_ascii() preload-index.c: use the right index instead of the_index dir.c: remove an implicit dependency on the_index in pathspec code ...
| * archive-*.c: use the right repositoryNguyễn Thái Ngọc Duy2018-08-131-1/+1
| | | | | | | | | | | | | | | | With 'struct archive_args' gaining new repository pointer, we don't have to assume the_repository in the archive backends anymore. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'nd/i18n'Junio C Hamano2018-08-161-6/+6
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many more strings are prepared for l10n. * nd/i18n: (23 commits) transport-helper.c: mark more strings for translation transport.c: mark more strings for translation sha1-file.c: mark more strings for translation sequencer.c: mark more strings for translation replace-object.c: mark more strings for translation refspec.c: mark more strings for translation refs.c: mark more strings for translation pkt-line.c: mark more strings for translation object.c: mark more strings for translation exec-cmd.c: mark more strings for translation environment.c: mark more strings for translation dir.c: mark more strings for translation convert.c: mark more strings for translation connect.c: mark more strings for translation config.c: mark more strings for translation commit-graph.c: mark more strings for translation builtin/replace.c: mark more strings for translation builtin/pack-objects.c: mark more strings for translation builtin/grep.c: mark strings for translation builtin/config.c: mark more strings for translation ...
| * archive-tar.c: mark more strings for translationNguyễn Thái Ngọc Duy2018-07-231-6/+6
| | | | | | | | | | Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'sb/object-store-grafts'Junio C Hamano2018-07-181-0/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h
| * object-store: move object access functions to object-store.hStefan Beller2018-05-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'js/use-bug-macro'Junio C Hamano2018-05-301-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | Developer support update, by using BUG() macro instead of die() to mark codepaths that should not happen more clearly. * js/use-bug-macro: BUG_exit_code: fix sparse "symbol not declared" warning Convert remaining die*(BUG) messages Replace all die("BUG: ...") calls by BUG() ones run-command: use BUG() to report bugs, not die() test-tool: help verifying BUG() code paths
| * Replace all die("BUG: ...") calls by BUG() onesJohannes Schindelin2018-05-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in 588a538ae55 (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | cache.h: add repository argument to oid_object_infoStefan Beller2018-04-261-1/+1
|/ | | | | | | | | | | | | | Add a repository argument to allow the callers of oid_object_info to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sha1_file: convert sha1_object_info* to object_idbrian m. carlson2018-03-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert sha1_object_info and sha1_object_info_extended to take pointers to struct object_id and rename them to use "oid" instead of "sha1" in their names. Update the declaration and definition and apply the following semantic patch, plus the standard object_id transforms: @@ expression E1, E2; @@ - sha1_object_info(E1.hash, E2) + oid_object_info(&E1, E2) @@ expression E1, E2; @@ - sha1_object_info(E1->hash, E2) + oid_object_info(E1, E2) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1.hash, E2, E3) + oid_object_info_extended(&E1, E2, E3) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1->hash, E2, E3) + oid_object_info_extended(E1, E2, E3) Signed-off-by: Junio C Hamano <gitster@pobox.com>
* streaming: convert open_istream to use struct object_idbrian m. carlson2018-03-141-1/+1
| | | | | Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive: convert sha1_file_to_archive to struct object_idbrian m. carlson2018-03-141-1/+1
| | | | | | | | Convert this function to take a pointer to struct object_id and rename it object_file_to_archive. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive: convert write_archive_entry_fn_t to object_idbrian m. carlson2018-03-141-14/+14
| | | | | | | | | Convert the write_archive_entry_fn_t type to use a pointer to struct object_id. Convert various static functions in the tar and zip archivers also. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'bw/config-h'Junio C Hamano2017-06-241-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h
| * config: don't include config.h by defaultBrandon Williams2017-06-151-0/+1
| | | | | | | | | | | | | | | | Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | archive-tar: fix a sparse 'constant too large' warningRamsay Jones2017-05-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit dddbad728c ("timestamp_t: a new data type for timestamps", 26-04-2017) introduced a new typedef 'timestamp_t', as a synonym for an unsigned long, which was used at the time to represent timestamps in git. A later commit 28f4aee3fb ("use uintmax_t for timestamps", 26-04-2017) changed the typedef to use an 'uintmax_t' for the timestamp representation type. When building on a 32-bit Linux system, sparse complains that a constant (USTAR_MAX_MTIME) used to detect a 'far-future mtime' timestamp, is too large; 'warning: constant 077777777777UL is so big it is unsigned long long' on lines 335 and 338 of archive-tar.c. Note that both gcc and clang only issue a warning if this constant is used in a context that requires an 'unsigned long' (rather than an uintmax_t). (Since TIME_MAX is no longer equal to 0xFFFFFFFF, even on a 32-bit system, the macro USTAR_MAX_MTIME is set to 077777777777UL, which cannot be represented as an 'unsigned long' constant). In order to suppress the warning, change the definition of the macro constant USTAR_MAX_MTIME to use an 'ULL' type suffix. In a similar vein, on systems which use a 64-bit representation of the 'unsigned long' type, the USTAR_MAX_SIZE constant macro is defined with the value 077777777777ULL. Although this does not cause any warning messages to be issued, it would be more appropriate for this constant to use an 'UL' type suffix rather than 'ULL'. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | timestamp_t: a new data type for timestampsJohannes Schindelin2017-04-271-2/+5
|/ | | | | | | | | | | | | | | | | | | | Git's source code assumes that unsigned long is at least as precise as time_t. Which is incorrect, and causes a lot of problems, in particular where unsigned long is only 32-bit (notably on Windows, even in 64-bit versions). So let's just use a more appropriate data type instead. In preparation for this, we introduce the new `timestamp_t` data type. By necessity, this is a very, very large patch, as it has to replace all timestamps' data type in one go. As we will use a data type that is not necessarily identical to `time_t`, we need to be very careful to use `time_t` whenever we interact with the system functions, and `timestamp_t` everywhere else. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'jk/big-and-future-archive-tar'Junio C Hamano2016-08-121-10/+5
|\ | | | | | | | | | | | | A small code clean-up. * jk/big-and-future-archive-tar: archive-tar: make write_extended_header() void
| * archive-tar: make write_extended_header() voidRené Scharfe2016-08-061-10/+5
| | | | | | | | | | | | | | | | | | The function write_extended_header() only ever returns 0. Simplify it and its caller by dropping its return value, like we did with write_global_extended_header() earlier. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | archive-tar: huge offset and future timestamps would not work on 32-bitJunio C Hamano2016-07-151-0/+5
|/ | | | | | | | | | As we are not yet moving everything to size_t but still using ulong internally when talking about the size of object, platforms with 32-bit long will not be able to produce tar archive with 4GB+ file, and cannot grok 077777777777UL as a constant. Disable the extended header feature and do not test it on them. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive-tar: drop return valueJeff King2016-07-011-7/+4
| | | | | | | | We never do any error checks, and so never return anything but "0". Let's just drop this to simplify the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive-tar: write extended headers for far-future mtimeJeff King2016-07-011-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ustar format represents timestamps as seconds since the epoch, but only has room to store 11 octal digits. To express anything larger, we need to use an extended header. This is exactly the same case we fixed for the size field in the previous commit, and the solution here follows the same pattern. This is even mentioned as an issue in f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), but since it only affected things far in the future, it wasn't deemed worth dealing with. But note that my calculations claiming thousands of years were off there; because our xsnprintf produces a NUL byte, we only have until the year 2242 to fix this. Given that this is just around the corner (geologically speaking, anyway), and because it's easy to fix, let's just make it work. Unlike the previous fix for "size", where we had to write an individual extended header for each file, we can write one global header (since we have only one mtime for the whole archive). There's a slight bit of trickiness there. We may already be writing a global header with a "comment" field for the commit sha1. So we need to write our new field into the same header. To do this, we push the decision of whether to write such a header down into write_global_extended_header(), which will now assemble the header as it sees fit, and will return early if we have nothing to write (in practice, we'll only have a large mtime if it comes from a commit, but this makes it also work if you set your system clock ahead such that time() returns a huge value). Note that we don't (and never did) handle negative timestamps (i.e., before 1970). This would probably not be too hard to support in the same way, but since git does not support negative timestamps at all, I didn't bother here. After writing the extended header, we munge the timestamp in the ustar headers to the maximum-allowable size. This is wrong, but it's the least-wrong thing we can provide to a tar implementation that doesn't understand pax headers (it's also what GNU tar does). Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive-tar: write extended headers for file sizes >= 8GBJeff King2016-07-011-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive-tar: convert snprintf to xsnprintfJeff King2016-05-261-1/+1
| | | | | | | | | | | | | | | | | | Commit f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24) converted cases of "sprintf" to "xsnprintf", but accidentally left one as just "snprintf". This meant that we could silently truncate the resulting buffer instead of flagging an error. In practice, this is impossible to achieve, as we are formatting a ustar checksum, which can be at most 7 characters. But the point of xsnprintf is to document and check for "should be impossible" conditions; this site was just accidentally mis-converted during f2f0267. Noticed-by: Paul Green <Paul.Green@stratus.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive-tar: use xsnprintf for trivial formattingJeff King2015-09-251-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we generate tar headers, we sprintf() values directly into a struct with the fixed-size header values. For the most part this is fine, as we are formatting small values (e.g., the octal format of "mode & 0x7777" is of fixed length). But it's still a good idea to use xsnprintf here. It communicates to readers what our expectation is, and it provides a run-time check that we are not overflowing the buffers. The one exception here is the mtime, which comes from the epoch time of the commit we are archiving. For sane values, this fits into the 12-byte value allocated in the header. But since git can handle 64-bit times, if I claim to be a visitor from the year 10,000 AD, I can overflow the buffer. This turns out to be harmless, as we simply overflow into the chksum field, which is then overwritten. This case is also best as an xsnprintf. It should never come up, short of extremely malformed dates, and in that case we are probably better off dying than silently truncating the date value (and we cannot expand the size of the buffer, since it is dictated by the ustar format). Our friends in the year 5138 (when we legitimately flip to a 12-digit epoch) can deal with that problem then. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* convert trivial sprintf / strcpy calls to xsnprintfJeff King2015-09-251-1/+1
| | | | | | | | | | | | | | | | | | We sometimes sprintf into fixed-size buffers when we know that the buffer is large enough to fit the input (either because it's a constant, or because it's numeric input that is bounded in size). Likewise with strcpy of constant strings. However, these sites make it hard to audit sprintf and strcpy calls for buffer overflows, as a reader has to cross-reference the size of the array with the input. Let's use xsnprintf instead, which communicates to a reader that we don't expect this to overflow (and catches the mistake in case we do). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive-tar: fix minor indentation violationJeff King2015-09-251-1/+1
| | | | | | | | This looks like a simple omission from 8539070 (archive-tar: unindent write_tar_entry by one level, 2012-05-03). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Revert "archive: honor tar.umask even for pax headers"Junio C Hamano2014-10-201-2/+2
| | | | | | | This reverts commit 10f343ea814f5c18a0913997904ee11cd9b7da24, whose output is no longer bit-for-bit equivalent from the older versions of Git, which the infrastructure to (pretend to) upload tarballs kernel.org uses depends on.
* Merge branch 'rs/child-process-init'Junio C Hamano2014-09-111-2/+1
|\ | | | | | | | | | | | | | | | | | | Code clean-up. * rs/child-process-init: run-command: inline prepare_run_command_v_opt() run-command: call run_command_v_opt_cd_env() instead of duplicating it run-command: introduce child_process_init() run-command: introduce CHILD_PROCESS_INIT
| * run-command: introduce CHILD_PROCESS_INITRené Scharfe2014-08-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | archive: honor tar.umask even for pax headersbrian m. carlson2014-08-041-2/+2
|/ | | | | | | | | | | git archive's tar format uses extended pax headers to encode metadata into the archive. Most tar implementations correctly treat these as metadata, but some that do not understand the pax format extract these as files instead. Apply the tar.umask setting to these entries to prevent tampering by other users. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* archive-tar: use parse_config_key when parsing configJeff King2013-01-231-9/+1
| | | | | | | | | | | | | | | | | This is fewer lines of code, but more importantly, fixes a bogus pointer offset. We are looking for "tar." in the section, but later assume that the dot we found is at offset 9, not 3. This is a holdover from an earlier iteration of 767cf45 which called the section "tarfilter". As a result, we could erroneously reject some filters with dots in their name, as well as read uninitialized memory. Reported by (and test by) René Scharfe. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'rs/leave-base-name-in-name-field-of-tar'Junio C Hamano2013-01-101-0/+2
|\ | | | | | | | | | | | | | | | | Improve compatibility with implementations of "tar" that do not like empty name field in header (with the additional prefix field holding everything). * rs/leave-base-name-in-name-field-of-tar: archive-tar: split long paths more carefully
| * archive-tar: split long paths more carefullyRené Scharfe2013-01-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The name field of a tar header has a size of 100 characters. This limit was extended long ago in a backward compatible way by providing the additional prefix field, which can hold 155 additional characters. The actual path is constructed at extraction time by concatenating the prefix field, a slash and the name field. get_path_prefix() is used to determine which slash in the path is used as the cutting point and thus which part of it is placed into the field prefix and which into the field name. It tries to cram as much into the prefix field as possible. (And only if we can't fit a path into the provided 255 characters we use a pax extended header to store it.) If a path is longer than 100 but shorter than 156 characters and ends with a slash (i.e. is for a directory) then get_path_prefix() puts the whole path in the prefix field and leaves the name field empty. GNU tar reconstructs the path without complaint, but the tar included with NetBSD 6 does not: It reports the header to be invalid. For compatibility with this version of tar, make sure to never leave the name field empty. In order to do that, trim the trailing slash from the part considered as possible prefix, if it exists -- that way the last path component (or more, but not less) will end up in the name field. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | archive: ustar header checksum is computed unsignedJunio C Hamano2012-06-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POSIX.1 (pax) is pretty clear on this: The chksum field shall be the ISO/IEC 646:1991 standard IRV representation of the octal value of the simple sum of all octets in the header logical record. Each octet in the header shall be treated as an unsigned value. These values shall be added to an unsigned integer, initialized to zero, the precision of which is not less than 17 bits. When calculating the checksum, the chksum field is treated as if it were all <space> characters. so is GNU: http://www.gnu.org/software/tar/manual/html_node/Checksumming.html Found by 7zip folks and reported by Rafał Mużyło. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | Merge branch 'rs/archive-tree-in-tip-simplify'Junio C Hamano2012-05-231-2/+2
|\ \ | | | | | | | | | | | | | | | | | | By René Scharfe * rs/archive-tree-in-tip-simplify: archive-tar: keep const in checksum calculation archive: simplify refname handling
| * | archive-tar: keep const in checksum calculationRené Scharfe2012-05-181-2/+2
| |/ | | | | | | | | | | | | For correctness, don't needlessly drop the const qualifier when casting. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | archive-tar: stream large blobs to tar fileNguyễn Thái Ngọc Duy2012-05-031-5/+51
| | | | | | | | | | | | | | | | t5000 verifies output while t1050 makes sure the command always respects core.bigfilethreshold Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>