summaryrefslogtreecommitdiffstats
path: root/diffcore-delta.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'en/diffcore-delta-final-line-fix'Junio C Hamano2024-01-301-0/+4
|\ | | | | | | | | | | | | | | Rename detection logic ignored the final line of a file if it is an incomplete line. * en/diffcore-delta-final-line-fix: diffcore-delta: avoid ignoring final 'line' of file
| * diffcore-delta: avoid ignoring final 'line' of fileElijah Newren2024-01-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hash_chars() would hash lines to integers, and store them in a spanhash, but cut lines at 64 characters. Thus, whenever it reached 64 characters or a newline, it would create a new spanhash. The problem is, the final part of the file might not end 64 characters after the previous 'line' and might not end with a newline. This could, for example, cause an 85-byte file with 12 lines and only the first character in the file differing to appear merely 23% similar rather than the expected 97%. Ensure the last line is included, and add a testcase that would have caught this problem. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | treewide: remove unnecessary includes in source filesElijah Newren2023-12-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each of these were checked with gcc -E -I. ${SOURCE_FILE} | grep ${HEADER_FILE} to ensure that removing the direct inclusion of the header actually resulted in that header no longer being included at all (i.e. that no other header pulled it in transitively). ...except for a few cases where we verified that although the header was brought in transitively, nothing from it was directly used in that source file. These cases were: * builtin/credential-cache.c * builtin/pull.c * builtin/send-pack.c Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | object.h: stop depending on cache.h; make cache.h depend on object.hElijah Newren2023-02-241-1/+1
|/ | | | | | | | | | | | | | | | | Things should be able to depend on object.h without pulling in all of cache.h. Move an enum to allow this. Note that a couple files previously depended on things brought in through cache.h indirectly (revision.h -> commit.h -> object.h -> cache.h). As such, this change requires making existing dependencies more explicit in half a dozen files. The inclusion of strbuf.h in some headers if of particular note: these headers directly embedded a strbuf in some new structs, meaning they should have been including strbuf.h all along but were indirectly getting the necessary definitions. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore-delta.c: LLP64 compatibility, upcast unity for left shiftPhilip Oakley2021-12-011-3/+3
| | | | | | | | | | Visual Studio reports C4334 "was 64-bit shift intended" warning because of size miss-match. Promote unity to the matching type to fit with its subsequent operation. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diff.c: reduce implicit dependency on the_indexNguyễn Thái Ngọc Duy2018-09-211-5/+7
| | | | | | | | | | diff and textconv code has so widespread use that it's hard to simply update their api and all call sites at once because it would result in a big patch. For now reduce the_index references to two places: diff_setup() and fill_textconv(). Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore-delta: rename 'new' variablesBrandon Williams2018-02-221-8/+8
| | | | | | | | Rename C++ keyword in order to bring the codebase closer to being able to be compiled with a C++ compiler. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Merge branch 'tk/diffcore-delta-remove-unused'Junio C Hamano2016-11-171-1/+0
|\ | | | | | | | | | | | | Code cleanup. * tk/diffcore-delta-remove-unused: diffcore-delta: remove unused parameter to diffcore_count_changes()
| * diffcore-delta: remove unused parameter to diffcore_count_changes()Tobias Klauser2016-11-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | The delta_limit parameter to diffcore_count_changes() has been unused since commit ba23bbc8e ("diffcore-delta: make change counter to byte oriented again.", 2006-03-04). Remove the parameter and adjust all callers. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* | use QSORTRené Scharfe2016-09-301-4/+1
|/ | | | | | | | | Apply the semantic patch contrib/coccinelle/qsort.cocci to the code base, replacing calls of qsort(3) with QSORT. The resulting code is shorter and supports empty arrays with NULL pointers. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* use st_add and st_mult for allocation size computationJeff King2016-02-221-2/+4
| | | | | | | | | | | | | If our size computation overflows size_t, we may allocate a much smaller buffer than we expected and overflow it. It's probably impossible to trigger an overflow in most of these sites in practice, but it is easy enough convert their additions and multiplications into overflow-checking variants. This may be fixing real bugs, and it makes auditing the code easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Fix diff -B/--dirstat miscounting of newly added contentsLinus Torvalds2009-12-051-1/+10
| | | | | | | | | | | | | | | | | | | | | | | What used to happen is that diffcore_count_changes() simply ignored any hashes in the destination that didn't match hashes in the source. EXCEPT if the source hash didn't exist at all, in which case it would count _one_ destination hash that happened to have the "next" hash value. As a consequence, newly added material was often undercounted, making output from --dirstat and "complete rewrite" detection used by -B unrelialble. This changes it so that: - whenever it bypasses a destination hash (because it doesn't match a source), it counts the bytes associated with that as "literal added" - at the end (once we have used up all the source hashes), we do the same thing with the remaining destination hashes. - when hashes do match, and we use the difference in counts as a value, we also use up that destination hash entry (the 'd++'). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* optimize diffcore-delta by sorting hash entries.Linus Torvalds2007-10-041-24/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here's a test-patch. I don't guarantee anything, except that when I did the timings I also did a "wc" on the result, and they matched.. Before: [torvalds@woody linux]$ time git diff -l0 --stat -C v2.6.22.. | wc 7104 28574 438020 real 0m10.526s user 0m10.401s sys 0m0.136s After: [torvalds@woody linux]$ time ~/git/git diff -l0 --stat -C v2.6.22.. | wc 7104 28574 438020 real 0m8.876s user 0m8.761s sys 0m0.128s but the diff is fairly simple, so if somebody will go over it and say whether it's likely to be *correct* too, that 15% may well be worth it. [ Side note, without rename detection, that diff takes just under three seconds for me, so in that sense the improvement to the rename detection itself is larger than the overall 15% - it brings the cost of just rename detection from 7.5s to 5.9s, which would be on the order of just over a 20% performance improvement. ] Hmm. The patch depends on half-way subtle issues like the fact that the hashtables are guaranteed to not be full => we're guaranteed to have zero counts at the end => we don't need to do any steenking iterator count in the loop. A few comments might in order. Linus
* Introduce diff_filespec_is_binary()Junio C Hamano2007-07-061-1/+1
| | | | | | | | | | | This replaces an explicit initialization of filespec->is_binary field used for rename/break followed by direct access to that field with a wrapper function that lazily iniaitlizes and accesses the field. We would add more attribute accesses for the use of diff routines, and it would be better to make this abstraction earlier. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore-delta.c: Ignore CR in CRLF for text filesJunio C Hamano2007-07-011-3/+11
| | | | | | | | | | | | This ignores CR byte in CRLF sequence in text file when computing similarity of two blobs. Usually this should not matter as nobody sane would be checking in a file with CRLF line endings to the repository (they would use autocrlf so that the repository copy would have LF line endings). Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore-delta.c: update the comment on the algorithm.Junio C Hamano2007-07-011-12/+9
| | | | | | | | | | The comment at the top of the file described an old algorithm that was neutral to text/binary differences (it hashed sliding window of N-byte sequences and counted overlaps), but long time ago we switched to a new heuristics that are more suitable for line oriented (read: text) files that are much faster. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore_count_changes: pass diffcore_filespecJunio C Hamano2007-07-011-4/+4
| | | | | | | | | | We may want to use richer information on the data we are dealing with in this function, so instead of passing a buffer address and length, just pass the diffcore_filespec structure. Existing callers always call this function with parameters taken from a filespec anyway, so there is no functionality changes. Signed-off-by: Junio C Hamano <gitster@pobox.com>
* diffcore-delta: 64-byte-or-EOL ultrafast replacement (hash fix).Linus Torvalds2006-03-151-9/+10
| | | | | | | The rotating 64-bit number was not really rotating, and worse yet ulong was longer than 64-bit on 64-bit architectures X-<. Signed-off-by: Junio C Hamano <junkio@cox.net>
* diffcore-delta: 64-byte-or-EOL ultrafast replacement.Linus Torvalds2006-03-151-18/+16
| | | | Signed-off-by: Junio C Hamano <junkio@cox.net>
* diffcore-delta: tweak hashbase value.Junio C Hamano2006-03-131-1/+8
| | | | | | | | | This tweaks the maximum hashvalue we use to hash the string into without making the maximum size of the hashtable can grow from the current limit. With this, the renames detected becomes a bit more precise without incurring additional paging cost. Signed-off-by: Junio C Hamano <junkio@cox.net>
* diffcore-delta: make the hash a bit denser.Junio C Hamano2006-03-131-4/+9
| | | | | | | | To reduce wasted memory, wait until the hash fills up more densely before we rehash. This reduces the working set size a bit further. Signed-off-by: Junio C Hamano <junkio@cox.net>
* diffcore-rename: somewhat optimized.Junio C Hamano2006-03-121-21/+140
| | | | | | | | | This changes diffcore-rename to reuse statistics information gathered during similarity estimation, and updates the hashtable implementation used to keep track of the statistics to be denser. This seems to give better performance. Signed-off-by: Junio C Hamano <junkio@cox.net>
* diffcore-delta: make change counter to byte oriented again.Junio C Hamano2006-03-041-28/+68
| | | | | | | | The textual line oriented change counter was fun but was not very effective. It tended to overcount the changes. This one changes it to a simple N-letter substring based implementation. Signed-off-by: Junio C Hamano <junkio@cox.net>
* diffcore-rename: split out the delta counting code.Junio C Hamano2006-03-011-0/+43
This is to rework diffcore break/rename/copy detection code so that it does not affected when deltifier code gets improved. Signed-off-by: Junio C Hamano <junkio@cox.net>