| Commit message (Collapse) | Author | Files | Lines |
|
The config name is "writeBitmaps", so the internal variable
missing the plural is unnecessarily confusing to write.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The config option to turn on bitmaps is read all the way
down in the plumbing of pack-objects. This makes it hard for
other options in the porcelain of repack to make decisions
based on the bitmap setting. For example,
repack.packKeptObjects tries to kick in by default only when
bitmaps are turned on. But it can't do so reliably because
it doesn't yet know whether we are using bitmaps.
This patch teaches repack to respect pack.writebitmaps. It
means we pass a redundant command-line flag to pack-objects,
but that's OK; it shouldn't affect the outcome.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit ee34a2b (repack: add `repack.packKeptObjects` config
var, 2014-03-03) added a flag which could duplicate kept
objects, but did not mean to turn it on by default. Instead,
the option is tied by default to the decision to write
bitmaps, like:
if (pack_kept_objects < 0)
pack_kept_objects = write_bitmap;
after which we expect pack_kept_objects to be a boolean 0 or
1. However, that assignment neglects that write_bitmap is
_also_ a tri-state with "-1" as the default, and with
neither option given, we accidentally turn the option on.
This patch is the minimal fix to restore the desired
behavior for the default state. Further patches will fix the
more complicated cases.
Note the update to t7700. It failed to turn on bitmaps,
meaning we were actually confirming the wrong behavior!
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.
However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.
Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:
1. The pushed .keep pack contains objects that were
already in the repository (e.g., blobs due to a revert of
an old commit).
2. Receive-pack updates the refs, making the objects
reachable, but before it removes the .keep file, the
repack runs.
In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.
This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option. By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).
Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit a201c20 tried to optimize out a loop like:
for (i = 0; i < len; i++)
data[i] = ntohll(data[i]);
in the big-endian case, because we know that ntohll is a
noop, and we do not need to pay the cost of the loop at all.
However, it mistakenly assumed that __BYTE_ORDER was always
defined, whereas it may not be on systems which do not
define it by default, and where we did not need to define it
to set up the ntohll macro. This includes OS X and Windows.
We could muck with the ordering in compat/bswap.h to make
sure it is defined unconditionally, but it is simpler to
still to just execute the loop unconditionally. That avoids
the application code knowing anything about these magic
macros, and lets it depend only on having ntohll defined.
And since the resulting loop looks like (on a big-endian
system):
for (i = 0; i < len; i++)
data[i] = data[i];
any decent compiler can probably optimize it out.
Original report and analysis by Brian Gernhardt.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The caller may hand us an unaligned buffer (e.g., because it
is an mmap of a file with many ewah bitmaps). On some
platforms (like SPARC) this can cause a bus error. We can
fix it with a combination of get_be32 and moving the data
into an aligned buffer (which we would do anyway, but we can
move it before fixing the endianness).
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
Commit d60c49c (read-cache.c: allow unaligned mapping of the
index file, 2012-04-03) introduced helpers to access
unaligned data. However, we already have get_be32, which has
a few advantages:
1. It's already written, so we avoid duplication.
2. It's probably faster, since it does the endian
conversion and the alignment fix at the same time.
3. The get_be32 code is well-tested, having been in
block-sha1 for a long time. By contrast, our custom
helpers were probably almost never used, since the user
needed to manually define a macro to enable them.
We have to add a get_be16 implementation to the existing
get_be32, but that is very simple to do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
The BLK_SHA1 code has optimized wrappers for doing endian
conversions on memory that may not be aligned. Let's pull
them out so that we can use them elsewhere, especially the
time-tested list of platforms that prefer each strategy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When an object lookup fails, we re-read the objects/pack
directory to pick up any new packfiles that may have been
created since our last read. We also discard any pack
revindex structs we've allocated.
The discarding is a problem for the pack-bitmap code, which keeps
a pointer to the revindex for the bitmapped pack. After the
discard, the pointer is invalid, and we may read free()d
memory.
Other revindex users do not keep a bare pointer to the
revindex; instead, they always access it through
revindex_for_pack(), which lazily builds the revindex. So
one solution is to teach the pack-bitmap code a similar
trick. It would be slightly less efficient, but probably not
all that noticeable.
However, it turns out this discarding is not actually
necessary. When we call reprepare_packed_git, we do not
throw away our old pack list. We keep the existing entries,
and only add in new ones. So there is no safety problem; we
will still have the pack struct that matches each revindex.
The packfile itself may go away, of course, but we are
already prepared to handle that, and it may happen outside
of reprepare_packed_git anyway.
Throwing away the revindex may save some RAM if the pack
never gets reused (about 12 bytes per object). But it also
wastes some CPU time (to regenerate the index) if the pack
does get reused. It's hard to say which is more valuable,
but in either case, it happens very rarely (only when we
race with a simultaneous repack). Just leaving the revindex
in place is simple and safe both for current and future
code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|
|
When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.
In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.
As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have "have" tips in the
bitmapped history, but "want" newer objects.
The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.
But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
"new" and "old" objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.
This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.
Here are perf results for p5310:
Test origin/master HEAD^ HEAD
-------------------------------------------------------------------------------------------------
5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7%
5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5%
5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2%
5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9%
You can see that the time spent on an incremental fetch goes
down, as our delta heuristics are able to do their work.
And we save time on the partial bitmap clone for the same
reason.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
|