Continuation of "setup.c" refactoring to drop remaining global state
(`git_work_tree_cfg`, `is_bare_repository_cfg`). The most notable
outcome is that `is_bare_repository()` has been updated to no longer
implicitly rely on `the_repository`.
* ps/setup-drop-global-state:
treewide: drop USE_THE_REPOSITORY_VARIABLE
environment: stop using `the_repository` in `is_bare_repository()`
environment: split up concerns of `is_bare_repository_cfg`
builtin/init: stop modifying `is_bare_repository_cfg`
setup: remove global `git_work_tree_cfg` variable
builtin/init: simplify logic to configure worktree
builtin/init: stop modifying global `git_work_tree_cfg` variable
The global configuration variables protect_hfs and protect_ntfs have
been migrated into struct repo_config_values to tie them to
per-repository configuration state.
* ty/move-protect-hfs-ntfs:
environment: use 'repo->initialized' for repo_protect_hfs() and repo_protect_ntfs()
environment: move 'protect_hfs' and 'protect_ntfs' into 'repo_config_values'
The handling of promisor-remote protocol capability has been
loosened to allow the other side to add to the list of promisor
remotes via the promisor.acceptFromServerURL configuration
variable.
* cc/promisor-auto-config-url-more:
doc: promisor: improve acceptFromServer entry
promisor-remote: auto-configure unknown remotes
promisor-remote: trust known remotes matching acceptFromServerUrl
promisor-remote: introduce promisor.acceptFromServerUrl
promisor-remote: add 'local_name' to 'struct promisor_info'
urlmatch: add url_normalize_pattern() helper
urlmatch: change 'allow_globs' arg to bool
t5710: simplify 'mkdir X' followed by 'git -C X init'
Advice shown by "git status" when the local branch is behind or has
diverged from its push branch has been updated to suggest "git pull
<remote> <branch>".
* hn/status-pull-advice-qualified:
remote: qualify "git pull" advice for non-upstream compareBranches
When there is rebase in progress "git status" displays the last couple
of completed and the next couple of pending commands from the todo
list. When it does this it tries to abbreviate the object ids of
the commits to be picked. Unfortunately it does not abbreviate the
object ids when the line starts with "fixup -C" or "merge -C". It
also mistakenly replaces the refname in "reset main" and "update-ref
refs/heads/main" with the object id that the ref points to.
Fix this by using the function added in the last commit to parse the
command name and only try to abbreviate the argument for commands that
take an object id. If a command accepts a label then try to resolve the
object name as a label first and only if that fails try to resolve it
as an object_id. When trying to abbreviate an object id, only replace
the object name if it starts with the abbreviated object id so that
tag or branch names that contain only hex digits are left unchanged.
Comments are now processed after stripping any leading
whitespace from the line. This matches what the sequencer does in
parse_insn_line(). The existing test cases are updated to test a
wider variety of commands. Only the pending commands in the tests
are changed to avoid removing existing coverage.
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code that parses todo commands into a separate function so
that it can be shared with "git status" in the next commit. As we
know the input is NUL terminated we do not pass a pointer to the end
of the line and instead test for a blank line by looking for NUL, CR
LF, or LF. We use starts_with() instead of starts_with_mem() for the
same reason. This results in slightly different behavior when there
a CR at the start of the line that is not followed by LF. Previously
such a line was treated as a comment rather than an invalid line.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Xcode 15 and later has a linker set to complain when the same library
archive is listed twice on the command line. Squelch the annoyance.
* hn/macos-linker-warning:
config.mak.uname: avoid macOS dup-library warning
Wean the Windows builds in GitLab CI procedure away from
(unfortunately unreliable) Chocolatey to install dependencies.
* ps/gitlab-ci-windows:
gitlab-ci: migrate Windows builds away from Chocolatey
The Git project is not exactly the easiest project to get started in:
it's written in C and POSIX shell, with bits of Perl, Rust and other
languages sprinkled into it. On top of that, the project has grown
somewhat organically over time, making the codebase hard to navigate.
These are problems that we're aware of, and there have been and still
are efforts to clean up some of the technical debt that is natural to
exist an a project that is more than 20 years old. Furthermore, we
provide resources to newcomers that help them out like our coding
guidelines, code of conduct or "MyFirstContribution.adoc".
But there is a rather practical problem: finding your way around in our
project's tree is not easy. Doing a directory listing in the top-level
directory will present you with more than 550 files, which makes it
extremely hard for a newcomer to figure out what files they are even
supposed to look at. This makes the onboarding experience somewhat
harder than it really needs to be. This isn't only a problem for
newcomers though, as I myself struggle to find the files I am looking
for because of the sheer number of files.
Besides the problem of discoverability it also creates a problem of
structure. It is not obvious at all which files are part of "libgit.a"
and which files are only linked into our final executables. So while we
have this split in our build systems, that split is not evident at all
in our tree.
Introduce a new "lib/" directory and move all of our sources for
"libgit.a" into it to fix these issues. It makes the split we have
evident and reduces the number of files in our top-level tree from 550
files to ~80 files.
This is still a lot of files, but it's significantly easier to navigate
already. Furthermore, we can further iterate after this step and think
about introducing a better structure for remaining files, as well.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we're about to introduce a new "lib/" directory and
move all of our files into it. With this split the compiler won't be
able to find one of the includes in "test-example-tap.c" anymore. Adjust
it to a relative include to prepare for this change.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
# By Patrick Steinhardt
* ps/odb-source-packed:
odb/source-packed: drop pointer to "files" parent source
midx: refactor interfaces to work on "packed" source
odb/source-packed: stub out remaining functions
odb/source-packed: wire up `freshen_object()` callback
odb/source-packed: wire up `find_abbrev_len()` callback
odb/source-packed: wire up `count_objects()` callback
odb/source-packed: wire up `for_each_object()` callback
odb/source-packed: wire up `read_object_stream()` callback
odb/source-packed: wire up `read_object_info()` callback
packfile: use higher-level interface to implement `has_object_pack()`
odb/source-packed: wire up `reprepare()` callback
odb/source-packed: wire up `close()` callback
odb/source-packed: start converting to a proper `struct odb_source`
odb/source-packed: store pointer to "files" instead of generic source
packfile: move packed source into "odb/" subsystem
packfile: split out packfile list logic
packfile: rename `struct packfile_store` to `odb_source_packed`
# Conflicts:
# packfile.h
When performing connectivity checks we have to figure out whether any of
the new objects are promisor objects, as we cannot assume full
connectivity if so.
This check is performed by iterating through all packfiles in the
repository and searching each of them for the given object. Of course,
this mechanism is quite specific to implementation details of the object
database, as we assume that it uses packfiles in the first place.
Refactor the logic so that we instead use `odb_for_each_object_ext()`
with an object prefix filter and the `ODB_FOR_EACH_OBJECT_PROMISOR_ONLY`
flag. This will yield all objects that have the exact object name and
that are part of a promisor pack in a generic way.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers of `odb_for_each_object()` can specify an optional object name
prefix so that we only yield objects that match it. This is incompatible
though with passing flags at the same time, as we don't yet know to
handle them.
Loosen this restriction by calling `should_exclude_pack()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The caller can pass flags that allow them to filter out specific kinds
of objects when iterating objects via `odb_for_each_object()`. This only
works for "normal" iteration though, as we `BUG()` when the user passes
flags and specifies an object prefix.
This limitation will be lifted in the next commit. Prepare for this by
extracting the logic that skips certain kinds of packs so that we can
easily reuse it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/odb-source-packed:
odb/source-packed: drop pointer to "files" parent source
midx: refactor interfaces to work on "packed" source
odb/source-packed: stub out remaining functions
odb/source-packed: wire up `freshen_object()` callback
odb/source-packed: wire up `find_abbrev_len()` callback
odb/source-packed: wire up `count_objects()` callback
odb/source-packed: wire up `for_each_object()` callback
odb/source-packed: wire up `read_object_stream()` callback
odb/source-packed: wire up `read_object_info()` callback
packfile: use higher-level interface to implement `has_object_pack()`
odb/source-packed: wire up `reprepare()` callback
odb/source-packed: wire up `close()` callback
odb/source-packed: start converting to a proper `struct odb_source`
odb/source-packed: store pointer to "files" instead of generic source
packfile: move packed source into "odb/" subsystem
packfile: split out packfile list logic
packfile: rename `struct packfile_store` to `odb_source_packed`
Introduce `odb_prepare()` as a simple wrapper to prepare alternates and
then prepare each individual source. Adapt git-grep(1) to use it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `reprepare()` callback function can be used to flush caches of a
given object source and then prepare it anew. This is for example used
when a concurrent process may have written new objects. Ultimately, this
can be seen as doing two separate steps:
1. We drop any caches.
2. We prepare the source.
We have one callsite in git-grep(1) though that really only want to do
(2). This is done by reaching into the "files" backend directly and then
calling `odb_source_packed_prepare()`, which of course may not work with
alternate backends.
We could in theory just call `reprepare()` here, and that would likely
not have any significant downside. But this would certainly feel like a
code smell.
Instead, generalize the `reprepare()` callback to `prepare()` with a
flag that optionally instructs the backend to also flush the caches,
which allows us to drop the external `odb_source_packed_prepare()`
declaration.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ps/odb-source-packed:
odb/source-packed: drop pointer to "files" parent source
midx: refactor interfaces to work on "packed" source
odb/source-packed: stub out remaining functions
odb/source-packed: wire up `freshen_object()` callback
odb/source-packed: wire up `find_abbrev_len()` callback
odb/source-packed: wire up `count_objects()` callback
odb/source-packed: wire up `for_each_object()` callback
odb/source-packed: wire up `read_object_stream()` callback
odb/source-packed: wire up `read_object_info()` callback
packfile: use higher-level interface to implement `has_object_pack()`
odb/source-packed: wire up `reprepare()` callback
odb/source-packed: wire up `close()` callback
odb/source-packed: start converting to a proper `struct odb_source`
odb/source-packed: store pointer to "files" instead of generic source
packfile: move packed source into "odb/" subsystem
packfile: split out packfile list logic
packfile: rename `struct packfile_store` to `odb_source_packed`
In the preceding commits we have fixed recursion when creating the
reference backends due to a chicken-and-egg situation with "onbranch"
conditions. Unfortunately, this issue has existed for a while, and we
didn't really have a good mechanism to detect this recursion.
Improve the status quo by detecting the recursion when creating the main
reference store.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Same as with the "files" backend, the "reftable" backend also has a
chicken-and-egg problem with "onbranch" conditions. Fix this issue the
same as we did with the "files" backend by lazy-loading configuration.
Now that both the "files" and the "reftable" backend handle this
properly, add a generic test to t1400 that verifies that the user can
configure "core.logAllRefUpdates" via an "onbranch" condition. This is
mostly a nonsensical thing to do in the first place, but it serves as a
good sanity chekc.
Note that we had to move `should_write_log()` around so that it can
access the new `reftable_be_write_options()` function.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When initializing the reftable stack the caller may optionally pass some
write options. These write options mix up two different concerns though:
- Of course, they allow the caller to configure how new reftables are
being written.
- But they also allow the caller to configure the stack itself, like
its hash ID and the `on_reload` callback.
This is somewhat awkward, as it doesn't easily give the caller the
flexibility to for example write multiple reftables with different
options. Furthermore, this requires us to eagerly parse relevant
configuration when initializing the reftable backend.
Refactor the code by splitting out those options that configure the
stack itself. Creating a new stack will thus only require this limited
set of options, whereas the caller is expected to pass write options to
all functions that end up writing tables.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When initializing the "files" reference backend we read the repository's
config to parse "core.preferSymlinkRefs" and "core.logAllRefUpdates".
This results in a chicken-and-egg problem though, because parsing the
configuration may require us to have access to the reference store
already when an "onbranch" condition exists.
Luckily, all the configuration that we honor only relates to writing
references. Consequently, we don't strictly need that configuration to
be readily available at initialization time, and we can easiliy defer
parsing it to a later point in time.
Implement this fix and add tests that verify that we can indeed properly
parse these config knobs via an "onbranch" condition.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cc42c88945 (refs: extract out reflog config to generic layer,
2026-05-04) we have refactored how we parse "core.logAllRefUpdates" so
that it happens in the generic layer. Unfortunately, this has worsened a
preexisting issue where we may recurse when creating the reference store
because of a chicken-and-egg problem between parsing the configuration
and evaluating "onbranch" conditions.
Prepare for a fix by essentially reverting that change so that we handle
this setting in the respective backends again. The backends are already
parsing other configuration anyway, so by moving the logic back in there
we can ensure that all backend configuration is parsed the same way.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While we release worktree and submodule reference databases when
clearing a repository, we don't ever release the main reference
database. This memory leak went unnoticed because its pointer is
kept alive by the "chdir_notify" subsystem.
Fix the memory leak.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the preceding commit we've removed all callers of
`chdir_notify_reparent()`, so the function is unused now. Drop it.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When creating reference stores we register them with the "chdir_notify"
subsystem. This is required because some of the paths we track may be
relative paths, so we have to reparent them in case the current working
directory changes.
But while we register the reference stores, we never unregister them.
This can have multiple outcomes:
- For a repository's main reference database we essentially keep the
pointer alive. We never free that database, either, and our leak
checker doesn't notice because it's still registered.
- For submodule and worktree reference databases we do eventually free
them in `repo_clear()`, so we may keep pointers to free'd memory
registered. We never notice though as we don't tend to chdir around
in the middle of the process.
We never noticed either of these symptoms, but they are obviously bad.
Partially fix those issues by unregistering the reference stores when
releasing them. The leak of the main reference database will be fixed in
a subsequent commit.
Note that this requires us to use `chdir_notify_register()` instead of
`chdir_notify_reparent()`, as there is no infrastructure to unregister the
latter.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When discovering a repository we eventually also apply the
"GIT_REFERENCE_BACKEND" environment variable to the repository. There's
two problems with that:
- We do this unconditionally, which is rather pointless: we really
only have to configure the repository when we have found one.
- We have already applied the repository format at that point in time,
so we need to manually reapply it.
Move the logic around so that we only apply the environment variable
when a repository was discovered. This also allows us to drop the
explcit call to `repo_set_ref_storage_format()` because we now adjust
the format before we apply it via `apply_repository_format()`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When discovering the repository in "setup.c" we apply the final
repository format multiple times:
- Once via `repository_format_configure()`, where we apply the hash
algorithm and ref storage format to both `struct repository_format`
and `struct repository`.
- And once via `apply_repository_format()`, where we apply these two
settings from `struct repository_format` to `struct repository`.
With the current flow both of these are in fact necessary. But this is
only because we call `repository_format_configure()` after we have
called `apply_repository_format()`. Consequently, if we only changed the
repository format in `repository_format_configure()` it would never
propagate to the repository.
Refactor the code so that we first configure the repository format
before applying it to the repository so that we can stop setting the
hash and reference storage format multiple times.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two callsites of `check_and_apply_repository_format()`. In a
subsequent commit we'll want to adapt one of those callsites to change
the order in which we read and apply the repository format, at which
point the helper function will not really be a good fit for us anymore.
Inline the function to both of the callsites.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the stated goals of git-replay(1) is to allow implementing the
git-rebase(1) functionality on the server side.
The default mode of git-rebase(1) is to act as if `--no-rebase-merges`
was given. This mode drops merge commits instead of replaying them, and
linearizes the commit history into a sequence of the
regular (single-parent) commits.
Add option `--linearize` to git-replay(1) to do the same.
Co-authored-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function replay_revisions() in replay.c is rather lengthy. Extract
the logic to put a commit entry into mapped_commits into a helper
function put_mapped_commit().
While at it, rename mapped_commit() to get_mapped_commit() to pair with
this new function.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 2760ee4983 (replay: add --revert mode to reverse commit changes,
2026-03-26) the enum `replay_mode` was introduced. This has two possible
values:
- The value `REPLAY_MODE_REVERT` is used when option `--revert` is
passed to git-replay(1). When using this value the commits are
processed in reverse order and the inverse of the changes are
applied.
- The value `REPLAY_MODE_PICK` is used when either option `--onto` or
`--advance` is used. In both cases the commits are processed in
normal order, and the changes are applied as-is.
Since there are only two possible values of this enum, simplify the code
by converting the enum into a bool. This avoids adding code paths that
check for invalid values of the enum, and shortens code where the value
is checked with a ternary operator.
Signed-off-by: Toon Claes <toon@iotcl.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `__MINGW64__` constant is defined, surprise, surprise, only when
building for a 64-bit CPU architecture.
Therefore using it as a guard to define `_POSIX_C_SOURCE` (so that
`localtime_r()` is declared, among other functions) is not enough, we
also need to check `__MINGW32__`.
Technically, the latter constant is defined even for 64-bit builds. But
let's make things a bit easier to understand by testing for both
constants.
Making it so fixes this compile warning (turned error in GCC v14.1):
archive-zip.c: In function 'dos_time':
archive-zip.c:612:9: error: implicit declaration of function 'localtime_r';
did you mean 'localtime_s'? [-Wimplicit-function-declaration]
612 | localtime_r(&time, &tm);
| ^~~~~~~~~~~
| localtime_s
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With --dry-run, --delete-merged prints the local branches it would
delete, one "Would delete branch <name>" line each, and exits
without touching any ref. The same filtering applies, so the output
is exactly the set that the real run would delete.
--dry-run is only meaningful together with --delete-merged and is
rejected otherwise.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting branch.<name>.deleteMerged=false exempts that branch from
"git branch --delete-merged", which is useful for a topic you want
to keep developing after an early round of it has been merged
upstream. Unless --quiet is given, each skip is reported so the
user knows why their topic was kept.
Explicit deletion with "git branch -d" still uses the normal merge
check and ignores this setting.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git branch --delete-merged <branch>...
deletes the local branches that "--forked <branch>" would list,
keeping only those whose tip is reachable from their configured
upstream. The work has already landed on the upstream they track,
so the local copy is no longer needed.
Three kinds of branches are not deleted:
* any branch checked out in any worktree
* any branch whose upstream remote-tracking branch no longer
exists, since a missing upstream is not by itself a sign of
integration
* any branch whose push destination equals its upstream
(<branch>@{push} is the same as <branch>@{upstream}), such as
a local "main" that tracks and pushes to "origin/main". Right
after a pull it just looks "fully merged", so it is kept. Only
branches that push somewhere other than their upstream,
typically topics in a fork workflow, are candidates.
A branch whose work is not yet merged into its upstream is silently
skipped, so one unmerged topic does not abort the whole sweep.
A branch that another, surviving branch tracks as its upstream is
also kept, so a branch is never deleted out from under one stacked
on top of it. Sparing such a base can in turn protect its own
upstream, so the check repeats until the set stops changing.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach delete_branches() two new modes for the upcoming
--delete-merged: one that asks only whether a branch is merged into
its upstream, without falling back to HEAD when there is no
upstream, and one that rehearses the deletions without removing any
ref. Existing callers keep their current behavior.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a skip-unmerged mode to delete_branches() and check_branch_commit()
so a bulk caller can silently skip branches that are not fully merged
and carry on, rather than erroring with the "use 'git branch -D'"
advice that the plain "git branch -d" path emits. Existing callers are
unaffected.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
delete_branches() and check_branch_commit() take a pair of int
booleans (force and quiet) that the next commits would grow further.
Replace them with a single "unsigned int flags" argument and an
enum, splitting the bits back into named bool locals so the body
keeps reading the same named values.
No change in behavior.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a --forked option to "git branch" list mode that lists only
branches whose configured upstream matches <branch>. The argument
can be a ref (e.g. "origin/main", "master"), a remote name like
"origin" for the branch its origin/HEAD points at, or a shell glob
(e.g. "origin/*"), and may be repeated to widen the filter.
It is an ordinary list filter, so it combines with the others:
git branch --merged origin/main --forked 'origin/*'
lists branches forked from origin that are already merged into
origin/main, and --no-merged inverts the question.
This is the building block for --delete-merged, which deletes the
listed branches once they have landed on their upstream.
Signed-off-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Have a repo with a subtree merge, do a 'git log --follow prefix/test.c',
the output only contains history in the outer repo, not commits that
were merged via a subtree merge.
What happens is that 'git log --follow' stores the followed path only in
opt->diffopt.pathspec, so in case the commit history is non-linear, and
multiple parents have renames to the followed path, then the end result
isn't really defined: the first commit that happens to be visited in one
of the parents update opt->diffopt.pathspec, and from that point, only
that updated path is visited.
Fix the problem by introducing a commit -> path map
(follow_pathspec_slab) that stores what will be a path to follow when
visiting that parent. At the top of log_tree_commit(), if the slab has
an entry for this commit, we replace opt->diffopt.pathspec with a path
from this entry, so the correct path is followed, even if an unrelated
sub-tree changed the path to be followed to something else. After
log_tree_diff() runs, we record each parent's path in the slab. As a
result, the walk order doesn't matter, which was exactly the source of
problems previously.
This helps with subtree merges (rename happens inside the merge commit),
but also fixes the general case when the rename happens in the history
of parents, not in the merge commit itself.
Signed-off-by: Miklos Vajna <vmiklos@collabora.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/objects-larger-than-4gb-on-windows-more:
odb: use size_t for object_info.sizep and the size APIs
packfile,delta: drop the `cast_size_t_to_ulong()` wrappers
pack-objects: use size_t for in-core object sizes
packfile: widen unpack_entry()'s size out-parameter to size_t
pack-objects(check_pack_inflate()): use size_t instead of unsigned long
patch-delta: use size_t for sizes
compat/msvc: use _chsize_s for ftruncate
Since the inception of `--path-walk`, this option has had a documented
incompatibility with `--delta-islands`.
When discussing those original patches on the list, a message from
Stolee in [1] noted the following:
this could be remedied by [...] doing a separate walk to identify
islands using the normal method
In a related portion of the thread, Peff explains[2]:
The delta islands code already does its own tree walk to propagate
the bits down (it does rely on the base walk's show_commit() to
propagate through the commits).
Once each object has its island bitmaps, I think however you
choose to come up with delta candidates [...] you should be able
to use it. It's fundamentally just answering the question of "am
I allowed to delta between these two objects".
That is similar to what this patch does, and it turns out the cheaper
option is sufficient: perform the same island side effects from the
path-walk callback rather than doing a second walk.
Recall how delta-islands are computed during a normal repack:
- `show_commit()` calls `propagate_island_marks()` for each commit,
which merges the commit's island bitset onto its root tree object and
onto each of its parent commits.
- `show_object()` for a tree records the tree's depth derived from the
slash-separated pathname. Subsequent `resolve_tree_islands()` uses
that depth to walk trees in increasing-depth order, propagating each
tree's marks to its children.
- At delta-search time, `in_same_island()` enforces that a delta
target's island bitmap is a subset of its base's: every island that
reaches the target must also reach the base.
Path-walk's enumeration callback is `add_objects_by_path()`. It already
adds objects to `to_pack`, but until now did not perform the
island-related side effects. Two things are needed:
- For each commit batch, call `propagate_island_marks()` on commits,
exactly as `show_commit()` does.
We have to be careful about the order in which we call this function,
and we must see a commit before its parents in order to have
island marks to propagate.
The path-walk batch preserves that order. Path-walk appends commits
to its `OBJ_COMMIT` batch as they come back from the same
`get_revision()` loop the regular traversal uses, and
`add_objects_by_path()` iterates the batch in array order. So every
commit reaches `propagate_island_marks()` in the same sequence that
`show_commit()` would have seen it, and the descendant-first chain
that the algorithm relies on is intact.
Skip island propagation for excluded commits to match the regular
traversal, whose `show_commit()` callback is only invoked for
interesting commits. Boundary commits may still be present in
path-walk's callback so they can serve as thin-pack bases, but they
should not contribute island marks.
- For each tree batch, record the tree's depth from the path. Use the
`record_tree_depth()` helper from the previous commit so both
callbacks behave identically, including the max-depth-wins behavior
when a tree is reached via more than one path. The helper accepts
both the `show_object()` path shape ("foo", "foo/bar") and the
path-walk shape with a trailing slash ("foo/", "foo/bar/"), so depths
recorded from either traversal mode are directly comparable.
This is implicit in the implementation sketch from Peff above.
`resolve_tree_islands()` sorts trees by `oe->tree_depth` in
increasing-depth order before propagating marks down, so that a
parent tree's marks are finalized before its children inherit them.
Without recording the depth at path-walk time, every
path-walk-discovered tree would land at depth 0 in `to_pack`, the
sort would lose its ordering, and children could inherit marks from
parents whose own contributions had not yet been merged in.
With those two pieces in place, `resolve_tree_islands()` receives the
same island inputs from path-walk as it would from the regular
traversal, so the existing island checks can be reused unchanged.
Drop the documented incompatibility between `--path-walk` and
`--delta-islands`, and add t5320 coverage for path-walk island repacks
with and without bitmap writing, as well as the same-island case where a
delta remains allowed.
[1]: https://lore.kernel.org/git/9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com/
[2]: https://lore.kernel.org/git/20240911063203.GA1538586@coredump.intra.peff.net/
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare for a subsequent change that needs to record tree depths from a
second call site by factoring the delta-islands tree-depth bookkeeping
out of `show_object()` and into a helper, `record_tree_depth()`.
The helper looks up the object in `to_pack`, returns early when the
object was not added there, computes the depth from the slash count in
the supplied name, and preserves the existing max-depth-wins behavior
when a tree is reached by more than one path.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'pack-objects' is invoked with '--path-walk', it prevents us from
using reachability bitmaps.
This behavior dates back to 70664d2865 (pack-objects: add --path-walk
option, 2025-05-16), which included a comment in the relevant portion of
the command-line arguments handling that read as follows:
/*
* We must disable the bitmaps because we are removing
* the --objects / --objects-edge[-aggressive] options.
*/
In fb2c309b7d3 (pack-objects: pass --objects with --path-walk,
2026-05-02), path-walk learned to pass '--objects' again, but still
kept bitmap traversal disabled. That leaves two useful cases
unsupported:
* A path-walk repack that writes bitmaps does not give the bitmap
selector any commits, because path-walk reveals commits through
`add_objects_by_path()` rather than through `show_commit()`, where
`index_commit_for_bitmap()` is normally called.
* An invocation like "git pack-objects --use-bitmap-index --path-walk"
never tries an existing bitmap, even when one is available and could
answer the request.
Fortunately for us, neither restriction is required.
* On the writing side: teach the path-walk object callback to call
`index_commit_for_bitmap()` for commits that it adds to the pack.
That gives the bitmap selector the commit candidates it would have
seen from the regular traversal.
* For bitmap reading, keep passing '--objects' to the internal rev_list
machinery, but stop clearing `use_bitmap_index`. If an existing
bitmap can answer the request, use it; otherwise fall back to
path-walk's own enumeration.
As a result, we can see significantly reduced pack generation times from
p5311 (with our `GIT_PERF_REPO` set to a recent clone of the fluentui
repository) before this commit:
Test HEAD^ HEAD
----------------------------------------------------------------------------------------
5311.40: server (1 days, --path-walk) 1.43(1.39+0.04) 0.01(0.01+0.00) -99.3%
5311.41: size (1 days, --path-walk) 139.6K 139.7K +0.0%
5311.42: client (1 days, --path-walk) 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0%
5311.44: server (2 days, --path-walk) 1.43(1.39+0.04) 0.01(0.00+0.00) -99.3%
5311.45: size (2 days, --path-walk) 139.6K 139.7K +0.0%
5311.46: client (2 days, --path-walk) 0.02(0.02+0.00) 0.02(0.02+0.00) +0.0%
5311.48: server (4 days, --path-walk) 1.44(1.39+0.04) 0.01(0.01+0.00) -99.3%
5311.49: size (4 days, --path-walk) 238.1K 238.1K +0.0%
5311.50: client (4 days, --path-walk) 0.03(0.03+0.00) 0.03(0.03+0.00) +0.0%
5311.52: server (8 days, --path-walk) 1.43(1.39+0.03) 0.01(0.00+0.00) -99.3%
5311.53: size (8 days, --path-walk) 344.9K 344.9K +0.0%
5311.54: client (8 days, --path-walk) 0.07(0.07+0.00) 0.07(0.08+0.00) +0.0%
5311.56: server (16 days, --path-walk) 1.47(1.44+0.03) 0.10(0.08+0.01) -93.2%
5311.57: size (16 days, --path-walk) 844.0K 844.0K +0.0%
5311.58: client (16 days, --path-walk) 0.09(0.09+0.00) 0.09(0.09+0.00) +0.0%
5311.60: server (32 days, --path-walk) 1.52(1.50+0.05) 0.14(0.15+0.02) -90.8%
5311.61: size (32 days, --path-walk) 4.2M 4.2M +0.1%
5311.62: client (32 days, --path-walk) 0.34(0.48+0.02) 0.34(0.45+0.05) +0.0%
5311.64: server (64 days, --path-walk) 1.55(1.52+0.06) 0.15(0.15+0.04) -90.3%
5311.65: size (64 days, --path-walk) 6.4M 6.4M -0.0%
5311.66: client (64 days, --path-walk) 0.51(0.79+0.05) 0.51(0.80+0.06) +0.0%
5311.68: server (128 days, --path-walk) 1.59(1.57+0.06) 0.16(0.21+0.01) -89.9%
5311.69: size (128 days, --path-walk) 8.4M 8.4M -0.0%
5311.70: client (128 days, --path-walk) 0.72(1.44+0.08) 0.71(1.47+0.09) -1.4%
We get the same size of output pack, but this commit allows us to do so
in a significantly shorter amount of time. Intuitively, we're generating
the same pack (hence the unchanged 'test_size' output from run to run),
but varying how we get there. Before this commit, pack-objects prefers
'--path-walk' to '--use-bitmap-index', so we generate the output pack by
performing a normal '--path-walk' traversal. With this commit, we are
operating over a *repacked* state (that itself was done with a
'--path-walk' traversal), but are able to perform pack-reuse on that
repacked state via bitmaps.
When comparing the size of the repacked pack with/without '--path-walk'
on the previous commit versus this one, we see that (a) the repacked size
improves significantly with '--path-walk', and that (b) writing bitmaps
during repacking does not regress this improvement:
Test HEAD^ HEAD
----------------------------------------------------------------------------------------
5311.3: size of bitmapped pack 558.4M 558.5M +0.0%
5311.38: size of bitmapped pack (--path-walk) 164.4M 164.4M +0.0%
(Note that to observe an improvement here, we must repack with '-F' in
order to avoid reusing non-'--path-walk' deltas, which would otherwise
skew our results.)
There is one wrinkle when it comes to '--boundary', which we must not
pass into the bitmap walk in the presence of both '--path-walk' and
'--use-bitmap-index'. Path-walk needs boundary commits when it performs
its own traversal, in order to discover bases for thin packs, but the
bitmap traversal does not expect this. Work around this by setting
`revs->boundary` as late as possible within the '--path-walk' traversal,
after any bitmap attempt has either succeeded or declined to answer the
request.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
p5311 measures the cost of serving a fetch from a bitmapped pack and
indexing the resulting pack on the client. Since 761416ef91
(bitmap-lookup-table: add performance tests for lookup table,
2022-08-14), p5311 effectively runs itself twice: once with the bitmap's
lookup table extension enabled, and again with it disabled.
This comparison has served its useful purpose, as the lookup table is
almost four years old, and the de-facto default in server-side Git
deployments.
A following commit will want to test a different combination (repacking
with and without '--path-walk' instead of the lookup table). Instead of
multiplying the current test count by two again to produce four
variations of `test_fetch_bitmaps()`, drop the lookup table option to
reduce the number of perf tests we run. Retain `test_fetch_bitmaps()`
itself, since we will use this in the future for the new
parameterization.
(As an aside, a future commit outside of this series will adjust the
default value of 'pack.writeBitmapLookupTable' to "true", matching the
de-facto norm for deployments where the existence of bitmap lookup
tables is meaningful. Punt on that to a later series and instead make
the minimal change for now.)
Suggested-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>