summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJunio C Hamano <gitster@pobox.com>2022-07-27 18:16:54 +0200
committerJunio C Hamano <gitster@pobox.com>2022-07-27 18:16:54 +0200
commit3a03633812cb60bf2f59c77a485a283141e7e199 (patch)
treec5e102bb0920a25b435fcffc7222bfd8b644947e
parentMerge branch 'ds/doc-wo-whitelist' (diff)
parentscalar: convert README.md into a technical design doc (diff)
downloadgit-3a03633812cb60bf2f59c77a485a283141e7e199.tar.xz
git-3a03633812cb60bf2f59c77a485a283141e7e199.zip
Merge branch 'vd/scalar-doc'
Doc update. * vd/scalar-doc: scalar: convert README.md into a technical design doc scalar: reword command documentation to clarify purpose
-rw-r--r--Documentation/technical/scalar.txt127
-rw-r--r--contrib/scalar/README.md82
-rw-r--r--contrib/scalar/scalar.txt9
3 files changed, 131 insertions, 87 deletions
diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt
new file mode 100644
index 0000000000..08bc09c225
--- /dev/null
+++ b/Documentation/technical/scalar.txt
@@ -0,0 +1,127 @@
+Scalar
+======
+
+Scalar is a repository management tool that optimizes Git for use in large
+repositories. It accomplishes this by helping users to take advantage of
+advanced performance features in Git. Unlike most other Git built-in commands,
+Scalar is not executed as a subcommand of 'git'; rather, it is built as a
+separate executable containing its own series of subcommands.
+
+Background
+----------
+
+Scalar was originally designed as an add-on to Git and implemented as a .NET
+Core application. It was created based on the learnings from the VFS for Git
+project (another application aimed at improving the experience of working with
+large repositories). As part of its initial implementation, Scalar relied on
+custom features in the Microsoft fork of Git that have since been integrated
+into core Git:
+
+* partial clone,
+* commit graphs,
+* multi-pack index,
+* sparse checkout (cone mode),
+* scheduled background maintenance,
+* etc
+
+With the requisite Git functionality in place and a desire to bring the benefits
+of Scalar to the larger Git community, the Scalar application itself was ported
+from C# to C and integrated upstream.
+
+Features
+--------
+
+Scalar is comprised of two major pieces of functionality: automatically
+configuring built-in Git performance features and managing repository
+enlistments.
+
+The Git performance features configured by Scalar (see "Background" for
+examples) confer substantial performance benefits to large repositories, but are
+either too experimental to enable for all of Git yet, or only benefit large
+repositories. As new features are introduced, Scalar should be updated
+accordingly to incorporate them. This will prevent the tool from becoming stale
+while also providing a path for more easily bringing features to the appropriate
+users.
+
+Enlistments are how Scalar knows which repositories on a user's system should
+utilize Scalar-configured features. This allows it to update performance
+settings when new ones are added to the tool, as well as centrally manage
+repository maintenance. The enlistment structure - a root directory with a
+`src/` subdirectory containing the cloned repository itself - is designed to
+encourage users to route build outputs outside of the repository to avoid the
+performance-limiting overhead of ignoring those files in Git.
+
+Design
+------
+
+Scalar is implemented in C and interacts with Git via a mix of child process
+invocations of Git and direct usage of `libgit.a`. Internally, it is structured
+much like other built-ins with subcommands (e.g., `git stash`), containing a
+`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()`
+function. Most options are unique to each subcommand, with `scalar` respecting
+some "global" `git` options (e.g., `-c` and `-C`).
+
+Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is
+built and installed as its own executable in the `bin/` directory, alongside
+`git`, `git-gui`, etc.
+
+Roadmap
+-------
+
+NOTE: this section will be removed once the remaining tasks outlined in this
+roadmap are complete.
+
+Scalar is a large enough project that it is being upstreamed incrementally,
+living in `contrib/` until it is feature-complete. So far, the following patch
+series have been accepted:
+
+- `scalar-the-beginning`: The initial patch series which sets up
+ `contrib/scalar/` and populates it with a minimal `scalar` command that
+ demonstrates the fundamental ideas.
+
+- `scalar-c-and-C`: The `scalar` command learns about two options that can be
+ specified before the command, `-c <key>=<value>` and `-C <directory>`.
+
+- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
+
+Roughly speaking (and subject to change), the following series are needed to
+"finish" this initial version of Scalar:
+
+- Finish Scalar features: Enable the built-in FSMonitor in Scalar enlistments
+ and implement `scalar help`. At the end of this series, Scalar should be
+ feature-complete from the perspective of a user.
+
+- Generalize features not specific to Scalar: In the spirit of making Scalar
+ configure only what is needed for large repo performance, move common
+ utilities into other parts of Git. Some of this will be internal-only, but one
+ major change will be generalizing `scalar diagnose` for use with any Git
+ repository.
+
+- Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of
+ `git`, including updates to build and install it with the rest of Git. This
+ change will incorporate Scalar into the Git CI and test framework, as well as
+ expand regression and performance testing to ensure the tool is stable.
+
+Finally, there are two additional patch series that exist in Microsoft's fork of
+Git, but there is no current plan to upstream them. There are some interesting
+ideas there, but the implementation is too specific to Azure Repos and/or VFS
+for Git to be of much help in general.
+
+These still exist mainly because the GVFS protocol is what Azure Repos has
+instead of partial clone, while Git is focused on improving partial clone:
+
+- `scalar-with-gvfs`: The primary purpose of this patch series is to support
+ existing Scalar users whose repositories are hosted in Azure Repos (which does
+ not support Git's partial clones, but supports its predecessor, the GVFS
+ protocol, which is used by Scalar to emulate the partial clone).
+
+ Since the GVFS protocol will never be supported by core Git, this patch series
+ will remain in Microsoft's fork of Git.
+
+- `run-scalar-functional-tests`: The Scalar project developed a quite
+ comprehensive set of integration tests (or, "Functional Tests"). They are the
+ sole remaining part of the original C#-based Scalar project, and this patch
+ adds a GitHub workflow that runs them all.
+
+ Since the tests partially depend on features that are only provided in the
+ `scalar-with-gvfs` patch series, this patch cannot be upstreamed.
diff --git a/contrib/scalar/README.md b/contrib/scalar/README.md
deleted file mode 100644
index 634b5771ed..0000000000
--- a/contrib/scalar/README.md
+++ /dev/null
@@ -1,82 +0,0 @@
-# Scalar - an opinionated repository management tool
-
-Scalar is an add-on to Git that helps users take advantage of advanced
-performance features in Git. Originally implemented in C# using .NET Core,
-based on the learnings from the VFS for Git project, most of the techniques
-developed by the Scalar project have been integrated into core Git already:
-
-* partial clone,
-* commit graphs,
-* multi-pack index,
-* sparse checkout (cone mode),
-* scheduled background maintenance,
-* etc
-
-This directory contains the remaining parts of Scalar that are not (yet) in
-core Git.
-
-## Roadmap
-
-The idea is to populate this directory via incremental patch series and
-eventually move to a top-level directory next to `gitk-git/` and to `git-gui/`. The
-current plan involves the following patch series:
-
-- `scalar-the-beginning`: The initial patch series which sets up
- `contrib/scalar/` and populates it with a minimal `scalar` command that
- demonstrates the fundamental ideas.
-
-- `scalar-c-and-C`: The `scalar` command learns about two options that can be
- specified before the command, `-c <key>=<value>` and `-C <directory>`.
-
-- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand.
-
-- `scalar-and-builtin-fsmonitor`: The built-in FSMonitor is enabled in `scalar
- register` and in `scalar clone`, for an enormous performance boost when
- working in large worktrees. This patch series necessarily depends on Jeff
- Hostetler's FSMonitor patch series to be integrated into Git.
-
-- `scalar-gentler-config-locking`: Scalar enlistments are registered in the
- user's Git config. This usually does not represent any problem because it is
- rare for a user to register an enlistment. However, in Scalar's functional
- tests, Scalar enlistments are created galore, and in parallel, which can lead
- to lock contention. This patch series works around that problem by re-trying
- to lock the config file in a gentle fashion.
-
-- `scalar-extra-docs`: Add some extensive documentation that has been written
- in the original Scalar project (all subject to discussion, of course).
-
-- `optionally-install-scalar`: Now that Scalar is feature (and documentation)
- complete and is verified in CI builds, let's offer to install it.
-
-- `move-scalar-to-toplevel`: Now that Scalar is complete, let's move it next to
- `gitk-git/` and to `git-gui/`, making it a top-level command.
-
-The following two patch series exist in Microsoft's fork of Git and are
-publicly available. There is no current plan to upstream them, not because I
-want to withhold these patches, but because I don't think the Git community is
-interested in these patches.
-
-There are some interesting ideas there, but the implementation is too specific
-to Azure Repos and/or VFS for Git to be of much help in general (and also: my
-colleagues tried to upstream some patches already and the enthusiasm for
-integrating things related to Azure Repos and VFS for Git can be summarized in
-very, very few words).
-
-These still exist mainly because the GVFS protocol is what Azure Repos has
-instead of partial clone, while Git is focused on improving partial clone:
-
-- `scalar-with-gvfs`: The primary purpose of this patch series is to support
- existing Scalar users whose repositories are hosted in Azure Repos (which
- does not support Git's partial clones, but supports its predecessor, the GVFS
- protocol, which is used by Scalar to emulate the partial clone).
-
- Since the GVFS protocol will never be supported by core Git, this patch
- series will remain in Microsoft's fork of Git.
-
-- `run-scalar-functional-tests`: The Scalar project developed a quite
- comprehensive set of integration tests (or, "Functional Tests"). They are the
- sole remaining part of the original C#-based Scalar project, and this patch
- adds a GitHub workflow that runs them all.
-
- Since the tests partially depend on features that are only provided in the
- `scalar-with-gvfs` patch series, this patch cannot be upstreamed.
diff --git a/contrib/scalar/scalar.txt b/contrib/scalar/scalar.txt
index c0425e0653..1a12dc4507 100644
--- a/contrib/scalar/scalar.txt
+++ b/contrib/scalar/scalar.txt
@@ -3,7 +3,7 @@ scalar(1)
NAME
----
-scalar - an opinionated repository management tool
+scalar - A tool for managing large Git repositories
SYNOPSIS
--------
@@ -20,10 +20,9 @@ scalar delete <enlistment>
DESCRIPTION
-----------
-Scalar is an opinionated repository management tool. By creating new
-repositories or registering existing repositories with Scalar, your Git
-experience will speed up. Scalar sets advanced Git config settings,
-maintains your repositories in the background, and helps reduce data sent
+Scalar is a repository management tool that optimizes Git for use in large
+repositories. Scalar improves performance by configuring advanced Git settings,
+maintaining repositories in the background, and helping to reduce data sent
across the network.
An important Scalar concept is the enlistment: this is the top-level directory