diff options
Diffstat (limited to 'Documentation/technical/scalar.txt')
-rw-r--r-- | Documentation/technical/scalar.txt | 127 |
1 files changed, 127 insertions, 0 deletions
diff --git a/Documentation/technical/scalar.txt b/Documentation/technical/scalar.txt new file mode 100644 index 0000000000..08bc09c225 --- /dev/null +++ b/Documentation/technical/scalar.txt @@ -0,0 +1,127 @@ +Scalar +====== + +Scalar is a repository management tool that optimizes Git for use in large +repositories. It accomplishes this by helping users to take advantage of +advanced performance features in Git. Unlike most other Git built-in commands, +Scalar is not executed as a subcommand of 'git'; rather, it is built as a +separate executable containing its own series of subcommands. + +Background +---------- + +Scalar was originally designed as an add-on to Git and implemented as a .NET +Core application. It was created based on the learnings from the VFS for Git +project (another application aimed at improving the experience of working with +large repositories). As part of its initial implementation, Scalar relied on +custom features in the Microsoft fork of Git that have since been integrated +into core Git: + +* partial clone, +* commit graphs, +* multi-pack index, +* sparse checkout (cone mode), +* scheduled background maintenance, +* etc + +With the requisite Git functionality in place and a desire to bring the benefits +of Scalar to the larger Git community, the Scalar application itself was ported +from C# to C and integrated upstream. + +Features +-------- + +Scalar is comprised of two major pieces of functionality: automatically +configuring built-in Git performance features and managing repository +enlistments. + +The Git performance features configured by Scalar (see "Background" for +examples) confer substantial performance benefits to large repositories, but are +either too experimental to enable for all of Git yet, or only benefit large +repositories. As new features are introduced, Scalar should be updated +accordingly to incorporate them. This will prevent the tool from becoming stale +while also providing a path for more easily bringing features to the appropriate +users. + +Enlistments are how Scalar knows which repositories on a user's system should +utilize Scalar-configured features. This allows it to update performance +settings when new ones are added to the tool, as well as centrally manage +repository maintenance. The enlistment structure - a root directory with a +`src/` subdirectory containing the cloned repository itself - is designed to +encourage users to route build outputs outside of the repository to avoid the +performance-limiting overhead of ignoring those files in Git. + +Design +------ + +Scalar is implemented in C and interacts with Git via a mix of child process +invocations of Git and direct usage of `libgit.a`. Internally, it is structured +much like other built-ins with subcommands (e.g., `git stash`), containing a +`cmd_<subcommand>()` function for each subcommand, routed through a `cmd_main()` +function. Most options are unique to each subcommand, with `scalar` respecting +some "global" `git` options (e.g., `-c` and `-C`). + +Because `scalar` is not invoked as a Git subcommand (like `git scalar`), it is +built and installed as its own executable in the `bin/` directory, alongside +`git`, `git-gui`, etc. + +Roadmap +------- + +NOTE: this section will be removed once the remaining tasks outlined in this +roadmap are complete. + +Scalar is a large enough project that it is being upstreamed incrementally, +living in `contrib/` until it is feature-complete. So far, the following patch +series have been accepted: + +- `scalar-the-beginning`: The initial patch series which sets up + `contrib/scalar/` and populates it with a minimal `scalar` command that + demonstrates the fundamental ideas. + +- `scalar-c-and-C`: The `scalar` command learns about two options that can be + specified before the command, `-c <key>=<value>` and `-C <directory>`. + +- `scalar-diagnose`: The `scalar` command is taught the `diagnose` subcommand. + +Roughly speaking (and subject to change), the following series are needed to +"finish" this initial version of Scalar: + +- Finish Scalar features: Enable the built-in FSMonitor in Scalar enlistments + and implement `scalar help`. At the end of this series, Scalar should be + feature-complete from the perspective of a user. + +- Generalize features not specific to Scalar: In the spirit of making Scalar + configure only what is needed for large repo performance, move common + utilities into other parts of Git. Some of this will be internal-only, but one + major change will be generalizing `scalar diagnose` for use with any Git + repository. + +- Move Scalar to toplevel: Move Scalar out of `contrib/` and into the root of + `git`, including updates to build and install it with the rest of Git. This + change will incorporate Scalar into the Git CI and test framework, as well as + expand regression and performance testing to ensure the tool is stable. + +Finally, there are two additional patch series that exist in Microsoft's fork of +Git, but there is no current plan to upstream them. There are some interesting +ideas there, but the implementation is too specific to Azure Repos and/or VFS +for Git to be of much help in general. + +These still exist mainly because the GVFS protocol is what Azure Repos has +instead of partial clone, while Git is focused on improving partial clone: + +- `scalar-with-gvfs`: The primary purpose of this patch series is to support + existing Scalar users whose repositories are hosted in Azure Repos (which does + not support Git's partial clones, but supports its predecessor, the GVFS + protocol, which is used by Scalar to emulate the partial clone). + + Since the GVFS protocol will never be supported by core Git, this patch series + will remain in Microsoft's fork of Git. + +- `run-scalar-functional-tests`: The Scalar project developed a quite + comprehensive set of integration tests (or, "Functional Tests"). They are the + sole remaining part of the original C#-based Scalar project, and this patch + adds a GitHub workflow that runs them all. + + Since the tests partially depend on features that are only provided in the + `scalar-with-gvfs` patch series, this patch cannot be upstreamed. |