diff --git a/text/3529-cargo-path-bases.md b/text/3529-cargo-path-bases.md new file mode 100644 index 00000000000..eb9c9793006 --- /dev/null +++ b/text/3529-cargo-path-bases.md @@ -0,0 +1,479 @@ +- Feature Name: `path_bases` +- Start Date: 2023-11-13 +- RFC PR: [rust-lang/rfcs#3529](https://github.com/rust-lang/rfcs/pull/3529) +- Rust Issue: [rust-lang/cargo#14355](https://github.com/rust-lang/cargo/issues/14355) + +# Summary +[summary]: #summary + +Introduce a table of path "bases" in Cargo configuration files that can be used +to prefix the path of `path` dependencies and `patch` entries. + +This feature will not support declaring path bases in manifest files to avoid +additional design complexity, though this may be added in the future. + +# Motivation +[motivation]: #motivation + +As a project grows in size, it becomes necessary to split it into smaller +sub-projects, architected into layers with well-defined boundaries. + +One way to enforce these boundaries is to use different Git repos (aka +"multi-repo"). Cargo has good support for multi-repo projects using either `git` +dependencies, or developers can use private registries if they want to +explicitly publish code or need to preprocess their sub-projects (e.g., +generating code) before they can be consumed. + +If all of the code is kept in a single Git repo (aka "mono-repo"), then these +boundaries must be enforced a different way: either leveraging tooling during +the build to check layering, or requiring that sub-projects explicitly publish +and consume from some intermediate directory. Cargo has poor support for +mono-repos: the only viable mechanism is `path` dependencies, but these require +relative paths (which makes refactoring and moving sub-projects very difficult) +and don't work at all if the mono-repo requires publishing and consuming from an +intermediate directory (as this may very per host, or per target being built). + +This RFC proposes a mechanism to specify path bases in `config.toml` files which +can be used to prepend `path` dependencies. This allows mono-repos to specify +dependencies relative to their root directory, which allows the consuming +project to be moved freely (no relative paths to update) and a simple +find-and-replace to handle a producing project being moved. Additionally, a +host-specific or target-specific intermediate directory may be specified as a +`base`, allowing code to be consumed from there using `path` dependencies. + +### Example + +If we had a sub-project that depends on three others: + +* `foo` which is in a different layer of the mono-repo. +* `bar_with_generated` that must be consumed from an intermediate directory +because it contains target-specific generated code. +* `baz` which is in the current layer. + +We may have a `Cargo.toml` snippet that looks like this: + +```toml +[dependencies] +foo = { path = "../../../other_layer/foo" } +bar_with_generated = { path = "../../../../intermediates/x86_64/Debug/third_layer/bar_with_generated" } +baz = { path = "../baz" } +``` + +This has many issues: + +* Moving the current sub-project may require changing all of these relative +paths. +* `bar_with_generated` will only work if we're building x86_64 Debug. +* `bar_with_generated` assumes that the `intermediates` directory is a sibling +to our source directory, and not somewhere else completely (e.g., a different +drive for performance reasons). +* Moving `foo` or `baz` requires searching the code for each possible relative +path (e.g., `../../../other_layer/foo` and `../foo`) and may be error prone if +there is some other sub-project in directory with the same name. + +Instead, if we could specify these common paths as path bases in a `config.toml` +(which may be generated by an external build system which in turn invokes Cargo): + +```toml +[path-bases] +sources = "/home/user/dev/src" +intermediates = "/home/user/dev/intermediates/x86_64/Debug" +``` + +Then the `Cargo.toml` can use those path bases and avoid relative paths: + +```toml +[dependencies] +foo = { path = "other_layer/foo", base = "sources" } +bar_with_generated = { path = "third_layer/bar_with_generated", base = "intermediates" } +baz = { path = "this_layer/baz", base = "sources" } +``` + +Which resolves the issues we previously had: + +* The current project can be moved without modifying the `Cargo.toml` at all. +* `bar_with_generated` works for all targets (assuming the `config.toml` is + +generated). +* The `intermediates` directory can be placed anywhere. +* Moving `foo` or `baz` only requires searching for the canonical form relative +to the path base. + +## Other uses + +The ability to use path bases for `path` dependencies is convenient for +developers who are using a large number of `path` dependencies within the same +root directory. Instead of repeating the same path fragment many times in their +`Cargo.toml`, they can instead specify it once in a `config.toml` as a path +base, then use that path base in each of their `path` dependencies. + +Cargo will also provide built-in base paths, for example `workspace` to point to +the root directory of the workspace. This allows workspace members to reference +each other without first needing to `../` their way back to the workspace root. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +If you often use multiple path dependencies that have a common parent directory, +or if you want to avoid putting long paths in your `Cargo.toml`, you can +define path _base directories_ in your +[configuration](https://doc.rust-lang.org/cargo/reference/config.html). +Your path dependencies can then be specified relative to those base +directories. + +For example, say you have a number of projects checked out in +`/home/user/dev/rust/libraries/`. Rather than use that path in your +`Cargo.toml` files, you can define it as a "base" path in +`~/.cargo/config.toml`: + +```toml +[path-bases] +dev = "/home/user/dev/rust/libraries/" +``` + +Now, you can specify a path dependency on a library `foo` in that +directory in your `Cargo.toml` using + +```toml +[dependencies] +foo = { path = "foo", base = "dev" } +``` + +Like with other path dependencies, keep in mind that both the base _and_ +the path must exist on any other host where you want to use the same +`Cargo.toml` to build your project. + +You can also use `base` along with `path` when specifying a `[patch]`. +Specifying a `path` and `base` on a `[patch]` is equivalent to specifying just a +`path` containing the full path including the prepended base. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## Specifying Dependencies + +### Path Bases + +A `path` dependency may optionally specify a base by setting the `base` key to +the name of a path base from the `[path-bases]` table in either the +[configuration](https://doc.rust-lang.org/cargo/reference/config.html#path-bases) +or one of the [built-in path bases](#built-in-path-bases). The value of that +path base is prepended to the `path` value (along with a path separator if +necessary) to produce the actual location where Cargo will look for the +dependency. + +For example, if the `Cargo.toml` contains: + +```toml +[dependencies] +foo = { path = "foo", base = "dev" } +``` + +Given a `[path-bases]` table in the configuration that contains: + +```toml +[path-bases] +dev = "/home/user/dev/rust/libraries/" +``` + +This will produce a `path` dependency `foo` located at +`/home/user/dev/rust/libraries/foo`. + +Path bases can be either absolute or relative. Relative path bases are relative +to the parent directory of the configuration file that declared that path base. + +The name of a path base must use only [alphanumeric](https://doc.rust-lang.org/std/primitive.char.html#method.is_alphanumeric) +characters or `-` or `_`, must start with an [alphabetic](https://doc.rust-lang.org/std/primitive.char.html#method.is_alphabetic) +character, and must not be empty. + +If the name of path base used in a dependency is neither in the configuration +nor one of the built-in path base, then Cargo will raise an error. + +#### Built-in path base + +Cargo provides implicit path bases that can be used without the need to specify +them in a `[path-bases]` table. + +* `workspace` - If a project is [a workspace or workspace member](https://doc.rust-lang.org/cargo/reference/workspaces.html) +then this path base is defined as the parent directory of the root `Cargo.toml` +of the workspace. + +If a built-in path base name is also declared in the configuration, then Cargo +will prefer the value in the configuration. The allows Cargo to add new built-in +path bases without compatibility issues (as existing uses will shadow the +built-in name). + +## Configuration + +`[path-bases]` + +* Type: string +* Default: see below +* Environment: `CARGO_PATH_BASES_` + +The `[path-bases]` table defines a set of path prefixes that can be used to +prepend the locations of `path` dependencies. See the +[specifying dependencies](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#path-bases) +documentation for more information. + +## cargo add + +### Synopsis + +`cargo add` *[options]* `--path` *path* [`--base` *base*] + +### Options + +#### Source options + +`--base` *base* + +The [path base](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#path-bases) +to use when adding from a local crate. + +## Workspaces + +Path bases can be used in a workspace's `[dependencies]` table. + +If a member is inheriting a dependency (i.e., using `workspace = true`) then the +`base` key cannot also be specified for that dependency in the member manifest. +That is, the member will use the `path` dependency as specified in the workspace +manifest and has no ability to override the base path being used (if any). + +# Drawbacks +[drawbacks]: #drawbacks + +1. There is now an additional way to specify a dependency in + `Cargo.toml` that may not be accessible when others try to build the + same project. Specifically, it may now be that the other host has a + `path` dependency available at the same relative path to `Cargo.toml` + as the author of the `Cargo.toml` entry, but does not have the path base + defined (or has it defined as some other value). + + At the same time, this might make path dependencies _more_ re-usable + across hosts, since developers can dictate only which _bases_ need to + exist, rather than which _paths_ need to exist. This would allow + different developers to host their path dependencies in different + locations from the original author. +2. Developers still need to know the path _within_ each path base. We + could instead define path "aliases", though at that point the whole + thing looks more like a special kind of "local path registry". +3. This introduces yet another mechanism for grouping local + dependencies. We already have [local registries, directory + registries](https://doc.rust-lang.org/cargo/reference/source-replacement.html), + and the [`[paths]` + override](https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html#paths-overrides). + However, those are all intended for immutable local copies of + dependencies where versioning is enforced, rather than as mutable + path dependencies. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This design was primarily chosen for its simplicity — it adds very +little to what we have today both in terms of API surface and mechanism. +But, other approaches exist. + +Developers could have their `path` dependencies point to symlinks in the +current directory, which other developers would then be told to set up +to point to the appropriate place on their system. This approach has two +main drawbacks: they are harder to use on Windows as they [require +special privileges](https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/create-symbolic-links), +and they pollute the user's project directory. + +For the build-system case, the build system could place vendored +dependencies directly into the source directory at well-known locations, +though this would mean that if the source of those dependencies were to +change, the user would have to re-run the build system (rather than just +run `cargo`) to refresh the vendored dependency. And this approach too +would end up polluting the user's source directory. + +An earlier iteration of the design avoided adding a new field to +dependencies, and instead inlined the base name into the path using +`path = "base::relative/path"`. This has the advantage of not +introducing another special keyword in `Cargo.toml`, but comes at the +cost of making `::` illegal in paths, which was deemed too great. + +Alternatively, we could add support for extrapolating environment +variables (or arbitrary configuration values?) in `Cargo.toml` values. +That way, the path could be given as `path = +"${base.name}/relative/path"`. While that works, it's not trivially +backwards compatible, may be confusing when users try to extrapolate +random other configuration variables in their paths, and _seems_ like a +possible Pandora's box of corner-cases. + +The [`[paths]` +feature](https://doc.rust-lang.org/cargo/reference/overriding-dependencies.html#paths-overrides) +could be updated to lift its current limitations around adding +dependencies and requiring that the dependencies be available on +crates.io. This would allow users to avoid `path` dependencies in more +cases, but makes the replacement more implicit than explicit. That +change is also more likely to break existing users, and to involve +significant refactoring of the existing mechanism. + +We could add another type of local registry that is explicitly declared +in `Cargo.toml`, and from which local dependencies could then be drawn. +Something like: + +```toml +[registry.local] +path = "/path/to/path/registry" +``` + +This would make specifying the dependencies somewhat nicer (`version = +"1", registry = "local"`), and would ensure a standard layout for the +locations of the local dependencies. However, using local dependencies +in this manner would require more set-up to arrange for the right +registry layout, and we would be introducing what is effectively a +mutable registry, which Cargo has avoided thus far. + +Even with such an approach, there are benefits to being able to not put +complex paths into `Cargo.toml` as they may differ on other build hosts. +So, a mechanism for indirecting through a path name may still be +desirable. + +Ultimately, by not having a mechanism to name paths that lives outside +of `Cargo.toml`, we are forcing developers to coordinate their file +system layouts without giving them a mechanism for doing so. Or to work +around the lack of a mechanism by requiring developers to add symlinks +in strategic locations, cluttering their directories. The proposed +mechanism is simple to understand and to use, and still covers a wide +variety of use-cases. + +## Support for declaring path bases in the manifest + +Currently path bases only support being declared in the configuration, and not +the manifest. While it would be possible to add support for declaring path bases +in the manifest in the future (which would require specifying if the declaration +in the manifest or configuration is prefered, and how workspace versus members +declarations work), it is hard to justify the additional complexity of adding of +adding this capability to the initial implementation of the feature. + +An argument could be made that specifying path bases in the manifest is a +convenience feature, allowing a common path where multiple local dependencies +exist to be specified as a path base so that the individual path dependencies +would be shorter. However, it would be just as easy to add a configuration file +to some parent directory of the dependent and this would be more useful as it is +likely that those dependencies will also be used in other local packages thus +saving the path bases table being duplicated in multiple manifests. + +It could also be argued that specifying path bases in the manifest would be a +way to set "default values" for path dependencies (e.g., to a submodule) that a +developer could override in their local configuration file. While this may be +useful, this scenario is already taken care of by the `patch` feature in Cargo. + +# Prior art +[prior-art]: #prior-art + +Python searches for dependencies by walking `sys.path` in definition +order, which [is pulled +from](https://docs.python.org/3/tutorial/modules.html#the-module-search-path) +the current directory, `PYTHONPATH`, and a list of system-wide library +directories. All imports are thus "relative" to every directory in +`sys.path`. This makes it easy to inject local development dependencies +simply by injecting a path early in `sys.path`. The path dependency is +never made explicit anywhere in Python. We _could_ adopt a similar +approach by declaring an environment variable `CARGO_PATHS`, where every +`path` is considered relative to each path in `CARGO_PATHS` until a path +that exists is found. However, this introduces additional possibilities +for user confusion if, say, `foo` exists in multiple paths in +`CARGO_PATHS` and the first one is picked (though maybe that could be a +warning?). + +NodeJS (with npm) is very similar to Python, except that dependencies +can also be +[specified](https://nodejs.org/api/modules.html#modules_all_together) +using relative paths like Cargo's `path` dependencies. For non-path +dependencies, it searches in [`node_modules/` in every parent +directory](https://nodejs.org/api/modules.html#modules_loading_from_node_modules_folders), +as well as in the [`NODE_PATH` search +path](https://nodejs.org/api/modules.html#modules_loading_from_the_global_folders). +There does not exist a standard mechanism to specify a path dependency +relative to a path named elsewhere. With CommonJS modules, JavaScript +developers are able to extrapolate variables directly into their +`require` arguments, and can thus implement custom schemes for getting +customizable paths. + +Ruby's `Gemfile` [path +dependencies](https://bundler.io/man/gemfile.5.html#PATH) are only ever +absolute paths or paths relative to the `Gemfile`'s location, and so are +similar to Rust's current `path` dependencies. + +The same is the case for Go's `go.mod` [replacement +dependencies](https://golang.org/doc/modules/managing-dependencies#tmp_10), +which only allow absolute or relative paths. + +From this, it's clear that other major languages do not have a feature +quite like this. This is likely because path dependencies are assumed +to be short-lived and local, and thus having them be host-specific is +often good enough. However, as the motivation section of this RFC +outlines, there are still use-cases where a simple name-indirection +could help. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +* What exact names we should use for the table (`path-bases`) and field names +(`base`)? +* What other built-in base paths could be useful? + * `package` or `current-dir` for the directory of the current project? + * `home` or `user_home` for the user's home directory? + * `sysroot` for the current rustc sysroot? + +# Future possibilities +[future-possibilities]: #future-possibilities + +## Add support for declaring path bases in the manifest + +As mentioned [above](#support-for-declaring-path-bases-in-the-manifest), +declaring path bases is only supported in the configuration. + +Support could be added to declare path bases in the manifest, but the following +design questions need to be answered: + +* Is `[path-bases]` a package or a workspace field? +* If it is a package field, would it support workspace inheritance? Or would we +introduce a new mechanism (e.g., one version of the RFC introduced a "search +order" such that Cargo would search for a path base in the package manifest, +then the workspace manifest, then the configuration and finally the built-in +list). +* Would a relative path base in the workspace manifest be relative to that +manifest, or to the package that uses it? +* If using inheritance, should path bases be implicitly or explicitly inherited? +(e.g., requiring `[base-paths] workspace = true`) + +## Path bases relative to other path bases + +We could allow defining a path base relative to another path base: + +```toml +[path-bases] +base1 = "/dev/me" +base2 = { base = "base1", path = "some_subdir" } # /dev/me/some_subdir +``` + +## Path dependency with just a base + +We could allow defining a path dependency with *just* `base`, making +`cratename = { base = "thebase" }` equivalent to +`cratename = { base = "thebase", path = "cratename" }`. This would simplify many +common cases, where crates appear within the base in a directory named for the +crate. + +## Git dependencies + +It seems reasonable to extend path bases to `git` dependencies, with something +like: + +```toml +[path-bases] +gh = "https://github.com/jonhoo" +``` + +```toml +[dependency] +foo = { git = "foo.git", base = "gh" } +``` + +However, this may get complicated if someone specifies `git`, `path`, _and_ +`base`.