-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: resolve tilda paths and validate datadir config #4485
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall feedback, we need to slow down on logging. We just reduce our logging here #4469, and this PR seems like a regression.
Also I am not sure what the problem the PR is trying to solve. I see you are linking an issue, but that issue also doesn't have a description on why data dir with ~
is a problem
pkg/config/config.go
Outdated
// allow the users to set datadir to a path like ~/.bacalhau or ~/something/idk/whatever | ||
// and expand the path for them | ||
dataDirPath := c.base.GetString(types.DataDirKey) | ||
if dataDirPath[0] == '~' { | ||
log.Info().Msgf("configuration field 'DataDir' contains '~': (%s). Attempting to expand to the home directory...", dataDirPath) | ||
expanded, err := homedir.Expand(dataDirPath) | ||
if err == nil { | ||
dataDirPath = expanded | ||
c.base.Set(types.DataDirKey, dataDirPath) | ||
log.Info().Msgf("successfully expanded data directory to %s", dataDirPath) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing the purpose of expanding ~
. Also this is going to generate a lot of unnecessary logs for users. In most cases when I define a dataDir I use ~
and I believe most of our users do as we keep saying ~/.bacalahu
is the default repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am missing the purpose of expanding ~
see description here: #4485 (comment)
Also this is going to generate a lot of unnecessary logs for users
Okay, I will remove the logs.
In most cases when I define a dataDir I use ~ and I believe most of our users do as we keep saying ~/.bacalahu is the default repo
Thats the intention, but we need to expand it to a valid path before using it internally for filesystem operations.
pkg/config/config.go
Outdated
// validate the config | ||
var cfg types.Bacalhau | ||
if err := c.base.Unmarshal(&cfg); err != nil { | ||
return nil, fmt.Errorf("failed to unmarshal config: %w", err) | ||
} | ||
if err := cfg.Validate(); err != nil { | ||
return nil, fmt.Errorf("config invalid: %w", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a little weird to unmarshal types.Bacalhau
just for validation purposes, drop it, and then unmarshal again when setting up cli commands to get types.Bacalhau
again
pkg/config/types/default_config.go
Outdated
if repoDir, set := os.LookupEnv("BACALHAU_DIR"); set && repoDir != "" { | ||
return repoDir | ||
} else if set { | ||
log.Warn().Msg("BACALHAU_DIR environment variable is set but empty. Falling back to default directories.") | ||
} else { | ||
log.Debug().Msg("BACALHAU_DIR environment variable is not set. Trying to use $HOME.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are a noisy and unnecessary logs. We now log the config path and repo path that we ended up using, and that should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I will delete these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets not forget about this
pkg/config/types/default_config.go
Outdated
// Fallback: attempt to use the absolute path of the default directory | ||
path, err := filepath.Abs(defaultBacalhauDir) | ||
if err == nil { | ||
log.Info().Str("Directory", path).Msg("Bacalhau will initialize in current working directory.") | ||
return path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should. If no data directory was explicitly provided and user has no home, then we should just fail with a clear error message.
Meaning this method should just return an empty data dir and let the config validation take care of the rest.
To be honest, and I think I've asked this multipe times before, I don't know why we are still looking into BACALHAU_DIR
to get the default repo path. Default is ~/.bacalhau
, and then viper should take of the rest and replacing it with BACALHAU_DIR
if it was provided
Sorry, I should have provided a more verbose description in the issue - at a high level we need to expand
As an example, More generally this PR is:
Yes, this is the intention, the goal is for this to work as expected, right now it doesn't. #4457 was actually caused by this issue, for example try this: Terminal 1
Terminal 2
Observe this is the error returned:
Since |
I see what you mean that this change is needed. Though something is weird with the Using
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good overall. still some comments not addressed
pkg/config/config.go
Outdated
dataDirPath := c.base.GetString(types.DataDirKey) | ||
if dataDirPath[0] == '~' { | ||
expanded, err := homedir.Expand(dataDirPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not checking if dataDirPath is empty before fetching the first char. Lets just call homedir.Expand(dataDirPath)
without this condition as it is handling it internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pkg/config/config.go
Outdated
// validate the config | ||
var cfg types.Bacalhau | ||
if err := c.base.Unmarshal(&cfg); err != nil { | ||
return nil, fmt.Errorf("failed to unmarshal config: %w", err) | ||
} | ||
if err := cfg.Validate(); err != nil { | ||
return nil, fmt.Errorf("config invalid: %w", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed yesterday, lets move the validation to unmarshall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, done.
pkg/config/types/default_config.go
Outdated
if repoDir, set := os.LookupEnv("BACALHAU_DIR"); set && repoDir != "" { | ||
return repoDir | ||
} else if set { | ||
log.Warn().Msg("BACALHAU_DIR environment variable is set but empty. Falling back to default directories.") | ||
} else { | ||
log.Debug().Msg("BACALHAU_DIR environment variable is not set. Trying to use $HOME.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets not forget about this
- include method to validate lib for paths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few examples that should work, but aren't:
→ bacalhau serve --orchestrator --compute --data-dir=./here
failed to setup config: path must be absolute (start with '/')
DataDir ("./here") must be an absolute path
→ bacalhau serve --orchestrator --compute --data-dir here
failed to setup config: path must be absolute (start with '/')
DataDir ("here") must be an absolute path
# this works, which maybe it shouldn't
→ bacalhau config set --config config.yaml datadir ./here
10:46:06.327 | INF ../ProtocolLabs/workspace/bacalhau/cmd/cli/config/set.go:94 > Writing config to /Users/walid/bacalhau-versions/config.yaml
# but trying to fix it fail
→ bacalhau config set --config config.yaml datadir $(PWD)/here
failed to setup config: path must be absolute (start with '/')
DataDir ("./here") must be an absolute path
First, please do more through manual testing from your end and document them in the PR's description
Second, the error message path must be absolute (start with '/')
might be specific to unix and won't play nicely with windows
Third, you need to differentiate between setting the data-dir from config file and from cli flags as it makes sense if users define relative paths as cli flags, but I'll that to product to decide @aronchick
Lastly, since we are validating when unmarshalling the config, then we need to also validate before persisting config changes using config set
. Otherwise we will corrupt the config file and only manual intervention will be required to fix it. I might even say you should only validate after writing when using config set
and shouldn't validate before
Right now this PR enforces the $ bacalhau config set datadir $(pwd)/here
07:25:50.478 | INF pkg/repo/fs.go:91 > Initializing repo at /home/frrist/.bacalhau
07:25:50.479 | INF cmd/cli/config/set.go:94 > Writing config to /home/frrist/.bacalhau/config.yaml
$ cat ~/.bacalhau/config.yaml
datadir: /home/frrist/workspace/src/github.com/bacalhau-project/bacalhau/here
The tests are fairly thorough: https://github.com/bacalhau-project/bacalhau/pull/4485/files#diff-29cea1b5b831c8655c7155f43e2367cec73e66d1e338f8b3c7877a2f339b8811R19 is there a specific case you have in mind that is not covered here? Or it this comment more related to the nature of absolute vs relative paths?
Good point, I'll modify this instead say:
For the case when paths are passed via flags, I can modify this PR such that it attempts to expand them to absolute paths before accepting them as config. If they cannot be expanded then we should fail and not accept them as configuration.
Ooo yeah, good catch - will add validation to set before persisting the values to the file. |
You don't think unit tests for the config are enough, right? We keep finding issues with each config change, so obviously we can do better. As I mentioned several times before, try to break the feature, think of edge cases, and document the scenarios you ran. I am more than happy with manual tests for now, and we can migrate those to bashtub after launch.
The message still has duplication
Yeah that sounds right |
@wdbaruni can you review this, please? Needed for 1.5.0 |
i don't understand this. We are now allowing relative paths even in config files where the original goal was to no longer support relative paths, and expand them only if provided in cli flags as you mentioned below
relative paths in config files result in unpredictable behaviour as the data-dir will depend on where you are running the binary from and not where the config file is. The reason it makes sense to support them in cli flags is because the user will provide a path relative to where they are running the binary and that make sense. Here is an exmaple of unpredictable behavour:
|
- prevent config set from modifying the data dir path within the bacalhau repp - prevent invalid paths from being set via the config set command
I see, this is addressed by 56e7fd5. It seems we needed to add a couple things:
Here is the new behavior from your commands above with this change:
Additionally, removing the foot gun of modifying the data dir path of the config in the default repo location:
Ensure paths provided to
Ensure config files cannot be loaded with invalid paths already present:
Works with several config files:
The --config flags still accept relative paths, and expand them:
|
pkg/config/config.go
Outdated
// log the resolved config paths | ||
absoluteConfigPaths := make([]string, len(c.paths)) | ||
for i, path := range c.paths { | ||
absoluteConfigPaths[i], err = filepath.Abs(path) | ||
if err != nil { | ||
absoluteConfigPaths[i] = path | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed as we are replacing paths with abs paths in 119
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, I think this is residual from a merge, will remove this.
pkg/config/config.go
Outdated
// allow the users to set datadir to a path like ~/.bacalhau or ~/something/idk/whatever | ||
// and expand the path for them | ||
dataDirPath := c.base.GetString(types.DataDirKey) | ||
if dataDirPath != "" { | ||
dataDirPath, err := ExpandPath(dataDirPath) | ||
if err != nil { | ||
return nil, fmt.Errorf("%s invalid: %w", types.DataDirKey, err) | ||
} | ||
c.base.Set(types.DataDirKey, dataDirPath) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to expand? we do validate in line 138
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
138 is validating the contents of the config file(s) passed, ensuring iff repo dir is present in the file its an absolute path.
The logic here is expanding values provided via flags, envvars, and --config flags, expanding them on the users behalf.
pkg/config/types/default_config.go
Outdated
func DefaultDataDir() string { | ||
// this method runs before root.go, so we set the level to info for these calls, then return it to previous value | ||
currentLevel := zerolog.GlobalLevel() | ||
zerolog.SetGlobalLevel(zerolog.InfoLevel) | ||
defer zerolog.SetGlobalLevel(currentLevel) | ||
|
||
// Check if the BACALHAU_DIR environment variable is set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my concern with the warning logs you introduced here is there is no way for the user to avoid them, except by setting the env variable. If they run --data-dir <path>
they will still get warnings if they don't have home directory which doesn't make sense.
This method is about figuring out the default path, and it shouldn't fail if it couldn't. The config validation should fail later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only warn if the user set the env var to an empty value, a rare case, but possible. Would you prefer we remove the log line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true. You warn when you can't find the home directory and when you can't figure out a default repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment was in the context of the line you commented on. True, for other cases we log a warning: If we cannot determine a users home directory, and thus the default location where bacalhau initializes/opens it repo this seems worth a warning. But it sounds like you rather we remove all these log lines, is that correct?
1. only --data-dir and -c data.dir allow relative paths 2. all configurations allow ~ tilda paths 3. centralize expanding the tilda path in one place 4. validate once after resolving the configuration that the resolved path is abs and not empty 5. simplify default path logic
Testing done
Default
Env Var
Cli Flag
Config Value
Config File
Config Set
|
Running the command
bacalhau --repo=~/some/path <command>
results in a config with aDataDir
value of~/some/path
rather than an expanded path, e.g./home/$USER/some/path
. We must handle tilda paths gracefully by expanding them to an absolute path before accepting them as configuration.Further, we must either dis-allow relative paths, or expand them to absolute paths, before accepting them as configuration. In this PR we have chosen the latter. This is because the
DataDir
field is used as the root path for our hard-coded config paths, and said path must be absolute when provided to docker as a volume mount point, else we get errors like: #4506This PR adds path expansion when
DataDir
is a tilda path, and validation to enforcesDataDir
be an absolute path before accepting it as configuration.Below are some examples of valid, and invalid paths - enforced via unit tests:
Valid
Invalid
--repo
and--data-dir
flag don't expand paths #4484