-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with analysis specific command line parameters #500
Comments
I think configuration files (JSON or YAML...) should be used instead of options. |
Yes, it would be elegant, but it requires to rewrite a number of things - both in the scripts and in Java side (the input for Java would then be a file name, then we should parse the file marshal it to Java objects, manage the irrelevant parameters etc. What about the documentation of the parameters on script levels? Validation on script levels? Some of the questions I think of now. But either we use options or config file, I think the first step would be the separation of general and specific parameters. So for K10plus it would look like this general:
- schemaType: PICA
- marcFormat: PICA_NORMALIZED
- emptyLargeCollectors: true
- groupBy: 001@\$0
- groupListFile: src/main/resources/k10plus-libraries-by-unique-iln.txt
- ignorableFields 001@,001E,001L,001U,001U,001X,001X,002V,003C,003G,003Z,008G,017N,020F,027D,031B,037I,039V,042@,046G,046T,101@,101E,101U,102D,201E,201U,202D,1...,2...
- allowableRecords: '[email protected] !~ "^L" && [email protected] !~ "^..[iktN]" && ([email protected] !~ "^.v" || 021A.a?)'
- solrForScoresUrl: http://localhost:8983/solr/k10plus_pica_grouped_validation
index:
- indexWithTokenizedField: true
- indexFieldCounts: true I need to think over this option. |
The current configuration format is a set of parameters passed to script
As we are free to choose, I'd prefer TOML. Here is an example: # general configuration
# can be overridden by same name arguments to qa-catalogue script
# can also be read from .env file (this should then be removed when proper configuration files exist)
name = "MyCat"
mask = "*.dat"
schema = "PICA"
# some general configuration is currently set by --params
ignorableFields = ["001@","001E","001L","001U","001U","001X","001X","002V","003C","003G","003Z",
"008G","017N","020F","027D","031B","037I","039V","042@","046G","046T",
"101@","101E","101U","102D","201E","201U","202D","1...","2..."]
# additional configuration for individual analysis tasks
[validate]
indexWithTokenizedField = true
indexFieldCounts = true In any case it's important to keep the configuration names used by |
@nichtich I like this kind of configuration. The only open question I see now is the following: there is a list of TOML parsers here: https://github.com/toml-lang/toml/wiki, and it includes bash parser as well: stoml. So we should include it as a dependency. I do not see linux package that you can easily install, we should fetch it from the Github project page's release section. |
With the current approach of CLI scripts there is a problem. In the specific analysis shell scripts (such as
./validate
) we list all the allowed parameters that are list as short and long options. But when we run commands such as "all" we put every parameters into a TYPE_PARAMS variable. When validate parses the parameters withgetopt
it removes the unknown parameter names, but keeps their parameters.For example:
the parameters (@$) are
and after getops parsing they become
--unknown_param3
is removed, but its parametervalue3
remains there!We have two types of parameters: i) those that have parameter values ii) those that haven't. It is not a problem for tpye ii, but it moves the parameter values of the unknown parameters into the file list, and it cases errors.
How to solve this issue?
I could imagine the following solutions:
./common-script
does it in a limited fashion../common-script
and in./qa-catalogue
we handle those. It is the most elegant solution, but not backward compatible, the current scripts should be revised to check the usage ofTYPE_PARAMS
(those calls./common-script
directly) and--params
(those calls./qa-catalogue
).getops
, allow every parameters, but handle the unknown parameters on Java side (I am not sure if it is possible).@nichtich What do you think?
The text was updated successfully, but these errors were encountered: