Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRY/Refactor hard coded config file keys #1646

Open
dliappis opened this issue Jan 4, 2023 · 8 comments
Open

DRY/Refactor hard coded config file keys #1646

dliappis opened this issue Jan 4, 2023 · 8 comments
Labels
good first issue Small, contained changes that are good for newcomers

Comments

@dliappis
Copy link
Contributor

dliappis commented Jan 4, 2023

The motivation of this issue is to prevent bugs like #1645.

The Config object uses various keys for storing configuration properties. Currently many of those get populated during argument parsing (e.g.

rally/esrally/rally.py

Lines 1024 to 1029 in 4c7141a

cfg.add(config.Scope.applicationOverride, "system", "list.config.option", args.configuration)
cfg.add(config.Scope.applicationOverride, "system", "list.max_results", args.limit)
cfg.add(config.Scope.applicationOverride, "system", "admin.track", args.track)
cfg.add(config.Scope.applicationOverride, "system", "list.races.benchmark_name", args.benchmark_name)
cfg.add(config.Scope.applicationOverride, "system", "list.from_date", args.from_date)
cfg.add(config.Scope.applicationOverride, "system", "list.races.to_date", args.to_date)
), and the same keys are referenced, again as strings, in other code locations e.g. in metrics.py.

This approach is brittle and error prone. We should DRY things up and instead reference the config options via meaningful variable names in some module (e.g. config.py) such that a typo becomes a real error.

@dliappis dliappis added the good first issue Small, contained changes that are good for newcomers label Jan 4, 2023
@MadhuMPandurangi
Copy link

Hi @dliappis, I'm interested in working on this issue, can you please assign me this.
Thank you

@dliappis
Copy link
Contributor Author

@MadhuMPandurangi thank you for your interest. I've assigned it to you.

@sakurai-youhei
Copy link
Member

@dliappis I'm also interested. How about this way? I feel like I may reinvent the wheel somewhere else, though.

# erally/const.py
from collections import UserDict


class Node(UserDict):
    def __init__(self):
        super().__init__(self._iterattrs())

    def _iterattrs(self):
        attributes = vars(type(self))
        for k in attributes:
            if k.startswith('_'):
                continue
            yield k, attributes[k]


class Section(Node):
    def _iterattrs(self):
        for _, v in super()._iterattrs():
            yield v[1], v


class SectionA(Section):
    _section = "section_a"
    snake_case_key = (_section, "snake_case_key")
    dot_notation_key = (_section, "dot.notation.key")


class Root(Node):
    section_a: SectionA = SectionA()


CONFIG_PATH = Root()

With the above, (section, key) will be accessible like this:

from esrally.const import CONFIG_PATH as CFG

CFG.section_a.snake_case_key
CFG.section_a.dot_notation_key
CFG.section_a["dot.notation.key"]

So, the example code will be refactored in the following with some bonuses, such as nice code completion, pprint capability thanks to an inheritance from dict, the same line length between before and after the change, etc. In addition to those, if you prefer, we can capitalize sections and keys (or inject more magics for auto-conversion) to be able to write like *CFG.SYSTEM.LIST_CONFIG_OPTION.

 cfg.add(config.Scope.applicationOverride, *CFG.system.list_config_option, args.configuration) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_max_results, args.limit) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.admin_track, args.track) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_races_benchmark_name, args.benchmark_name) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_from_date, args.from_date) 
 cfg.add(config.Scope.applicationOverride, *CFG.system.list_races_to_date, args.to_date) 
--- a.py        2023-10-26 19:08:49.359047597 +0900
+++ b.py        2023-10-26 19:09:01.960758334 +0900
@@ -1 +1 @@
- cfg.add(config.Scope.applicationOverride, "system", "list.config.option", args.configuration)
+ cfg.add(config.Scope.applicationOverride, *CFG.system.list_config_option, args.configuration)

There'd be other ways, like using dataclass or some non-standard libraries, but I thought this is probably good enough not to overkill.

@dliappis
Copy link
Contributor Author

cc @elastic/es-perf

@gbanasiak
Copy link
Contributor

gbanasiak commented Oct 27, 2023

@sakurai-youhei Many thanks for your interest in making Rally better, feel free to assign this issue to yourself and contribute.

I've looked at the proposal based on UserDict. My understanding is the benefit of UserDict is the following usage: CFG.section_a["dot.notation.key"] but is this a requirement at all? I think all we need is to refer to config keys as class attributes, so perhaps there's a room for making it simpler? To be honest it took me a while to figure out what your code is doing.

I like the idea of returning tuples and unpacking them in cfg API calls. I don't like the _section repetition in the section class code.

I was thinking about something as simple as this:

class Section:
    def __init__(self, name):
        self._name = name

    def __getattribute__(self, item):
        # object.__getattribute__ required instead of self.__getattribute__ to prevent an infinite loop
        return object.__getattribute__(self, "_name"), object.__getattribute__(self, item)


class System(Section):
    def __init__(self):
        super().__init__("system")

    # description ...
    list_config_option = "list.config.option"
    # description ...
    list_max_results = "list.max_results"


class Root:
    system: System = System()


CFG = Root()
print(CFG.system.list_config_option)
print(CFG.system.list_max_results)
#print(CFG.system.does_not_exist) <--- error

with a simple path to . removal (breaking change with config file version increment):

class Section:
    def __init__(self, name):
        self._name = name

    def __getattribute__(self, item):
        # object.__getattribute__ required instead of self.__getattribute__ to prevent an infinite loop
        return object.__getattribute__(self, "_name"), item


class System(Section):
    def __init__(self):
        super().__init__("system")

    # description ...
    list_config_option = None
    # description ...
    list_max_results = None

I'm also curious what was your idea with the dataclass?

To move forward I would suggest to start working on a single section (the smallest, the better) to see how it goes.

Edit: Changed super() to object in __getattribute__ and added a comment.

@sakurai-youhei
Copy link
Member

@gbanasiak Thank you for your comment.

I have played my idea, your idea, dataclass, etc., more to see what would be the best, and I find enum can be used to express dot-notation straightforwardly like the PR #1795.

I hope you like it, as I have also incorporated the concept of avoiding repetition and giving descriptions.

@sakurai-youhei
Copy link
Member

P.S. I have come to the conclusion that the literal check like this PR #1798 should come first before DRY, refactoring, auto-completion, etc.

https://github.com/elastic/rally/actions/runs/6773039463/job/18406946647?pr=1798#step:5:202
image

@gbanasiak
Copy link
Contributor

@sakurai-youhei I like this approach. I'll leave more detailed comments in each PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Small, contained changes that are good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants