-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Add reports generation as ActiveJob #3620
base: main
Are you sure you want to change the base?
Conversation
This is a first pass at creating the era audit reports through an active job. This first approach seems too long as it generates all the reports in the same job. We also iterate through all Item and Thesis multiple times. Should we split the report generation through multiple jobs?
Gemfile
Outdated
@@ -38,6 +38,7 @@ gem 'bcrypt', '>= 3.1.13' | |||
gem 'omniauth', '~> 2.1' | |||
gem 'omniauth-rails_csrf_protection', '~> 1.0' | |||
gem 'omniauth-saml', '~> 2.1' | |||
gem 'omniauth_openid_connect', '~> 0.8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not have been added, I will remove the changes in the Gemfile at a later push.
# Report 1: Metadata only records | ||
|
||
def report_metadata_only_records | ||
[Item, Thesis].each do |klass| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We see this iteration for most reports, which is not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this repetition feels smelly/inefficient... but I'm not sure we do anything about it at this point.
I was thinking about pulling out my Design Patterns book because maybe there is a pattern (Strategy, Composite?) that makes sense to solve this problem.
I was also thinking about seeing if reek had feedback that could improve or DRY this code. My second thought was that maybe we don't want to optimize for computational efficiencies and preference human readbility in this case. Here's what reek had to say if you're interested.
$ reek app/jobs/generate_reports_job.rb
Inspecting 1 file(s):
S
app/jobs/generate_reports_job.rb -- 28 warnings:
[46, 46]:DuplicateMethodCall: GenerateReportsJob#report_metadata_only_records calls 'klass.rdf_annotation_for_attr(key)' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
[132, 134]:DuplicateMethodCall: GenerateReportsJob#report_multifile_records calls 'entity.files' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
[126, 126]:DuplicateMethodCall: GenerateReportsJob#report_multifile_records calls 'klass.rdf_annotation_for_attr(key)' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
[97, 97]:DuplicateMethodCall: GenerateReportsJob#report_records_with_compressed_files calls 'klass.rdf_annotation_for_attr(key)' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
[68, 68, 69, 75]:FeatureEnvy: GenerateReportsJob#report_file_types refers to 'entity_file_types' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
[43, 44, 46, 46, 50]:FeatureEnvy: GenerateReportsJob#report_metadata_only_records refers to 'klass' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
[123, 124, 126, 126, 131]:FeatureEnvy: GenerateReportsJob#report_multifile_records refers to 'klass' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
[94, 95, 97, 97, 103]:FeatureEnvy: GenerateReportsJob#report_records_with_compressed_files refers to 'klass' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
[1]:InstanceVariableAssumption: GenerateReportsJob assumes too much for instance variable '@root_directory' [https://github.com/troessner/reek/blob/v6.1.4/docs/Instance-Variable-Assumption.md]
[1]:InstanceVariableAssumption: GenerateReportsJob assumes too much for instance variable '@time_of_start' [https://github.com/troessner/reek/blob/v6.1.4/docs/Instance-Variable-Assumption.md]
[1]:IrresponsibleModule: GenerateReportsJob has no descriptive comment [https://github.com/troessner/reek/blob/v6.1.4/docs/Irresponsible-Module.md]
[75]:NestedIterators: GenerateReportsJob#report_file_types contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[66]:NestedIterators: GenerateReportsJob#report_file_types contains iterators nested 3 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[45]:NestedIterators: GenerateReportsJob#report_metadata_only_records contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[50]:NestedIterators: GenerateReportsJob#report_metadata_only_records contains iterators nested 3 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[125]:NestedIterators: GenerateReportsJob#report_multifile_records contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[134]:NestedIterators: GenerateReportsJob#report_multifile_records contains iterators nested 4 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[96]:NestedIterators: GenerateReportsJob#report_records_with_compressed_files contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[106]:NestedIterators: GenerateReportsJob#report_records_with_compressed_files contains iterators nested 4 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
[46, 97, 126]:RepeatedConditional: GenerateReportsJob tests 'klass.rdf_annotation_for_attr(key).present?' at least 3 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Repeated-Conditional.md]
[59]:TooManyStatements: GenerateReportsJob#report_file_types has approx 11 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
[41]:TooManyStatements: GenerateReportsJob#report_metadata_only_records has approx 10 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
[121]:TooManyStatements: GenerateReportsJob#report_multifile_records has approx 13 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
[82]:TooManyStatements: GenerateReportsJob#report_records_with_compressed_files has approx 15 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
[5]:UnusedParameters: GenerateReportsJob#perform has unused parameter 'args' [https://github.com/troessner/reek/blob/v6.1.4/docs/Unused-Parameters.md]
[27]:UtilityFunction: GenerateReportsJob#get_collection_url doesn't depend on instance state (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Utility-Function.md]
[22]:UtilityFunction: GenerateReportsJob#get_community_url doesn't depend on instance state (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Utility-Function.md]
[17]:UtilityFunction: GenerateReportsJob#get_entity_url doesn't depend on instance state (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Utility-Function.md]
I think the answer depends on how we see it being used. If we anticipate the need to only run the whole set, this approach makes sense. If we anticipate rerunning specific reports, then breaking the reports into different jobs should be the priority. Testing and error handling/recovery might be simpler in separate jobs. I think we don't overthink or prematurely optimize it too much at this point. I'd recommend continuing to convert the scripts to a single job. We can always iterate if we find it's not working well based on lived experience. If it works or we only use it once, then we've saved ourselves a bunch of time. |
Context
We are generating reports for the entities inside era and we have been doing this with external scripts. We are interested in automating them.
Having the report generation inside the code base will also detect changes that break these steps.
What's New
At this point I am looking for feedback on weather to break up the report generation into different jobs, keep as is in a single large job, or other options.