Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Add reports generation as ActiveJob #3620

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

WIP Add reports generation as ActiveJob #3620

wants to merge 3 commits into from

Conversation

lagoan
Copy link
Contributor

@lagoan lagoan commented Oct 30, 2024

Context

We are generating reports for the entities inside era and we have been doing this with external scripts. We are interested in automating them.

Having the report generation inside the code base will also detect changes that break these steps.

What's New

At this point I am looking for feedback on weather to break up the report generation into different jobs, keep as is in a single large job, or other options.

This is a first pass at creating the era audit reports through an active
job.

This first approach seems too long as it generates all the reports in
the same job. We also iterate through all Item and Thesis multiple
times.

Should we split the report generation through multiple jobs?
Gemfile Outdated
@@ -38,6 +38,7 @@ gem 'bcrypt', '>= 3.1.13'
gem 'omniauth', '~> 2.1'
gem 'omniauth-rails_csrf_protection', '~> 1.0'
gem 'omniauth-saml', '~> 2.1'
gem 'omniauth_openid_connect', '~> 0.8'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not have been added, I will remove the changes in the Gemfile at a later push.

# Report 1: Metadata only records

def report_metadata_only_records
[Item, Thesis].each do |klass|
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We see this iteration for most reports, which is not ideal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this repetition feels smelly/inefficient... but I'm not sure we do anything about it at this point.

I was thinking about pulling out my Design Patterns book because maybe there is a pattern (Strategy, Composite?) that makes sense to solve this problem.

I was also thinking about seeing if reek had feedback that could improve or DRY this code. My second thought was that maybe we don't want to optimize for computational efficiencies and preference human readbility in this case. Here's what reek had to say if you're interested.

$ reek app/jobs/generate_reports_job.rb 
Inspecting 1 file(s):
S

app/jobs/generate_reports_job.rb -- 28 warnings:
  [46, 46]:DuplicateMethodCall: GenerateReportsJob#report_metadata_only_records calls 'klass.rdf_annotation_for_attr(key)' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
  [132, 134]:DuplicateMethodCall: GenerateReportsJob#report_multifile_records calls 'entity.files' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
  [126, 126]:DuplicateMethodCall: GenerateReportsJob#report_multifile_records calls 'klass.rdf_annotation_for_attr(key)' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
  [97, 97]:DuplicateMethodCall: GenerateReportsJob#report_records_with_compressed_files calls 'klass.rdf_annotation_for_attr(key)' 2 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Duplicate-Method-Call.md]
  [68, 68, 69, 75]:FeatureEnvy: GenerateReportsJob#report_file_types refers to 'entity_file_types' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
  [43, 44, 46, 46, 50]:FeatureEnvy: GenerateReportsJob#report_metadata_only_records refers to 'klass' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
  [123, 124, 126, 126, 131]:FeatureEnvy: GenerateReportsJob#report_multifile_records refers to 'klass' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
  [94, 95, 97, 97, 103]:FeatureEnvy: GenerateReportsJob#report_records_with_compressed_files refers to 'klass' more than self (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Feature-Envy.md]
  [1]:InstanceVariableAssumption: GenerateReportsJob assumes too much for instance variable '@root_directory' [https://github.com/troessner/reek/blob/v6.1.4/docs/Instance-Variable-Assumption.md]
  [1]:InstanceVariableAssumption: GenerateReportsJob assumes too much for instance variable '@time_of_start' [https://github.com/troessner/reek/blob/v6.1.4/docs/Instance-Variable-Assumption.md]
  [1]:IrresponsibleModule: GenerateReportsJob has no descriptive comment [https://github.com/troessner/reek/blob/v6.1.4/docs/Irresponsible-Module.md]
  [75]:NestedIterators: GenerateReportsJob#report_file_types contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [66]:NestedIterators: GenerateReportsJob#report_file_types contains iterators nested 3 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [45]:NestedIterators: GenerateReportsJob#report_metadata_only_records contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [50]:NestedIterators: GenerateReportsJob#report_metadata_only_records contains iterators nested 3 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [125]:NestedIterators: GenerateReportsJob#report_multifile_records contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [134]:NestedIterators: GenerateReportsJob#report_multifile_records contains iterators nested 4 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [96]:NestedIterators: GenerateReportsJob#report_records_with_compressed_files contains iterators nested 2 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [106]:NestedIterators: GenerateReportsJob#report_records_with_compressed_files contains iterators nested 4 deep [https://github.com/troessner/reek/blob/v6.1.4/docs/Nested-Iterators.md]
  [46, 97, 126]:RepeatedConditional: GenerateReportsJob tests 'klass.rdf_annotation_for_attr(key).present?' at least 3 times [https://github.com/troessner/reek/blob/v6.1.4/docs/Repeated-Conditional.md]
  [59]:TooManyStatements: GenerateReportsJob#report_file_types has approx 11 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
  [41]:TooManyStatements: GenerateReportsJob#report_metadata_only_records has approx 10 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
  [121]:TooManyStatements: GenerateReportsJob#report_multifile_records has approx 13 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
  [82]:TooManyStatements: GenerateReportsJob#report_records_with_compressed_files has approx 15 statements [https://github.com/troessner/reek/blob/v6.1.4/docs/Too-Many-Statements.md]
  [5]:UnusedParameters: GenerateReportsJob#perform has unused parameter 'args' [https://github.com/troessner/reek/blob/v6.1.4/docs/Unused-Parameters.md]
  [27]:UtilityFunction: GenerateReportsJob#get_collection_url doesn't depend on instance state (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Utility-Function.md]
  [22]:UtilityFunction: GenerateReportsJob#get_community_url doesn't depend on instance state (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Utility-Function.md]
  [17]:UtilityFunction: GenerateReportsJob#get_entity_url doesn't depend on instance state (maybe move it to another class?) [https://github.com/troessner/reek/blob/v6.1.4/docs/Utility-Function.md]

@pgwillia
Copy link
Member

At this point I am looking for feedback on weather to break up the report generation into different jobs, keep as is in a single large job, or other options.

I think the answer depends on how we see it being used. If we anticipate the need to only run the whole set, this approach makes sense. If we anticipate rerunning specific reports, then breaking the reports into different jobs should be the priority. Testing and error handling/recovery might be simpler in separate jobs.

I think we don't overthink or prematurely optimize it too much at this point. I'd recommend continuing to convert the scripts to a single job. We can always iterate if we find it's not working well based on lived experience. If it works or we only use it once, then we've saved ourselves a bunch of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants