Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(torchx/scheduler)(aws) add instance type for aws_batch_scheduler multinode jobs #781

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

clumsy
Copy link
Contributor

@clumsy clumsy commented Oct 19, 2023

add instance type for aws_batch_scheduler multinode jobs

Test plan:

  • added two unit tests: one for singlenode and one for multinode job

add instance type for aws_batch_scheduler multinode jobs
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 19, 2023
@codecov
Copy link

codecov bot commented Oct 19, 2023

Codecov Report

Merging #781 (09ba107) into main (a711634) will decrease coverage by 0.01%.
Report is 1 commits behind head on main.
The diff coverage is 97.36%.

@@            Coverage Diff             @@
##             main     #781      +/-   ##
==========================================
- Coverage   92.88%   92.88%   -0.01%     
==========================================
  Files          96       96              
  Lines        6099     6112      +13     
==========================================
+ Hits         5665     5677      +12     
- Misses        434      435       +1     
Flag Coverage Δ
unittests 92.88% <97.36%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
torchx/schedulers/aws_batch_scheduler.py 90.44% <100.00%> (+0.13%) ⬆️
torchx/specs/named_resources_aws.py 98.50% <97.05%> (-1.50%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@clumsy clumsy force-pushed the feature/780 branch 2 times, most recently from c8b83c5 to d5c4f7c Compare October 19, 2023 19:03
@clumsy clumsy force-pushed the feature/780 branch 4 times, most recently from 69a4653 to 744b37d Compare October 23, 2023 21:29
@clumsy
Copy link
Contributor Author

clumsy commented Oct 23, 2023

@kiukchung please find an updated commit based on our discussion.

@clumsy clumsy force-pushed the feature/780 branch 2 times, most recently from db3d484 to 09ba107 Compare October 24, 2023 17:20
@clumsy
Copy link
Contributor Author

clumsy commented Oct 24, 2023

btw lint failure seems unrelated to my change, @kiukchung

@clumsy clumsy mentioned this pull request Oct 30, 2023
@schmidt-ai
Copy link
Contributor

Excited for this!

@kiukchung
Copy link
Collaborator

@clumsy let me know if you're blocked on anything here.

@kiukchung kiukchung changed the title fixes #780 (torchx/scheduler)(aws) add instance type for aws_batch_scheduler multinode jobs Nov 2, 2023
@clumsy
Copy link
Contributor Author

clumsy commented Nov 2, 2023

@kiukchung just curious if you agree it's better to introduce an AWS resource subclass to provide (self-documented) instance_type attribute instead of relying on K8S_TYPE behavior that needs to be documented separately. Asking because I'd like to use instance_type for my upcoming SageMaker scheduler contribution.
Basically If you're OK with the current PR - I'll address the failures and submit a new version.

@kiukchung
Copy link
Collaborator

kiukchung commented Nov 2, 2023

@kiukchung just curious if you agree it's better to introduce an AWS resource subclass to provide (self-documented) instance_type attribute instead of relying on K8S_TYPE behavior that needs to be documented separately. Asking because I'd like to use instance_type for my upcoming SageMaker scheduler contribution. Basically If you're OK with the current PR - I'll address the failures and submit a new version.

I'd like to avoid introducing subclasses for Resource to keep the specs API easier to serialize (e.g. to JSON, YAML, Proto). We can use K8S_TYPE (and document that this will be used by the batch and SM schedulers to set instance type params in the respective scheduler's request).

Edit: what would be useful is to have a util method that both batch and sm schedulers use that returns the aws instance type given a Resource (it would pull this info from K8S_TYPE key from the capabilities map). This method would also throw an error with a useful message when K8S_TYPE is not found, perhaps prompting the user to add one to their resource specs for self-service fix.

@clumsy
Copy link
Contributor Author

clumsy commented Nov 3, 2023

@kiukchung I've made the changes as discussed, the docs build failure does not seem to be caused by my changes.

@kiukchung
Copy link
Collaborator

LGTM thanks

@kiukchung kiukchung merged commit b1e56b2 into pytorch:main Nov 3, 2023
21 of 22 checks passed
@clumsy clumsy deleted the feature/780 branch November 3, 2023 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants