Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[investigate/prototype] speech_to_text_generation_service approach 2: Explore AWS SageMaker #4

Open
3 tasks
Tracked by #1 ...
jmartin-sul opened this issue Sep 12, 2024 · 2 comments
Open
3 tasks
Tracked by #1 ...

Comments

@jmartin-sul
Copy link
Member

jmartin-sul commented Sep 12, 2024

Keep in mind the guidelines at the top of #1. In particular, the ones about ultimately using Terraform for deployment, and about access to model configuration.

Note: some naming might change, depending on terminology decisions, see https://github.com/orgs/sul-dlss/projects/65/views/1?pane=issue&itemId=79627337

SageMaker is "a fully managed machine learning service". Whisper is a model that can be used with it. see:

Questions to determine whether this is a viable option

  • Is it available via Cardinal Cloud? Is there a clear setting or API parameter or similar that we trust will prevent SageMaker from using our data as training data?
  • Does it expose the Whisper configuration parameters we care about?
  • What's it cost relative to e.g. a Docker container that we run in ECS? Is it acceptably affordable?
@jmartin-sul jmartin-sul changed the title Possible approach 2: Explore ("a fully managed machine learning service"). Whisper is one of its available models. transcript_generation_service 2: Explore ("a fully managed machine learning service"). Whisper is one of its available models. Sep 12, 2024
@jmartin-sul jmartin-sul changed the title transcript_generation_service 2: Explore ("a fully managed machine learning service"). Whisper is one of its available models. transcript_generation_service approach 2: Explore ("a fully managed machine learning service"). Whisper is one of its available models. Sep 12, 2024
@jmartin-sul jmartin-sul changed the title transcript_generation_service approach 2: Explore ("a fully managed machine learning service"). Whisper is one of its available models. transcript_generation_service approach 2: Explore AWS SageMaker Sep 12, 2024
@jmartin-sul jmartin-sul changed the title transcript_generation_service approach 2: Explore AWS SageMaker [investigate/prototype] transcript_generation_service approach 2: Explore AWS SageMaker Sep 12, 2024
@peetucket peetucket self-assigned this Sep 12, 2024
@jmartin-sul jmartin-sul changed the title [investigate/prototype] transcript_generation_service approach 2: Explore AWS SageMaker [investigate/prototype] speech_to_text_generation_service approach 2: Explore AWS SageMaker Sep 13, 2024
@peetucket peetucket removed their assignment Sep 18, 2024
@jmartin-sul
Copy link
Member Author

blocked till we can get permissions issues straightened out

@jmartin-sul
Copy link
Member Author

maybe more for building and tuning models, and less for deploying, but we'd like to confirm this.

was in blocked due to AWS perms issues, but we think this is actionable now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants