Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large File Support: Use I/O, not filepath wherever possible #162

Open
atz opened this issue Jun 5, 2017 · 0 comments
Open

Large File Support: Use I/O, not filepath wherever possible #162

atz opened this issue Jun 5, 2017 · 0 comments

Comments

@atz
Copy link
Contributor

atz commented Jun 5, 2017

It is costly to pull down files and write them to disk unnecessarily. For sufficiently large files, this will break the ingest/derivative pipeline. This is made worse by attempts at job parallelization, where each job (potentially serviced on a different worker box) incurs this cost. But it is possible to avoid this problem.

Even though we are forking to shell for many of the non-ruby derivative processors, we should avoid forcing the input (and ideally output) to be literal filesystem files, when there is no such legitimate need:

This also allows optimizations for processors that don't use the bulk of a large file (e.g., only the metadata and first 2 minutes of, say, a 6 hour video). They can read until satisfied and then reset/close the IO. Most of the GBs are never pulled down, never put in memory, and never written to disk.

With a cloud-based platform like Hyku, it is very conceivable that this derivatives code is the tightest bottleneck in supporting large files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant