-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes required to work with latest htsjdk #107
Conversation
final Container container = containerFactory.buildContainer(cramRecords); | ||
for (final Slice slice : container.slices) { | ||
final Container container = containerFactory.buildContainer(cramRecords, offset); | ||
for (final Slice slice : container.getSlices()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomwhite I think you can remove CRAMContainerStreamWriter2
entirely, and use CRAMContainerStreamWriter` directly now that samtools/htsjdk#1351 is merged ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course! I'd forgotten about that change...
d1aa585
to
ef7dcc7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One request, otherwise looks good to me.
@@ -125,6 +125,7 @@ public void registerClasses(final Kryo kryo) { | |||
kryo.register(htsjdk.variant.variantcontext.VariantContext.Type.class); | |||
|
|||
// htsjdk.variant.vcf | |||
kryo.register(htsjdk.variant.vcf.VCFAltHeaderLine.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomwhite The htsjdk PR that introduced this new header line class also added a couple of others that should probably be registered for serialization as well: VCFSampleHeaderLine, VCFMetaHeaderLine, VCFPedigreeHeaderLine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are those VCF header line changes binary compatible with htsjdk 2.18.x? At some point we need to start thinking about how htsjdk and Disq releases synchronize with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huermh Yes, the new header line classes are VCFHeaderLine-derived, so they should be binary compatible with previous VCFHeader/VCFHeaderLine consumers.
25e2d9c
to
6ce7fce
Compare
Remove CRAMContainerStreamWriter2 entirely Use htsjdk snapshot for testing Register VCFAltHeaderLine Register more header line classes Co-authored-by: Tom White <[email protected]> Co-authored-by: Louis Bergelson <[email protected]>
6ce7fce
to
ce2ecf8
Compare
@heuermh Could you take a look at the final form of this? I'd like to do a release with these changes in the near future if there aren't any objections. |
@lbergelson Sure, I'll run some testing against this branch this afternoon. I ran into #112 and #116 running benchmarks in preparing for merging Disq into ADAM, I'll also try to see if there is anything actionable on those. |
Bumping htsjdk to 2.20.0 and disq to 0.3.3-SNAPSHOT from this branch causes this ADAM unit test to fail
AlignmentRecordDatasetSuite.scala:1871 Perhaps the way the sequence name is parsed has changed since 2.19.0? |
Strange... I don't remember any deliberate changes to how we parse sequence names. |
Other than that, I haven't found any runtime issues in my benchmarking scripts, so I'm a tentative +1 |
@heuermh I'm going to merge this and do release then so we can do a downstream gatk release on Tuesday. Let me know if you have any progress on that weird read name issue. Something seems very odd there. |
Sounds good to me |
@lbergelson I've created an ADAM pull request at bigdatagenomics/adam#2195 that demonstrates the read name issue with htsjdk versions 2.20.0 and 2.20.1. Very strange that only the first read name is truncated. |
This is in preparation for the next htsjdk release (and can't be merged until then) which has several changes to the CRAM API. It passes all tests locally when run against htsjdk master.
cc @lbergelson @cmnbroad