Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes required to work with latest htsjdk #107

Merged
merged 1 commit into from
Aug 1, 2019

Conversation

tomwhite
Copy link
Member

This is in preparation for the next htsjdk release (and can't be merged until then) which has several changes to the CRAM API. It passes all tests locally when run against htsjdk master.

cc @lbergelson @cmnbroad

final Container container = containerFactory.buildContainer(cramRecords);
for (final Slice slice : container.slices) {
final Container container = containerFactory.buildContainer(cramRecords, offset);
for (final Slice slice : container.getSlices()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomwhite I think you can remove CRAMContainerStreamWriter2 entirely, and use CRAMContainerStreamWriter` directly now that samtools/htsjdk#1351 is merged ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, of course! I'd forgotten about that change...

@tomwhite tomwhite force-pushed the latest-htsjdk-changes branch 2 times, most recently from d1aa585 to ef7dcc7 Compare July 2, 2019 09:42
Copy link
Contributor

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One request, otherwise looks good to me.

@@ -125,6 +125,7 @@ public void registerClasses(final Kryo kryo) {
kryo.register(htsjdk.variant.variantcontext.VariantContext.Type.class);

// htsjdk.variant.vcf
kryo.register(htsjdk.variant.vcf.VCFAltHeaderLine.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomwhite The htsjdk PR that introduced this new header line class also added a couple of others that should probably be registered for serialization as well: VCFSampleHeaderLine, VCFMetaHeaderLine, VCFPedigreeHeaderLine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those VCF header line changes binary compatible with htsjdk 2.18.x? At some point we need to start thinking about how htsjdk and Disq releases synchronize with each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huermh Yes, the new header line classes are VCFHeaderLine-derived, so they should be binary compatible with previous VCFHeader/VCFHeaderLine consumers.

Remove CRAMContainerStreamWriter2 entirely
Use htsjdk snapshot for testing
Register VCFAltHeaderLine
Register more header line classes

Co-authored-by: Tom White <[email protected]>
Co-authored-by: Louis Bergelson <[email protected]>
@lbergelson lbergelson changed the title Changes required to work with latest htsjdk (not yet released) Changes required to work with latest htsjdk Jul 29, 2019
@lbergelson
Copy link
Contributor

@heuermh Could you take a look at the final form of this? I'd like to do a release with these changes in the near future if there aren't any objections.

@heuermh
Copy link
Contributor

heuermh commented Jul 29, 2019

@lbergelson Sure, I'll run some testing against this branch this afternoon. I ran into #112 and #116 running benchmarks in preparing for merging Disq into ADAM, I'll also try to see if there is anything actionable on those.

@heuermh
Copy link
Contributor

heuermh commented Jul 31, 2019

Bumping htsjdk to 2.20.0 and disq to 0.3.3-SNAPSHOT from this branch causes this ADAM unit test to fail

- convert alignments to reads *** FAILED ***
  "[6472783]:false" did not equal "[simread:1:101556378]:false"

AlignmentRecordDatasetSuite.scala:1871

Perhaps the way the sequence name is parsed has changed since 2.19.0?

@lbergelson
Copy link
Contributor

Strange... I don't remember any deliberate changes to how we parse sequence names.

@heuermh
Copy link
Contributor

heuermh commented Jul 31, 2019

Other than that, I haven't found any runtime issues in my benchmarking scripts, so I'm a tentative +1

@lbergelson
Copy link
Contributor

@heuermh I'm going to merge this and do release then so we can do a downstream gatk release on Tuesday. Let me know if you have any progress on that weird read name issue. Something seems very odd there.

@heuermh
Copy link
Contributor

heuermh commented Aug 1, 2019

Sounds good to me

@lbergelson lbergelson merged commit 7370ba6 into disq-bio:master Aug 1, 2019
@heuermh
Copy link
Contributor

heuermh commented Aug 10, 2019

@lbergelson I've created an ADAM pull request at bigdatagenomics/adam#2195 that demonstrates the read name issue with htsjdk versions 2.20.0 and 2.20.1. Very strange that only the first read name is truncated.

@heuermh heuermh added this to the 0.4.0 milestone Sep 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants