-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'cpgplot' input sequence not recognized #47
Comments
Marek,
Please show us some code, then we can take a look.
But I am noting that you said that when you use “asis:” your script works, presumably you are providing a string when you use “asis", like:
…-sequencea => "asis:$sequence_str"
If you do not provide a string then your script should supply a Sequence object, like:
-sequencea => $seq_object
This “asis” business is built in to EMBOSS, so you can run a command this way from the command line:
water -asequence=asis:gatc -bsequence=asis:gatc
Otherwise, the EMBOSS applications expect file names, not strings, at the command line.
Brian O.
On Feb 24, 2017, at 2:15 PM, piatekmj ***@***.***> wrote:
Session info:
perl = v5.22.0
Mac OS = 10.11.6
BioPerl-Run-1.007001
From Bio::Factory::EMBOSS when trying to provide input sequence for cpgplot as a variable, I do get an error message saying:
Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
This seems to be composition dependent as I do not get that for every sequence provided. It seems this error only shows up for sequences that do have CpG island identified.
Also occasionally I do get error message saying:
Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309
The above only applies for some cases not for all, I only get those messages for some sequences.
When I, however provide the input sequence with asis prefix like so: sequence => "asis:$sequence" program completes with output.
As long as we are on cpgplot, I would like to also mention that with plot => 'Y' provided, I do get another error message for sequences for which CpG islands have been identified:
*** PLPLOT ERROR ***
Unable to either (1) open/find or (2) allocate memory for the font file
Program aborted
This can be turned off by providing plot => 'N' if one is not interested in graphical output.
I do have plplot installed with my package manager.
Any assistance would be appreciated.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#47>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQ99FI0NeyTblkRXdAYEvrMafQg9f3sks5rfyw2gaJpZM4MLkmS>.
|
Hi Brian, The code I'm testing is as follows:
The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them. I have to have the With With the The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between Could there be something wrong with the installation maybe? Thanks, Additional session info: |
Marek,
I don’t have your input file but I suspect that using a long string with returns in it is not working, as “-sequence". Try using Sequence objects, rather than strings:
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
my $genome = 'genome.fa';
my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta',
);
my %assembly;
while (my $seq = $inseq->next_seq) {
$assembly{$seq->id} = $seq;
}
my $factory = new Bio::Factory::EMBOSS;
my $prog = $factory -> program('cpgplot');
foreach my $frag (keys %assembly){
print "Processing $frag\n";
my %input = (
-sequence => $assembly{$frag},
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'N',
);
print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";
}
Brian O.
… On Feb 27, 2017, at 9:56 AM, piatekmj ***@***.***> wrote:
Hi Brian,
The code I'm testing is as follows:
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
my $genome = 'genome.fa';
my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta',
);
my %assembly;
while (my $seq = $inseq->next_seq) {
$assembly{$seq->id} = $seq->seq;
}
my $factory = new Bio::Factory::EMBOSS;
my $prog = $factory -> program('cpgplot');
foreach my $frag (keys %assembly){
print "Processing $frag\n";
my %input = (
-sequence => "asis:$assembly{$frag}",
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'N',
);
print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";
}
The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver <http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot> contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.
I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.
With -sequence => $assembly{$frag}, I get some failures reported:
Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
or
Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2
With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.
The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 <http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot> or webserver2 <http://www.ebi.ac.uk/Tools/seqstats/emboss_cpgplot/> (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)
Could there be something wrong with the installation maybe?
Thanks,
Marek
Additional session info:
print $factory->version(); # prints 6.5.7.0
print $factory->location(); # prints local
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#47 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQ99Bqc9Uh6NYpUW6iHX5mML-xp38yrks5rguQDgaJpZM4MLkmS>.
|
Marek,
OK, I see you tried using Sequence objects. How long are the sequences in “genome.fa”?
Brian O.
… On Feb 27, 2017, at 9:56 AM, piatekmj ***@***.***> wrote:
Hi Brian,
The code I'm testing is as follows:
use Bio::Factory::EMBOSS;
use Bio::SeqIO;
my $genome = 'genome.fa';
my $inseq = Bio::SeqIO->new(
-file => $genome,
-format => 'fasta',
);
my %assembly;
while (my $seq = $inseq->next_seq) {
$assembly{$seq->id} = $seq->seq;
}
my $factory = new Bio::Factory::EMBOSS;
my $prog = $factory -> program('cpgplot');
foreach my $frag (keys %assembly){
print "Processing $frag\n";
my %input = (
-sequence => "asis:$assembly{$frag}",
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'N',
);
print "$frag\tCheck 1\t";
$prog->run(\%input);
print "Check 2\n";
}
The above produces output that is different than what I obtain from any EMBOSS webserver using the same settings. The output file from webserver <http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot> contains CpG islands for every fasta entry in my 5MB input file containing 12 fasta sequences, while my script outputs results only for 6 of them.
I have to have the -plot => 'N', switch otherwise I get errors related to PLPLOT.
With -sequence => $assembly{$frag}, I get some failures reported:
Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
or
Check 1 Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309 Check 2
With the -sequence => "asis:$assembly{$frag}", I get no errors but results differ from a webserver even when used with the same settings.
The concerns I have are regarding 1) the errors/warnings related to PLPLOT; 2) why is there a difference between -sequence => "asis:$assembly{$frag}" and just -sequence => "$assembly{$frag}" where the variable holds just a DNA sequence 3) the output being different between a webserver and local script using the same settings using webserver1 <http://www.bioinformatics.nl/cgi-bin/emboss/cpgplot> or webserver2 <http://www.ebi.ac.uk/Tools/seqstats/emboss_cpgplot/> (EBI's cpgplot doesn't accept files larger than 1MB). (I'm not able to check what version of EMBOSS is being used by any of these two)
Could there be something wrong with the installation maybe?
Thanks,
Marek
Additional session info:
print $factory->version(); # prints 6.5.7.0
print $factory->location(); # prints local
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#47 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQ99Bqc9Uh6NYpUW6iHX5mML-xp38yrks5rguQDgaJpZM4MLkmS>.
|
They are not very long. See the lengths of the sequences below: Interestingly (or not), I only get reported results from 'unitig_11' down. Is there a sequence length limits imposed on cpgplot and other EMBOSS scripts? |
Brian, it seems I was able to overcome the problem by using your suggestion of using sequence objects.
Then later on using It seems the major issue is now solved, leaving me wonder why was the message so cryptic? |
Marek,
Excellent. Regarding plplot, are you sure that the executable is in your PATH? I think it’s called pltek.
Brian O.
… On Feb 27, 2017, at 12:05 PM, piatekmj ***@***.***> wrote:
Brian, it seems I was able to overcome the problem by using your suggestion of using sequence objects.
while (my $seq = $inseq->next_seq) {
$assembly{$seq->id} = $seq;
}
Then later on using -sequence => $assembly{$frag} produces the exact output as from example webserver, without any error if used in conjunction with -plot => 'N'.
It seems the major issue is now solved, leaving me wonder why was the message so cryptic?
If you would be able to provide any assistance with regards to the PLPLOT issue, it would be great.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#47 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQ99IGcHzVTQwcEqrxfK3VJDlYER4wMks5rgwJtgaJpZM4MLkmS>.
|
pltek is in my PATH indeed. If you instructed me, I could provide some more details regarding that. |
Marek,
I don’t know much about how plotting works in EMBOSS, but this seems to work for me:
my %input = (
-sequence => $assembly{$frag},
-outfile => "${frag}.cpgplot",
-window => 100,
-minlen => 200,
-minoe => 0.6,
-minpc => 50,
-plot => 'Y',
-graph => 'pdf'
);
Brian O.
… On Feb 27, 2017, at 3:10 PM, piatekmj ***@***.***> wrote:
pltek is in my PATH indeed. If you instructed me, I could provide some more details regarding that.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#47 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAQ99Md6PnA3uo-3pTw65sKbJsDcM7cnks5rgy3DgaJpZM4MLkmS>.
|
Session info:
perl = v5.22.0
Mac OS = 10.11.6
BioPerl-Run-1.007001
From Bio::Factory::EMBOSS when trying to provide input sequence for
cpgplot
as a variable, I do get an error message saying:Died: ajResourceinNewDrcat 'AGGT ... (abbreviated for clarity) ... GGAG' failed to find drcat
This seems to be composition dependent as I do not get that for every sequence provided. It seems this error only shows up for sequences that do have CpG island identified.
Also occasionally I do get error message saying:
Uncaught exception: Formatting Overflow, raised at ajfmt.c:1309
The above only applies for some cases not for all, I only get those messages for some sequences.
When I, however provide the input sequence with
asis
prefix like so:sequence => "asis:$sequence"
program completes with output.As long as we are on cpgplot, I would like to also mention that with
plot => 'Y'
provided, I do get another error message for sequences for which CpG islands have been identified:This can be turned off by providing
plot => 'N'
if one is not interested in graphical output.I do have plplot installed with my package manager.
Any assistance would be appreciated.
The text was updated successfully, but these errors were encountered: