You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know that the optimization work is yet to come, but I'm wondering if this could be caused by something else. I have two versions of RegexRedux and, on my M1, processing the full 50 MB text file takes roughly 9 seconds with NSRegularExpression and over 3 minutes with StringProcessing:
import Foundation
extensionString{func countMatches(of pattern:String)->Int{letregex=try!NSRegularExpression(pattern: pattern)letrange=NSRange(location:0, length:self.count)return regex.numberOfMatches(in:self, range: range)}}letinput=String(data:FileHandle.standardInput.readDataToEndOfFile(), encoding:.utf8)!
letsequence= input.replacingOccurrences(of:#">[^\n]*\n|\n"#, with:"", options:.regularExpression)letresultLength=Task.detached{[(regex:"tHa[Nt]", replacement:"<4>"),(regex:"aND|caN|Ha[DS]|WaS", replacement:"<3>"),(regex:"a[NSt]|BY", replacement:"<2>"),(regex:"<[^>]*>", replacement:"|"),(regex:"\\|[^|][^|]*\\|", replacement:"-")].reduce(sequence){ buffer, iub inreturn buffer.replacingOccurrences(of: iub.regex, with: iub.replacement, options:.regularExpression)}.count
}letvariants=["agggtaaa|tttaccct","[cgt]gggtaaa|tttaccc[acg]","a[act]ggtaaa|tttacc[agt]t","ag[act]gtaaa|tttac[agt]ct","agg[act]taaa|ttta[agt]cct","aggg[acg]aaa|ttt[cgt]ccct","agggt[cgt]aa|tt[acg]accct","agggta[cgt]a|t[acg]taccct","agggtaa[cgt]|[acg]ttaccct"]awaitwithTaskGroup(of:(variant: String, count: Int).self){ group in
for variant in variants {
group.addTask{(variant, sequence.countMatches(of: variant))}}letcounts=await group.reduce(into:[:]){$0[$1.variant]= $1.count }
for variant in variants {print(variant,counts[variant]??0)}}print("", input.count, sequence.count,await resultLength.value, separator:"\n")
Except for the countMatches extension on String, I've tried to keep both programs roughly the same.
import Foundation
letinput=String(data:FileHandle.standardInput.readDataToEndOfFile(), encoding:.utf8)!
letsequence= input.replacing(try!Regex(">[^\n]*\n|\n"), with:"")letresultLength=Task.detached{[(regex:"tHa[Nt]", replacement:"<4>"),(regex:"aND|caN|Ha[DS]|WaS", replacement:"<3>"),(regex:"a[NSt]|BY", replacement:"<2>"),(regex:"<[^>]*>", replacement:"|"),(regex:"\\|[^|][^|]*\\|", replacement:"-")].reduce(sequence){ buffer, iub inreturn buffer.replacing(try!Regex(iub.regex), with: iub.replacement)}.count
}letvariants=["agggtaaa|tttaccct","[cgt]gggtaaa|tttaccc[acg]","a[act]ggtaaa|tttacc[agt]t","ag[act]gtaaa|tttac[agt]ct","agg[act]taaa|ttta[agt]cct","aggg[acg]aaa|ttt[cgt]ccct","agggt[cgt]aa|tt[acg]accct","agggta[cgt]a|t[acg]taccct","agggtaa[cgt]|[acg]ttaccct"]awaitwithTaskGroup(of:(variant: String, count: Int).self){ group in
for variant in variants {
group.addTask{(variant, sequence.matches(of:try!Regex(variant)).count)}}letcounts=await group.reduce(into:[:]){$0[$1.variant]= $1.count }
for variant in variants {print(variant,counts[variant]??0)}}print("", input.count, sequence.count,await resultLength.value, separator:"\n")
I know that the optimization work is yet to come, but I'm wondering if this could be caused by something else. I have two versions of RegexRedux and, on my M1, processing the full 50 MB text file takes roughly 9 seconds with NSRegularExpression and over 3 minutes with StringProcessing:
Except for the
countMatches
extension onString
, I've tried to keep both programs roughly the same.Hopefully I'm using the right compiler flags:
Thanks for all your hard work on this project.
The text was updated successfully, but these errors were encountered: