Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is CustomConsumingRegexComponent incapable of declaring the start of a match? #765

Open
DandyLyons opened this issue Sep 24, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@DandyLyons
Copy link

Description

I believe I have found a missing capability in CustomConsumingRegexComponent 2 and I would like to confirm that I'm not simply using it wrong.

To implement CustomConsumingRegexComponent you must implement consuming(_:startingAt:in:) which returns (upperBound: String.Index, output: Self.RegexOutput)?. Notice how the tuple only contains an upperBound. There is no lowerBound and there is no Range. As far as I know there is no way for a CustomConsumingRegexComponent to declare where the start of the match is.

In the short term this missing capability should be documented.
In the long term, it would be beneficial to pitch an implementation of this to Swift Evolution.

Reproduction

Consider the following test data:

let dateStringHaystacks = [
  "2023-04-15 some other text",
  "some other text 2023-04-15",
  "some 2023-04-15 other text"
]

All three of these strings pass the following test:

@Test(arguments: dateStringHaystacks)
func dateRegex(string: String) throws {
  // Create a Regex using RegexBuilder for YYYY/MM/DD
  let dateRegex = Regex {

    Repeat(OneOrMore(.digit), count: 4)  // Matches exactly 4 digits for the year (YYYY)
    "-"
    Repeat(OneOrMore(.digit), count: 2)   // Matches exactly 2 digits for the month (MM)
    "-"
    Repeat(OneOrMore(.digit), count: 2)   // Matches exactly 2 digits for the day (DD)
  }
  
  let ranges = string.ranges(of: dateRegex)
  let range = try #require(ranges.first)
  let foundString = String(string[range])
  #expect(foundString == "2023-04-15")
}

which yields the following results:

// input: "2023-04-15 some other text" ✅ test passes
// input: "some other text 2023-04-15" ✅ test passes
// input: "some 2023-04-15 other text" ✅ test passes

I then implemented a CustomConsumingRegexComponent wrapper around NSDataDetector:

public struct DateDataDetector: CustomConsumingRegexComponent {
  public typealias RegexOutput = Date
  
  public init() {}
  
  public func consuming(
    _ input: String,
    startingAt index: String.Index,
    in bounds: Range<String.Index>
  ) throws -> (upperBound: String.Index, output: Date)? {
    let detector = try NSDataDetector(types: NSTextCheckingResult.CheckingType.date.rawValue)
    let range = NSRange(index..<bounds.upperBound, in: input)
    guard let match = detector.firstMatch(in: input, options: [], range: range),
          let date = match.date else {
      return nil
    }
    let upperBound = input.index(
      input.startIndex,
      offsetBy: match.range.upperBound
    )
    return (upperBound: upperBound, output: date)
  }
}

This custom regex component correctly identifies dates however, the ranges are wrong. The start of the range is always the beginning of the string, even if that is not where the beginning of the match was. Furthermore, because there is no upperBound in the tuple, I can see no way to implement declaring the beginning of the range when a match is found.

The following tests fail, but only fail when the date is not found at the beginning of the string. (They pass even if there is text after the match.)

  @Test(arguments: dateStringHaystacks)
  func rangesOf_valid(string: String) throws {
    let regex = Regex {
      DateDataDetector()
      
    }
    let ranges = string.ranges(of: regex)
    #expect(ranges.count == 1)
    
    let range = try #require(ranges.first)
    let foundString = String(string[range])
    #expect(foundString.starts(with: "2023-04-15"))
    #expect(foundString == "2023-04-15")
  }
  
  @Test(arguments: dateStringHaystacks)
  func firstRangeOf_valid(string: String) throws {
    let regex = Regex {
      DateDataDetector()
      
    }
    let range = try #require(string.firstRange(of: regex))
    
    let foundString = String(string[range])
    #expect(foundString.starts(with: "2023-04-15"))
    #expect(foundString == "2023-04-15")
  }
  
  @Test(arguments: dateStringHaystacks)
  func matches_valid(string: String) throws {
    let regex = Regex {
      OneOrMore(DateDataDetector())
      
    }
    let matches = string.matches(of: regex)
    #expect(matches.count == 1)
    let match = try #require(matches.first)
    #expect(match.output == "2023-04-15")
  }

which yields the following results:

// input: "2023-04-15 some other text", ✅ test passes
// input: "some other text 2023-04-15" ❌ match.output = "some other text 2023-04-15"
// input:  "some 2023-04-15 other text" ❌ match.output = "some 2023-04-15"

As you can see the match behavior is different for RegexBuilder than it is for CustomConsumingRegexComponent. The RegexBuilder begins the match at the beginning of the match, but the CustomConsumingRegexComponent matches at the beginning of the string, no matter what. It's highly likely that I'm "using it wrong". However, if that is the case, I simply do not know how to tell the Regex system where the beginning of the match is. I can reliably tell it where the end of the match is (the upperBound) but there is no way to return the beginning of the match (a lowerBound).

Expected behavior

See reproduction.

Environment

swift-driver version: 1.115 Apple Swift version 6.0 (swiftlang-6.0.0.9.10 clang-1600.0.26.2)
Target: arm64-apple-macosx15.0

Additional information

I also posted about this in the Swift forums here to try to figure out if this is missing or not.

wes1 helpfully pointed to evidence that suggests that CustomConsumingRegexComponent was only meant to implement prefixMatch by design. If CustomConsumingRegexComponent is not currently up to feature parity with any other RegexComponent then this should be documented.

@AnthonyLatsis
Copy link

@hamishknight I think this needs a transfer.

@hamishknight hamishknight transferred this issue from swiftlang/swift Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants