Regex Nightmare

    I am a big fan of the Swift language, and I am so excited about where itsheading. There's not a lot to complaints besides some of the rough edges. It'sstill young after all. Dealing with Regular Expression is one of them thatannoys me the most.

    Lots of love to String, Great, what about regex?

    The swift team have given String lots of love in Swift 4. One thing that ismissing out for me is that the regex part remains untouched. In my opinion,regex is an essential part of string manipulation in any language. Ruby,JavaScript, Python ... you name it. They all have built in, easy to use regexAPIs. NSRegeularExpression is the worst regex API I have ever used across theall languages (and I use a LOT of languages).

    The current problems

    There are two 2 main issues here.

    NSRegeularExpression is not defined on Linux version

    Swift is ditching NS~ prefix for all the new types. But there's still no ~RegularExpression. However, on the Linux version, you can't find NSRegeularExpression. So here is the workaround I did.
      #if !os(Linux)
          typealias RegularExpression = NSRegularExpression
          typealias TextCheckingResult = NSTextCheckingResult
          extension TextCheckingResult {
              // yeah, you have to deal with this
              func rangeAt(_ idx: Int) -> NSRange {
                  return range(at: idx)

    It works with NSRange instead of Range<String.Index>

    The above one is just annoying, but we can live with it. This one is a little bit trick. With a String, all the functions that accept or return a range will give you Range<String.Index>. NSRegeularExpression only accept NSRange for a range. The trick part is, it accept String instead of NSString for the string. The following code will demo the issue.
      var str = "Hello😀"
      var nsStr = str as NSString
      str.characters.count // 6
      nsStr.length // 7
    Yes, you can freely cast between NSString and String with minimalperformance penalty. But the fact that the length of the string is notconsistent causes big problems. Consider the following code:
      var str = "Hello😀World"
      str.characters.count     // 11
      (str as NSString).length // 12
      var regex = try! NSRegularExpression(pattern: "World", options: [])
      let match = regex.firstMatch(
        in: str,
        options: [],
        range: NSMakeRange(0, str.characters.count))
      // the match would be nil.
    The most confusing part of this code is the fact that NSRegeularExpressionaccept String as input, but not respect it's characters.count. The workaround would be:
      let match = regex.firstMatch(
        in: str,
        options: [],
        range: NSMakeRange(0, (str as NSString).length))
    I really hope that's the end of the story, and I we need to do is to put up withthe ugly code until swift team fix this for us. Unfortunately, it is not. Well, it can end here. The only thing you need to do is to remember to convert String to NSString every time you use NSRegeularExpression, or use NSRange with them. But that would be so error pron since majority of the functions accept and return type String, even NSRegeularExpression itself.
      let nsRange = match!.range // got the range from a regex match
      let substring = ( str as NSString ).substring(with: nsRange)
      // "World"
    If you plan on using String all the time, which you should, then here is how toconvert NSRange to Range<String.Index>.
      extension String {
          func range(from nsRange: NSRange) -> Range<String.Index>? {
                let from16 = utf16.index(
                  offsetBy: nsRange.location,
                  limitedBy: utf16.endIndex),
                let to16 = utf16.index(
                  offsetBy: nsRange.length,
                  limitedBy: utf16.endIndex),
                let from = from16.samePosition(in: self),
                let to = to16.samePosition(in: self)
              else { return nil }
              return from ..< to
      let substring = str.substring(with: str.range(from: nsRange))
    From the look of the code, it is so compiler-optimizable. Which means if thisis taken care by the built in framework (like a real RegularExpression type),it can be optimized down to nothing. Because technically you can directly accessthe raw value of the index, instead of doing all the offset work. I know by thenatural of the offset behavior, it's not going to be computational intensive. Butwhen you have massive of conversions between NSRange and Range<String.Index>type in your code, it really adds up.

    Looking Forward

    It is a hard problem to solve. Especially when Apple added native emoji supportin the String type. Regex was not designed to handle this kind of issue. But Ireally looking forward to see how smart people in swift team crack this.