Regex Trouble

Home Forums Multi-Edit Support Regex Trouble

Tagged: 

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #25370
    Reid Sweatman
    Participant

    Having an odd problem with regexes in ME2008. I can’t make any regex that contains alternations work with either Find_Text() or FindProgramText(). It doesn’t appear to matter which kind of quote, double or single, I use for the regex string (although obviously, with single quotes an alternation would be interpreted as an attempt to embed a decimal code), nor does it make a difference if I escape the alternations, either by hand or with Make_Literal_X(). These are regexes that work fine in a manual search. This seems like something that would have long since been dealt with.
    Any ideas, anyone? I think this is a problem with ME’s string handling, not with PCRE (incidentally, I’ve read the MAN pages for that package, and have a version of ME’s pop-up menus for regexes that adds a few things we left out back when).

    #25372
    Clay Martin
    Keymaster

    Hi Reid,
    Couple questions, what type of regex are you using? Do you get the same results with S_And_R base function? Can you provide an example of the search string (and the actual text you are looking for)?

    Thanks,
    Clay

    #25377
    Reid Sweatman
    Participant

    Here’s one example I’m working with. It’s a regex to find CMac functions. It’s a bit messier than the standard one in the REAlias list, but it will find some things that one won’t, and has fewer false positives. It’s a Perl expression (well, PCRE), and that’s all I work with since Dan faired PCRE in back in version 9.10. You can cut this down to just a simple alternation on two return types, and the problem is still there.

    ^(import\s+)?(?>(int|str|void|real|macro)\s+([A-Za-z_]\w*)\s*)(\()?(?(-1)([A-Za-z_]\w*)?\))(?>\s*(trans2|trans|no_break|dump))?(\s+{)?

    I haven’t tried S_And_R(), on the grounds that it relies on Find_Text() itself, although I have looked through it to see if it handles the input string in any special way. Looks to me, though, like the damage is done just by declaring the string. I’ve put debug dumps right after declaring the string, and they show nothing from the alternation position onward, so that the only match that could happen would be against the first alternative, and anything before that.

    #25379
    Clay Martin
    Keymaster

    Hi Reid,
    Must admit PCRE is not my strong point. If you have not already, try declaring a string with the first part, then a second string with the second part, then combing the two strings into a 3rd that will be used in the search.
    Clay

    #25408
    Reid Sweatman
    Participant

    Okay, got it. Turns out that if you have to put a regex in a string, rather than entering it from a dialog, the compiler will interpret it using the different rules for single- and double-quoted strings. The easiest fix is to use single quotes, and double the alternation operator wherever it occurs in the expression, since there are potentially more characters to fix in a double-quoted string. The form of the regex I asked about that works is:
    '^(import\s+)?(?>(int||str||void||real||macro)\s+([A-Za-z_]\w*)\s*)(\()?(?(-1)([A-Za-z_]\w*)?\))(?>\s*(trans2||trans||no_break||dump))?(\s+{)?'
    I think I collided with this because I was switching back and forth between the two string types, which confused me as to what was going on, since something different was failing in each. On top of that, there are a number of system functions or macros that “interpret” strings, and I was experimenting with all of them, as well as a couple that I wrote. The trick is to make sure that the regex arrives intact after whatever interpretation is going on. Thanks for the suggestions.

Viewing 5 posts - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.