Reply To: Regular expressions and $ (EOL)

Product Compare Forums Multi-Edit Support Regular expressions and $ (EOL) Reply To: Regular expressions and $ (EOL)

#3910
ReidSweatman
Participant

I’m not sure why you’re seeing any termination characters, unless you’ve opened your file in binary mode; Multi-Edit strips EOL termination on load and restores it on save. Thus, it’s never really there in the data while loaded. Any searches on BOL or EOL metacharacters are really dependent on buffer boundaries and character counts. Thus, no EOL characters should be appearing.

$ loses its meaning as a metacharacter inside a character class because there is no reasonable meaning it could have there. This is common behavior in most varieties of regular expression; as an example, Perl regular expressions work this way. You can escape it or not, as you choose, but in that context, it isn’t necessary (doing so in Perl would fail, though, as that language uses the symbol for another purpose).

The reason 0xFF won’t match EOL is because it isn’t an EOL character; it’s the value we use internally to represent “virtual space,” which is the padding following a tab character. Such space is only present during editing, and is removed when a file is saved to disk. Since it is, effectively, part of a tab in Multi-Edit’s representation, we treat it as whitespace. That’s why the regular expression for a whitespace character in Multi-Edit is [ \t\xFF]; it’s a class that can match a space, a tab, or a virtual tab. A more explicit way to have written it would have been [\x20\x09\xFF].

There are some long-standing problems with using multiple-match metacharacters on BOL and EOL markers, and it’s difficult to predict exactly which combination can be made to work. For instance, a commonly used Multi-Edit regular expression is (.@$.@)@.@, or “multi-line match.” When used between two other expressions, it matches anything between the first two adjacent occurences of the two, no matter how many lines there are intervening (it can be made to break with certain patterns of text in the middle). This is obviously a more complex expression than you’d probably first invent, but it’s exactly as complex as it has to be, because of the issue I mentioned above. In any event, I wouldn’t expect, off the top of my head, an expression like ($^)@ to work. I’m not even sure it would work in Perl. If it does, I’m sure someone will tell me very shortly. :)

As for why you’re getting extra characters, I’m not sure. I would suspect either that it had to do with one of the issues above, or a flaw in the Classic regex engine. Strangely, I don’t believe there’s anyone in the company who uses the Classic forms any more, as most of us learned regular expressions first in a UNIX setting. As an experiment, try translating your expression into the UNIX form, and see if you get the same results.

One last comment of general utility: whenever you start getting strange results from search and replace in Multi-Edit, delete <TMP>MeFind0.tmp and <TMP>MeFind1.tmp (there may be only one of the two), and see if things clear up. Sometimes, especially after long use, they can become corrupt, causing the search functionality to malfunction.

And yes, the new regular expression engine will function exactly as the Perl regular expression docs specify.