This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
regexec.c: Don't give up on fold matching early
authorKarl Williamson <public@khwilliamson.com>
Sun, 7 Nov 2010 22:25:31 +0000 (15:25 -0700)
committerFather Chrysostomos <sprout@cpan.org>
Mon, 8 Nov 2010 05:42:42 +0000 (21:42 -0800)
commit2726813d9af5d50f1451663cd931317e7172da50
tree12ffa4ce7951e688df59ceceb9a061ab67d606de
parenta85c03da46d77cd5b9f4e0ba809245cf000962ad
regexec.c: Don't give up on fold matching early

As noted in the comments of the code, "a" =~ /[A]/i doesn't work currently
(except that regcomp.c knows about the ASCII characters and corrects for
it, but not always, for example in cases like "a" =~ /\p{Upper}/i.  This
patch catches all those).

It works by computing a list of all characters that (singly) fold to
another one, and then checking each of those.  The maximum length of
the list is 3 in the current Unicode standard.

I believe that a better long-term solution is to do this at compile
rather than execution time, by generating a closure of everything
matched.  But this can't be done now because the data structure would
need to be extensively revamped to list all non-byte characters, and
user-defined \p{} matches are not known at compile-time.

And it doesn't handle the multi-char folds.  There is a separate ticket
for those.
embedvar.h
intrpvar.h
perl.c
regexec.c
sv.c
t/re/reg_fold.t