perl5.git.perl.org Git - perl5.git/commit

author	Karl Williamson <khw@cpan.org>
	Fri, 29 Dec 2017 20:35:00 +0000 (13:35 -0700)
committer	Karl Williamson <khw@cpan.org>
	Sat, 30 Dec 2017 05:45:14 +0000 (22:45 -0700)
commit	aff4cafe362e55c7722ba12952e287a7d1770cb9
tree	e2b958277f68493fce42e719a40185cca03910a8	tree \| snapshot
parent	39d242201ac8d2ef276274c2fd8a03ab861a0916	commit \| diff

Use new regnodes for /[[:ascii:]]/

Prior to this commit, the ASCII Posix class was treated as any other
Posix class under /a.  It turns out that by making separate nodes for it
(and its complement), the performance gains can be astronomical on ASCII
platforms.  This is mainly due to the fact that scanning can be done
word-at-a-time, but also because various conditionals can be skipped.

Below are some measurements from Porting/bench.pl on a 64-bit Linux g++
-O2 system.The numbers are for very long strings, as otherwise, the
delta due solely to this change is masked by the overhead around pattern
matching in general.  These numbers are for finding an ASCII character
at the end of a very long string of non-ASCII ones.

All 1-byte (non-utf8) characters improvement ratio
    Ir      1990.8
    Dr      2995.4
    Dw      17296.7
  COND      786.0

All 2-byte characters improvement ratio
    Ir      242.2
    Dr      232.9
    Dw      17237.8
  COND      100.0

The numbers for three and more bytes per character are essentially the
same as for two bytes.

As an example, the Dw for strings consisting entirely of 2-byte
characters (before the terminal ASCII one) now take 584 instructions,
and previously 100,582

The gain would be less for a 32-bit system.

bench.pl returns other measurements which I omitted above, because they
either have unchanged performance or involve a trivial number of total
instructions.

embed.fnc		diff \| blob \| blame \| history
embed.h		diff \| blob \| blame \| history
inline.h		diff \| blob \| blame \| history
proto.h		diff \| blob \| blame \| history
regcomp.c		diff \| blob \| blame \| history
regexec.c		diff \| blob \| blame \| history