This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Use dfa to speed up translating UTF-8 into code point
This dfa is available from the internet has the reputation of being the
fastest general translator. This commit changes to use it at the
beginning of our translator, modifying it slightly to accept surrogates
and all 4-byte Perl-extended. If necessary, it drops down into our
translator to handle errors and warnings and Perl extended.
It shows some improvement over our base translation:
Key:
Ir Instruction read
Dr Data read
Dw Data write
COND conditional branches
IND indirect branches
_m branch predict miss
- indeterminate percentage (e.g. 1/0)
The numbers represent raw counts per loop iteration.
unicode::utf8n_to_uvchr_0x007f
ord(X)
blead dfa Ratio %
----- ----- -------
Ir 359.0 359.0 100.0
Dr 111.0 111.0 100.0
Dw 64.0 64.0 100.0
COND 42.0 42.0 100.0
IND 5.0 5.0 100.0
COND_m 2.0 0.0 Inf
IND_m 5.0 5.0 100.0
unicode::utf8n_to_uvchr_0x07ff
ord(X)
blead dfa Ratio %
----- ----- -------
Ir 478.0 467.0 102.4
Dr 132.0 133.0 99.2
Dw 79.0 78.0 101.3
COND 63.0 57.0 110.5
IND 5.0 5.0 100.0
COND_m 1.0 0.0 Inf
IND_m 5.0 5.0 100.0
unicode::utf8n_to_uvchr_0xfffd
ord(X)
blead dfa Ratio %
----- ----- -------
Ir 494.0 486.0 101.6
Dr 134.0 136.0 98.5
Dw 79.0 78.0 101.3
COND 67.0 61.0 109.8
IND 5.0 5.0 100.0
COND_m 2.0 0.0 Inf
IND_m 5.0 5.0 100.0
unicode::utf8n_to_uvchr_0x1fffd
ord(X)
blead dfa Ratio %
----- ----- -------
Ir 508.0 505.0 100.6
Dr 135.0 139.0 97.1
Dw 79.0 78.0 101.3
COND 70.0 65.0 107.7
IND 5.0 5.0 100.0
COND_m 2.0 1.0 200.0
IND_m 5.0 5.0 100.0
unicode::utf8n_to_uvchr_0x10fffd
ord(X)
blead dfa Ratio %
----- ----- -------
Ir 508.0 505.0 100.6
Dr 135.0 139.0 97.1
Dw 79.0 78.0 101.3
COND 70.0 65.0 107.7
IND 5.0 5.0 100.0
COND_m 2.0 1.0 200.0
IND_m 5.0 5.0 100.0
Each code point represents an extra byte required in its UTF-8
representation from the previous one.