This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Add Perl_bytes_cmp_utf8() to compare character sequences in different encodings
authorNicholas Clark <nick@ccl4.org>
Thu, 11 Nov 2010 16:08:43 +0000 (16:08 +0000)
committerNicholas Clark <nick@ccl4.org>
Thu, 11 Nov 2010 16:08:43 +0000 (16:08 +0000)
commitfed3ba5d6b9222e6e73844680734b059e616c86b
treec8a449308b28520170011d015883c39c887fb9e8
parent08a6f934b8306af074a22b05f6de14f564a9da18
Add Perl_bytes_cmp_utf8() to compare character sequences in different encodings

Convert sv_eq_flags() and sv_cmp_flags() to use it.

Previously, to compare two strings of characters, where was was in UTF-8, and
one was not, you had to either:

1: Upgrade the second to UTF-8
2: Compare the resulting octet sequence
3: Free the temporary UTF-8 string

or:

1: Attempt to downgrade the first to bytes. If it can't be, they aren't equal
2: Else compare the resulting octet sequence
3: Free the temporary byte string

Which for the general case involves a malloc()/free() and at least two O(n)
scans per comparison.

Whereas this approach has no allocation, a single O(n) scan, which terminates
as early as the best case for the second approach.
MANIFEST
embed.fnc
embed.h
ext/XS-APItest/APItest.xs
ext/XS-APItest/t/utf8.t [new file with mode: 0644]
global.sym
proto.h
sv.c
t/porting/diag.t
utf8.c