From 565b06906fd882efd902930e86ffef5430b3ea96 Mon Sep 17 00:00:00 2001 From: Aaron Crane Date: Sun, 5 Mar 2017 17:19:31 +0000 Subject: [PATCH 1/1] perlfunc: fix documentation for UTF-8 vec() The documentation previously claimed that the internal UTF-8 buffer is used even if the string is downgradeable. But the current behaviour is to downgrade the buffer to the single-byte representation, and use the UTF-8 behaviour only if that fails. That's been the case since commit 33b454808819084359e76a3f223a41b842c180b7, from 7th September 2000. There was also a period of a few days before that when a failed downgrade yielded an exception; see commit 246fae53ea6ae12991e7653f136a0f797ce002d4. --- pod/perlfunc.pod | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/pod/perlfunc.pod b/pod/perlfunc.pod index 3854acf..5e9e61d 100644 --- a/pod/perlfunc.pod +++ b/pod/perlfunc.pod @@ -9553,10 +9553,11 @@ extend the string with sufficiently many zero bytes. It is an error to try to write off the beginning of the string (i.e., negative OFFSET). If the string happens to be encoded as UTF-8 internally (and thus has -the UTF8 flag set), this is ignored by L|/vec EXPR,OFFSET,BITS>, -and it operates on the -internal byte string, not the conceptual character string, even if you -only have characters with values less than 256. +the UTF8 flag set), L|/vec EXPR,OFFSET,BITS> tries to convert it +to use a one-byte-per-character internal representation. However, if the +string contains characters with values of 256 or higher, that conversion +will fail. In that situation, C will operate on the underlying buffer +regardless, in its internal UTF-8 representation. Strings created with L|/vec EXPR,OFFSET,BITS> can also be manipulated with the logical -- 1.8.3.1