- * in UTF-8, so the whole result needs to be in UTF-8. So,
- * here we are somewhere in the middle of processing a
- * non-UTF-8 string, and realize that we will have to convert
- * the whole thing to UTF-8. What to do? There are
- * several possibilities. The simplest to code is to
- * convert what we have so far, set a flag, and continue on
- * in the loop. The flag would be tested each time through
- * the loop, and if set, the next character would be
- * converted to UTF-8 and stored. But, I (khw) didn't want
- * to slow down the mainstream case at all for this fairly
- * rare case, so I didn't want to add a test that didn't
- * absolutely have to be there in the loop, besides the
- * possibility that it would get too complicated for
- * optimizers to deal with. Another possibility is to just
- * give up, convert the source to UTF-8, and restart the
- * function that way. Another possibility is to convert
- * both what has already been processed and what is yet to
- * come separately to UTF-8, then jump into the loop that
- * handles UTF-8. But the most efficient time-wise of the
- * ones I could think of is what follows, and turned out to
- * not require much extra code. */
-
- /* Convert what we have so far into UTF-8, telling the
+ * in UTF-8, so the whole result needs to be in UTF-8.
+ *
+ * So, here we are somewhere in the middle of processing a
+ * non-UTF-8 string, and realize that we will have to
+ * convert the whole thing to UTF-8. What to do? There
+ * are several possibilities. The simplest to code is to
+ * convert what we have so far, set a flag, and continue on
+ * in the loop. The flag would be tested each time through
+ * the loop, and if set, the next character would be
+ * converted to UTF-8 and stored. But, I (khw) didn't want
+ * to slow down the mainstream case at all for this fairly
+ * rare case, so I didn't want to add a test that didn't
+ * absolutely have to be there in the loop, besides the
+ * possibility that it would get too complicated for
+ * optimizers to deal with. Another possibility is to just
+ * give up, convert the source to UTF-8, and restart the
+ * function that way. Another possibility is to convert
+ * both what has already been processed and what is yet to
+ * come separately to UTF-8, then jump into the loop that
+ * handles UTF-8. But the most efficient time-wise of the
+ * ones I could think of is what follows, and turned out to
+ * not require much extra code.
+ *
+ * First, calculate the extra space needed for the
+ * remainder of the source needing to be in UTF-8. Except
+ * for the 'i' in Turkic locales, in UTF-8 strings, the
+ * uppercase of a character below 256 occupies the same
+ * number of bytes as the original. Therefore, the space
+ * needed is the that number plus the number of characters
+ * that become two bytes when converted to UTF-8, plus, in
+ * turkish locales, the number of 'i's. */
+
+ extra = send - s + variant_under_utf8_count(s, send);
+
+#ifdef USE_LOCALE_CTYPE
+
+ if (UNLIKELY(*s == 'i')) { /* We wouldn't get an 'i' here
+ unless are in a Turkic
+ locale */
+ const U8 * s_peek = s;
+
+ do {
+ extra++;
+
+ s_peek = (U8 *) memchr(s_peek + 1, 'i',
+ send - (s_peek + 1));
+ } while (s_peek != NULL);
+ }
+#endif
+
+ /* Convert what we have so far into UTF-8, telling the