Tokeniser debugging

[perl5.git] / Todo-5.6
diff --git a/Todo-5.6 b/Todo-5.6

index 30c7cc0..8dcb9be 100644 (file)
--- a/Todo-5.6
+++ b/Todo-5.6
@@ -16,6 +16,20 @@ Unicode support
         to work similarly to Unicode tech reports and Java
         notation \uXXXX (and already existing \x{XXXX))?
         more than four hexdigits? make also \U+XXXX work?
+    overloadable regex assertions? e.g. in Thai \b cannot
+        be deduced by any simple character class boundary rules,
+        word boundaries must algorithmically computed
+
+    see ext/Encode/Todo for notes and references about proper detection
+    of malformed UTF-8
+
+    SCSU?          http://www.unicode.org/unicode/reports/tr6/
+    Collation?     http://www.unicode.org/unicode/reports/tr10/
+    Normalization? http://www.unicode.org/unicode/reports/tr15/
+    EBCDIC?        http://www.unicode.org/unicode/reports/tr16/
+    Regexes?       http://www.unicode.org/unicode/reports/tr18/
+    Case Mappings? http://www.unicode.org/unicode/reports/tr21/
+
      See also "Locales", "Regexen", and "Miscellaneous".
  
  Multi-threading
@@ -151,7 +165,6 @@ Miscellaneous
         C is too high-level...
      replace pod2html with new PodtoHtml? (requires other modules from CPAN)
      automate testing with large parts of CPAN
-    Unicode collation? http://www.unicode.org/unicode/reports/tr10/
      turn Cwd into an XS module?  (Configure already probes for getcwd())
      mmap for speeding up input? (Configure already probes for the mmap family)
      sendmsg, recvmsg? (Configure doesn't probe for these but the units exist)