lib/Memoize/TODO

   1 # Version 0.05 alpha $Revision: 1.5 $ $Date: 1999/09/17 14:57:55 $
   2
   3 =head1 TO DO
   4
   5 =over 4
   6
   7 =item *
   8
   9 LIST_CACHE doesn't work with ties to most DBM implementations, because
  10 Memouze tries to save a listref, and DB_File etc. can only store
  11 strings.  This should at least be documented.  Maybe Memoize could
  12 detect the problem at TIE time and throw a fatal error.
  13
  14 20010623 This was added sometime prior to 20001025.
  15
  16 Try out MLDBM here and document it if it works.
  17
  18 =item *
  19
  20 We should extend the benchmarking module to allow
  21
  22         timethis(main, { MEMOIZED => [ suba, subb ] })
  23
  24 What would this do?  It would time C<main> three times, once with
  25 C<suba> and C<subb> unmemoized, twice with them memoized.
  26
  27 Why would you want to do this?  By the third set of runs, the memo
  28 tables would be fully populated, so all calls by C<main> to C<suba>
  29 and C<subb> would return immediately.  You would be able to see how
  30 much of C<main>'s running time was due to time spent computing in
  31 C<suba> and C<subb>.  If that was just a little time, you would know
  32 that optimizing or improving C<suba> and C<subb> would not have a
  33 large effect on the performance of C<main>.  But if there was a big
  34 difference, you would know that C<suba> or C<subb> was a good
  35 candidate for optimization if you needed to make C<main> go faster.
  36
  37 Done.
  38
  39 =item *
  40
  41 Perhaps C<memoize> should return a reference to the original function
  42 as well as one to the memoized version?  But the programmer could
  43 always construct such a reference themselves, so perhaps it's not
  44 necessary.  We save such a reference anyway, so a new package method
  45 could return it on demand even if it wasn't provided by C<memoize>.
  46 We could even bless the new function reference so that it could have
  47 accessor methods for getting to the original function, the options,
  48 the memo table, etc.
  49
  50 Naah.
  51
  52 =item *
  53
  54 The TODISK feature is not ready yet.  It will have to be rather
  55 complicated, providing options for which disk method to use (GDBM?
  56 DB_File?  Flat file?  Storable?  User-supplied?) and which stringizing
  57 method to use (FreezeThaw?  Marshal?  User-supplied?)
  58
  59 Done!
  60
  61 =item *
  62
  63 Maybe an option for automatic expiration of cache values?  (`After one
  64 day,' `After five uses,' etc.)  Also possibly an option to limit the
  65 number of active entries with automatic LRU expiration.
  66
  67 You have a long note to Mike Cariaso that outlines a good approach
  68 that you sent on 9 April 1999.
  69
  70 What's the timeout stuff going to look like?
  71
  72         EXPIRE_TIME => time_in_sec
  73         EXPIRE_USES => num_uses
  74         MAXENTRIES => n
  75
  76 perhaps?  Is EXPIRE_USES actually useful?
  77
  78 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES.
  79 MAXENTRIES can come later as a separate module.
  80
  81 =item *
  82
  83 Put in a better example than C<fibo>.  Show an example of a
  84 nonrecursive function that simply takes a long time to run.
  85 C<getpwuid> for example?  But this exposes the bug that you can't say
  86 C<memoize('getpwuid')>, so perhaps it's not a very good example.
  87
  88 Well, I did add the ColorToRGB example, but it's still not so good.
  89 These examples need a lot of work.  C<factorial> might be a better
  90 example than C<fibo>.
  91
  92 =item *
  93
  94 Add more regression tests for normalizers.
  95
  96 =item *
  97
  98 Maybe resolve normalizer function to code-ref at memoize time instead
  99 of at function call time for efficiency?  I think there was some
 100 reason not to do this, but I can't remember what it was.
 101
 102 =item *
 103
 104 Add more array value tests to the test suite.
 105
 106 Does it need more now?
 107
 108 =item *
 109
 110 Fix that `Subroutine u redefined ... line 484' message.
 111
 112 Fixed, I think.
 113
 114 =item *
 115
 116 Get rid of any remaining *{$ref}{CODE} or similar magic hashes.
 117
 118 =item *
 119
 120 There should be an option to dump out the memoized values or to
 121 otherwise traverse them.
 122
 123 What for?
 124
 125 Maybe the tied hash interface taskes care of this anyway?
 126
 127 =item *
 128
 129 Include an example that caches DNS lookups.
 130
 131 =item *
 132
 133 Make tie for Storable (Memoize::Storable)
 134
 135 A prototype of Memoize::Storable is finished.  Test it and add to the
 136 test suite.
 137
 138 Done.
 139
 140 =item *
 141
 142 Make tie for DBI  (Memoize::DBI)
 143
 144 =item *
 145
 146 I think there's a bug.  See `###BUG'.
 147
 148 =item *
 149
 150 Storable probably can't be done, because it doesn't allow updating.
 151 Maybe a different interface that supports readonly caches fronted by a
 152 writable in-memory cache?  A generic tied hash maybe?
 153
 154         FETCH {
 155           if (it's in the memory hash) {
 156             return it
 157           } elsif (it's in the readonly disk hash) {
 158             return it
 159           } else {
 160             not-there
 161           }
 162         }
 163
 164         STORE {
 165           put it into the in-memory hash
 166         }
 167
 168 Maybe `save' and `restore' methods?
 169
 170 It isn't working right because the destructor doesn't get called at
 171 the right time.
 172
 173 This is fixed.  `use strict vars' would have caught it immediately.  Duh.
 174
 175 =item *
 176
 177 Don't forget about generic interface to Storable-like packages
 178
 179 20010627 It would appear that you put this into 0.51.
 180
 181 =item *
 182
 183 Maybe add in TODISK after all, with TODISK => 'filename' equivalent to
 184
 185         SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666],
 186         LIST_CACHE => MERGE
 187
 188 =item *
 189
 190 Maybe the default for LIST_CACHE should be MERGE anyway.
 191
 192 =item *
 193
 194 There's some terrible bug probably related to use under threaded perl,
 195 possibly connected with line 56:
 196
 197     my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }";
 198
 199 I think becayse C<@_> is lexically scoped in threadperl, the effect of
 200 C<unshift> never makes it into C<_memoizer>.  That's probably a bug in
 201 Perl, but maybe I should work around it.   Can anyone provide more
 202 information here, or lend me a machine with threaded Perl where I can
 203 test this theory?  Line 59, currently commented out, may fix the
 204 problem.
 205
 206 20010623 Working around this in 0.65, but it still blows.
 207
 208 =item *
 209
 210 Maybe if the original function has a prototype, the module can use
 211 that to select the most appropriate default normalizer.  For example,
 212 if the prototype was C<($)>, there's no reason to use `join'.  If it's
 213 C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>.
 214
 215 =item *
 216
 217 Ariel Scolnikov suggests using the change counting problem as an
 218 example.  (How many ways to make change of a dollar?)
 219
 220 =item *
 221
 222 Jonathan Roy found a use for `unmemoize'.  If you're using the
 223 Storable glue, and your program gets SIGINT, you find that the cache
 224 data is not in the cache, because Perl normally writes it all out at
 225 once from a DESTROY method, and signals skip DESTROY processing.  So
 226 you could add
 227
 228         $sig{INT} = sub { unmemoize ... };
 229
 230
 231 =item *
 232
 233 This means it would be useful to have a method to return references to
 234 all the currently-memoized functions so that you could say
 235
 236         $sig{INT} = sub { for $f (Memoize->all_memoized) {
 237                             unmemoize $f;
 238                           }
 239                         }
 240
 241
 242 =item *
 243
 244 19990917 There should be a call you can make to get back the cache
 245 itself.  If there were, then you could delete stuff from it to
 246 manually expire data items.
 247
 248 =item *
 249
 250 19990925 Randal says that the docs for Memoize;:Expire should make it
 251 clear that the expired entries are never flushed all at once.  He
 252 asked if you would need to do that manually.  I said:
 253
 254   Right, if that's what you want.  If you have EXISTS return false,
 255   it'll throw away the old cached item and replace it in the cache
 256   with a new item.  But if you want the cache to actually get smaller,
 257   you have to do that yourself.
 258
 259   I was planning to build an Expire module that implemented an LRU
 260   queue and kept the cache at a constant fixed size, but I didn't get
 261   to it yet.  It's not clear to me that the automatic exptynig-out
 262   behavior is very useful anyway.  The whole point of a cache is to
 263   trade space for time, so why bother going through the cache to throw
 264   away old items before you need to?
 265
 266 Randal then pointed out that it could discard expired items at DESTRoY
 267 or TIEHASH time, which seemed like a good idea, because if the cache
 268 is on disk you might like to keep it as small as possible.
 269
 270 =item *
 271
 272 19991219 Philip Gwyn suggests this technique:  You have a load_file
 273 function that memoizes the file contexts.  But then if the file
 274 changes you get the old contents.  So add a normalizer that does
 275
 276         return join $;, (stat($_[0])[9]), $_[0];
 277
 278 Now when the modification date changes, the true key returned by the
 279 normalizer is different, so you get a cache miss and it loads the new
 280 contents.   Disadvantage:  The old contents are still in the cache.  I
 281 think it makes more sense to have a special expiration manager for
 282 this.  Make one up and bundle it.
 283
 284 19991220 I have one written: Memoize::ExpireFile.  But how can you
 285 make this work when the function might have several arguments, of
 286 which some are filenames and some aren't?
 287
 288 =item *
 289
 290 19991219 There should be an inheritable TIEHASH method that does the
 291 argument processing properly.
 292
 293 19991220 Philip Gwyn contributed a patch for this.
 294
 295 20001231 You should really put this in.  Jonathan Roy uncovered a
 296 problem that it will be needed to solve.  Here's the problem:  He has:
 297
 298         memoize "get_items",
 299         LIST_CACHE => ["TIE", "Memoize::Expire",
 300                 LIFETIME => 86400,
 301                 TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666]
 302         ];
 303
 304 This won't work, because memoize is trying to store listrefs in a
 305 DB_File.    He owuld have gotten a fatal error if he had done this:
 306
 307         memoize "get_items",
 308           LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]'
 309
 310
 311 But in this case, he tied the cache to Memoize::Expire, which is *not*
 312 scalar-only, and the check for scalar-only ties is missing from
 313 Memoize::Expire.  The inheritable method can take care of this.
 314
 315 20010623 I decided not to put it in.  Instead, we avoid the problem by
 316 getting rid of TIE.  The HASH option does the same thing, and HASH is
 317 so simple to support that a module is superfluous.
 318
 319 =item *
 320
 321 20001130 Custom cache manager that checks to make sure the function
 322 return values actually match the memoized values.
 323
 324 =item *
 325
 326 20001231 Expiration manager that watches cache performance and
 327 accumulates statistics.  Variation:  Have it automatically unmemoize
 328 the function if performance is bad.
 329
 330 =item *
 331
 332 20010517 Option to have normalizer *modify* @_ for use by memoized
 333 function.  This would save code and time in cases like the one in the
 334 manual under 'NORMALIZER', where both f() and normalize_f() do the
 335 same analysis and make the same adjustments to the hash.  If the
 336 normalizer could make the adjustments and save the changes in @_, you
 337 wouldn't have to do it twice.
 338
 339 =item*
 340 20010623 Add CLEAR methods to tied hash modules.
 341
 342 =item*
 343 20010623 You get a warning if you try to use DB_File as LIST_CACHE,
 344 because it won't store lists.  But if you use it as the underlying
 345 cache with an expiration manager in the middle, no warning---the
 346 expiration manager doesn't know it's managing a list cache, and
 347 memoize doesn't know that DB_File is underlying.  Is this fixable?
 348 Probably not, but think about it.
 349
 350 =item *
 351 There was probably some other stuff that I forgot.
 352
 353
 354
 355 =back