[perl5.git] / lib / Memoize / TODO

# Version 0.05 alpha $Revision: 1.6 $ $Date: 2001/06/24 17:11:26 $

=head1 TO DO

=over 4

=item * 

LIST_CACHE doesn't work with ties to most DBM implementations, because
Memouze tries to save a listref, and DB_File etc. can only store
strings.  This should at least be documented.  Maybe Memoize could
detect the problem at TIE time and throw a fatal error.

20010623 This was added sometime prior to 20001025.

Try out MLDBM here and document it if it works.

=item *

We should extend the benchmarking module to allow

	timethis(main, { MEMOIZED => [ suba, subb ] })

What would this do?  It would time C<main> three times, once with
C<suba> and C<subb> unmemoized, twice with them memoized.

Why would you want to do this?  By the third set of runs, the memo
tables would be fully populated, so all calls by C<main> to C<suba>
and C<subb> would return immediately.  You would be able to see how
much of C<main>'s running time was due to time spent computing in
C<suba> and C<subb>.  If that was just a little time, you would know
that optimizing or improving C<suba> and C<subb> would not have a
large effect on the performance of C<main>.  But if there was a big
difference, you would know that C<suba> or C<subb> was a good
candidate for optimization if you needed to make C<main> go faster.

Done.

=item * 

Perhaps C<memoize> should return a reference to the original function
as well as one to the memoized version?  But the programmer could
always construct such a reference themselves, so perhaps it's not
necessary.  We save such a reference anyway, so a new package method
could return it on demand even if it wasn't provided by C<memoize>.
We could even bless the new function reference so that it could have
accessor methods for getting to the original function, the options,
the memo table, etc.

Naah.

=item *

The TODISK feature is not ready yet.  It will have to be rather
complicated, providing options for which disk method to use (GDBM?
DB_File?  Flat file?  Storable?  User-supplied?) and which stringizing
method to use (FreezeThaw?  Marshal?  User-supplied?)

Done!

=item *

Maybe an option for automatic expiration of cache values?  (`After one
day,' `After five uses,' etc.)  Also possibly an option to limit the
number of active entries with automatic LRU expiration.

You have a long note to Mike Cariaso that outlines a good approach
that you sent on 9 April 1999.

What's the timeout stuff going to look like?

	EXPIRE_TIME => time_in_sec
	EXPIRE_USES => num_uses
	MAXENTRIES => n

perhaps?  Is EXPIRE_USES actually useful?

19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES.
MAXENTRIES can come later as a separate module.

=item *

Put in a better example than C<fibo>.  Show an example of a
nonrecursive function that simply takes a long time to run.
C<getpwuid> for example?  But this exposes the bug that you can't say
C<memoize('getpwuid')>, so perhaps it's not a very good example.

Well, I did add the ColorToRGB example, but it's still not so good.
These examples need a lot of work.  C<factorial> might be a better
example than C<fibo>.  

=item *

Add more regression tests for normalizers.

=item *

Maybe resolve normalizer function to code-ref at memoize time instead
of at function call time for efficiency?  I think there was some
reason not to do this, but I can't remember what it was.

=item *

Add more array value tests to the test suite.

Does it need more now?

=item *

Fix that `Subroutine u redefined ... line 484' message.

Fixed, I think.

=item *

Get rid of any remaining *{$ref}{CODE} or similar magic hashes. 

=item *

There should be an option to dump out the memoized values or to
otherwise traverse them.

What for?

Maybe the tied hash interface taskes care of this anyway?

=item *

Include an example that caches DNS lookups.

=item *

Make tie for Storable (Memoize::Storable)

A prototype of Memoize::Storable is finished.  Test it and add to the
test suite.

Done.

=item *

Make tie for DBI  (Memoize::DBI)

=item *

I think there's a bug.  See `###BUG'.

=item * 

Storable probably can't be done, because it doesn't allow updating.
Maybe a different interface that supports readonly caches fronted by a
writable in-memory cache?  A generic tied hash maybe?

	FETCH {
	  if (it's in the memory hash) {
	    return it
	  } elsif (it's in the readonly disk hash) {
	    return it
	  } else { 
	    not-there
	  }
	}

	STORE {
	  put it into the in-memory hash
	}

Maybe `save' and `restore' methods?

It isn't working right because the destructor doesn't get called at
the right time.

This is fixed.  `use strict vars' would have caught it immediately.  Duh.

=item *

Don't forget about generic interface to Storable-like packages

20010627 It would appear that you put this into 0.51.

=item * 

Maybe add in TODISK after all, with TODISK => 'filename' equivalent to

	SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666],
	LIST_CACHE => MERGE

=item *

Maybe the default for LIST_CACHE should be MERGE anyway.

=item *

There's some terrible bug probably related to use under threaded perl,
possibly connected with line 56:

    my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }";

I think becayse C<@_> is lexically scoped in threadperl, the effect of
C<unshift> never makes it into C<_memoizer>.  That's probably a bug in
Perl, but maybe I should work around it.   Can anyone provide more
information here, or lend me a machine with threaded Perl where I can
test this theory?  Line 59, currently commented out, may fix the
problem.  

20010623 Working around this in 0.65, but it still blows.

=item * 

Maybe if the original function has a prototype, the module can use
that to select the most appropriate default normalizer.  For example,
if the prototype was C<($)>, there's no reason to use `join'.  If it's
C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>.

=item *

Ariel Scolnikov suggests using the change counting problem as an
example.  (How many ways to make change of a dollar?)

=item * 

Jonathan Roy found a use for `unmemoize'.  If you're using the
Storable glue, and your program gets SIGINT, you find that the cache
data is not in the cache, because Perl normally writes it all out at
once from a DESTROY method, and signals skip DESTROY processing.  So
you could add

	$sig{INT} = sub { unmemoize ... };


=item *

This means it would be useful to have a method to return references to
all the currently-memoized functions so that you could say

	$sig{INT} = sub { for $f (Memoize->all_memoized) {
	                    unmemoize $f;
	                  }
	                }


=item *

19990917 There should be a call you can make to get back the cache
itself.  If there were, then you could delete stuff from it to
manually expire data items.

=item *

19990925 Randal says that the docs for Memoize;:Expire should make it
clear that the expired entries are never flushed all at once.  He
asked if you would need to do that manually.  I said:

  Right, if that's what you want.  If you have EXISTS return false,
  it'll throw away the old cached item and replace it in the cache
  with a new item.  But if you want the cache to actually get smaller,
  you have to do that yourself.

  I was planning to build an Expire module that implemented an LRU
  queue and kept the cache at a constant fixed size, but I didn't get
  to it yet.  It's not clear to me that the automatic exptynig-out
  behavior is very useful anyway.  The whole point of a cache is to
  trade space for time, so why bother going through the cache to throw
  away old items before you need to?

Randal then pointed out that it could discard expired items at DESTRoY
or TIEHASH time, which seemed like a good idea, because if the cache
is on disk you might like to keep it as small as possible.

=item *

19991219 Philip Gwyn suggests this technique:  You have a load_file
function that memoizes the file contexts.  But then if the file
changes you get the old contents.  So add a normalizer that does

	return join $;, (stat($_[0])[9]), $_[0];

Now when the modification date changes, the true key returned by the
normalizer is different, so you get a cache miss and it loads the new
contents.   Disadvantage:  The old contents are still in the cache.  I
think it makes more sense to have a special expiration manager for
this.  Make one up and bundle it.

19991220 I have one written: Memoize::ExpireFile.  But how can you
make this work when the function might have several arguments, of
which some are filenames and some aren't?

=item *

19991219 There should be an inheritable TIEHASH method that does the
argument processing properly.

19991220 Philip Gwyn contributed a patch for this.

20001231 You should really put this in.  Jonathan Roy uncovered a
problem that it will be needed to solve.  Here's the problem:  He has:

        memoize "get_items",
        LIST_CACHE => ["TIE", "Memoize::Expire",
                LIFETIME => 86400,
                TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666]
        ];

This won't work, because memoize is trying to store listrefs in a
DB_File.    He owuld have gotten a fatal error if he had done this:

        memoize "get_items",
          LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]'


But in this case, he tied the cache to Memoize::Expire, which is *not*
scalar-only, and the check for scalar-only ties is missing from
Memoize::Expire.  The inheritable method can take care of this.

20010623 I decided not to put it in.  Instead, we avoid the problem by
getting rid of TIE.  The HASH option does the same thing, and HASH is
so simple to support that a module is superfluous.

=item *

20001130 Custom cache manager that checks to make sure the function
return values actually match the memoized values.

=item *

20001231 Expiration manager that watches cache performance and
accumulates statistics.  Variation:  Have it automatically unmemoize
the function if performance is bad.

=item *

20010517 Option to have normalizer *modify* @_ for use by memoized
function.  This would save code and time in cases like the one in the
manual under 'NORMALIZER', where both f() and normalize_f() do the
same analysis and make the same adjustments to the hash.  If the
normalizer could make the adjustments and save the changes in @_, you
wouldn't have to do it twice. 

=item *
20010623 Add CLEAR methods to tied hash modules.

=item *
20010623 You get a warning if you try to use DB_File as LIST_CACHE,
because it won't store lists.  But if you use it as the underlying
cache with an expiration manager in the middle, no warning---the
expiration manager doesn't know it's managing a list cache, and
memoize doesn't know that DB_File is underlying.  Is this fixable?
Probably not, but think about it.

=item *
There was probably some other stuff that I forgot.


=back
Commit	Line	Data
3d4a255c	1	# Version 0.05 alpha $Revision: 1.6 $ $Date: 2001/06/24 17:11:26 $
a0cb3900 JH	2
	3	=head1 TO DO
	4
	5	=over 4
	6
	7	=item *
	8
	9	LIST_CACHE doesn't work with ties to most DBM implementations, because
	10	Memouze tries to save a listref, and DB_File etc. can only store
	11	strings. This should at least be documented. Maybe Memoize could
	12	detect the problem at TIE time and throw a fatal error.
	13
899dc88a JH	14	20010623 This was added sometime prior to 20001025.
899dc88a JH	15
a0cb3900 JH	16	Try out MLDBM here and document it if it works.
	17
	18	=item *
	19
	20	We should extend the benchmarking module to allow
	21
	22	timethis(main, { MEMOIZED => [ suba, subb ] })
	23
	24	What would this do? It would time C<main> three times, once with
	25	C<suba> and C<subb> unmemoized, twice with them memoized.
	26
	27	Why would you want to do this? By the third set of runs, the memo
	28	tables would be fully populated, so all calls by C<main> to C<suba>
	29	and C<subb> would return immediately. You would be able to see how
	30	much of C<main>'s running time was due to time spent computing in
	31	C<suba> and C<subb>. If that was just a little time, you would know
	32	that optimizing or improving C<suba> and C<subb> would not have a
	33	large effect on the performance of C<main>. But if there was a big
	34	difference, you would know that C<suba> or C<subb> was a good
	35	candidate for optimization if you needed to make C<main> go faster.
	36
	37	Done.
	38
	39	=item *
	40
	41	Perhaps C<memoize> should return a reference to the original function
	42	as well as one to the memoized version? But the programmer could
	43	always construct such a reference themselves, so perhaps it's not
	44	necessary. We save such a reference anyway, so a new package method
	45	could return it on demand even if it wasn't provided by C<memoize>.
	46	We could even bless the new function reference so that it could have
	47	accessor methods for getting to the original function, the options,
	48	the memo table, etc.
	49
	50	Naah.
	51
	52	=item *
	53
	54	The TODISK feature is not ready yet. It will have to be rather
	55	complicated, providing options for which disk method to use (GDBM?
	56	DB_File? Flat file? Storable? User-supplied?) and which stringizing
	57	method to use (FreezeThaw? Marshal? User-supplied?)
	58
	59	Done!
	60
	61	=item *
	62
	63	Maybe an option for automatic expiration of cache values? (`After one
	64	day,' `After five uses,' etc.) Also possibly an option to limit the
	65	number of active entries with automatic LRU expiration.
	66
	67	You have a long note to Mike Cariaso that outlines a good approach
	68	that you sent on 9 April 1999.
	69
	70	What's the timeout stuff going to look like?
	71
	72	EXPIRE_TIME => time_in_sec
	73	EXPIRE_USES => num_uses
	74	MAXENTRIES => n
	75
	76	perhaps? Is EXPIRE_USES actually useful?
	77
	78	19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES.
	79	MAXENTRIES can come later as a separate module.
80
81	=item *
82
83	Put in a better example than C<fibo>. Show an example of a
84	nonrecursive function that simply takes a long time to run.
85	C<getpwuid> for example? But this exposes the bug that you can't say
86	C<memoize('getpwuid')>, so perhaps it's not a very good example.
87
88	Well, I did add the ColorToRGB example, but it's still not so good.
89	These examples need a lot of work. C<factorial> might be a better
90	example than C<fibo>.
91
92	=item *
93
94	Add more regression tests for normalizers.
95
96	=item *
97
98	Maybe resolve normalizer function to code-ref at memoize time instead
99	of at function call time for efficiency? I think there was some
100	reason not to do this, but I can't remember what it was.
101
102	=item *
103
104	Add more array value tests to the test suite.
105
106	Does it need more now?
107
108	=item *
109
110	Fix that `Subroutine u redefined ... line 484' message.
111
112	Fixed, I think.
113
114	=item *
115
116	Get rid of any remaining *{$ref}{CODE} or similar magic hashes.
117
118	=item *
119
120	There should be an option to dump out the memoized values or to
121	otherwise traverse them.
122
123	What for?
124
125	Maybe the tied hash interface taskes care of this anyway?
126
127	=item *
128
129	Include an example that caches DNS lookups.
130
131	=item *
132
133	Make tie for Storable (Memoize::Storable)
134
135	A prototype of Memoize::Storable is finished. Test it and add to the
136	test suite.
137
138	Done.
139
140	=item *
141
142	Make tie for DBI (Memoize::DBI)
143
144	=item *
145
146	I think there's a bug. See `###BUG'.
147
148	=item *
149
150	Storable probably can't be done, because it doesn't allow updating.
151	Maybe a different interface that supports readonly caches fronted by a
152	writable in-memory cache? A generic tied hash maybe?
153
154	FETCH {
155	if (it's in the memory hash) {
156	return it
157	} elsif (it's in the readonly disk hash) {
158	return it
159	} else {
160	not-there
161	}
162	}
163
164	STORE {
165	put it into the in-memory hash
166	}
167
168	Maybe `save' and `restore' methods?
169
170	It isn't working right because the destructor doesn't get called at
171	the right time.
172
173	This is fixed. `use strict vars' would have caught it immediately. Duh.
174
175	=item *
176
177	Don't forget about generic interface to Storable-like packages
178
899dc88a	179	20010627 It would appear that you put this into 0.51.
a0cb3900	180
899dc88a	181	=item *
a0cb3900 JH	182
	183	Maybe add in TODISK after all, with TODISK => 'filename' equivalent to
	184
	185	SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR\|O_CREAT, 0666],
	186	LIST_CACHE => MERGE
	187
	188	=item *
	189
	190	Maybe the default for LIST_CACHE should be MERGE anyway.
	191
	192	=item *
	193
	194	There's some terrible bug probably related to use under threaded perl,
	195	possibly connected with line 56:
	196
	197	my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }";
	198
	199	I think becayse C<@_> is lexically scoped in threadperl, the effect of
	200	C<unshift> never makes it into C<_memoizer>. That's probably a bug in
	201	Perl, but maybe I should work around it. Can anyone provide more
	202	information here, or lend me a machine with threaded Perl where I can
	203	test this theory? Line 59, currently commented out, may fix the
	204	problem.
	205
899dc88a JH	206	20010623 Working around this in 0.65, but it still blows.
899dc88a JH	207
a0cb3900 JH	208	=item *
	209
	210	Maybe if the original function has a prototype, the module can use
	211	that to select the most appropriate default normalizer. For example,
	212	if the prototype was C<($)>, there's no reason to use `join'. If it's
	213	C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>.
	214
	215	=item *
	216
	217	Ariel Scolnikov suggests using the change counting problem as an
	218	example. (How many ways to make change of a dollar?)
	219
	220	=item *
	221
899dc88a JH	222	Jonathan Roy found a use for `unmemoize'. If you're using the
	223	Storable glue, and your program gets SIGINT, you find that the cache
	224	data is not in the cache, because Perl normally writes it all out at
	225	once from a DESTROY method, and signals skip DESTROY processing. So
	226	you could add
a0cb3900 JH	227
	228	$sig{INT} = sub { unmemoize ... };
	229
a0cb3900 JH	230
	231	=item *
	232
	233	This means it would be useful to have a method to return references to
	234	all the currently-memoized functions so that you could say
	235
	236	$sig{INT} = sub { for $f (Memoize->all_memoized) {
	237	unmemoize $f;
	238	}
	239	}
	240
	241
	242	=item *
	243
	244	19990917 There should be a call you can make to get back the cache
	245	itself. If there were, then you could delete stuff from it to
	246	manually expire data items.
	247
	248	=item *
	249
	250	19990925 Randal says that the docs for Memoize;:Expire should make it
	251	clear that the expired entries are never flushed all at once. He
	252	asked if you would need to do that manually. I said:
	253
	254	Right, if that's what you want. If you have EXISTS return false,
	255	it'll throw away the old cached item and replace it in the cache
	256	with a new item. But if you want the cache to actually get smaller,
	257	you have to do that yourself.
	258
	259	I was planning to build an Expire module that implemented an LRU
	260	queue and kept the cache at a constant fixed size, but I didn't get
	261	to it yet. It's not clear to me that the automatic exptynig-out
	262	behavior is very useful anyway. The whole point of a cache is to
	263	trade space for time, so why bother going through the cache to throw
	264	away old items before you need to?
	265
	266	Randal then pointed out that it could discard expired items at DESTRoY
	267	or TIEHASH time, which seemed like a good idea, because if the cache
	268	is on disk you might like to keep it as small as possible.
	269
	270	=item *
	271
	272	19991219 Philip Gwyn suggests this technique: You have a load_file
	273	function that memoizes the file contexts. But then if the file
	274	changes you get the old contents. So add a normalizer that does
	275
	276	return join $;, (stat($_[0])[9]), $_[0];
	277
	278	Now when the modification date changes, the true key returned by the
	279	normalizer is different, so you get a cache miss and it loads the new
	280	contents. Disadvantage: The old contents are still in the cache. I
	281	think it makes more sense to have a special expiration manager for
	282	this. Make one up and bundle it.
	283
	284	19991220 I have one written: Memoize::ExpireFile. But how can you
	285	make this work when the function might have several arguments, of
	286	which some are filenames and some aren't?
	287
	288	=item *
	289
	290	19991219 There should be an inheritable TIEHASH method that does the
	291	argument processing properly.
	292
	293	19991220 Philip Gwyn contributed a patch for this.
294
295	20001231 You should really put this in. Jonathan Roy uncovered a
296	problem that it will be needed to solve. Here's the problem: He has:
297
298	memoize "get_items",
299	LIST_CACHE => ["TIE", "Memoize::Expire",
300	LIFETIME => 86400,
301	TIE => ["DB_File", "debug.db", O_CREAT\|O_RDWR, 0666]
302	];
303
304	This won't work, because memoize is trying to store listrefs in a
305	DB_File. He owuld have gotten a fatal error if he had done this:
306
307	memoize "get_items",
308	LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT\|O_RDWR, 0666]'
309
310
311	But in this case, he tied the cache to Memoize::Expire, which is not
312	scalar-only, and the check for scalar-only ties is missing from
313	Memoize::Expire. The inheritable method can take care of this.
314
899dc88a JH	315	20010623 I decided not to put it in. Instead, we avoid the problem by
	316	getting rid of TIE. The HASH option does the same thing, and HASH is
	317	so simple to support that a module is superfluous.
	318
a0cb3900 JH	319	=item *
	320
	321	20001130 Custom cache manager that checks to make sure the function
	322	return values actually match the memoized values.
	323
	324	=item *
	325
	326	20001231 Expiration manager that watches cache performance and
	327	accumulates statistics. Variation: Have it automatically unmemoize
	328	the function if performance is bad.
	329
	330	=item *
	331
	332	20010517 Option to have normalizer modify @_ for use by memoized
	333	function. This would save code and time in cases like the one in the
	334	manual under 'NORMALIZER', where both f() and normalize_f() do the
	335	same analysis and make the same adjustments to the hash. If the
	336	normalizer could make the adjustments and save the changes in @_, you
	337	wouldn't have to do it twice.
	338
3d4a255c	339	=item *
899dc88a JH	340	20010623 Add CLEAR methods to tied hash modules.
899dc88a JH	341
3d4a255c	342	=item *
899dc88a JH	343	20010623 You get a warning if you try to use DB_File as LIST_CACHE,
	344	because it won't store lists. But if you use it as the underlying
	345	cache with an expiration manager in the middle, no warning---the
	346	expiration manager doesn't know it's managing a list cache, and
	347	memoize doesn't know that DB_File is underlying. Is this fixable?
	348	Probably not, but think about it.
	349
a0cb3900 JH	350	=item *
	351	There was probably some other stuff that I forgot.
	352
	353
	354
	355	=back