Commit | Line | Data |
---|---|---|
3d4a255c | 1 | # Version 0.05 alpha $Revision: 1.6 $ $Date: 2001/06/24 17:11:26 $ |
a0cb3900 JH |
2 | |
3 | =head1 TO DO | |
4 | ||
5 | =over 4 | |
6 | ||
7 | =item * | |
8 | ||
9 | LIST_CACHE doesn't work with ties to most DBM implementations, because | |
10 | Memouze tries to save a listref, and DB_File etc. can only store | |
11 | strings. This should at least be documented. Maybe Memoize could | |
12 | detect the problem at TIE time and throw a fatal error. | |
13 | ||
899dc88a JH |
14 | 20010623 This was added sometime prior to 20001025. |
15 | ||
a0cb3900 JH |
16 | Try out MLDBM here and document it if it works. |
17 | ||
18 | =item * | |
19 | ||
20 | We should extend the benchmarking module to allow | |
21 | ||
22 | timethis(main, { MEMOIZED => [ suba, subb ] }) | |
23 | ||
24 | What would this do? It would time C<main> three times, once with | |
25 | C<suba> and C<subb> unmemoized, twice with them memoized. | |
26 | ||
27 | Why would you want to do this? By the third set of runs, the memo | |
28 | tables would be fully populated, so all calls by C<main> to C<suba> | |
29 | and C<subb> would return immediately. You would be able to see how | |
30 | much of C<main>'s running time was due to time spent computing in | |
31 | C<suba> and C<subb>. If that was just a little time, you would know | |
32 | that optimizing or improving C<suba> and C<subb> would not have a | |
33 | large effect on the performance of C<main>. But if there was a big | |
34 | difference, you would know that C<suba> or C<subb> was a good | |
35 | candidate for optimization if you needed to make C<main> go faster. | |
36 | ||
37 | Done. | |
38 | ||
39 | =item * | |
40 | ||
41 | Perhaps C<memoize> should return a reference to the original function | |
42 | as well as one to the memoized version? But the programmer could | |
43 | always construct such a reference themselves, so perhaps it's not | |
44 | necessary. We save such a reference anyway, so a new package method | |
45 | could return it on demand even if it wasn't provided by C<memoize>. | |
46 | We could even bless the new function reference so that it could have | |
47 | accessor methods for getting to the original function, the options, | |
48 | the memo table, etc. | |
49 | ||
50 | Naah. | |
51 | ||
52 | =item * | |
53 | ||
54 | The TODISK feature is not ready yet. It will have to be rather | |
55 | complicated, providing options for which disk method to use (GDBM? | |
56 | DB_File? Flat file? Storable? User-supplied?) and which stringizing | |
57 | method to use (FreezeThaw? Marshal? User-supplied?) | |
58 | ||
59 | Done! | |
60 | ||
61 | =item * | |
62 | ||
63 | Maybe an option for automatic expiration of cache values? (`After one | |
64 | day,' `After five uses,' etc.) Also possibly an option to limit the | |
65 | number of active entries with automatic LRU expiration. | |
66 | ||
67 | You have a long note to Mike Cariaso that outlines a good approach | |
68 | that you sent on 9 April 1999. | |
69 | ||
70 | What's the timeout stuff going to look like? | |
71 | ||
72 | EXPIRE_TIME => time_in_sec | |
73 | EXPIRE_USES => num_uses | |
74 | MAXENTRIES => n | |
75 | ||
76 | perhaps? Is EXPIRE_USES actually useful? | |
77 | ||
78 | 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES. | |
79 | MAXENTRIES can come later as a separate module. | |
80 | ||
81 | =item * | |
82 | ||
83 | Put in a better example than C<fibo>. Show an example of a | |
84 | nonrecursive function that simply takes a long time to run. | |
85 | C<getpwuid> for example? But this exposes the bug that you can't say | |
86 | C<memoize('getpwuid')>, so perhaps it's not a very good example. | |
87 | ||
88 | Well, I did add the ColorToRGB example, but it's still not so good. | |
89 | These examples need a lot of work. C<factorial> might be a better | |
90 | example than C<fibo>. | |
91 | ||
92 | =item * | |
93 | ||
94 | Add more regression tests for normalizers. | |
95 | ||
96 | =item * | |
97 | ||
98 | Maybe resolve normalizer function to code-ref at memoize time instead | |
99 | of at function call time for efficiency? I think there was some | |
100 | reason not to do this, but I can't remember what it was. | |
101 | ||
102 | =item * | |
103 | ||
104 | Add more array value tests to the test suite. | |
105 | ||
106 | Does it need more now? | |
107 | ||
108 | =item * | |
109 | ||
110 | Fix that `Subroutine u redefined ... line 484' message. | |
111 | ||
112 | Fixed, I think. | |
113 | ||
114 | =item * | |
115 | ||
116 | Get rid of any remaining *{$ref}{CODE} or similar magic hashes. | |
117 | ||
118 | =item * | |
119 | ||
120 | There should be an option to dump out the memoized values or to | |
121 | otherwise traverse them. | |
122 | ||
123 | What for? | |
124 | ||
125 | Maybe the tied hash interface taskes care of this anyway? | |
126 | ||
127 | =item * | |
128 | ||
129 | Include an example that caches DNS lookups. | |
130 | ||
131 | =item * | |
132 | ||
133 | Make tie for Storable (Memoize::Storable) | |
134 | ||
135 | A prototype of Memoize::Storable is finished. Test it and add to the | |
136 | test suite. | |
137 | ||
138 | Done. | |
139 | ||
140 | =item * | |
141 | ||
142 | Make tie for DBI (Memoize::DBI) | |
143 | ||
144 | =item * | |
145 | ||
146 | I think there's a bug. See `###BUG'. | |
147 | ||
148 | =item * | |
149 | ||
150 | Storable probably can't be done, because it doesn't allow updating. | |
151 | Maybe a different interface that supports readonly caches fronted by a | |
152 | writable in-memory cache? A generic tied hash maybe? | |
153 | ||
154 | FETCH { | |
155 | if (it's in the memory hash) { | |
156 | return it | |
157 | } elsif (it's in the readonly disk hash) { | |
158 | return it | |
159 | } else { | |
160 | not-there | |
161 | } | |
162 | } | |
163 | ||
164 | STORE { | |
165 | put it into the in-memory hash | |
166 | } | |
167 | ||
168 | Maybe `save' and `restore' methods? | |
169 | ||
170 | It isn't working right because the destructor doesn't get called at | |
171 | the right time. | |
172 | ||
173 | This is fixed. `use strict vars' would have caught it immediately. Duh. | |
174 | ||
175 | =item * | |
176 | ||
177 | Don't forget about generic interface to Storable-like packages | |
178 | ||
899dc88a | 179 | 20010627 It would appear that you put this into 0.51. |
a0cb3900 | 180 | |
899dc88a | 181 | =item * |
a0cb3900 JH |
182 | |
183 | Maybe add in TODISK after all, with TODISK => 'filename' equivalent to | |
184 | ||
185 | SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666], | |
186 | LIST_CACHE => MERGE | |
187 | ||
188 | =item * | |
189 | ||
190 | Maybe the default for LIST_CACHE should be MERGE anyway. | |
191 | ||
192 | =item * | |
193 | ||
194 | There's some terrible bug probably related to use under threaded perl, | |
195 | possibly connected with line 56: | |
196 | ||
197 | my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }"; | |
198 | ||
199 | I think becayse C<@_> is lexically scoped in threadperl, the effect of | |
200 | C<unshift> never makes it into C<_memoizer>. That's probably a bug in | |
201 | Perl, but maybe I should work around it. Can anyone provide more | |
202 | information here, or lend me a machine with threaded Perl where I can | |
203 | test this theory? Line 59, currently commented out, may fix the | |
204 | problem. | |
205 | ||
899dc88a JH |
206 | 20010623 Working around this in 0.65, but it still blows. |
207 | ||
a0cb3900 JH |
208 | =item * |
209 | ||
210 | Maybe if the original function has a prototype, the module can use | |
211 | that to select the most appropriate default normalizer. For example, | |
212 | if the prototype was C<($)>, there's no reason to use `join'. If it's | |
213 | C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>. | |
214 | ||
215 | =item * | |
216 | ||
217 | Ariel Scolnikov suggests using the change counting problem as an | |
218 | example. (How many ways to make change of a dollar?) | |
219 | ||
220 | =item * | |
221 | ||
899dc88a JH |
222 | Jonathan Roy found a use for `unmemoize'. If you're using the |
223 | Storable glue, and your program gets SIGINT, you find that the cache | |
224 | data is not in the cache, because Perl normally writes it all out at | |
225 | once from a DESTROY method, and signals skip DESTROY processing. So | |
226 | you could add | |
a0cb3900 JH |
227 | |
228 | $sig{INT} = sub { unmemoize ... }; | |
229 | ||
a0cb3900 JH |
230 | |
231 | =item * | |
232 | ||
233 | This means it would be useful to have a method to return references to | |
234 | all the currently-memoized functions so that you could say | |
235 | ||
236 | $sig{INT} = sub { for $f (Memoize->all_memoized) { | |
237 | unmemoize $f; | |
238 | } | |
239 | } | |
240 | ||
241 | ||
242 | =item * | |
243 | ||
244 | 19990917 There should be a call you can make to get back the cache | |
245 | itself. If there were, then you could delete stuff from it to | |
246 | manually expire data items. | |
247 | ||
248 | =item * | |
249 | ||
250 | 19990925 Randal says that the docs for Memoize;:Expire should make it | |
251 | clear that the expired entries are never flushed all at once. He | |
252 | asked if you would need to do that manually. I said: | |
253 | ||
254 | Right, if that's what you want. If you have EXISTS return false, | |
255 | it'll throw away the old cached item and replace it in the cache | |
256 | with a new item. But if you want the cache to actually get smaller, | |
257 | you have to do that yourself. | |
258 | ||
259 | I was planning to build an Expire module that implemented an LRU | |
260 | queue and kept the cache at a constant fixed size, but I didn't get | |
261 | to it yet. It's not clear to me that the automatic exptynig-out | |
262 | behavior is very useful anyway. The whole point of a cache is to | |
263 | trade space for time, so why bother going through the cache to throw | |
264 | away old items before you need to? | |
265 | ||
266 | Randal then pointed out that it could discard expired items at DESTRoY | |
267 | or TIEHASH time, which seemed like a good idea, because if the cache | |
268 | is on disk you might like to keep it as small as possible. | |
269 | ||
270 | =item * | |
271 | ||
272 | 19991219 Philip Gwyn suggests this technique: You have a load_file | |
273 | function that memoizes the file contexts. But then if the file | |
274 | changes you get the old contents. So add a normalizer that does | |
275 | ||
276 | return join $;, (stat($_[0])[9]), $_[0]; | |
277 | ||
278 | Now when the modification date changes, the true key returned by the | |
279 | normalizer is different, so you get a cache miss and it loads the new | |
280 | contents. Disadvantage: The old contents are still in the cache. I | |
281 | think it makes more sense to have a special expiration manager for | |
282 | this. Make one up and bundle it. | |
283 | ||
284 | 19991220 I have one written: Memoize::ExpireFile. But how can you | |
285 | make this work when the function might have several arguments, of | |
286 | which some are filenames and some aren't? | |
287 | ||
288 | =item * | |
289 | ||
290 | 19991219 There should be an inheritable TIEHASH method that does the | |
291 | argument processing properly. | |
292 | ||
293 | 19991220 Philip Gwyn contributed a patch for this. | |
294 | ||
295 | 20001231 You should really put this in. Jonathan Roy uncovered a | |
296 | problem that it will be needed to solve. Here's the problem: He has: | |
297 | ||
298 | memoize "get_items", | |
299 | LIST_CACHE => ["TIE", "Memoize::Expire", | |
300 | LIFETIME => 86400, | |
301 | TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666] | |
302 | ]; | |
303 | ||
304 | This won't work, because memoize is trying to store listrefs in a | |
305 | DB_File. He owuld have gotten a fatal error if he had done this: | |
306 | ||
307 | memoize "get_items", | |
308 | LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]' | |
309 | ||
310 | ||
311 | But in this case, he tied the cache to Memoize::Expire, which is *not* | |
312 | scalar-only, and the check for scalar-only ties is missing from | |
313 | Memoize::Expire. The inheritable method can take care of this. | |
314 | ||
899dc88a JH |
315 | 20010623 I decided not to put it in. Instead, we avoid the problem by |
316 | getting rid of TIE. The HASH option does the same thing, and HASH is | |
317 | so simple to support that a module is superfluous. | |
318 | ||
a0cb3900 JH |
319 | =item * |
320 | ||
321 | 20001130 Custom cache manager that checks to make sure the function | |
322 | return values actually match the memoized values. | |
323 | ||
324 | =item * | |
325 | ||
326 | 20001231 Expiration manager that watches cache performance and | |
327 | accumulates statistics. Variation: Have it automatically unmemoize | |
328 | the function if performance is bad. | |
329 | ||
330 | =item * | |
331 | ||
332 | 20010517 Option to have normalizer *modify* @_ for use by memoized | |
333 | function. This would save code and time in cases like the one in the | |
334 | manual under 'NORMALIZER', where both f() and normalize_f() do the | |
335 | same analysis and make the same adjustments to the hash. If the | |
336 | normalizer could make the adjustments and save the changes in @_, you | |
337 | wouldn't have to do it twice. | |
338 | ||
3d4a255c | 339 | =item * |
899dc88a JH |
340 | 20010623 Add CLEAR methods to tied hash modules. |
341 | ||
3d4a255c | 342 | =item * |
899dc88a JH |
343 | 20010623 You get a warning if you try to use DB_File as LIST_CACHE, |
344 | because it won't store lists. But if you use it as the underlying | |
345 | cache with an expiration manager in the middle, no warning---the | |
346 | expiration manager doesn't know it's managing a list cache, and | |
347 | memoize doesn't know that DB_File is underlying. Is this fixable? | |
348 | Probably not, but think about it. | |
349 | ||
a0cb3900 JH |
350 | =item * |
351 | There was probably some other stuff that I forgot. | |
352 | ||
353 | ||
354 | ||
355 | =back |