Commit | Line | Data |
---|---|---|
a0cb3900 JH |
1 | # Version 0.05 alpha $Revision: 1.5 $ $Date: 1999/09/17 14:57:55 $ |
2 | ||
3 | =head1 TO DO | |
4 | ||
5 | =over 4 | |
6 | ||
7 | =item * | |
8 | ||
9 | LIST_CACHE doesn't work with ties to most DBM implementations, because | |
10 | Memouze tries to save a listref, and DB_File etc. can only store | |
11 | strings. This should at least be documented. Maybe Memoize could | |
12 | detect the problem at TIE time and throw a fatal error. | |
13 | ||
14 | Try out MLDBM here and document it if it works. | |
15 | ||
16 | =item * | |
17 | ||
18 | We should extend the benchmarking module to allow | |
19 | ||
20 | timethis(main, { MEMOIZED => [ suba, subb ] }) | |
21 | ||
22 | What would this do? It would time C<main> three times, once with | |
23 | C<suba> and C<subb> unmemoized, twice with them memoized. | |
24 | ||
25 | Why would you want to do this? By the third set of runs, the memo | |
26 | tables would be fully populated, so all calls by C<main> to C<suba> | |
27 | and C<subb> would return immediately. You would be able to see how | |
28 | much of C<main>'s running time was due to time spent computing in | |
29 | C<suba> and C<subb>. If that was just a little time, you would know | |
30 | that optimizing or improving C<suba> and C<subb> would not have a | |
31 | large effect on the performance of C<main>. But if there was a big | |
32 | difference, you would know that C<suba> or C<subb> was a good | |
33 | candidate for optimization if you needed to make C<main> go faster. | |
34 | ||
35 | Done. | |
36 | ||
37 | =item * | |
38 | ||
39 | Perhaps C<memoize> should return a reference to the original function | |
40 | as well as one to the memoized version? But the programmer could | |
41 | always construct such a reference themselves, so perhaps it's not | |
42 | necessary. We save such a reference anyway, so a new package method | |
43 | could return it on demand even if it wasn't provided by C<memoize>. | |
44 | We could even bless the new function reference so that it could have | |
45 | accessor methods for getting to the original function, the options, | |
46 | the memo table, etc. | |
47 | ||
48 | Naah. | |
49 | ||
50 | =item * | |
51 | ||
52 | The TODISK feature is not ready yet. It will have to be rather | |
53 | complicated, providing options for which disk method to use (GDBM? | |
54 | DB_File? Flat file? Storable? User-supplied?) and which stringizing | |
55 | method to use (FreezeThaw? Marshal? User-supplied?) | |
56 | ||
57 | Done! | |
58 | ||
59 | =item * | |
60 | ||
61 | Maybe an option for automatic expiration of cache values? (`After one | |
62 | day,' `After five uses,' etc.) Also possibly an option to limit the | |
63 | number of active entries with automatic LRU expiration. | |
64 | ||
65 | You have a long note to Mike Cariaso that outlines a good approach | |
66 | that you sent on 9 April 1999. | |
67 | ||
68 | What's the timeout stuff going to look like? | |
69 | ||
70 | EXPIRE_TIME => time_in_sec | |
71 | EXPIRE_USES => num_uses | |
72 | MAXENTRIES => n | |
73 | ||
74 | perhaps? Is EXPIRE_USES actually useful? | |
75 | ||
76 | 19990916: Memoize::Expire does EXPIRE_TIME and EXPIRE_USES. | |
77 | MAXENTRIES can come later as a separate module. | |
78 | ||
79 | =item * | |
80 | ||
81 | Put in a better example than C<fibo>. Show an example of a | |
82 | nonrecursive function that simply takes a long time to run. | |
83 | C<getpwuid> for example? But this exposes the bug that you can't say | |
84 | C<memoize('getpwuid')>, so perhaps it's not a very good example. | |
85 | ||
86 | Well, I did add the ColorToRGB example, but it's still not so good. | |
87 | These examples need a lot of work. C<factorial> might be a better | |
88 | example than C<fibo>. | |
89 | ||
90 | =item * | |
91 | ||
92 | Add more regression tests for normalizers. | |
93 | ||
94 | =item * | |
95 | ||
96 | Maybe resolve normalizer function to code-ref at memoize time instead | |
97 | of at function call time for efficiency? I think there was some | |
98 | reason not to do this, but I can't remember what it was. | |
99 | ||
100 | =item * | |
101 | ||
102 | Add more array value tests to the test suite. | |
103 | ||
104 | Does it need more now? | |
105 | ||
106 | =item * | |
107 | ||
108 | Fix that `Subroutine u redefined ... line 484' message. | |
109 | ||
110 | Fixed, I think. | |
111 | ||
112 | =item * | |
113 | ||
114 | Get rid of any remaining *{$ref}{CODE} or similar magic hashes. | |
115 | ||
116 | =item * | |
117 | ||
118 | There should be an option to dump out the memoized values or to | |
119 | otherwise traverse them. | |
120 | ||
121 | What for? | |
122 | ||
123 | Maybe the tied hash interface taskes care of this anyway? | |
124 | ||
125 | =item * | |
126 | ||
127 | Include an example that caches DNS lookups. | |
128 | ||
129 | =item * | |
130 | ||
131 | Make tie for Storable (Memoize::Storable) | |
132 | ||
133 | A prototype of Memoize::Storable is finished. Test it and add to the | |
134 | test suite. | |
135 | ||
136 | Done. | |
137 | ||
138 | =item * | |
139 | ||
140 | Make tie for DBI (Memoize::DBI) | |
141 | ||
142 | =item * | |
143 | ||
144 | I think there's a bug. See `###BUG'. | |
145 | ||
146 | =item * | |
147 | ||
148 | Storable probably can't be done, because it doesn't allow updating. | |
149 | Maybe a different interface that supports readonly caches fronted by a | |
150 | writable in-memory cache? A generic tied hash maybe? | |
151 | ||
152 | FETCH { | |
153 | if (it's in the memory hash) { | |
154 | return it | |
155 | } elsif (it's in the readonly disk hash) { | |
156 | return it | |
157 | } else { | |
158 | not-there | |
159 | } | |
160 | } | |
161 | ||
162 | STORE { | |
163 | put it into the in-memory hash | |
164 | } | |
165 | ||
166 | Maybe `save' and `restore' methods? | |
167 | ||
168 | It isn't working right because the destructor doesn't get called at | |
169 | the right time. | |
170 | ||
171 | This is fixed. `use strict vars' would have caught it immediately. Duh. | |
172 | ||
173 | =item * | |
174 | ||
175 | Don't forget about generic interface to Storable-like packages | |
176 | ||
177 | =item * | |
178 | ||
179 | ||
180 | Maybe add in TODISK after all, with TODISK => 'filename' equivalent to | |
181 | ||
182 | SCALAR_CACHE => [TIE, Memoize::SDBM_File, $filename, O_RDWR|O_CREAT, 0666], | |
183 | LIST_CACHE => MERGE | |
184 | ||
185 | =item * | |
186 | ||
187 | Maybe the default for LIST_CACHE should be MERGE anyway. | |
188 | ||
189 | =item * | |
190 | ||
191 | There's some terrible bug probably related to use under threaded perl, | |
192 | possibly connected with line 56: | |
193 | ||
194 | my $wrapper = eval "sub { unshift \@_, qq{$cref}; goto &_memoizer; }"; | |
195 | ||
196 | I think becayse C<@_> is lexically scoped in threadperl, the effect of | |
197 | C<unshift> never makes it into C<_memoizer>. That's probably a bug in | |
198 | Perl, but maybe I should work around it. Can anyone provide more | |
199 | information here, or lend me a machine with threaded Perl where I can | |
200 | test this theory? Line 59, currently commented out, may fix the | |
201 | problem. | |
202 | ||
203 | =item * | |
204 | ||
205 | Maybe if the original function has a prototype, the module can use | |
206 | that to select the most appropriate default normalizer. For example, | |
207 | if the prototype was C<($)>, there's no reason to use `join'. If it's | |
208 | C<(\@)> then it can use C<join $;,@$_[0];> instead of C<join $;,@_;>. | |
209 | ||
210 | =item * | |
211 | ||
212 | Ariel Scolnikov suggests using the change counting problem as an | |
213 | example. (How many ways to make change of a dollar?) | |
214 | ||
215 | =item * | |
216 | ||
217 | I found a use for `unmemoize'. If you're using the Storable glue, and | |
218 | your program gets SIGINT, you find that the cache data is not in the | |
219 | cache, because Perl normally writes it all out at once from a | |
220 | DESTROY method, and signals skip DESTROY processing. So you could add | |
221 | ||
222 | $sig{INT} = sub { unmemoize ... }; | |
223 | ||
224 | (Jonathan Roy pointed this out) | |
225 | ||
226 | =item * | |
227 | ||
228 | This means it would be useful to have a method to return references to | |
229 | all the currently-memoized functions so that you could say | |
230 | ||
231 | $sig{INT} = sub { for $f (Memoize->all_memoized) { | |
232 | unmemoize $f; | |
233 | } | |
234 | } | |
235 | ||
236 | ||
237 | =item * | |
238 | ||
239 | 19990917 There should be a call you can make to get back the cache | |
240 | itself. If there were, then you could delete stuff from it to | |
241 | manually expire data items. | |
242 | ||
243 | =item * | |
244 | ||
245 | 19990925 Randal says that the docs for Memoize;:Expire should make it | |
246 | clear that the expired entries are never flushed all at once. He | |
247 | asked if you would need to do that manually. I said: | |
248 | ||
249 | Right, if that's what you want. If you have EXISTS return false, | |
250 | it'll throw away the old cached item and replace it in the cache | |
251 | with a new item. But if you want the cache to actually get smaller, | |
252 | you have to do that yourself. | |
253 | ||
254 | I was planning to build an Expire module that implemented an LRU | |
255 | queue and kept the cache at a constant fixed size, but I didn't get | |
256 | to it yet. It's not clear to me that the automatic exptynig-out | |
257 | behavior is very useful anyway. The whole point of a cache is to | |
258 | trade space for time, so why bother going through the cache to throw | |
259 | away old items before you need to? | |
260 | ||
261 | Randal then pointed out that it could discard expired items at DESTRoY | |
262 | or TIEHASH time, which seemed like a good idea, because if the cache | |
263 | is on disk you might like to keep it as small as possible. | |
264 | ||
265 | =item * | |
266 | ||
267 | 19991219 Philip Gwyn suggests this technique: You have a load_file | |
268 | function that memoizes the file contexts. But then if the file | |
269 | changes you get the old contents. So add a normalizer that does | |
270 | ||
271 | return join $;, (stat($_[0])[9]), $_[0]; | |
272 | ||
273 | Now when the modification date changes, the true key returned by the | |
274 | normalizer is different, so you get a cache miss and it loads the new | |
275 | contents. Disadvantage: The old contents are still in the cache. I | |
276 | think it makes more sense to have a special expiration manager for | |
277 | this. Make one up and bundle it. | |
278 | ||
279 | 19991220 I have one written: Memoize::ExpireFile. But how can you | |
280 | make this work when the function might have several arguments, of | |
281 | which some are filenames and some aren't? | |
282 | ||
283 | =item * | |
284 | ||
285 | 19991219 There should be an inheritable TIEHASH method that does the | |
286 | argument processing properly. | |
287 | ||
288 | 19991220 Philip Gwyn contributed a patch for this. | |
289 | ||
290 | 20001231 You should really put this in. Jonathan Roy uncovered a | |
291 | problem that it will be needed to solve. Here's the problem: He has: | |
292 | ||
293 | memoize "get_items", | |
294 | LIST_CACHE => ["TIE", "Memoize::Expire", | |
295 | LIFETIME => 86400, | |
296 | TIE => ["DB_File", "debug.db", O_CREAT|O_RDWR, 0666] | |
297 | ]; | |
298 | ||
299 | This won't work, because memoize is trying to store listrefs in a | |
300 | DB_File. He owuld have gotten a fatal error if he had done this: | |
301 | ||
302 | memoize "get_items", | |
303 | LIST_CACHE => ["TIE", "DB_File", "debug.db", O_CREAT|O_RDWR, 0666]' | |
304 | ||
305 | ||
306 | But in this case, he tied the cache to Memoize::Expire, which is *not* | |
307 | scalar-only, and the check for scalar-only ties is missing from | |
308 | Memoize::Expire. The inheritable method can take care of this. | |
309 | ||
310 | =item * | |
311 | ||
312 | 20001130 Custom cache manager that checks to make sure the function | |
313 | return values actually match the memoized values. | |
314 | ||
315 | =item * | |
316 | ||
317 | 20001231 Expiration manager that watches cache performance and | |
318 | accumulates statistics. Variation: Have it automatically unmemoize | |
319 | the function if performance is bad. | |
320 | ||
321 | =item * | |
322 | ||
323 | 20010517 Option to have normalizer *modify* @_ for use by memoized | |
324 | function. This would save code and time in cases like the one in the | |
325 | manual under 'NORMALIZER', where both f() and normalize_f() do the | |
326 | same analysis and make the same adjustments to the hash. If the | |
327 | normalizer could make the adjustments and save the changes in @_, you | |
328 | wouldn't have to do it twice. | |
329 | ||
330 | =item * | |
331 | There was probably some other stuff that I forgot. | |
332 | ||
333 | ||
334 | ||
335 | =back |