Commit | Line | Data |
---|---|---|
463ee0b2 LW |
1 | |
2 | ||
3 | ||
4 | ||
5 | ||
6 | ||
7 | sdbm - Substitute DBM | |
8 | or | |
9 | Berkeley ndbm for Every UN*X[1] Made Simple | |
10 | ||
11 | Ozan (oz) Yigit | |
12 | ||
13 | The Guild of PD Software Toolmakers | |
14 | Toronto - Canada | |
15 | ||
16 | oz@nexus.yorku.ca | |
17 | ||
18 | ||
19 | ||
20 | Implementation is the sincerest form of flattery. - L. Peter | |
21 | Deutsch | |
22 | ||
23 | A The Clone of the ndbm library | |
24 | ||
25 | The sources accompanying this notice - sdbm - consti- | |
26 | tute the first public release (Dec. 1990) of a complete | |
27 | clone of the Berkeley UN*X ndbm library. The sdbm library is | |
28 | meant to clone the proven functionality of ndbm as closely | |
29 | as possible, including a few improvements. It is practical, | |
30 | easy to understand, and compatible. The sdbm library is not | |
31 | derived from any licensed, proprietary or copyrighted | |
32 | software. | |
33 | ||
34 | The sdbm implementation is based on a 1978 algorithm | |
882ce583 | 35 | [Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". |
463ee0b2 LW |
36 | In the course of searching for a substitute for ndbm, I pro- |
37 | totyped three different external-hashing algorithms [Lar78, | |
38 | Fag79, Lit80] and ultimately chose Larson's algorithm as a | |
39 | basis of the sdbm implementation. The Bell Labs dbm (and | |
40 | therefore ndbm) is based on an algorithm invented by Ken | |
41 | Thompson, [Tho90, Tor87] and predates Larson's work. | |
42 | ||
43 | The sdbm programming interface is totally compatible | |
44 | with ndbm and includes a slight improvement in database ini- | |
45 | tialization. It is also expected to be binary-compatible | |
46 | under most UN*X versions that support the ndbm library. | |
47 | ||
48 | The sdbm implementation shares the shortcomings of the | |
49 | ndbm library, as a side effect of various simplifications to | |
50 | the original Larson algorithm. It does produce holes in the | |
51 | page file as it writes pages past the end of file. (Larson's | |
52 | paper include a clever solution to this problem that is a | |
53 | result of using the hash value directly as a block address.) | |
54 | On the other hand, extensive tests seem to indicate that | |
55 | sdbm creates fewer holes in general, and the resulting page- | |
56 | files are smaller. The sdbm implementation is also faster | |
57 | than ndbm in database creation. Unlike the ndbm, the sdbm | |
58 | _________________________ | |
59 | ||
60 | [1] UN*X is not a trademark of any (dis)organization. | |
61 | ||
62 | ||
63 | ||
64 | ||
65 | ||
66 | ||
67 | ||
68 | ||
69 | ||
70 | - 2 - | |
71 | ||
72 | ||
882ce583 | 73 | store operation will not "wander away" trying to split its |
463ee0b2 LW |
74 | data pages to insert a datum that cannot (due to elaborate |
75 | worst-case situations) be inserted. (It will fail after a | |
76 | pre-defined number of attempts.) | |
77 | ||
78 | Important Compatibility Warning | |
79 | ||
80 | The sdbm and ndbm libraries cannot share databases: one | |
81 | cannot read the (dir/pag) database created by the other. | |
82 | This is due to the differences between the ndbm and sdbm | |
83 | algorithms[2], and the hash functions used. It is easy to | |
84 | convert between the dbm/ndbm databases and sdbm by ignoring | |
85 | the index completely: see dbd, dbu etc. | |
86 | ||
87 | ||
88 | Notice of Intellectual Property | |
89 | ||
90 | The entire sdbm library package, as authored by me, Ozan S. | |
91 | Yigit, is hereby placed in the public domain. As such, the | |
92 | author is not responsible for the consequences of use of | |
93 | this software, no matter how awful, even if they arise from | |
94 | defects in it. There is no expressed or implied warranty for | |
95 | the sdbm library. | |
96 | ||
97 | Since the sdbm library package is in the public domain, | |
98 | this original release or any additional public-domain | |
99 | releases of the modified original cannot possibly (by defin- | |
100 | ition) be withheld from you. Also by definition, You (singu- | |
101 | lar) have all the rights to this code (including the right | |
102 | to sell without permission, the right to hoard[3] and the | |
103 | right to do other icky things as you see fit) but those | |
104 | rights are also granted to everyone else. | |
105 | ||
106 | Please note that all previous distributions of this | |
107 | software contained a copyright (which is now dropped) to | |
108 | protect its origins and its current public domain status | |
109 | against any possible claims and/or challenges. | |
110 | ||
111 | Acknowledgments | |
112 | ||
113 | Many people have been very helpful and supportive. A | |
114 | partial list would necessarily include Rayan Zacherissen | |
115 | (who contributed the man page, and also hacked a MMAP | |
116 | _________________________ | |
117 | ||
118 | [2] Torek's discussion [Tor87] indicates that | |
119 | dbm/ndbm implementations use the hash value to traverse | |
120 | the radix trie differently than sdbm and as a result, | |
121 | the page indexes are generated in different order. For | |
122 | more information, send e-mail to the author. | |
123 | [3] You cannot really hoard something that is avail- | |
124 | able to the public at large, but try if it makes you | |
125 | feel any better. | |
126 | ||
127 | ||
128 | ||
129 | ||
130 | ||
131 | ||
132 | ||
133 | ||
134 | ||
135 | ||
136 | - 3 - | |
137 | ||
138 | ||
139 | version of sdbm), Arnold Robbins, Chris Lewis, Bill David- | |
140 | sen, Henry Spencer, Geoff Collyer, Rich Salz (who got me | |
141 | started in the first place), Johannes Ruschein (who did the | |
142 | minix port) and David Tilbrook. I thank you all. | |
143 | ||
144 | Distribution Manifest and Notes | |
145 | ||
146 | This distribution of sdbm includes (at least) the following: | |
147 | ||
148 | CHANGES change log | |
149 | README this file. | |
150 | biblio a small bibliography on external hashing | |
151 | dba.c a crude (n/s)dbm page file analyzer | |
152 | dbd.c a crude (n/s)dbm page file dumper (for conversion) | |
153 | dbe.1 man page for dbe.c | |
154 | dbe.c Janick's database editor | |
155 | dbm.c a dbm library emulation wrapper for ndbm/sdbm | |
156 | dbm.h header file for the above | |
157 | dbu.c a crude db management utility | |
158 | hash.c hashing function | |
159 | makefile guess. | |
160 | pair.c page-level routines (posted earlier) | |
161 | pair.h header file for the above | |
162 | readme.ms troff source for the README file | |
163 | sdbm.3 man page | |
164 | sdbm.c the real thing | |
165 | sdbm.h header file for the above | |
166 | tune.h place for tuning & portability thingies | |
167 | util.c miscellaneous | |
168 | ||
169 | dbu is a simple database manipulation program[4] that | |
170 | tries to look like Bell Labs' cbt utility. It is currently | |
171 | incomplete in functionality. I use dbu to test out the rou- | |
172 | tines: it takes (from stdin) tab separated key/value pairs | |
173 | for commands like build or insert or takes keys for commands | |
174 | like delete or look. | |
175 | ||
176 | dbu <build|creat|look|insert|cat|delete> dbmfile | |
177 | ||
178 | dba is a crude analyzer of dbm/sdbm/ndbm page files. It | |
179 | scans the entire page file, reporting page level statistics, | |
180 | and totals at the end. | |
181 | ||
182 | dbd is a crude dump program for dbm/ndbm/sdbm data- | |
183 | bases. It ignores the bitmap, and dumps the data pages in | |
184 | sequence. It can be used to create input for the dbu util- | |
185 | ity. Note that dbd will skip any NULLs in the key and data | |
186 | fields, thus is unsuitable to convert some peculiar | |
187 | _________________________ | |
188 | ||
189 | [4] The dbd, dba, dbu utilities are quick hacks and | |
190 | are not fit for production use. They were developed | |
191 | late one night, just to test out sdbm, and convert some | |
192 | databases. | |
193 | ||
194 | ||
195 | ||
196 | ||
197 | ||
198 | ||
199 | ||
200 | ||
201 | ||
202 | - 4 - | |
203 | ||
204 | ||
205 | databases that insist in including the terminating null. | |
206 | ||
207 | I have also included a copy of the dbe (ndbm DataBase | |
208 | Editor) by Janick Bergeron [janick@bnr.ca] for your pleas- | |
209 | ure. You may find it more useful than the little dbu util- | |
210 | ity. | |
211 | ||
212 | dbm.[ch] is a dbm library emulation on top of ndbm (and | |
213 | hence suitable for sdbm). Written by Robert Elz. | |
214 | ||
215 | The sdbm library has been around in beta test for quite | |
216 | a long time, and from whatever little feedback I received | |
217 | (maybe no news is good news), I believe it has been func- | |
218 | tioning without any significant problems. I would, of | |
219 | course, appreciate all fixes and/or improvements. Portabil- | |
220 | ity enhancements would especially be useful. | |
221 | ||
222 | Implementation Issues | |
223 | ||
224 | Hash functions: The algorithm behind sdbm implementa- | |
225 | tion needs a good bit-scrambling hash function to be effec- | |
226 | tive. I ran into a set of constants for a simple hash func- | |
227 | tion that seem to help sdbm perform better than ndbm for | |
228 | various inputs: | |
229 | ||
230 | /* | |
231 | * polynomial conversion ignoring overflows | |
232 | * 65599 nice. 65587 even better. | |
233 | */ | |
234 | long | |
235 | dbm_hash(char *str, int len) { | |
5aaab254 | 236 | unsigned long n = 0; |
463ee0b2 LW |
237 | |
238 | while (len--) | |
239 | n = n * 65599 + *str++; | |
240 | return n; | |
241 | } | |
242 | ||
243 | There may be better hash functions for the purposes of | |
244 | dynamic hashing. Try your favorite, and check the pagefile. | |
245 | If it contains too many pages with too many holes, (in rela- | |
246 | tion to this one for example) or if sdbm simply stops work- | |
247 | ing (fails after SPLTMAX attempts to split) when you feed | |
248 | your NEWS history file to it, you probably do not have a | |
249 | good hashing function. If you do better (for different | |
250 | types of input), I would like to know about the function you | |
251 | use. | |
252 | ||
253 | Block sizes: It seems (from various tests on a few | |
254 | machines) that a page file block size PBLKSIZ of 1024 is by | |
255 | far the best for performance, but this also happens to limit | |
256 | the size of a key/value pair. Depending on your needs, you | |
257 | may wish to increase the page size, and also adjust PAIRMAX | |
258 | (the maximum size of a key/value pair allowed: should always | |
259 | ||
260 | ||
261 | ||
262 | ||
263 | ||
264 | ||
265 | ||
266 | ||
267 | ||
268 | - 5 - | |
269 | ||
270 | ||
271 | be at least three words smaller than PBLKSIZ.) accordingly. | |
272 | The system-wide version of the library should probably be | |
273 | configured with 1024 (distribution default), as this appears | |
274 | to be sufficient for most common uses of sdbm. | |
275 | ||
276 | Portability | |
277 | ||
278 | This package has been tested in many different UN*Xes | |
279 | even including minix, and appears to be reasonably portable. | |
280 | This does not mean it will port easily to non-UN*X systems. | |
281 | ||
282 | Notes and Miscellaneous | |
283 | ||
284 | The sdbm is not a very complicated package, at least | |
285 | not after you familiarize yourself with the literature on | |
286 | external hashing. There are other interesting algorithms in | |
287 | existence that ensure (approximately) single-read access to | |
288 | a data value associated with any key. These are directory- | |
289 | less schemes such as linear hashing [Lit80] (+ Larson varia- | |
290 | tions), spiral storage [Mar79] or directory schemes such as | |
291 | extensible hashing [Fag79] by Fagin et al. I do hope these | |
292 | sources provide a reasonable playground for experimentation | |
293 | with other algorithms. See the June 1988 issue of ACM Com- | |
294 | puting Surveys [Enb88] for an excellent overview of the | |
295 | field. | |
296 | ||
297 | References | |
298 | ||
299 | ||
300 | [Lar78] | |
882ce583 | 301 | P.-A. Larson, "Dynamic Hashing", BIT, vol. 18, pp. |
463ee0b2 LW |
302 | 184-201, 1978. |
303 | ||
304 | [Tho90] | |
305 | Ken Thompson, private communication, Nov. 1990 | |
306 | ||
307 | [Lit80] | |
6b0bcbb1 JK |
308 | W. Litwin, "Linear Hashing: A new tool for file and |
309 | table addressing", Proceedings of the 6th Conference on | |
463ee0b2 LW |
310 | Very Large Dabatases (Montreal), pp. 212-223, Very |
311 | Large Database Foundation, Saratoga, Calif., 1980. | |
312 | ||
313 | [Fag79] | |
314 | R. Fagin, J. Nievergelt, N. Pippinger, and H. R. | |
6b0bcbb1 JK |
315 | Strong, "Extendible Hashing - A Fast Access Method for |
316 | Dynamic Files", ACM Trans. Database Syst., vol. 4, | |
463ee0b2 LW |
317 | no.3, pp. 315-344, Sept. 1979. |
318 | ||
319 | [Wal84] | |
6b0bcbb1 | 320 | Rich Wales, "Discussion of 'dbm' data base system", |
463ee0b2 LW |
321 | USENET newsgroup unix.wizards, Jan. 1984. |
322 | ||
323 | [Tor87] | |
882ce583 | 324 | Chris Torek, "Re: dbm.a and ndbm.a archives", |
463ee0b2 LW |
325 | |
326 | ||
327 | ||
328 | ||
329 | ||
330 | ||
331 | ||
332 | ||
333 | ||
334 | - 6 - | |
335 | ||
336 | ||
337 | USENET newsgroup comp.unix, 1987. | |
338 | ||
339 | [Mar79] | |
6b0bcbb1 JK |
340 | G. N. Martin, "Spiral Storage: Incrementally Augment- |
341 | able Hash Addressed Storage", Technical Report #27, | |
463ee0b2 LW |
342 | University of Varwick, Coventry, U.K., 1979. |
343 | ||
344 | [Enb88] | |
6b0bcbb1 JK |
345 | R. J. Enbody and H. C. Du, "Dynamic Hashing |
346 | Schemes",ACM Computing Surveys, vol. 20, no. 2, pp. | |
463ee0b2 LW |
347 | 85-113, June 1988. |
348 | ||
349 | ||
350 | ||
351 | ||
352 | ||
353 | ||
354 | ||
355 | ||
356 | ||
357 | ||
358 | ||
359 | ||
360 | ||
361 | ||
362 | ||
363 | ||
364 | ||
365 | ||
366 | ||
367 | ||
368 | ||
369 | ||
370 | ||
371 | ||
372 | ||
373 | ||
374 | ||
375 | ||
376 | ||
377 | ||
378 | ||
379 | ||
380 | ||
381 | ||
382 | ||
383 | ||
384 | ||
385 | ||
386 | ||
387 | ||
388 | ||
389 | ||
390 | ||
391 | ||
392 | ||
393 | ||
394 | ||
395 | ||
396 |