Commit | Line | Data |
---|---|---|
463ee0b2 LW |
1 | .\" tbl | readme.ms | [tn]roff -ms | ... |
2 | .\" note the "C" (courier) and "CB" fonts: you will probably have to | |
3 | .\" change these. | |
4 | .\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $ | |
5 | ||
6 | .de P1 | |
7 | .br | |
8 | .nr dT 4 | |
9 | .nf | |
10 | .ft C | |
11 | .sp .5 | |
12 | .nr t \\n(dT*\\w'x'u | |
13 | .ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu | |
14 | .. | |
15 | .de P2 | |
16 | .br | |
17 | .ft 1 | |
18 | .br | |
19 | .sp .5 | |
20 | .br | |
21 | .fi | |
22 | .. | |
23 | .\" CW uses the typewriter/courier font. | |
24 | .de CW | |
25 | \fC\\$1\\fP\\$2 | |
26 | .. | |
27 | ||
28 | .\" Footnote numbering [by Henry Spencer] | |
29 | .\" <text>\*f for a footnote number.. | |
30 | .\" .FS | |
31 | .\" \*F <footnote text> | |
32 | .\" .FE | |
33 | .\" | |
34 | .ds f \\u\\s-2\\n+f\\s+2\\d | |
35 | .nr f 0 1 | |
36 | .ds F \\n+F. | |
37 | .nr F 0 1 | |
38 | ||
39 | .ND | |
40 | .LP | |
41 | .TL | |
42 | \fIsdbm\fP \(em Substitute DBM | |
43 | .br | |
44 | or | |
45 | .br | |
46 | Berkeley \fIndbm\fP for Every UN*X\** Made Simple | |
47 | .AU | |
48 | Ozan (oz) Yigit | |
49 | .AI | |
50 | The Guild of PD Software Toolmakers | |
51 | Toronto - Canada | |
52 | .sp | |
53 | oz@nexus.yorku.ca | |
54 | .LP | |
55 | .FS | |
56 | UN*X is not a trademark of any (dis)organization. | |
57 | .FE | |
58 | .sp 2 | |
59 | \fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP | |
60 | .SH | |
61 | A The Clone of the \fIndbm\fP library | |
62 | .PP | |
63 | The sources accompanying this notice \(em \fIsdbm\fP \(em constitute | |
64 | the first public release (Dec. 1990) of a complete clone of | |
65 | the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to | |
66 | clone the proven functionality of \fIndbm\fP as closely as possible, | |
67 | including a few improvements. It is practical, easy to understand, and | |
68 | compatible. | |
69 | The \fIsdbm\fP library is not derived from any licensed, proprietary or | |
70 | copyrighted software. | |
71 | .PP | |
72 | The \fIsdbm\fP implementation is based on a 1978 algorithm | |
882ce583 | 73 | [Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". |
463ee0b2 LW |
74 | In the course of searching for a substitute for \fIndbm\fP, I |
75 | prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80] | |
76 | and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP | |
77 | implementation. The Bell Labs | |
78 | \fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by | |
79 | Ken Thompson, [Tho90, Tor87] and predates Larson's work. | |
80 | .PP | |
81 | The \fIsdbm\fR programming interface is totally compatible | |
82 | with \fIndbm\fP and includes a slight improvement in database initialization. | |
83 | It is also expected to be binary-compatible under most UN*X versions that | |
84 | support the \fIndbm\fP library. | |
85 | .PP | |
86 | The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP | |
87 | library, as a side effect of various simplifications to the original Larson | |
88 | algorithm. It does produce \fIholes\fP in the page file as it writes | |
89 | pages past the end of file. (Larson's paper include a clever solution to | |
90 | this problem that is a result of using the hash value directly as a block | |
91 | address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP | |
92 | creates fewer holes in general, and the resulting pagefiles are | |
93 | smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP | |
94 | in database creation. | |
95 | Unlike the \fIndbm\fP, the \fIsdbm\fP | |
96 | .CW store | |
882ce583 | 97 | operation will not "wander away" trying to split its |
463ee0b2 LW |
98 | data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case |
99 | situations) be inserted. (It will fail after a pre-defined number of attempts.) | |
100 | .SH | |
101 | Important Compatibility Warning | |
102 | .PP | |
103 | The \fIsdbm\fP and \fIndbm\fP | |
104 | libraries \fIcannot\fP share databases: one cannot read the (dir/pag) | |
105 | database created by the other. This is due to the differences | |
106 | between the \fIndbm\fP and \fIsdbm\fP algorithms\**, | |
107 | .FS | |
108 | Torek's discussion [Tor87] | |
109 | indicates that \fIdbm/ndbm\fP implementations use the hash | |
110 | value to traverse the radix trie differently than \fIsdbm\fP | |
111 | and as a result, the page indexes are generated in \fIdifferent\fP order. | |
112 | For more information, send e-mail to the author. | |
113 | .FE | |
114 | and the hash functions | |
115 | used. | |
116 | It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP | |
117 | by ignoring the index completely: see | |
118 | .CW dbd , | |
119 | .CW dbu | |
120 | etc. | |
121 | .R | |
122 | .LP | |
123 | .SH | |
124 | Notice of Intellectual Property | |
125 | .LP | |
126 | \fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit, | |
127 | \fIis hereby placed in the public domain.\fP As such, the author is not | |
128 | responsible for the consequences of use of this software, no matter how | |
129 | awful, even if they arise from defects in it. There is no expressed or | |
130 | implied warranty for the \fIsdbm\fP library. | |
131 | .PP | |
132 | Since the \fIsdbm\fP | |
133 | library package is in the public domain, this \fIoriginal\fP | |
134 | release or any additional public-domain releases of the modified original | |
135 | cannot possibly (by definition) be withheld from you. Also by definition, | |
136 | You (singular) have all the rights to this code (including the right to | |
137 | sell without permission, the right to hoard\** | |
138 | .FS | |
139 | You cannot really hoard something that is available to the public at | |
140 | large, but try if it makes you feel any better. | |
141 | .FE | |
142 | and the right to do other icky things as | |
143 | you see fit) but those rights are also granted to everyone else. | |
144 | .PP | |
145 | Please note that all previous distributions of this software contained | |
146 | a copyright (which is now dropped) to protect its | |
147 | origins and its current public domain status against any possible claims | |
148 | and/or challenges. | |
149 | .SH | |
150 | Acknowledgments | |
151 | .PP | |
152 | Many people have been very helpful and supportive. A partial list would | |
153 | necessarily include Rayan Zacherissen (who contributed the man page, | |
154 | and also hacked a MMAP version of \fIsdbm\fP), | |
155 | Arnold Robbins, Chris Lewis, | |
156 | Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started | |
157 | in the first place), Johannes Ruschein | |
158 | (who did the minix port) and David Tilbrook. I thank you all. | |
159 | .SH | |
160 | Distribution Manifest and Notes | |
161 | .LP | |
162 | This distribution of \fIsdbm\fP includes (at least) the following: | |
163 | .P1 | |
164 | CHANGES change log | |
165 | README this file. | |
166 | biblio a small bibliography on external hashing | |
167 | dba.c a crude (n/s)dbm page file analyzer | |
168 | dbd.c a crude (n/s)dbm page file dumper (for conversion) | |
169 | dbe.1 man page for dbe.c | |
170 | dbe.c Janick's database editor | |
171 | dbm.c a dbm library emulation wrapper for ndbm/sdbm | |
172 | dbm.h header file for the above | |
173 | dbu.c a crude db management utility | |
174 | hash.c hashing function | |
175 | makefile guess. | |
176 | pair.c page-level routines (posted earlier) | |
177 | pair.h header file for the above | |
178 | readme.ms troff source for the README file | |
179 | sdbm.3 man page | |
180 | sdbm.c the real thing | |
181 | sdbm.h header file for the above | |
182 | tune.h place for tuning & portability thingies | |
183 | util.c miscellaneous | |
184 | .P2 | |
185 | .PP | |
186 | .CW dbu | |
187 | is a simple database manipulation program\** that tries to look | |
188 | .FS | |
189 | The | |
190 | .CW dbd , | |
191 | .CW dba , | |
192 | .CW dbu | |
193 | utilities are quick hacks and are not fit for production use. They were | |
194 | developed late one night, just to test out \fIsdbm\fP, and convert some | |
195 | databases. | |
196 | .FE | |
197 | like Bell Labs' | |
198 | .CW cbt | |
199 | utility. It is currently incomplete in functionality. | |
200 | I use | |
201 | .CW dbu | |
202 | to test out the routines: it takes (from stdin) tab separated | |
203 | key/value pairs for commands like | |
204 | .CW build | |
205 | or | |
206 | .CW insert | |
207 | or takes keys for | |
208 | commands like | |
209 | .CW delete | |
210 | or | |
211 | .CW look . | |
212 | .P1 | |
213 | dbu <build|creat|look|insert|cat|delete> dbmfile | |
214 | .P2 | |
215 | .PP | |
216 | .CW dba | |
217 | is a crude analyzer of \fIdbm/sdbm/ndbm\fP | |
218 | page files. It scans the entire | |
219 | page file, reporting page level statistics, and totals at the end. | |
220 | .PP | |
221 | .CW dbd | |
222 | is a crude dump program for \fIdbm/ndbm/sdbm\fP | |
223 | databases. It ignores the | |
224 | bitmap, and dumps the data pages in sequence. It can be used to create | |
225 | input for the | |
226 | .CW dbu | |
227 | utility. | |
228 | Note that | |
229 | .CW dbd | |
230 | will skip any NULLs in the key and data | |
231 | fields, thus is unsuitable to convert some peculiar databases that | |
232 | insist in including the terminating null. | |
233 | .PP | |
234 | I have also included a copy of the | |
235 | .CW dbe | |
236 | (\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for | |
237 | your pleasure. You may find it more useful than the little | |
238 | .CW dbu | |
239 | utility. | |
240 | .PP | |
241 | .CW dbm.[ch] | |
242 | is a \fIdbm\fP library emulation on top of \fIndbm\fP | |
243 | (and hence suitable for \fIsdbm\fP). Written by Robert Elz. | |
244 | .PP | |
245 | The \fIsdbm\fP | |
246 | library has been around in beta test for quite a long time, and from whatever | |
247 | little feedback I received (maybe no news is good news), I believe it has been | |
248 | functioning without any significant problems. I would, of course, appreciate | |
249 | all fixes and/or improvements. Portability enhancements would especially be | |
250 | useful. | |
251 | .SH | |
252 | Implementation Issues | |
253 | .PP | |
254 | Hash functions: | |
255 | The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling | |
256 | hash function to be effective. I ran into a set of constants for a simple | |
257 | hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP | |
258 | for various inputs: | |
259 | .P1 | |
260 | /* | |
261 | * polynomial conversion ignoring overflows | |
262 | * 65599 nice. 65587 even better. | |
263 | */ | |
264 | long | |
265 | dbm_hash(char *str, int len) { | |
5aaab254 | 266 | unsigned long n = 0; |
463ee0b2 LW |
267 | |
268 | while (len--) | |
269 | n = n * 65599 + *str++; | |
270 | return n; | |
271 | } | |
272 | .P2 | |
273 | .PP | |
274 | There may be better hash functions for the purposes of dynamic hashing. | |
275 | Try your favorite, and check the pagefile. If it contains too many pages | |
276 | with too many holes, (in relation to this one for example) or if | |
277 | \fIsdbm\fP | |
278 | simply stops working (fails after | |
279 | .CW SPLTMAX | |
280 | attempts to split) when you feed your | |
281 | NEWS | |
282 | .CW history | |
283 | file to it, you probably do not have a good hashing function. | |
284 | If you do better (for different types of input), I would like to know | |
285 | about the function you use. | |
286 | .PP | |
287 | Block sizes: It seems (from various tests on a few machines) that a page | |
288 | file block size | |
289 | .CW PBLKSIZ | |
290 | of 1024 is by far the best for performance, but | |
291 | this also happens to limit the size of a key/value pair. Depending on your | |
292 | needs, you may wish to increase the page size, and also adjust | |
293 | .CW PAIRMAX | |
294 | (the maximum size of a key/value pair allowed: should always be at least | |
295 | three words smaller than | |
296 | .CW PBLKSIZ .) | |
297 | accordingly. The system-wide version of the library | |
298 | should probably be | |
299 | configured with 1024 (distribution default), as this appears to be sufficient | |
300 | for most common uses of \fIsdbm\fP. | |
301 | .SH | |
302 | Portability | |
303 | .PP | |
304 | This package has been tested in many different UN*Xes even including minix, | |
305 | and appears to be reasonably portable. This does not mean it will port | |
306 | easily to non-UN*X systems. | |
307 | .SH | |
308 | Notes and Miscellaneous | |
309 | .PP | |
310 | The \fIsdbm\fP is not a very complicated package, at least not after you | |
311 | familiarize yourself with the literature on external hashing. There are | |
312 | other interesting algorithms in existence that ensure (approximately) | |
313 | single-read access to a data value associated with any key. These are | |
314 | directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson | |
315 | variations), \fIspiral storage\fP [Mar79] or directory schemes such as | |
316 | \fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources | |
317 | provide a reasonable playground for experimentation with other algorithms. | |
318 | See the June 1988 issue of ACM Computing Surveys [Enb88] for an | |
319 | excellent overview of the field. | |
320 | .PG | |
321 | .SH | |
322 | References | |
323 | .LP | |
324 | .IP [Lar78] 4m | |
325 | P.-A. Larson, | |
882ce583 | 326 | "Dynamic Hashing", \fIBIT\fP, vol. 18, pp. 184-201, 1978. |
463ee0b2 LW |
327 | .IP [Tho90] 4m |
328 | Ken Thompson, \fIprivate communication\fP, Nov. 1990 | |
329 | .IP [Lit80] 4m | |
330 | W. Litwin, | |
6b0bcbb1 | 331 | "Linear Hashing: A new tool for file and table addressing", |
463ee0b2 LW |
332 | \fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP, |
333 | pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980. | |
334 | .IP [Fag79] 4m | |
335 | R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong, | |
882ce583 | 336 | "Extendible Hashing - A Fast Access Method for Dynamic Files", |
463ee0b2 LW |
337 | \fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979. |
338 | .IP [Wal84] 4m | |
339 | Rich Wales, | |
6b0bcbb1 | 340 | "Discussion of 'dbm' data base system", \fIUSENET newsgroup unix.wizards\fP, |
463ee0b2 LW |
341 | Jan. 1984. |
342 | .IP [Tor87] 4m | |
343 | Chris Torek, | |
882ce583 | 344 | "Re: dbm.a and ndbm.a archives", \fIUSENET newsgroup comp.unix\fP, |
463ee0b2 LW |
345 | 1987. |
346 | .IP [Mar79] 4m | |
347 | G. N. Martin, | |
882ce583 | 348 | "Spiral Storage: Incrementally Augmentable Hash Addressed Storage", |
463ee0b2 LW |
349 | \fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979. |
350 | .IP [Enb88] 4m | |
351 | R. J. Enbody and H. C. Du, | |
882ce583 | 352 | "Dynamic Hashing Schemes",\fIACM Computing Surveys\fP, |
463ee0b2 | 353 | vol. 20, no. 2, pp. 85-113, June 1988. |