| 1 | .\" tbl | readme.ms | [tn]roff -ms | ... |
| 2 | .\" note the "C" (courier) and "CB" fonts: you will probably have to |
| 3 | .\" change these. |
| 4 | .\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $ |
| 5 | |
| 6 | .de P1 |
| 7 | .br |
| 8 | .nr dT 4 |
| 9 | .nf |
| 10 | .ft C |
| 11 | .sp .5 |
| 12 | .nr t \\n(dT*\\w'x'u |
| 13 | .ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu |
| 14 | .. |
| 15 | .de P2 |
| 16 | .br |
| 17 | .ft 1 |
| 18 | .br |
| 19 | .sp .5 |
| 20 | .br |
| 21 | .fi |
| 22 | .. |
| 23 | .\" CW uses the typewriter/courier font. |
| 24 | .de CW |
| 25 | \fC\\$1\\fP\\$2 |
| 26 | .. |
| 27 | |
| 28 | .\" Footnote numbering [by Henry Spencer] |
| 29 | .\" <text>\*f for a footnote number.. |
| 30 | .\" .FS |
| 31 | .\" \*F <footnote text> |
| 32 | .\" .FE |
| 33 | .\" |
| 34 | .ds f \\u\\s-2\\n+f\\s+2\\d |
| 35 | .nr f 0 1 |
| 36 | .ds F \\n+F. |
| 37 | .nr F 0 1 |
| 38 | |
| 39 | .ND |
| 40 | .LP |
| 41 | .TL |
| 42 | \fIsdbm\fP \(em Substitute DBM |
| 43 | .br |
| 44 | or |
| 45 | .br |
| 46 | Berkeley \fIndbm\fP for Every UN*X\** Made Simple |
| 47 | .AU |
| 48 | Ozan (oz) Yigit |
| 49 | .AI |
| 50 | The Guild of PD Software Toolmakers |
| 51 | Toronto - Canada |
| 52 | .sp |
| 53 | oz@nexus.yorku.ca |
| 54 | .LP |
| 55 | .FS |
| 56 | UN*X is not a trademark of any (dis)organization. |
| 57 | .FE |
| 58 | .sp 2 |
| 59 | \fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP |
| 60 | .SH |
| 61 | A The Clone of the \fIndbm\fP library |
| 62 | .PP |
| 63 | The sources accompanying this notice \(em \fIsdbm\fP \(em constitute |
| 64 | the first public release (Dec. 1990) of a complete clone of |
| 65 | the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to |
| 66 | clone the proven functionality of \fIndbm\fP as closely as possible, |
| 67 | including a few improvements. It is practical, easy to understand, and |
| 68 | compatible. |
| 69 | The \fIsdbm\fP library is not derived from any licensed, proprietary or |
| 70 | copyrighted software. |
| 71 | .PP |
| 72 | The \fIsdbm\fP implementation is based on a 1978 algorithm |
| 73 | [Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". |
| 74 | In the course of searching for a substitute for \fIndbm\fP, I |
| 75 | prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80] |
| 76 | and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP |
| 77 | implementation. The Bell Labs |
| 78 | \fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by |
| 79 | Ken Thompson, [Tho90, Tor87] and predates Larson's work. |
| 80 | .PP |
| 81 | The \fIsdbm\fR programming interface is totally compatible |
| 82 | with \fIndbm\fP and includes a slight improvement in database initialization. |
| 83 | It is also expected to be binary-compatible under most UN*X versions that |
| 84 | support the \fIndbm\fP library. |
| 85 | .PP |
| 86 | The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP |
| 87 | library, as a side effect of various simplifications to the original Larson |
| 88 | algorithm. It does produce \fIholes\fP in the page file as it writes |
| 89 | pages past the end of file. (Larson's paper include a clever solution to |
| 90 | this problem that is a result of using the hash value directly as a block |
| 91 | address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP |
| 92 | creates fewer holes in general, and the resulting pagefiles are |
| 93 | smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP |
| 94 | in database creation. |
| 95 | Unlike the \fIndbm\fP, the \fIsdbm\fP |
| 96 | .CW store |
| 97 | operation will not "wander away" trying to split its |
| 98 | data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case |
| 99 | situations) be inserted. (It will fail after a pre-defined number of attempts.) |
| 100 | .SH |
| 101 | Important Compatibility Warning |
| 102 | .PP |
| 103 | The \fIsdbm\fP and \fIndbm\fP |
| 104 | libraries \fIcannot\fP share databases: one cannot read the (dir/pag) |
| 105 | database created by the other. This is due to the differences |
| 106 | between the \fIndbm\fP and \fIsdbm\fP algorithms\**, |
| 107 | .FS |
| 108 | Torek's discussion [Tor87] |
| 109 | indicates that \fIdbm/ndbm\fP implementations use the hash |
| 110 | value to traverse the radix trie differently than \fIsdbm\fP |
| 111 | and as a result, the page indexes are generated in \fIdifferent\fP order. |
| 112 | For more information, send e-mail to the author. |
| 113 | .FE |
| 114 | and the hash functions |
| 115 | used. |
| 116 | It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP |
| 117 | by ignoring the index completely: see |
| 118 | .CW dbd , |
| 119 | .CW dbu |
| 120 | etc. |
| 121 | .R |
| 122 | .LP |
| 123 | .SH |
| 124 | Notice of Intellectual Property |
| 125 | .LP |
| 126 | \fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit, |
| 127 | \fIis hereby placed in the public domain.\fP As such, the author is not |
| 128 | responsible for the consequences of use of this software, no matter how |
| 129 | awful, even if they arise from defects in it. There is no expressed or |
| 130 | implied warranty for the \fIsdbm\fP library. |
| 131 | .PP |
| 132 | Since the \fIsdbm\fP |
| 133 | library package is in the public domain, this \fIoriginal\fP |
| 134 | release or any additional public-domain releases of the modified original |
| 135 | cannot possibly (by definition) be withheld from you. Also by definition, |
| 136 | You (singular) have all the rights to this code (including the right to |
| 137 | sell without permission, the right to hoard\** |
| 138 | .FS |
| 139 | You cannot really hoard something that is available to the public at |
| 140 | large, but try if it makes you feel any better. |
| 141 | .FE |
| 142 | and the right to do other icky things as |
| 143 | you see fit) but those rights are also granted to everyone else. |
| 144 | .PP |
| 145 | Please note that all previous distributions of this software contained |
| 146 | a copyright (which is now dropped) to protect its |
| 147 | origins and its current public domain status against any possible claims |
| 148 | and/or challenges. |
| 149 | .SH |
| 150 | Acknowledgments |
| 151 | .PP |
| 152 | Many people have been very helpful and supportive. A partial list would |
| 153 | necessarily include Rayan Zacherissen (who contributed the man page, |
| 154 | and also hacked a MMAP version of \fIsdbm\fP), |
| 155 | Arnold Robbins, Chris Lewis, |
| 156 | Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started |
| 157 | in the first place), Johannes Ruschein |
| 158 | (who did the minix port) and David Tilbrook. I thank you all. |
| 159 | .SH |
| 160 | Distribution Manifest and Notes |
| 161 | .LP |
| 162 | This distribution of \fIsdbm\fP includes (at least) the following: |
| 163 | .P1 |
| 164 | CHANGES change log |
| 165 | README this file. |
| 166 | biblio a small bibliography on external hashing |
| 167 | dba.c a crude (n/s)dbm page file analyzer |
| 168 | dbd.c a crude (n/s)dbm page file dumper (for conversion) |
| 169 | dbe.1 man page for dbe.c |
| 170 | dbe.c Janick's database editor |
| 171 | dbm.c a dbm library emulation wrapper for ndbm/sdbm |
| 172 | dbm.h header file for the above |
| 173 | dbu.c a crude db management utility |
| 174 | hash.c hashing function |
| 175 | makefile guess. |
| 176 | pair.c page-level routines (posted earlier) |
| 177 | pair.h header file for the above |
| 178 | readme.ms troff source for the README file |
| 179 | sdbm.3 man page |
| 180 | sdbm.c the real thing |
| 181 | sdbm.h header file for the above |
| 182 | tune.h place for tuning & portability thingies |
| 183 | util.c miscellaneous |
| 184 | .P2 |
| 185 | .PP |
| 186 | .CW dbu |
| 187 | is a simple database manipulation program\** that tries to look |
| 188 | .FS |
| 189 | The |
| 190 | .CW dbd , |
| 191 | .CW dba , |
| 192 | .CW dbu |
| 193 | utilities are quick hacks and are not fit for production use. They were |
| 194 | developed late one night, just to test out \fIsdbm\fP, and convert some |
| 195 | databases. |
| 196 | .FE |
| 197 | like Bell Labs' |
| 198 | .CW cbt |
| 199 | utility. It is currently incomplete in functionality. |
| 200 | I use |
| 201 | .CW dbu |
| 202 | to test out the routines: it takes (from stdin) tab separated |
| 203 | key/value pairs for commands like |
| 204 | .CW build |
| 205 | or |
| 206 | .CW insert |
| 207 | or takes keys for |
| 208 | commands like |
| 209 | .CW delete |
| 210 | or |
| 211 | .CW look . |
| 212 | .P1 |
| 213 | dbu <build|creat|look|insert|cat|delete> dbmfile |
| 214 | .P2 |
| 215 | .PP |
| 216 | .CW dba |
| 217 | is a crude analyzer of \fIdbm/sdbm/ndbm\fP |
| 218 | page files. It scans the entire |
| 219 | page file, reporting page level statistics, and totals at the end. |
| 220 | .PP |
| 221 | .CW dbd |
| 222 | is a crude dump program for \fIdbm/ndbm/sdbm\fP |
| 223 | databases. It ignores the |
| 224 | bitmap, and dumps the data pages in sequence. It can be used to create |
| 225 | input for the |
| 226 | .CW dbu |
| 227 | utility. |
| 228 | Note that |
| 229 | .CW dbd |
| 230 | will skip any NULLs in the key and data |
| 231 | fields, thus is unsuitable to convert some peculiar databases that |
| 232 | insist in including the terminating null. |
| 233 | .PP |
| 234 | I have also included a copy of the |
| 235 | .CW dbe |
| 236 | (\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for |
| 237 | your pleasure. You may find it more useful than the little |
| 238 | .CW dbu |
| 239 | utility. |
| 240 | .PP |
| 241 | .CW dbm.[ch] |
| 242 | is a \fIdbm\fP library emulation on top of \fIndbm\fP |
| 243 | (and hence suitable for \fIsdbm\fP). Written by Robert Elz. |
| 244 | .PP |
| 245 | The \fIsdbm\fP |
| 246 | library has been around in beta test for quite a long time, and from whatever |
| 247 | little feedback I received (maybe no news is good news), I believe it has been |
| 248 | functioning without any significant problems. I would, of course, appreciate |
| 249 | all fixes and/or improvements. Portability enhancements would especially be |
| 250 | useful. |
| 251 | .SH |
| 252 | Implementation Issues |
| 253 | .PP |
| 254 | Hash functions: |
| 255 | The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling |
| 256 | hash function to be effective. I ran into a set of constants for a simple |
| 257 | hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP |
| 258 | for various inputs: |
| 259 | .P1 |
| 260 | /* |
| 261 | * polynomial conversion ignoring overflows |
| 262 | * 65599 nice. 65587 even better. |
| 263 | */ |
| 264 | long |
| 265 | dbm_hash(char *str, int len) { |
| 266 | unsigned long n = 0; |
| 267 | |
| 268 | while (len--) |
| 269 | n = n * 65599 + *str++; |
| 270 | return n; |
| 271 | } |
| 272 | .P2 |
| 273 | .PP |
| 274 | There may be better hash functions for the purposes of dynamic hashing. |
| 275 | Try your favorite, and check the pagefile. If it contains too many pages |
| 276 | with too many holes, (in relation to this one for example) or if |
| 277 | \fIsdbm\fP |
| 278 | simply stops working (fails after |
| 279 | .CW SPLTMAX |
| 280 | attempts to split) when you feed your |
| 281 | NEWS |
| 282 | .CW history |
| 283 | file to it, you probably do not have a good hashing function. |
| 284 | If you do better (for different types of input), I would like to know |
| 285 | about the function you use. |
| 286 | .PP |
| 287 | Block sizes: It seems (from various tests on a few machines) that a page |
| 288 | file block size |
| 289 | .CW PBLKSIZ |
| 290 | of 1024 is by far the best for performance, but |
| 291 | this also happens to limit the size of a key/value pair. Depending on your |
| 292 | needs, you may wish to increase the page size, and also adjust |
| 293 | .CW PAIRMAX |
| 294 | (the maximum size of a key/value pair allowed: should always be at least |
| 295 | three words smaller than |
| 296 | .CW PBLKSIZ .) |
| 297 | accordingly. The system-wide version of the library |
| 298 | should probably be |
| 299 | configured with 1024 (distribution default), as this appears to be sufficient |
| 300 | for most common uses of \fIsdbm\fP. |
| 301 | .SH |
| 302 | Portability |
| 303 | .PP |
| 304 | This package has been tested in many different UN*Xes even including minix, |
| 305 | and appears to be reasonably portable. This does not mean it will port |
| 306 | easily to non-UN*X systems. |
| 307 | .SH |
| 308 | Notes and Miscellaneous |
| 309 | .PP |
| 310 | The \fIsdbm\fP is not a very complicated package, at least not after you |
| 311 | familiarize yourself with the literature on external hashing. There are |
| 312 | other interesting algorithms in existence that ensure (approximately) |
| 313 | single-read access to a data value associated with any key. These are |
| 314 | directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson |
| 315 | variations), \fIspiral storage\fP [Mar79] or directory schemes such as |
| 316 | \fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources |
| 317 | provide a reasonable playground for experimentation with other algorithms. |
| 318 | See the June 1988 issue of ACM Computing Surveys [Enb88] for an |
| 319 | excellent overview of the field. |
| 320 | .PG |
| 321 | .SH |
| 322 | References |
| 323 | .LP |
| 324 | .IP [Lar78] 4m |
| 325 | P.-A. Larson, |
| 326 | "Dynamic Hashing", \fIBIT\fP, vol. 18, pp. 184-201, 1978. |
| 327 | .IP [Tho90] 4m |
| 328 | Ken Thompson, \fIprivate communication\fP, Nov. 1990 |
| 329 | .IP [Lit80] 4m |
| 330 | W. Litwin, |
| 331 | "Linear Hashing: A new tool for file and table addressing", |
| 332 | \fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP, |
| 333 | pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980. |
| 334 | .IP [Fag79] 4m |
| 335 | R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong, |
| 336 | "Extendible Hashing - A Fast Access Method for Dynamic Files", |
| 337 | \fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979. |
| 338 | .IP [Wal84] 4m |
| 339 | Rich Wales, |
| 340 | "Discussion of 'dbm' data base system", \fIUSENET newsgroup unix.wizards\fP, |
| 341 | Jan. 1984. |
| 342 | .IP [Tor87] 4m |
| 343 | Chris Torek, |
| 344 | "Re: dbm.a and ndbm.a archives", \fIUSENET newsgroup comp.unix\fP, |
| 345 | 1987. |
| 346 | .IP [Mar79] 4m |
| 347 | G. N. Martin, |
| 348 | "Spiral Storage: Incrementally Augmentable Hash Addressed Storage", |
| 349 | \fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979. |
| 350 | .IP [Enb88] 4m |
| 351 | R. J. Enbody and H. C. Du, |
| 352 | "Dynamic Hashing Schemes",\fIACM Computing Surveys\fP, |
| 353 | vol. 20, no. 2, pp. 85-113, June 1988. |