ext/SDBM_File/sdbm/readme.ms

   1 .\" tbl | readme.ms | [tn]roff -ms | ...
   2 .\" note the "C" (courier) and "CB" fonts: you will probably have to
   3 .\" change these.
   4 .\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $
   5
   6 .de P1
   7 .br
   8 .nr dT 4
   9 .nf
  10 .ft C
  11 .sp .5
  12 .nr t \\n(dT*\\w'x'u
  13 .ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu
  14 ..
  15 .de P2
  16 .br
  17 .ft 1
  18 .br
  19 .sp .5
  20 .br
  21 .fi
  22 ..
  23 .\" CW uses the typewriter/courier font.
  24 .de CW
  25 \fC\\$1\\fP\\$2
  26 ..
  27
  28 .\" Footnote numbering [by Henry Spencer]
  29 .\" <text>\*f for a footnote number..
  30 .\" .FS
  31 .\" \*F <footnote text>
  32 .\" .FE
  33 .\"
  34 .ds f \\u\\s-2\\n+f\\s+2\\d
  35 .nr f 0 1
  36 .ds F \\n+F.
  37 .nr F 0 1
  38
  39 .ND
  40 .LP
  41 .TL
  42 \fIsdbm\fP \(em Substitute DBM
  43 .br
  44 or
  45 .br
  46 Berkeley \fIndbm\fP for Every UN*X\** Made Simple
  47 .AU
  48 Ozan (oz) Yigit
  49 .AI
  50 The Guild of PD Software Toolmakers
  51 Toronto - Canada
  52 .sp
  53 oz@nexus.yorku.ca
  54 .LP
  55 .FS
  56 UN*X is not a trademark of any (dis)organization.
  57 .FE
  58 .sp 2
  59 \fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP
  60 .SH
  61 A The Clone of the \fIndbm\fP library
  62 .PP
  63 The sources accompanying this notice \(em \fIsdbm\fP \(em constitute
  64 the first public release (Dec. 1990) of a complete clone of
  65 the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to
  66 clone the proven functionality of \fIndbm\fP as closely as possible,
  67 including a few improvements. It is practical, easy to understand, and
  68 compatible.
  69 The \fIsdbm\fP library is not derived from any licensed, proprietary or
  70 copyrighted software.
  71 .PP
  72 The \fIsdbm\fP implementation is based on a 1978 algorithm
  73 [Lar78] by P.-A. (Paul) Larson known as ``Dynamic Hashing''.
  74 In the course of searching for a substitute for \fIndbm\fP, I
  75 prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80]
  76 and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP
  77 implementation. The Bell Labs
  78 \fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by
  79 Ken Thompson, [Tho90, Tor87] and predates Larson's work.
  80 .PP
  81 The \fIsdbm\fR programming interface is totally compatible
  82 with \fIndbm\fP and includes a slight improvement in database initialization.
  83 It is also expected to be binary-compatible under most UN*X versions that
  84 support the \fIndbm\fP library.
  85 .PP
  86 The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP
  87 library, as a side effect of various simplifications to the original Larson
  88 algorithm. It does produce \fIholes\fP in the page file as it writes
  89 pages past the end of file. (Larson's paper include a clever solution to
  90 this problem that is a result of using the hash value directly as a block
  91 address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP
  92 creates fewer holes in general, and the resulting pagefiles are
  93 smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP
  94 in database creation.
  95 Unlike the \fIndbm\fP, the \fIsdbm\fP
  96 .CW store
  97 operation will not ``wander away'' trying to split its
  98 data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case
  99 situations) be inserted. (It will fail after a pre-defined number of attempts.)
 100 .SH
 101 Important Compatibility Warning
 102 .PP
 103 The \fIsdbm\fP and \fIndbm\fP
 104 libraries \fIcannot\fP share databases: one cannot read the (dir/pag)
 105 database created by the other. This is due to the differences
 106 between the \fIndbm\fP and \fIsdbm\fP algorithms\**,
 107 .FS
 108 Torek's discussion [Tor87]
 109 indicates that \fIdbm/ndbm\fP implementations use the hash
 110 value to traverse the radix trie differently than \fIsdbm\fP
 111 and as a result, the page indexes are generated in \fIdifferent\fP order.
 112 For more information, send e-mail to the author.
 113 .FE
 114 and the hash functions
 115 used.
 116 It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP
 117 by ignoring the index completely: see
 118 .CW dbd ,
 119 .CW dbu
 120 etc.
 121 .R
 122 .LP
 123 .SH
 124 Notice of Intellectual Property
 125 .LP
 126 \fIThe entire\fP sdbm  \fIlibrary package, as authored by me,\fP Ozan S. Yigit,
 127 \fIis hereby placed in the public domain.\fP As such, the author is not
 128 responsible for the consequences of use of this software, no matter how
 129 awful, even if they arise from defects in it. There is no expressed or
 130 implied warranty for the \fIsdbm\fP library.
 131 .PP
 132 Since the \fIsdbm\fP
 133 library package is in the public domain, this \fIoriginal\fP
 134 release or any additional public-domain releases of the modified original
 135 cannot possibly (by definition) be withheld from you. Also by definition,
 136 You (singular) have all the rights to this code (including the right to
 137 sell without permission, the right to hoard\**
 138 .FS
 139 You cannot really hoard something that is available to the public at
 140 large, but try if it makes you feel any better.
 141 .FE
 142 and the right to do other icky things as
 143 you see fit) but those rights are also granted to everyone else.
 144 .PP
 145 Please note that all previous distributions of this software contained
 146 a copyright (which is now dropped) to protect its
 147 origins and its current public domain status against any possible claims
 148 and/or challenges.
 149 .SH
 150 Acknowledgments
 151 .PP
 152 Many people have been very helpful and supportive.  A partial list would
 153 necessarily include Rayan Zacherissen (who contributed the man page,
 154 and also hacked a MMAP version of \fIsdbm\fP),
 155 Arnold Robbins, Chris Lewis,
 156 Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started
 157 in the first place), Johannes Ruschein
 158 (who did the minix port) and David Tilbrook. I thank you all.
 159 .SH
 160 Distribution Manifest and Notes
 161 .LP
 162 This distribution of \fIsdbm\fP includes (at least) the following:
 163 .P1
 164         CHANGES         change log
 165         README          this file.
 166         biblio          a small bibliography on external hashing
 167         dba.c           a crude (n/s)dbm page file analyzer
 168         dbd.c           a crude (n/s)dbm page file dumper (for conversion)
 169         dbe.1           man page for dbe.c
 170         dbe.c           Janick's database editor
 171         dbm.c           a dbm library emulation wrapper for ndbm/sdbm
 172         dbm.h           header file for the above
 173         dbu.c           a crude db management utility
 174         hash.c          hashing function
 175         makefile        guess.
 176         pair.c          page-level routines (posted earlier)
 177         pair.h          header file for the above
 178         readme.ms       troff source for the README file
 179         sdbm.3          man page
 180         sdbm.c          the real thing
 181         sdbm.h          header file for the above
 182         tune.h          place for tuning & portability thingies
 183         util.c          miscellaneous
 184 .P2
 185 .PP
 186 .CW dbu
 187 is a simple database manipulation program\** that tries to look
 188 .FS
 189 The
 190 .CW dbd ,
 191 .CW dba ,
 192 .CW dbu
 193 utilities are quick hacks and are not fit for production use. They were
 194 developed late one night, just to test out \fIsdbm\fP, and convert some
 195 databases.
 196 .FE
 197 like Bell Labs'
 198 .CW cbt
 199 utility. It is currently incomplete in functionality.
 200 I use
 201 .CW dbu
 202 to test out the routines: it takes (from stdin) tab separated
 203 key/value pairs for commands like
 204 .CW build
 205 or
 206 .CW insert
 207 or takes keys for
 208 commands like
 209 .CW delete
 210 or
 211 .CW look .
 212 .P1
 213         dbu <build|creat|look|insert|cat|delete> dbmfile
 214 .P2
 215 .PP
 216 .CW dba
 217 is a crude analyzer of \fIdbm/sdbm/ndbm\fP
 218 page files. It scans the entire
 219 page file, reporting page level statistics, and totals at the end.
 220 .PP
 221 .CW dbd
 222 is a crude dump program for \fIdbm/ndbm/sdbm\fP
 223 databases. It ignores the
 224 bitmap, and dumps the data pages in sequence. It can be used to create
 225 input for the
 226 .CW dbu
 227 utility.
 228 Note that
 229 .CW dbd
 230 will skip any NULLs in the key and data
 231 fields, thus is unsuitable to convert some peculiar databases that
 232 insist in including the terminating null.
 233 .PP
 234 I have also included a copy of the
 235 .CW dbe
 236 (\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for
 237 your pleasure. You may find it more useful than the little
 238 .CW dbu
 239 utility.
 240 .PP
 241 .CW dbm.[ch]
 242 is a \fIdbm\fP library emulation on top of \fIndbm\fP
 243 (and hence suitable for \fIsdbm\fP). Written by Robert Elz.
 244 .PP
 245 The \fIsdbm\fP
 246 library has been around in beta test for quite a long time, and from whatever
 247 little feedback I received (maybe no news is good news), I believe it has been
 248 functioning without any significant problems. I would, of course, appreciate
 249 all fixes and/or improvements. Portability enhancements would especially be
 250 useful.
 251 .SH
 252 Implementation Issues
 253 .PP
 254 Hash functions:
 255 The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling
 256 hash function to be effective. I ran into a set of constants for a simple
 257 hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP
 258 for various inputs:
 259 .P1
 260         /*
 261          * polynomial conversion ignoring overflows
 262          * 65599 nice. 65587 even better.
 263          */
 264         long
 265         dbm_hash(char *str, int len) {
 266                 register unsigned long n = 0;
 267
 268                 while (len--)
 269                         n = n * 65599 + *str++;
 270                 return n;
 271         }
 272 .P2
 273 .PP
 274 There may be better hash functions for the purposes of dynamic hashing.
 275 Try your favorite, and check the pagefile. If it contains too many pages
 276 with too many holes, (in relation to this one for example) or if
 277 \fIsdbm\fP
 278 simply stops working (fails after
 279 .CW SPLTMAX
 280 attempts to split) when you feed your
 281 NEWS
 282 .CW history
 283 file to it, you probably do not have a good hashing function.
 284 If you do better (for different types of input), I would like to know
 285 about the function you use.
 286 .PP
 287 Block sizes: It seems (from various tests on a few machines) that a page
 288 file block size
 289 .CW PBLKSIZ
 290 of 1024 is by far the best for performance, but
 291 this also happens to limit the size of a key/value pair. Depending on your
 292 needs, you may wish to increase the page size, and also adjust
 293 .CW PAIRMAX
 294 (the maximum size of a key/value pair allowed: should always be at least
 295 three words smaller than
 296 .CW PBLKSIZ .)
 297 accordingly. The system-wide version of the library
 298 should probably be
 299 configured with 1024 (distribution default), as this appears to be sufficient
 300 for most common uses of \fIsdbm\fP.
 301 .SH
 302 Portability
 303 .PP
 304 This package has been tested in many different UN*Xes even including minix,
 305 and appears to be reasonably portable. This does not mean it will port
 306 easily to non-UN*X systems.
 307 .SH
 308 Notes and Miscellaneous
 309 .PP
 310 The \fIsdbm\fP is not a very complicated package, at least not after you
 311 familiarize yourself with the literature on external hashing. There are
 312 other interesting algorithms in existence that ensure (approximately)
 313 single-read access to a data value associated with any key. These are
 314 directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson
 315 variations), \fIspiral storage\fP [Mar79] or directory schemes such as
 316 \fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources
 317 provide a reasonable playground for experimentation with other algorithms.
 318 See the June 1988 issue of ACM Computing Surveys [Enb88] for an
 319 excellent overview of the field.
 320 .PG
 321 .SH
 322 References
 323 .LP
 324 .IP [Lar78] 4m
 325 P.-A. Larson,
 326 ``Dynamic Hashing'', \fIBIT\fP, vol.  18,  pp. 184-201, 1978.
 327 .IP [Tho90] 4m
 328 Ken Thompson, \fIprivate communication\fP, Nov. 1990
 329 .IP [Lit80] 4m
 330 W. Litwin,
 331 `` Linear Hashing: A new tool  for  file  and table addressing'',
 332 \fIProceedings of the 6th Conference on Very Large  Dabatases  (Montreal)\fP,
 333 pp.  212-223,  Very Large Database Foundation, Saratoga, Calif., 1980.
 334 .IP [Fag79] 4m
 335 R. Fagin, J.  Nievergelt,  N.  Pippinger,  and  H.  R. Strong,
 336 ``Extendible Hashing - A Fast Access Method for Dynamic Files'',
 337 \fIACM Trans. Database Syst.\fP, vol. 4,  no.3, pp. 315-344, Sept. 1979.
 338 .IP [Wal84] 4m
 339 Rich Wales,
 340 ``Discussion of "dbm" data base system'', \fIUSENET newsgroup unix.wizards\fP,
 341 Jan. 1984.
 342 .IP [Tor87] 4m
 343 Chris Torek,
 344 ``Re:  dbm.a  and  ndbm.a  archives'', \fIUSENET newsgroup comp.unix\fP,
 345 1987.
 346 .IP [Mar79] 4m
 347 G. N. Martin,
 348 ``Spiral Storage: Incrementally  Augmentable  Hash  Addressed  Storage'',
 349 \fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979.
 350 .IP [Enb88] 4m
 351 R. J. Enbody and H. C. Du,
 352 ``Dynamic Hashing  Schemes'',\fIACM Computing Surveys\fP,
 353 vol. 20, no. 2, pp. 85-113, June 1988.