This is a live mirror of the Perl 5 development currently hosted at https://github.com/perl/perl5
Remove another dead function in vms/vms.c.
[perl5.git] / ext / SDBM_File / readme.ms
CommitLineData
463ee0b2
LW
1.\" tbl | readme.ms | [tn]roff -ms | ...
2.\" note the "C" (courier) and "CB" fonts: you will probably have to
3.\" change these.
4.\" $Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $
5
6.de P1
7.br
8.nr dT 4
9.nf
10.ft C
11.sp .5
12.nr t \\n(dT*\\w'x'u
13.ta 1u*\\ntu 2u*\\ntu 3u*\\ntu 4u*\\ntu 5u*\\ntu 6u*\\ntu 7u*\\ntu 8u*\\ntu 9u*\\ntu 10u*\\ntu 11u*\\ntu 12u*\\ntu 13u*\\ntu 14u*\\ntu
14..
15.de P2
16.br
17.ft 1
18.br
19.sp .5
20.br
21.fi
22..
23.\" CW uses the typewriter/courier font.
24.de CW
25\fC\\$1\\fP\\$2
26..
27
28.\" Footnote numbering [by Henry Spencer]
29.\" <text>\*f for a footnote number..
30.\" .FS
31.\" \*F <footnote text>
32.\" .FE
33.\"
34.ds f \\u\\s-2\\n+f\\s+2\\d
35.nr f 0 1
36.ds F \\n+F.
37.nr F 0 1
38
39.ND
40.LP
41.TL
42\fIsdbm\fP \(em Substitute DBM
43.br
44or
45.br
46Berkeley \fIndbm\fP for Every UN*X\** Made Simple
47.AU
48Ozan (oz) Yigit
49.AI
50The Guild of PD Software Toolmakers
51Toronto - Canada
52.sp
53oz@nexus.yorku.ca
54.LP
55.FS
56UN*X is not a trademark of any (dis)organization.
57.FE
58.sp 2
59\fIImplementation is the sincerest form of flattery. \(em L. Peter Deutsch\fP
60.SH
61A The Clone of the \fIndbm\fP library
62.PP
63The sources accompanying this notice \(em \fIsdbm\fP \(em constitute
64the first public release (Dec. 1990) of a complete clone of
65the Berkeley UN*X \fIndbm\fP library. The \fIsdbm\fP library is meant to
66clone the proven functionality of \fIndbm\fP as closely as possible,
67including a few improvements. It is practical, easy to understand, and
68compatible.
69The \fIsdbm\fP library is not derived from any licensed, proprietary or
70copyrighted software.
71.PP
72The \fIsdbm\fP implementation is based on a 1978 algorithm
882ce583 73[Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing".
463ee0b2
LW
74In the course of searching for a substitute for \fIndbm\fP, I
75prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80]
76and ultimately chose Larson's algorithm as a basis of the \fIsdbm\fP
77implementation. The Bell Labs
78\fIdbm\fP (and therefore \fIndbm\fP) is based on an algorithm invented by
79Ken Thompson, [Tho90, Tor87] and predates Larson's work.
80.PP
81The \fIsdbm\fR programming interface is totally compatible
82with \fIndbm\fP and includes a slight improvement in database initialization.
83It is also expected to be binary-compatible under most UN*X versions that
84support the \fIndbm\fP library.
85.PP
86The \fIsdbm\fP implementation shares the shortcomings of the \fIndbm\fP
87library, as a side effect of various simplifications to the original Larson
88algorithm. It does produce \fIholes\fP in the page file as it writes
89pages past the end of file. (Larson's paper include a clever solution to
90this problem that is a result of using the hash value directly as a block
91address.) On the other hand, extensive tests seem to indicate that \fIsdbm\fP
92creates fewer holes in general, and the resulting pagefiles are
93smaller. The \fIsdbm\fP implementation is also faster than \fIndbm\fP
94in database creation.
95Unlike the \fIndbm\fP, the \fIsdbm\fP
96.CW store
882ce583 97operation will not "wander away" trying to split its
463ee0b2
LW
98data pages to insert a datum that \fIcannot\fP (due to elaborate worst-case
99situations) be inserted. (It will fail after a pre-defined number of attempts.)
100.SH
101Important Compatibility Warning
102.PP
103The \fIsdbm\fP and \fIndbm\fP
104libraries \fIcannot\fP share databases: one cannot read the (dir/pag)
105database created by the other. This is due to the differences
106between the \fIndbm\fP and \fIsdbm\fP algorithms\**,
107.FS
108Torek's discussion [Tor87]
109indicates that \fIdbm/ndbm\fP implementations use the hash
110value to traverse the radix trie differently than \fIsdbm\fP
111and as a result, the page indexes are generated in \fIdifferent\fP order.
112For more information, send e-mail to the author.
113.FE
114and the hash functions
115used.
116It is easy to convert between the \fIdbm/ndbm\fP databases and \fIsdbm\fP
117by ignoring the index completely: see
118.CW dbd ,
119.CW dbu
120etc.
121.R
122.LP
123.SH
124Notice of Intellectual Property
125.LP
126\fIThe entire\fP sdbm \fIlibrary package, as authored by me,\fP Ozan S. Yigit,
127\fIis hereby placed in the public domain.\fP As such, the author is not
128responsible for the consequences of use of this software, no matter how
129awful, even if they arise from defects in it. There is no expressed or
130implied warranty for the \fIsdbm\fP library.
131.PP
132Since the \fIsdbm\fP
133library package is in the public domain, this \fIoriginal\fP
134release or any additional public-domain releases of the modified original
135cannot possibly (by definition) be withheld from you. Also by definition,
136You (singular) have all the rights to this code (including the right to
137sell without permission, the right to hoard\**
138.FS
139You cannot really hoard something that is available to the public at
140large, but try if it makes you feel any better.
141.FE
142and the right to do other icky things as
143you see fit) but those rights are also granted to everyone else.
144.PP
145Please note that all previous distributions of this software contained
146a copyright (which is now dropped) to protect its
147origins and its current public domain status against any possible claims
148and/or challenges.
149.SH
150Acknowledgments
151.PP
152Many people have been very helpful and supportive. A partial list would
153necessarily include Rayan Zacherissen (who contributed the man page,
154and also hacked a MMAP version of \fIsdbm\fP),
155Arnold Robbins, Chris Lewis,
156Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started
157in the first place), Johannes Ruschein
158(who did the minix port) and David Tilbrook. I thank you all.
159.SH
160Distribution Manifest and Notes
161.LP
162This distribution of \fIsdbm\fP includes (at least) the following:
163.P1
164 CHANGES change log
165 README this file.
166 biblio a small bibliography on external hashing
167 dba.c a crude (n/s)dbm page file analyzer
168 dbd.c a crude (n/s)dbm page file dumper (for conversion)
169 dbe.1 man page for dbe.c
170 dbe.c Janick's database editor
171 dbm.c a dbm library emulation wrapper for ndbm/sdbm
172 dbm.h header file for the above
173 dbu.c a crude db management utility
174 hash.c hashing function
175 makefile guess.
176 pair.c page-level routines (posted earlier)
177 pair.h header file for the above
178 readme.ms troff source for the README file
179 sdbm.3 man page
180 sdbm.c the real thing
181 sdbm.h header file for the above
182 tune.h place for tuning & portability thingies
183 util.c miscellaneous
184.P2
185.PP
186.CW dbu
187is a simple database manipulation program\** that tries to look
188.FS
189The
190.CW dbd ,
191.CW dba ,
192.CW dbu
193utilities are quick hacks and are not fit for production use. They were
194developed late one night, just to test out \fIsdbm\fP, and convert some
195databases.
196.FE
197like Bell Labs'
198.CW cbt
199utility. It is currently incomplete in functionality.
200I use
201.CW dbu
202to test out the routines: it takes (from stdin) tab separated
203key/value pairs for commands like
204.CW build
205or
206.CW insert
207or takes keys for
208commands like
209.CW delete
210or
211.CW look .
212.P1
213 dbu <build|creat|look|insert|cat|delete> dbmfile
214.P2
215.PP
216.CW dba
217is a crude analyzer of \fIdbm/sdbm/ndbm\fP
218page files. It scans the entire
219page file, reporting page level statistics, and totals at the end.
220.PP
221.CW dbd
222is a crude dump program for \fIdbm/ndbm/sdbm\fP
223databases. It ignores the
224bitmap, and dumps the data pages in sequence. It can be used to create
225input for the
226.CW dbu
227utility.
228Note that
229.CW dbd
230will skip any NULLs in the key and data
231fields, thus is unsuitable to convert some peculiar databases that
232insist in including the terminating null.
233.PP
234I have also included a copy of the
235.CW dbe
236(\fIndbm\fP DataBase Editor) by Janick Bergeron [janick@bnr.ca] for
237your pleasure. You may find it more useful than the little
238.CW dbu
239utility.
240.PP
241.CW dbm.[ch]
242is a \fIdbm\fP library emulation on top of \fIndbm\fP
243(and hence suitable for \fIsdbm\fP). Written by Robert Elz.
244.PP
245The \fIsdbm\fP
246library has been around in beta test for quite a long time, and from whatever
247little feedback I received (maybe no news is good news), I believe it has been
248functioning without any significant problems. I would, of course, appreciate
249all fixes and/or improvements. Portability enhancements would especially be
250useful.
251.SH
252Implementation Issues
253.PP
254Hash functions:
255The algorithm behind \fIsdbm\fP implementation needs a good bit-scrambling
256hash function to be effective. I ran into a set of constants for a simple
257hash function that seem to help \fIsdbm\fP perform better than \fIndbm\fP
258for various inputs:
259.P1
260 /*
261 * polynomial conversion ignoring overflows
262 * 65599 nice. 65587 even better.
263 */
264 long
265 dbm_hash(char *str, int len) {
5aaab254 266 unsigned long n = 0;
463ee0b2
LW
267
268 while (len--)
269 n = n * 65599 + *str++;
270 return n;
271 }
272.P2
273.PP
274There may be better hash functions for the purposes of dynamic hashing.
275Try your favorite, and check the pagefile. If it contains too many pages
276with too many holes, (in relation to this one for example) or if
277\fIsdbm\fP
278simply stops working (fails after
279.CW SPLTMAX
280attempts to split) when you feed your
281NEWS
282.CW history
283file to it, you probably do not have a good hashing function.
284If you do better (for different types of input), I would like to know
285about the function you use.
286.PP
287Block sizes: It seems (from various tests on a few machines) that a page
288file block size
289.CW PBLKSIZ
290of 1024 is by far the best for performance, but
291this also happens to limit the size of a key/value pair. Depending on your
292needs, you may wish to increase the page size, and also adjust
293.CW PAIRMAX
294(the maximum size of a key/value pair allowed: should always be at least
295three words smaller than
296.CW PBLKSIZ .)
297accordingly. The system-wide version of the library
298should probably be
299configured with 1024 (distribution default), as this appears to be sufficient
300for most common uses of \fIsdbm\fP.
301.SH
302Portability
303.PP
304This package has been tested in many different UN*Xes even including minix,
305and appears to be reasonably portable. This does not mean it will port
306easily to non-UN*X systems.
307.SH
308Notes and Miscellaneous
309.PP
310The \fIsdbm\fP is not a very complicated package, at least not after you
311familiarize yourself with the literature on external hashing. There are
312other interesting algorithms in existence that ensure (approximately)
313single-read access to a data value associated with any key. These are
314directory-less schemes such as \fIlinear hashing\fP [Lit80] (+ Larson
315variations), \fIspiral storage\fP [Mar79] or directory schemes such as
316\fIextensible hashing\fP [Fag79] by Fagin et al. I do hope these sources
317provide a reasonable playground for experimentation with other algorithms.
318See the June 1988 issue of ACM Computing Surveys [Enb88] for an
319excellent overview of the field.
320.PG
321.SH
322References
323.LP
324.IP [Lar78] 4m
325P.-A. Larson,
882ce583 326"Dynamic Hashing", \fIBIT\fP, vol. 18, pp. 184-201, 1978.
463ee0b2
LW
327.IP [Tho90] 4m
328Ken Thompson, \fIprivate communication\fP, Nov. 1990
329.IP [Lit80] 4m
330W. Litwin,
6b0bcbb1 331"Linear Hashing: A new tool for file and table addressing",
463ee0b2
LW
332\fIProceedings of the 6th Conference on Very Large Dabatases (Montreal)\fP,
333pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980.
334.IP [Fag79] 4m
335R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong,
882ce583 336"Extendible Hashing - A Fast Access Method for Dynamic Files",
463ee0b2
LW
337\fIACM Trans. Database Syst.\fP, vol. 4, no.3, pp. 315-344, Sept. 1979.
338.IP [Wal84] 4m
339Rich Wales,
6b0bcbb1 340"Discussion of 'dbm' data base system", \fIUSENET newsgroup unix.wizards\fP,
463ee0b2
LW
341Jan. 1984.
342.IP [Tor87] 4m
343Chris Torek,
882ce583 344"Re: dbm.a and ndbm.a archives", \fIUSENET newsgroup comp.unix\fP,
463ee0b2
LW
3451987.
346.IP [Mar79] 4m
347G. N. Martin,
882ce583 348"Spiral Storage: Incrementally Augmentable Hash Addressed Storage",
463ee0b2
LW
349\fITechnical Report #27\fP, University of Varwick, Coventry, U.K., 1979.
350.IP [Enb88] 4m
351R. J. Enbody and H. C. Du,
882ce583 352"Dynamic Hashing Schemes",\fIACM Computing Surveys\fP,
463ee0b2 353vol. 20, no. 2, pp. 85-113, June 1988.