KB#00009-Comparison between MKEYED and DIRECT files
Comparison between MKEYED and DIRECT files
The questions seem to center around the pros and cons of DIRECT files with companion cross reference SORT files versus multi-keyed MKEYED files. We'd like to offer the following information, so that you may make the best decision for your application.
Functionally, the MKEYED file and a DIRECT file with multiple companion cross-reference SORT files can achieve the same purpose. The multiple keys associated with the single file in both cases make it possible to locate the same record using several different starting points. For example, in an airline reservation system using a multi-keyed file, the same passenger record could be reached by passenger name, flight number, date of flight, and so forth.
What Are They?
The BBx DIRECT file is something of a hybrid file type since it uses both a hash table and a single index file structure for one key. The single index file structure allows sequential access to the file. The hash table provides random access to the index structure which in turn leads to the desired record. The data records themselves are physically stored in an indeterminate order dependent on the add/delete sequence of the records in the file.
The BBx DIRECT file is functionally a combination of the BBx SORT and BBx INDEXED file types where the SORT file contains the ordered set of keys (and thus can be considered as being an index file) and the INDEXED file contains the data records. A DIRECT file can in fact be emulated using a SORT file and an INDEXED file, although the application must perform for itself the data management provided automatically by use of the DIRECT file.
BBx DIRECT/SORT file records can be accessed sequentially by record number, although use of this capability is not recommended in a production application since the position of a record number and its key in the file is indeterminate. DIRECT/SORT files have fixed file sizes which must be specified when the file is defined and disk space is pre-allocated.
The BBx MKEYED file is a dynamic multi-level indexed file which employs keys (identifiable fields from within a data record) stored in B-trees, one for each key, as the index data structures. The hierarchical B-tree data structures provide very rapid and efficient searches, insertions, and deletions of records in a file under most conditions. The data records themselves are stored in an indeterminate order in the file structure, i.e. without any particular organization. All data organization is supplied by the file indices.
BBx MKEYED file records are only accessible via their keys, files may grow dynamically, and up to 16 key segments per record may be specified.
DIRECT files are well suited to applications where the data attribute requirements are fairly stable, where the file volatility is moderate, and where the file data generally need be accessed only along the single dimension provided by the single key.
MKEYED files are appropriate when a data set is dynamic (has relatively frequent adding and deleting of records) and volatile (has relatively frequent updates to existing records) and when a files application requires the relatively high flexibility afforded by the ability to maintain multiple views of the same data set.
MKEYED files are relatively inefficient for sequential or exhaustive scans and may, although not necessarily, require substantial disk space, relative to their file size, for large data sets.
A good rule of thumb is: The more volatile the file or the greater the need for flexibility in its application, the more appropriate an MKEYED file is for the application. The more static the file, or if regular sequential or exhaustive access is needed, the more likely that a DIRECT/SORT file is the better choice.
Application Development And Maintenance Considerations
Defining a file to be a dynamic MKEYED file can save substantial amounts of development and maintenance time if the size of the file is likely to change. The fact that a DIRECT file's size is fixed at the time of its creation and definition can create unnecessary maintenance problems during the life of the application if careful thought is not given to the possible long term growth of the file.
Multiple views of the data in a DIRECT file can be accomplished by using multiple SORT files for different fields in the DIRECT file. The downside of this approach, however, is that the application software must absorb the overhead of maintaining the integrity of the data set comprised of the various SORT files and the DIRECT file. MKEYED files accomplish this overhead automatically, affording better application responsiveness and reducing the number of lines of code which must be written and maintained. In addition, the DIRECT/SORT file approach requires a file handle, file OPEN/CLOSE, etc. for each file, thus consuming more system resources than the single MKEYED file.
Unfortunately, file damage and corruption can and does happen. Moreover, according to most authoritative sources on probability, it can be expected to happen in accordance with one or more of the various corollaries to Murphy's Theorem, i.e., at the worst possible time, such as in the middle of the payroll run.
Damaged MKEYED files, because of their internal complexity, are harder to reconstruct from the smoking remains than damaged DIRECT/SORT files. A utility, "_fix", is available that can (usually, and if the structure of the damaged file is known, and if care is taken) repair a damaged MKEYED file with minimum record loss. It is available in the BASIS knowledge base for faxing.
Converting From DIRECT To MKEYED
Most DIRECT and SORT files can be converted to a single key MKEYED file without any modification. However, due to the nature of MKEYED files, IND= is illegal. These program references will need to be changed.
Converting from a DIRECT file with companion SORT cross references to an MKEYED file will take some program modification, but the gain in probability of maintaining data integrity, the reduced amount of code to develop and maintain, and the reduced disk space can make it worth the effort.
A single WRITE to an MKEYED file will create a record and all its cross reference keys (based on the definition of the MKEYED file). A single REMOVE will delete all of the related keys. A change to a field will correct all affected keys with a single WRITE to update the record. With MKEYED files, programs no longer need a 'KEY=' on the WRITE, as MKEYED files handle this by definition. The SORT file references must be changed to use the master file channel, and selection of the needed key chain is done with the 'KNUM=' clause. All those WRITES to the cross reference files can be eliminated (by
the way, that by itself will save a few cpu clock ticks).
When changing a DIRECT file with companion cross reference SORT files to an MKEYED file, consider adding alternate keys that can eliminate the sorts required to produce commonly used reports.
Hopefully, this has cleared up some of the confusion and will aid you in using one of the most powerful capabilities of BBx, the MKEYED file.
Q: What is the calculation to determine how much disk space will be used by a multi-keyed MKEYED file which has a fixed record size and a fixed number of records? As records are written to the file, the size increases, even though it was created with a fixed number of records (not 0, i.e., dynamic).
A: An MKEYED file with a fixed number of records will still allocate the key area dynamically. This is because it is impossible to predict how much disk space is necessary to accommodate a specific number of keys with any degree of accuracy.
Generally, the data area is approximately NUMBER_OF_RECORDS * RECORD_SIZE bytes. The key area is more complicated. Keys are allocated in 512-byte blocks; with BBx using 1024-byte blocks if any key in the file is >64 bytes long. Each key has an 8-byte overhead. A key block has an additional 5-bytes overhead. The largest number of keys that a block will contain must be an even number. At any given point in time, a key block will
be from 50% full to 100% full.
With this in mind, you may calculate the number of keys in a key block by using the number of bytes per key divided into the block size minus 5 bytes. Taking the integer result, round down to the next smaller even value if odd. This value is the maximum number of keys in each keyblock in the file.
To determine the minimum number of key blocks necessary to hold N keys you divide the number of records in the file by the maximum number of keys per block, derived above. This will give you the *minimum* number of blocks necessary to hold N keys. However, since a keyblock is only guaranteed to be at least 50% full, that could potentially double the number of blocks necessary to hold N keys (in the worst case).
Last Modified: 12/17/1997 Product: PRO/5 Operating System: All platforms
BASIS structures five components of their technology into the BBx Generations.