The Limited Additionmerge Sort

One way to build a sorted file is to limit the additions to the file to a small number of records. This filing technique illustrates the minimal disk 1-0 virtue. It is useful when only a small fraction of a file is out of order. The small number of records that are added to the file are kept in memory before they are added to the file. (In fact, the file need not be opened until the records to be added are accumulated and sorted.) After 10 or so records are accumulated in an array, they are sorted in memory and then merged into the file itself. It is easier to merge than to sort, because the position of the records being merged can be determined by one pass through the sorted array of additions and one read-pass through the file. The safest way to add records and to recreate the file is to create a sorted "scratch" file and then either to recopy the scratch file onto the original file space or to keep the scratch file as the new version of the sorted file. The programming sequence is summarized as follows..

1. Read the data to be added into an array A$.

2. Sort the array AS.

3. Open the sorted file.

4. Read the sorted file and the sorted array A$ one record at a time and file the lesser in a SCRATCH file.

5. Copy the SCRATCH file over the original file or rename the SCRATCH file and keep it as the newest version of the data.

This technique results in slow data entry, but it keeps the user from having to face the problem of sorting an entire file. To build the file, one might limit the number of additions to say 10 records or less. Even if the file was 250 bytes long, all 10 records could be kept in memory and sorted. When the first 10 records had been used to create the original file, the next 10 would be merged with the first 10. Each time, the new data is read into memory, sorted, and merged with the old data to form a longer file.

If desired, generations of files could be kept. For example, suppose that 10 records were used to create the first file, and it was named FILE1 (10 records). Then, the second, merged file might be named FILE2. FILE2 would contain 20 sorted records. When the next 10 records were ready to be added, FILE2 could be renamed as FILE1 (wiping out FILE1) and a new FILE2 created. Note that FILE2 will always be one generation later than FILE1 and contain the latest addition of records. (Generation data sets can consume a great deal of disk space.)

The advantage of this "limited addition" system for maintaining large, sorted files is that it is easy to implement. The program segments to do the sorting can even be written in BASIC with little sacrifice in sort time. The disadvantages of this system are that one has to enter a small number of records and wait, and there is ever a need to sort the entire file, this technique won't help much.

0 0

Post a comment