Hash codes simplified

For all models

T. R. Dettmann, associate editor

A hash code is a general form of table lookup procedure known as key transformation. The key variable contains the information used to recover the data record.

Why hash coding? There are several good reasons:

1. It can cut searching time to as little as a single look if we're lucky.

2. It can provide a simple access technique for complicated record keys.

3. It can reduce the size of a file required to store information.

In the September issue, a "To Do List" program was used as an example of a keyed file. The key used was the date due for a project. By putting this key into an array, the file could be searched quickly without having to look at each record.

To make the index reliable and efficient, it must be kept on disk. To do this without having to sort the index, it is necessary to change the key to another form. This process is called "key transformation". Come up with a routine which transforms a key to a number and use that number for the record that stores the data.;

If a key, such as "02/10/81", is entered and the output is the number five, then the data for the date "02/10/81" goes into record number five. This creates an immediate problem: What can be done with more than one record which has the same key? In that case, the basic hash technique breaks down and something else has to be done.

Another problem occurs when the routine generates the same record number for more than one key. This situation is quite possible and something which computer scientists have worked on for years. The problem may be more or less frequent, depending on how the routine works.

When two records have the same hash code, they collide.. What you do in this case depends on the file. If less than half of the index is used and the items are pretty well spread out, a simple solution is to start with the hashed record number and keep adding one until an empty record is located. This procedure could result in a serious problem.

Suppose the routine always produces a one, in which case, the first entry will go into position one, the second in position two and so on. When a particular record is wanted, you must start with record one and search sequentially through the file until it is found. This is known as a linear search. The "To Do list" program uses this method; each disk entry is examined instead of the index array, resulting in a real slowdown. However, if the same hash code for different keys is never generated, then a collision never takes place and the first record will be the one wanted.

With a reasonable amount of care in the design of a hash code, performance will be somewhere between the two extremes. To keep collisions to a minimum, the index table has to be kept larger than the number of items in it: too little space and there will be a large number of collisions; too much, and disk space is wasted. Therefore, the table should be approximately twice as large as you think will be needed.

To detect a collision, it is necessary that the full key be kept somewhere. It could be kept as part of the allocation table which may save some time in finding the record later. Another method is to keep the key in the disk record where it could be checked for correctness. You should choose the method which seems most effective for a particular application.

The hash code generator

The success of a hash coding system depends on the hash code generator. If the keys are "hashed" in such a way that they distribute themselves evenly through the table with a low probability of collision, you'll have a very efficient system. Ideally, the hash code should be unique for a given key.

Consider an inventory system, for example. While part numbers may run from one to 100,000

CAPTAIN 80's

MICRO-FANTASY (tm) MAGAZINE is a bi-monthly TAPEZINE which publishes magnetically the works of both famous and unknown authors. Such'published authors as Teri Li, Charles Forsythe, Don Boner, Jake Commander, Jamie Teitjen and James V. Nangano are just a few of those scheduled for the early issues.

MICRO FANTASY (tm) features the classic COMPU-NOVEL style of Adventure program as well as the recently-emerged TRAPMAZE(tm) genre which has taken off like a shot. Old friends are back, Startreks and Starwars and invaders galore plus DUNGEONS AND DRAGONS (tm) deep delving, treasure scarfing, monster stomping action for everyone.

VOLUME I, ISSUE I

Features as selection one ARCTIC ADVENTURE by fifteen-year-old Harry McCracken, his debut magnetically. Selection two is a classic of early pro-adventure, Teri Li's SPIDER MOUNTAIN. Selection three, a TRAPMAZE, Charles Forsythe's GAUNTLET OF DEATH. Finally, as a Micro-Fantasy Bonus, J ake Commander's acclaimed STAR TREK 4.0.

At commercial prices these programs alone would easily top $60.00 but they are all included as the first installment of MICRO-FANTASY(tm) MAGAZINE.

A SIX ISSUE SUBSCRIPTION TO MICRO-FAN T AS Y (tm) MAGAZINE IS ONLY $45.00 WITH BACK ISSUES AT $15.00 EACH. ADD $12.00 FOR SIX ISSUES OF MICRO-FANTASY ON DISK.

SEND $45.00 CHECK, MONEY ORDER, MASTERCARD OR VISA TO: MICRO-FANTASY MAGAZINE BOX 66

PETERBOROUGH, NEW HAMPSHIRE 03458

YES, rush the first issue of MICRO-FANTASY right away!

Address __

City_

Visa or Mastercard # Expiration Date_

State_Zip.

Circle # 42

there are only 250 different part types on hand at any one time. This is a perfect job for a hash coded system.

First, set up a table with 500 records. Next, look for a way to generate a unique number between one and 500. One method is to take the part number, divide it by 500, obtain the remainder and add one. The result will be a number between one and 500 and is called the modulus. The hash code generator formula is:

HC=IV-INT(IV/500)*500+l where HC is the hash code and IV is the inventory number. Using the hash code formula, the following table of sample hash codes can be derived.

Hash Codes Inventory # Hash Code

0 0

Post a comment