What is Hashing in DBMS?
In a huge data structure, It is next to impossible to search all the index values and reach to desired data, to overcome this problem, hashing is used.
In DBMS, hashing is a technique to directly search the location of desired data on the disk without using an index structure.
In this technique, data is stored at the data blocks whose address is generated by using the hashing function. The memory location where these records are stored is known as data bucket or data blocks.
- A hash function can choose any of the column values to generate the address.
- Mostly, the hash function uses the primary key to generate the address of the data block.
- A hash function is a simple mathematical function to any complex mathematical function.
- The primary key can also be used as the address of the data block. That means each row whose address will be the same as a primary key stored in the data block.
Terminologies using in Hashing
Here, are important terminologies which are used in Hashing:
|Data bucket||Data buckets are memory locations where the records are stored. It is also known as Unit Of Storage.|
|Key||A DBMS key is an attribute or set of an attribute which helps you to identify a row in a relation. This allows in finding the relationship between the two tables.|
|Hash function||A hash function is a mapping function that maps all the sets of search keys to the address where actual records are placed.|
|Linear Probing||Linear probing is a fixed interval between probes. In this method, the next available data block is used to enter the new record, instead of overwriting on the older record.|
|Quadratic probing||It helps you to determine the new bucket address. It helps you to add Interval between probes by adding the consecutive output of quadratic polynomial to starting value given by the original computation.|
|Hash index||It is an address of the data block. A hash function could be a simple mathematical function to even a complex mathematical function.|
|Double Hashing||Double hashing is a computer programming method used in hash tables to resolve the issues of has a collision.|
|Bucket Overflow||The condition of bucket-overflow is called a collision. This is a fatal stage for any static has to function.|
Types of Hashing
- Static Hashing
- Dynamic Hashing
In the static hashing, the resultant data bucket address will always remain the same. That means if we generate an address for Machine_No =103 using the hash function mod (10) then it will always result in the same bucket address 3. Here, there will be no change in the bucket address.
Hence in this static hashing, the number of data buckets in memory remains constant throughout. In this example, we will have five data buckets in the memory used to store the data.
Operations of Static Hashing
- Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the new record using its hash key. When the address is generated, the record is automatically stored in that location.
- Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address of the bucket where data should be stored.
- Delete a record: Using the hash function, you can first fetch the record which is you want to delete. Then you can remove the records for that address in memory.
- Update a Record: To update a record, we will first search it using a hash function, and then the data record is updated.
Static hashing are of two types .They are as follows
- Open hashing
- Close hashing
In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This method is also known as linear probing.
For example, N3 is a new record that you want to insert. The hash function generates the address as 5. But it is already occupied by some other value. That’s why the system looks for the next data bucket 6 and assigns N3 to it.
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after the previous one. This mechanism is known as Overflow chaining.
For example: Suppose N3 is a new address that needs to be inserted into the table, the hash function generates the address as 5 for it. But this bucket is full to store the new data. In this case, a new bucket is inserted at the end of 5 buckets and is linked to it.
Dynamic hashing offers a way in which data buckets are added and removed dynamically and when need. In this hashing, the hash function helps you to create a large number of values.
- The dynamic hashing mechanism is used to overcome the problems of static hashing like bucket overflow.
- In this mechanism, data buckets grow or shrink as the records increases or decrease. This method is also known as the Extendable hashing method.
- This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor performance.
What is Collision?
Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in the hash table.
How to deal with Hashing Collision?
There are two technique which you can use to avoid a hash collision:
- Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is found, where a record should be placed.
- Chaining: This method builds a Linked list of items whose key hashes to the same value. This method requires an extra link field to each table position.
If you found this article helpful to you then Help Others, Please Share. 🙂