主页 Swift UNIX C Assembly Go Plan 9 Web MCU Research Non-Tech

Differences between index storage and hash storage in detail

2022-09-03 | Research | #Words: 456

There are two kinds storage structures in data structures that are easily confused, namely index storage structure and hash storage structure (hash storage structure).

Index storage

You can find corresponding keyword according to address. It can be understood as a yellow pages. You can find someone’s phone number based on name. So index storage is also called direct addressing.

Array is a kind of index storage structure that can be directly accessed through subscripts (array’s subscripts is a variant address). You can say keywords and addresses are related.

Hash storage

The “hash” in the name of “hash storage” is calculated through an algorithm, such as MD5. In this storage format, the address is calculated into a hash value of the same length through a hash algorithm, and then the hash value is stored instead of directly storing the address. When accessing the keyword, the hash value will be decoded through operation and then accessed again, or directly use. You can say there is a certain mapping relationship between the node’s storage address and the keyword.

Difference

The difference between the two is an additional hash function operation process. But why do we need to do this extra step and “waste” the computer’s performance?

The answer is saving space.

Because the computer’s storage space is finite, if ignore the storage space limit, you can create a large dictionary, reserve positions for each possible keyword, and then directly address it through the index.

However, in some cases, the actual number of stored keywords is often much less than the possible number of keywords, resulting in a huge waste of space. If use hash storage instead of index storage at this case will save space.

For example, if the possible value is in array a to z, but most values are in array z. Ok, we need prepare all arrays, a to z, to save possible values, in fact, just z and some arrays be used. If we use the hash to store values, we just need one array! And saving much space.

Another benefit of hashing is excellent average-case performance, and it can also provide excellent worst-case performance when the set of keywords is static.

In terms of time complexity, perfect hashing and basic dictionary operations both require O(1) time, it means there is not much difference in time.

And sometimes the two need to be mixed, so the difference is not big.

However, some people need to take an exam, and relevant questions will appear in the exam. Just remember there is a certain mapping relationship between address and keywords using hash storage structure.

I hope these will help someone in need~