Demystifying Hashes,Maps and Hashing

When you dive into learning a new programming language, there are some standard concepts that tend to be very helpful. One of those is a data structure representing a list of key value pairs, where each key maps to one and only one value. However the value itself could be another data structure containing many values. This blog post will break down maps in Ruby and Javascript and how to use it in each.

Declaring and Initializing Ruby Hashes


hash_identifier =  { key1 => value1, key2 => value2 }

Here is an example of declaring and initializing a Ruby has using strings as keys.

attendance = {"Sarah" => 4, "Chris" => 6, "Alex" => 1}

puts attendance
# {:sarah=>4, :chris=>6, :alex=>1}

This is kind of like a shorthand for:

attendance = {:sarah => 4, :chris => 6, :alex => 1}

and BTW people like to call => a "hash rocket".

In Ruby if the keys are symbols (which are faster than using strings as keys) then you can use this JSON-style format.

attendance = {sarah: 4, chris: 6, alex: 1}

Accessing Ruby Hashes


Let's say that Alex attends another meeting and you wish to increment alex's value in the hash.

attendance[:alex] += 1

puts attendance
# {:sarah=>4, :chris=>6, :alex=>2}

Looping Over Ruby Hashes


Let's say that everyone attends another meeting and you wish to increment their values. We can use the Hash classes each method, which takes in two params a key identifier and a value identifier.

attendance.each { |key,value| attendance[key] += 1 }

For this purpose we only need the key, so instead we can use the each_key method for Hash and do the following.

attendance.each_key { |key| attendance[key] += 1 }

puts attendance
# {:sarah=>5, :chris=>7, :alex=>3}

Hashing Vs Hashes


A Ruby Hash is not the same thing as hashing or using a hash function. Most programming languages call an ordered key value pair data structure a "map" because it maps a key to a value.

On the other hand, hashing is the concept of using a function to produce an integer from an object. Here is how Java oracle) handles hashing:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer.

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

This is useful because you can create what is called a "hash table" to store data. If you write a good enough hashing function for the data set that you wish to store, then each object will get hashed to its own "bucket". Your hash table will start out with a bunch of empty spots in it (null), and after populating it there will still be many nulls, but since a hash table is essentially an array where each index is the hashcode (the integer returned by the hash function) and the element at each index is the data element then this means you can access data in array runtime i.e. constant run time as opposed to a logN runtime for binary search trees.

The downside to hash tables is that they are slower than binary search trees (BST) for looping over and they also store a bunch of nulls so they use more memory than BSTs. Also if the data is difficult to hash well then there could be a lot of collisions (many objects hashing to the same bucket). Linked lists are one way to store the values in a hash table.

So why does any of this matter? I don't want you to get mixed up on the names of things with what hashing is and what a hash is in ruby. So why the name hash then if it is really a map structure?

Yukihiro Matsumoto then creator of Ruby said that the name has was inherited fro Perl.

Screen Shot 2021-05-09 at 11.57.30 AM.png

Ruby also has a hash method in the Object class. This is necessary to override if you override the eql? method while defining a class.

It is important that the keys of a map form a set. In Java you can actually call the keySet() method on a map and it returns a set of all keys. In ruby you can just call the keys method on a hash and it returns an array of the keys. The important fact of a set is that it contains no duplicates and how are duplicates identified? You would think if the eql? method returns the same thing then the objects are equal so Ruby would not add the objects. But look at this...

class User
  attr_accessor :user_name

  def eql?(other)
    self.class == other.class && 
    self.user_name== other.user_name
  end

  alias :== eql?
end

u1 = User.new
u2 = User.new

u1.user_name = "rubyrocks"
u2.user_name = "rubyrocks"

user_map = {u1 => "premium", u2 => "premium"}

puts u1.hash
puts u2.hash
puts u1 == u2
puts user_map

# 3821538715194388951
# 2153368322350425775
# true
# {#<User:0x00007f839011c4c8 @user_name="rubyrocks">=>"premium", #<User:0x00007f839011c450 @user_name="rubyrocks">=>"premium"}

This is bad that we have 2 of the "same" objects in our map.

From docs.ruby-lang.org,

Two objects refer to the same hash key when their hash value is identical and the two objects are eql? to each other.

A user-defined class may be used as a hash key if the hash and eql? methods are overridden to provide meaningful behavior. By default, separate instances refer to separate hash keys.

A typical implementation of hash is based on the object's data while eql? is usually aliased to the overridden == method:

So it says we need to override the hash method to use a class we defined as a key.

class User
  attr_accessor :user_name

  def eql?(other)
    self.class == other.class && 
    self.user_name== other.user_name
  end

  def hash
    user_name.hash
  end

  alias :== eql?
end

u1 = User.new
u2 = User.new

u1.user_name = "rubyrocks"
u2.user_name = "rubyrocks"

user_map = {u1 => "premium", u2 => "premium"}

puts u1.hash
puts u2.hash
puts u1 == u2
puts user_map

# -4215281874242322283
# -4215281874242322283
# true
# {#<User:0x00007fd6738ac338 @user_name="rubyrocks">=>"premium"}

Now you can see that we only have one user object in our hash and that the hash functions returned the same value.

JavaScript Maps


In JavaScript, the closest thing to a Ruby hash is called a "map". Objects in js are kind of similar but since they are not easy to iterate over, and you cannot call all kinds of useful things on objects like "size" I don't think they are a contender. Java also has maps but you can choose to use HashMaps or TreeMaps right out of the box.

Now let's see how we did all of those hashy Ruby things above but in JavaScript world

Construct and initialize a Map in JavaScript


According to Mozilla using the assignment operator with a map does not interact with the Map data structure. Thus we need to use Map methods.

map_identifier =  new Map();
attendance =  new Map();
attendance.set("Sarah", 4);
attendance.set("Chris", 6);
attendance.set("Alex", 1);

console.log(attendance);
// {"Sarah" => 4, "Chris" => 6, "Alex" => 1}

For a short cut you could do:

let attendance = new Map([ ['Sarah', 4],['Chris', 6],['Alex', 1]]);

Looping over a map in JavaScript


Let's up the attendance of each by 2 this time!

attendance.forEach((value, key, map) => {map.set(key , value + 2 )});
console.log(attendance);
// {"Sarah" => 6, "Chris" => 8, "Alex" => 3}

Check out Mozilla's official site to read more on Map's forEach method.

Leave a comment if you would like to demo this functionality in Python, or any other language you like!

Happy coding!

No Comments Yet