Securely storing your users' passwords

What we'll focus on

Password hashing functions
Salting
Password cracking

Pre-requisites

Programming concepts like functions, loops, e.t.c
Basic knowledge of Go
Knowledge about how the web works (HTTP, forms, HTTP verbs, passwords)

As we build user facing software systems, we're always faced with the same problem of user identity management - how do we verify that the users intending to access certain data are the rightful owners?

Over the years, many web based software systems, e.g Linkedin have suffered from breaches in user password data. How then have we not learnt how to properly store user passwords?

As gatherers of data, we owe our users proper protection of the data entrusted with us. Good intention is not enough. We have to put in place proper mechanisms to make proper data security attainable.

Through this article, we'll explore the different password storage mechanisms, the weaknesses of some popular hashing algorithms and ways to overcome the said weaknesses.

Understanding hashing

Hashing is the process through which we transform plain text into hashes. Instead of storing the plaintext words/strings e.g password in our database, we store the hash representation of the said strings.

Some popular hashing functions include MD5 (Message Digest Method 5), SHA-2 (Secure Hash Algorithm 2), to mention but a few.

For example;

Suppose my password is thisIsMyPassword, And I'm using the MD5 hash function to hash it,

This would be the resultant string 80d2f0dd3f1caa2e62bab686d6d1d140

Properties of hash functions

They are one way functions

Once you use a function for hashing, you can not use the same function to reverse the hashing.

Time consuming

It takes a considerable amount of time to find two inputs that hash to the same output. If you use a secure hash function, it would essentially take longer than the earth's age to find two functions that hash to the same value (referred to as a collision).

To put the above in context, it's estimated that it will take 36 trillion years to find a collision for the SHA-256 hash function. The universe is only 13.8 billion years old.

Fixed size out put is produced

Long and short streams of text will result into hashes of the same length. Why then should we insist on using long passwords?

It's because the longer the password, the more difficult it is to crack it. I'll explain below.

Anatomy of a hash function

I choose not to go into the details of the inner details of the most popular hash function. However, here's a list of resources that delve into details:

MD5 -https://en.wikipedia.org/wiki/MD5
SHA-2 -https://en.wikipedia.org/wiki/SHA-2

Examples of how to encrypt a string using MD5, SHA-256, SHA-1

Using SHA-256

func main() {
	hasher := sha256.New()
	io.WriteString(hasher, "password")
	fmt.Printf("%x", hasher.Sum(nil))
}

// Output: 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8

Using SHA-1

func main() {
	hasher := sha1.New()
	io.WriteString(hasher, "password")
	fmt.Printf("%x", hasher.Sum(nil))
}

// Output: 5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8

Using MD5

func main() {
	hasher := md5.New()
	io.WriteString(hasher, "password")
	fmt.Printf("%x", hasher.Sum(nil))
}

// Output: 5f4dcc3b5aa765d61d8327deb882cf99

Bad approach to password handling

Remembering the fact that hash functions are one way "streets", the same input produces the same output.

For example;

Suppose your system has over 100,000 users and about 1% of them are using password as their password.

Since you're using the same hashing function e.g MD5, 1000 user accounts will have the same password hash in your database.

In case of a database breach, the attackers may use a rainbow table approach to compare the password hashes in your database with precomputed hashes thereby revealing the original passwords the users used.

Password hash cracking demo

I'll use a tool called hashcat to decrypt a password hash into plain text. There are many other tools options that can perform the same job.

See instructions on how to set up hashcat here

I'm using a Kali Linux VM and it comes with a password list stored in the /usr/share/wordlists directory.

$ hashcat -m 0 hashes.txt /usr/share/wordlists/rock.you.txt --show

-m means attack mode
0 tells hashcat that we're using raw MD5 hashes
hashes.txt is the file containing the password hash. It contains 5f4dcc3b5aa765d61d8327deb882cf99 as the only value
/usr/share/wordlists/rock.you.txt is the directory to our word list
--show is a flag to show the output in the format password hash:plain text equivalent

Here is the output:

We can see that hashing user data using one-way hash functions may not be enough.

A better approach

Very secure systems utilize hash algorithms that take into account the time and resources it would require to compute a given password digest. This allows us to create password digests that are computationally expensive to perform on a large scale. The greater the intensity of the calculation, the more difficult it will be for an attacker to pre-compute plain text.

In Go, it's recommended that you use the bcrypt package.

package main

import (
	"fmt"
	"log"

	"golang.org/x/crypto/bcrypt"
)

func main() {
	password1 := "thisIsAPassword"

	// Generate a hash from password
	hash, err := bcrypt.GenerateFromPassword([]byte(password1), bcrypt.DefaultCost)

	if err != nil {
		log.Println("error: ", err)
	}

	fmt.Println("Hash to store:", string(hash))

	// Store this "hash" somewhere
	// Later, a user wants to log in. Check the password they entered against the one you have in the database
	password2 := "given-password"

	storedHash := hash
	if err := bcrypt.CompareHashAndPassword(storedHash, []byte(password2)); err != nil {
		if err != nil {
			log.Println("error: ", err)
		}
	}

	fmt.Println("Password is correct!")
}

Conclusion

I hope this article has made a contribution towards building secure software with user data security in mind.