• Sai Geetha M N

Making the Right Database Choice

#ArchitecturalDecision

If someone were to ask, should I use SQL or NoSQL database, the obvious answer is "it depends". Depends on what? It depends on various aspects and that is what I would like to share my views on, today


There are so many theories and so many databases out there that this is always a big decision point especially in green-field projects. Enterprises that have already invested in one or the other do not leave much of a choice many times.


But if you are starting fresh and need to make a choice, I would propose a simplified 4 step process which would work for most cases, though complex and nuanced cases may require greater thought and deeper considerations.


Step 1: Should you use an RDBMS or a NoSQL Database?


First consideration should be whether an RDBMS itself would serve the purpose. If yes, do not think twice about going for a NoSQL DB. There are a whole host of features taken for granted by using an RDBMS. They come in very handy and make the ease of use immense.


What is the fundamental difference between the two? One is an ACID Database and the other is a BASE data base



You can see a detailed definition here


Scale, is one of the most important features brought in by NoSQL DBs. if and only if you need this to the extent that RDBMS cannot provide, think of a NoSQL DB as it comes with its own backpack of nuances that need to be taken care of.


Go for an RDBMS, if you need:

  1. Consistent data with ACID (Atomicity, Consistency, Isolation and Durability) support.

  2. Complex dynamic queries requiring stored procedures or views

  3. Option to migrate to another database without significant change to existing application

  4. Data Warehouse, Analytics or BI use cases

Also be aware that now a days there are RDBMS like PostgreSQL which provide horizontal scalability too


Think of NoSQL DB in these cases:

  • If you are dealing with semi-structured or unstructured data

  • You need flexible schema

  • You have very limited ways of accessing data/querying data

  • You have to deal with very high velocity transactions

  • You have very large volume of data requiring quick and cheap scalability

  • You do not have complex queries, stored procedures or views

  • You do not have data warehouse needs, analytics or BI use cases


Step 2: Which NoSQL?

If you have decided on an RDBMS, I do not have to say much. But if you have a use case for NoSQL DB, then the question is which one should I choose?


For this, let's understand the 4 broad types of noSQL DBs that exist.

  1. Key-Value Stores - Data is stored as key-value pairs

  2. Document Stores - Data is stored as JSON, XML or BSON documents

  3. Column-Oriented Stores - Data is organised by columns

  4. Graph Databases - Data storage focuses on relationships between data elements

Here is a ready-reckoner check list for when to use what - a good place to start with:-

Key Value Datastore Checklist:

  • Simple Schema

  • High velocity read/write with no frequent updates

  • High performance and scalability

  • No complex queries involving multiple keys or join

Document DB Checklist:

  • Flexible Schema with complex querying

  • JSON/BSON or XML data formats

  • Leverage complex indexes (multikey, geospatial, full text search etc.)

  • High performance and balanced read-write ratio

Column-Oriented Database:

  • High volume of data

  • Slower writes but fast reads (with compressed storage)

  • Data extractions by columns using row keys

  • No ad-hoc query patters, complex indices or high level aggregations

Graph DB:

  • Need to maintain connections between data points

  • The links and relationships between data is the most important aspect

  • Typical for social networks and knowledge graphs

So, let's have a look at some of the example databases of each type:


Now, suppose I have narrowed down to one type, there are so many of the same type. Which within that should I choose? Say, Column-oriented DB is what I need. Should I go with Cassandra or HBase? If I need a document DB, should I go with Couchbase or MongoDB.

This is answered in Step 3:


Step 3: Apply the CAP theorem


What does the cap theorem say? A distributed database can only provide 2 out of 3 guarantees between:

  • Consistency - All nodes in the cluster have consistent data and a read request returns the most recent write from any node.

  • Availability - A non-failing node must always respond to requests in a reasonable time

  • Partition Tolerance - System continues to operate during network or node failures.

As you can see, within Document DB, if you want CP, you will go for MongoDB and if you want AP, you will go for Couchbase.


In Column oriented, if you want CP, you will go for HBase and if you want AP, you will go for Cassandra.


This is a very simplified view. Each of them are further tunable but I am not getting there. Would leave it for further research to you once you come up to this point.


Step 4: Other Considerations

Even if you have reached this far, you are not done. There are a few other considerations that look trivial to begin with but will matter a lot in the long run. So do not over look the support provided by these databases on these aspects:

  • Backup And Recovery

  • Replication of data: Is it Synchronous, Asynchronous or Quorum-based?

  • What are the Read/Write and indexing strategies?

  • Do they provide a Concurrency control?

  • Support for Security, access controls and encryption at rest

  • Query methods supported

  • DR Configuration and Uptime capabilities

Conclusion

If you have been able to apply all of the above 4 steps and its checklists and answer yourself based on the use case you are looking for, you would have nearly made the right decision 80 to 85% of the times.

Every use case will have certain very custom needs that you may have to look at beyond this generalised way.

501 views3 comments

Recent Posts

See All

You can add comments here using your facebook account too!