When you read word NoSQL first time, there might be many questions in your mind as a developer, like what is it, what is purpose of using it, what are different types of it etc. In this article I will try to cover brief details about NoSQL databases and you will get answer to most of your questions. So lets start with NoSQL databases.
What is NoSQL?
There is no definition as such for NoSQL but they have few characteristics given below, which can describe it.
- It is not using relational model: In case of relational data model, data is organized in structure of Tables (relations) and Rows (tuples).
- Running well on clusters.
- Schemaless: It allows you to add custom field dynamically to database record without changing the structure. In case of Relational world, if we have to add any custom field to row in table, you have to add new field to table (structure) first and that field will be added to each record even though we just wanted to add it in on record. In case of NoSQL databases you can just add field to single record, as here we are not defining schema explicitly.
Need of NoSQL
There are multiple reasons because of which NoSQL databases came in picture.
- Impedance Mismatch Problem with Relational Model: Relational Model works fine for table with simple row values. But it has limitation of not storing any structure (Table), any nested records or list of values. But this was not a limitation to in-memory data structures, which can store richer data structures than relational model. In order to store richer data structures in relational model you need to translate them in order to store it on disk.
Consider an example where we want to store information of software professional in relational DB then we will be storing its personal details in USER_DETAILS table then its educational details in EDUCATIONAL_DETAILS table and his work experience details in WORK_EXPERIENCE table and then we will add various constrains such as foreign key and all. This storing of data across multiple tables and again at a time of retrieval of single professional’s data from multiple tables was frustrating to developers which is referred as Impedance Mismatch Problem.
- Demand of Clusters: As Internet has grown, number of users accessing the applications also grown. Which resulted in storage and retrieval of large amount of data. It was not manageable using single relational database because of size of data and performance, which demands use of clusters. Traditional relational database providers were supporting the clustering on the basis of shared disk mechanism, which has disk as a single point of failure. Also running databases on cluster had impact on budget as these cluster database environments cost more than single relational database. Also relational databases are not designed to run efficiently on clusters.
- There was a movement away for using database as a integration point: Earlier whenever to applications needs to interact with each other they were using same relational database as a point of integration this was one of the factor which was restricting applications to use relational databases. Then there was a movement for going away from using database as a integration point in order to encapsulate the database within an application and using service (SOA) as a integration point
Types of NoSQL databases
- Document Store: Here data is stored in unit of document and each document store the data in different format such as XML, YAML and JSON based on which implementation you are using. Documents are addressed in the database via a unique key that represents that document. One of the other defining characteristics of a document-oriented database is that in addition to the key lookup performed by a key-value store, the database offers an API or query language that retrieves documents based on their contents.
Apache CouchDB, IBM Domino, MongoDB etc.
- Key –Value Store: It uses map as their fundamental data model, In this data is represented as collection of key–value pairs such that each possible key appears at the most once in collection. Each value can store richer data model.
Dynamo, MemcacheDB, Aerospike, Berkeley DB, Orient DB etc.
- Column Family Store: It is two level aggregation structures. As with key–value stores, the first key often described as a row identifier, picking aggregate of interest. Row aggregate is itself formed of map of more detailed values .The second level values are referred to as columns, which allows picking particular column. Column family databases organize their columns into column families. Each column has to be part of single column family and column act as unit of access.
Amazon SimpleDB, Cassandra, HBase, Hypertable.
- Graph Store: This kind of database is designed for data whose relations are well represented as a graph consisting of elements interconnected with a finite number of relations between them. The type of data could be Social Relations, Public Transport Links, Road Maps or Network Topologies.
FlockDB, HyperGraphDB, Infinite Graph, Neo4j, Orient DB.
Rise of NoSQL resulted in Polyglot Persistence, which can be defined as using different data stores for different circumstances and not choosing relational databases as everyone is using it .we need to understand nature of data, we are storing and how we want manipulate it. This will result in mix of data storage technologies for different circumstances.
Happy Learning !!!