Basics of Data Modeling

Abinaya Rajesh
4 min readDec 17, 2020

The data model is an abstraction that organizes the elements of data and how they are related to each other. Data modeling helps to organize the data in the databases. In the conceptual data modeling, the data are related based on the entities. It means the data is related based on the domain of the data. For example, if you have an online shop, the customer data is related to the products they buy. The logical data modeling is one step ahead of conceptual data modeling. Here, the customer might have an address, phone number, email and so on. The modeling of such relations is called logical modeling. This is where the tables and columns come in to picture. In physical modeling, these tables are created under a database and are related based on schemas. We use DDL to create tables so that the database understands how to create tables.

Normalization

During the data modeling, it is important to make the tables are in the normal form. The normalization process helps to reduce the redundancy and improve the data integrity with the databases. There are up 6 normal forms. In industries, three normal forms are generally used. The important thing to note is that the normalization starts with first normal form, then the second and then the third normal form.

  • In the first normal form, it is important to consider that every cell in the table has a single data values, the group of data that have different relationships are moved to different tables.
  • When the data is in the first normal form, you can try to make the data to move to the second normal form by making each row of data uniquely identify itself by creating primary keys. This constitutes the second normal form.
  • In the third normal form, the transitive relationship between the data is identified and moved to a different table to avoid duplicates or redundant data.

Denormalization

Denormalization is the process of improving the read process of the database at the expense of the write process by adding redundancy to the database. The denormalization comes after the normalization to facilitate the improvements to reduce heavy reads. It is very important to maintain data consistency. Hence, you have to update all the redundant data in all the tables during insert update and delete process. During de-normalization, check on the data that you would like to retrieve from the database. If the query becomes very complex and if it requires heavy read, you can do the denormalization. Try to add the columns to the tables to make the simpler data retrieval. In some cases, you can also create a different table with all the data you need for the query. This causes the redundant data but the read time is reduced thereby increasing your performance.

Fact Tables: Fact tables are tables of a database that has measurement (metrics) values especially the facts and figures that heps to do analytics of the business process. They are numeric and additive. Eg. Total amount of invoice that contributes to sales.

Dimension Tables: Dimension tables are the tables that have the dimensions that are not useful for the analytics. They are usually textual fields that describe the business and answer questions like who, what, where, why, etc. They are connected to the fact tables through the foreign keys. Eg. Date and time, Roles like customers and providers.

Star Schema: Star schema has a single fact table surrounded by several dimension tables. The benefits of star schema are faster aggregation because the data is denormalized. The drawback of this schema is that it reduces data integrity because of the presence of redundant data. Finally, one to many relationships is in a star schema. Hence, they are suitable for data marts.

Snowflake Schema: It has a centralised fact table with several dimension tables which are in turn are related to several other dimension tables. It supports many to many relationships. Hence, they are suitable for data warehousing. The data are in normalized form in a snowflake schema. They are either 1N or 2N normalized.

Tip: In order to insert and update a table in PostgreSQL (also called as upsert), use the below syntax:

INSERT INTO table_name(col1, col2)
VALUES
(007, 'Have a great day')
ON CONFLICT (col1)
DO UPDATE
SET col2 = EXCLUDED.col2;

--

--