Database Design Tips

Do a clear separation between data fields

Each column should play its own, well-defined role. Just avoid mixing up the information. Like for example, in the case of an address, is better to have separate fields for town, zipcode, street and house number then having a single database field called address. Later you can search easier, enforce constraint checks over zipcode, and so on.

Use short, meaningful names

Long, complicated names for tables and columns are hard to understand. Just simple names, in the context of the table are much easier to read and understand. Like in the case of a Person table, I find better to use Name instead of PersonName as column name. For numeric columns some of the developers are using '#' instead of 'number_of' ( '#visits' instead of 'number_of_visits' )

Add comments to the schema

Good comments are crucial. Many users are writing too short comments, bad explained. Good comments with samples are always a help, for the rest of the team or even for you. It happen very often to me to come back on some database and to forget what does a certain column in the database.

Use distinct design principles for large and small tables

For the big tables ( with many records ) try to reduce the number of columns and the columns size. Like this they will need much less space in memory and on disk. All the queries involving this tables will find than the data much faster, without need to read from hard-disk. This is one of the key factors for a fast database: keep as much as possible from data in memory.

Use boolean and short encoding characters

Presume a field 'Status' with values 'Active', 'Inactive' and 'Idle'. You may save the value as the complete name, but this can be inneficient. Storing an enumeration or a char(1) with possible values 'a', 'i', 'd', for example, will use less space in database. This you should take care mostly over large tables, where the disk IO is a crucial factor for a better performance.

Use as much as possible column constraints

Using column constraints you can enforce certain rules over a specific column. Like this you can ensure that you have no 'bad' data in your database. Imagine the sample before, with day_of_week as number. By setting a constraint day_of_week between 1 and 7 you will make sure you will never have 'bad' values like 8 in the database. In this way you avoid any unpleasant exceptions in your application.

Use normalization

Normalization is the key factor when designing databases. This means you remove any duplicates from a table and set a separate table for them. For example, after normalization, an Employee table will store a field DepartmentId, where all the department details will be written in a separate Departement table. The Department table will use DepartmentId as Primary Key. This will be referenced by a Foreign Key from the Employee table.

Use small numbers when possible

Here I have in mind the sample of a database schema, where the id's started from 900000 or higher. The range 0-900000 was free. Some of the databases stores this numbers in as many bytes as minimal needed. For example, 1-255 will need one byte, 255-65535 two, etc. If you use small numbers, this id's will require much less space on disk. The joins will work also faster. If some of the records are deleted, is a good idea to reuse the small, free id ranges.

Use application caching

Caching on the application side may reduce the number of queries and the network traffic. You should cache the small/medium tables which are often accessed. Like more than one hundred times per second

Test queries with explain plan

If a query runs slow, it is a question where an index is missing, or the execution plan is wrong. You can check this with Explain Plan.

Too many indexes may slow down the database

This is the case by inserts, deletes and updates. Besides the table, the indexes has to be modified too. This will generate extra writes and reads.

Close transactions as soon as possible

Databases are 'consistent', in the way that a query will see only the modifications which ended before the query started. The modifications done in the same time are hidden it. All the changes are stored in 'rollback' areas. Long transaction will keep the rollback locked, and all the queries comming after the current one will be slowed-down.

Use LIMIT or not ?

Some databases ( like Postgres ) uses 'limit' as keyword for limiting the number of records. Other databases ( like Oracle ) does not. Oracle expects you will close the result set as soon as you don't need to read any more from the result set. Additional you can use the FIRST_ROWS hint, explained bellow.


Oracle has few special hints: ALL_ROWS, FIRST_ROWS_10, FIRST_ROWS_100, etc. Using them you may inform the database: I need the complete result or I will read only the first part of the result. For each of this cases the database will decide a different execution strategy. For ALL_ROWS it may take like 10 minutes to get the first row back, but than all the others are comming very fast, while with FIRST_ROWS you may need to wait only 1 minute up to the first row, but the next rows will have a large delay between each other, so you will need to wait like 20 minutes up to the moment when you have read all of them. You should use the FIRST_ROWS hint for webpages where you have to display a small number of records from a large result query.