Did You Normalize Your Data?

Updated: Sep 18, 2020

An excerpt from my EBook "5 Reasons Why Your Spreadsheet Is Misbehaving". You can download a free copy from the Free Stuff page.

One of the best features of Excel is its flexibility. There is virtually no limit to the number of ways you can build a spreadsheet so it is customized to your particular needs. That’s why Excel is such a great, powerful tool to use in managing your business, figuring out your budget or finances, or keeping track of personal items or business inventory.

You can’t do whatever you want and still get the results you expect.

There are rules to follow. The good news is this: If you master a few basic, but essential concepts, you’ll spend much less time working on Excel and more time working on the activities that really matter.

If you want your spreadsheet to work correctly, you must build it correctly.

Have you ever taken time to look at a brick wall? Structures made of brick are very sturdy and are able to stand for many, many years – if they are properly constructed. Each brick wall must be perfectly vertical and each row of bricks must follow a specific pattern. Otherwise the structure will not stand for very long.

Excel spreadsheets, to function properly, must also be constructed properly. The first thing to remember is that there is a reason Excel is organized in rows and columns. The reason is that the row and column structure, called a flat database, is an excellent format for organizing and analyzing data.

What You Need To Know

A database is a collection of fields and records.

A field is a column in an Excel spreadsheet.

A record is a row in an Excel spreadsheet.

A field contains one unit of information in a record. For example, in a record of a customer’s information, there will likely be fields for such items as first name, last name, address, city, state, zip code.

A record contains all relevant information for one unique item in the database. Examples include sales transactions, customer information, and inventory specifications.

Picture 1 shows a flat database with 6 fields and 5 records. All of the information is fictional.

Picture 1: Example of a flat database

Notice how there is no duplication of data. Each field contains only one item of information. Every record is related to only one person.

What You Need To Know

Your data is not normalized if you have duplicate data.

Each record must be unique.

Let’s look at some examples!

A good example is the use of dates. If you want to have a record of sales for each month, most people set up one column for each month. The thinking is that each month is a separate field, so it needs its own column. So they end up with something like what you see in Picture 2 below.

This format is incorrect. The data is not normalized correctly because we only have one record that shows sales for all months.

Picture 2: Data not normalized correctly

What we actually want is 12 separate records. We need one record for each month. Each record should show only the total sales for that month.

Where is the error? In the field name. The field name should be “Month”, not the name of each individual month.

What is the correct way to organize our data? Well, in our example, we have two pieces of information: the month of the sale and the total of sales for the month. So we only need two fields (columns) to record our data – Month and Sales. The proper way to normalize the data is shown below in Picture 3.

Picture 3: Data normalized correctly

Now the data is normalized correctly. We have what we need, which is a list of 12 unique records, one for each month, with total sales for the month.

Always make sure you normalize your data correctly. When you do, you will find that creating and updating reports is faster and consistently more accurate.

7 views0 comments

Recent Posts

See All

How Many Data Sets Do You Need?

How many different spreadsheets do you regularly use to monitor business activity and manage your business? If you use more than one spreadsheet, then how do you validate the integrity of the data?