Cooking Up Success: A Recipe to Normalise Data
Remember your first kitchen and the thrill of having your own culinary space? Â
(Or if you don’t have one yet, then we totally understand – the economy is in shambles.)Â
Picture a fridge stocked with your favourite drinks, weekends reserved for gourmet experiments, and the most fragrant cleaning products to keep the grime at bay. Â
Yet, reality often falls short of our expectations. Organizing turns into a nightmare, food spoils before it’s cooked, and no amount of cleaning products can hide the mundane reality of household chores.Â
It takes weeks, maybe months, to find a comfortable routine…and even then, is it the best you can do? Â
In real life, the answer is often elusive. Thankfully with data, it’s pretty straightforward.Â
Table of Contents
Customer InformationÂ
What was missing from that picture? The hard work of just organising and planning things. Thinking of what to buy, where to put it, how to prep it for the week ahead, where you’re going to find the time and strength to do everything…Â
Now, one way to cut through that dreary workload is normalisation. And in techspeak, this refers to creating standards for a particular dataset.Â
When it comes to normalising a data set, we need to know what it is we want to normalize. The key fields that should come into play are:Â
- Company Name Â
- Address Â
- Suburb Â
- State Â
- Postcode Â
- Country Â
- First Name Â
- Last Name Â
- Phone (Phone2/Mobile)Â Â
- Email Address Â
Fields such as Company Name, First Name, Last Name and Email address will need to be handled differently to a little differently.Â
For Company Name, proper case would be a great start, as would the standardization of PTY LTD from all the different iterations of it.  Â
First Name and Last Name would also be proper cased, but certain care needs to be taken when dealing with surnames such as McBride or O’Brien.Â
Simple proper casing of those two examples won’t work, so it’s crucial to implement specific rules that address unique capitalization patterns. Â
One way is to develop an algorithm that recognizes prefixes like “Mc” and “O’” and automatically capitalizes the subsequent letter. Additionally, provide a manual review system to correct errors and a feedback mechanism to continuously refine the process.
This approach ensures that names are accurately and respectfully stored, enhancing data integrity and reducing errors.Â
EmailÂ
Email is another troublesome field, but there are a couple of things that can be done. Firstly, you’ll want to ensure that it conforms to the email ‘standard’, that being [emailnamehere@provider.suffix].Â
Once this is done, you can then start to look at the suffix of the email, being that after the @, to ensure that the domain names actually match.  Â
There are of course simple ones to check, like gmail.com which should NEVER have anything after the .com. If you start your focus on the top 100 domains, you’ll then be able to build your logic as you go. Â
AddressesÂ
Address, Suburb, State, Post Code and Country can all be taken care of in one simple process!  Â
Or you could have one person slog through all that information. But if you’ve been there, you know that it’s a waste of, money, resources, and morale. Nobody’s got the time for that!Â
You could opt for a tool that can quickly and effectively parse this data into its normalised elements. Whether you need this done in real time or in bulk is entirely up to you, and you’ll need to consider how best to implement it in your system (web API, point and click, or a scripted interface?).Â
Phone NumbersÂ
Phone numbers are the most mismanaged data element in all the data sets that we’ve ever had to deal with. We’ve seen it all. We’ve had clients specifically request that ( ) be included within the phone element!Â
But the most common issue is that the leading zero has been dropped from the data. Â
The root cause of this is that someone has opened a CSV/TXT file in Excel, made some changes and then saved the results. As a result, Excel drops the zero…never to be seen again. Â
Whatever your normalisation strategy, ensure that: Â
- it’s globally accepted in your database Â
- you have back-end systems in place to monitor and resolve data that your front-end can’t.Â
And that’s how you clean up your data! Â