Cleaning data in Excel is a vital step to ensure accuracy and consistency. Whether you’re working on small datasets or handling large volumes, this guide explains both manual and automated approaches for data cleaning to make your workflow seamless and professional.
1. Removing Duplicates
Scenario: A customer list contains duplicate records.
- Manual Method:
- Select your dataset.
- Go to the Data tab → Click Remove Duplicates.
- Choose the columns to check for duplicates and confirm.
- Automated Method:
Use a formula to identify duplicates: - =IF(COUNTIF(A:A, A2) > 1, “Duplicate”, “Unique”)
Apply filters to remove duplicates based on the results.
-In this case the manual Method is less work in removing duplicates compared to automated Method
2. Handling Missing Data
Scenario: A sales dataset has blanks in the Region and Sales columns.
- Manual Method:
- Highlight missing data with Conditional Formatting.
- Go to Home → Conditional Formatting → New Rule → Format only cells that are blank.
- Fill blanks using:
- Averages: =AVERAGE (range)
- Specific values: Type directly into the blank cells.
- Highlight missing data with Conditional Formatting.
- Automated Method:
Use Excel’s Go To Special:- Press Ctrl + G → Select Special → Choose Blanks.
- Enter the value or formula (e.g., =A1) and press Ctrl + Enter to apply across all blanks.
– To handle missing data, try either method…see which one you might like the most.
3. Standardizing Data Formats
Scenario: A column of dates appears in inconsistent formats.
- Manual Method:
- Select the date column → Right-click → Format Cells → Choose a consistent format like MM/DD/YYYY.
- Use functions for text:
- Capitalize text: =PROPER(A1)
- Convert to uppercase: =UPPER(A1)
- Automated Method:
Use Power Query to format columns:- Go to Data → Get & Transform Data → Launch Power Query.
- Select the column and apply transformations for text or date consistency.
– To standardize Data formats: Automated method is less work and simple compared to the manual method.
4. Splitting and Combining Data
Scenario: A Full Name column needs splitting into First Name and Last Name.
- Manual Method:
- Select the column → Go to Data tab → Text to Columns.
- Choose a delimiter (e.g., space) and split into multiple columns.
- Automated Method:
Use formulas:
- First Name: =LEFT (A1, FIND (” “, A1) – 1)
- Last Name: =RIGHT (A1, LEN(A1) – FIND (” “, A1))
– for data splitting and data combining, the manual Method is simpler as you don’t have to remember formulas.
5. Identifying Outliers
Scenario: A sales column contains unusually high values.
- Manual Method:
- Sort the column in descending order to spot anomalies.
- Highlight outliers using Conditional Formatting → Highlight Cell Rules → More Rules.
- Automated Method:
Use a formula to flag outliers: - =IF(ABS(A2-AVERAGE(A: A)) > 2*STDEV(A:A), “Outlier”, “Normal”)
-Identifying an outlier, the automated method might be easier as it might seem complicated when selecting the rules using manual method.
6. Removing Unnecessary Spaces
Scenario: Email addresses have leading or trailing spaces.
- Manual Method:
- Highlight affected cells.
- Manually delete spaces.
- Automated Method:
Use the TRIM () function: - =TRIM(A1)
– Trimming is a better way of removing unnecessary space.
7. Validating Data
Scenario: Ensure phone numbers contain only numeric values.
- Manual Method:
- Select the column → Go to Data tab → Click Data Validation.
- Set rules to allow only numbers.
- Automated Method:
Use formulas: - =IF(ISNUMBER(A1), “Valid”, “Invalid”)
- Automated method is easier to validate data. It is easier to remember after a couple of times of utilization.
Let’s know which method you prefer on each step of data cleaning!!!
Leave a comment