This project demonstrates a complete data cleaning workflow using MySQL. The goal was to transform a raw dataset of company layoffs into a clean and standardized dataset suitable for further analysis.
The cleaning process includes identifying duplicate records, standardizing inconsistent values, converting data types, and removing incomplete or unnecessary data.
The dataset contains information about layoffs across companies, including:
- Company name
- Location
- Industry
- Total layoffs
- Percentage of layoffs
- Date
- Company stage
- Country
- Funds raised
The following steps were performed using SQL:
-
Create a Staging Table
- A copy of the original dataset was created to perform cleaning operations safely without modifying the raw data.
-
Remove Duplicate Records
- Used
ROW_NUMBER()withPARTITION BYto identify duplicate rows. - Duplicate records were removed from the dataset.
- Used
-
Standardize Data
- Trimmed extra spaces in company names.
- Standardized industry names (e.g., converting variations of "Crypto" into a single category).
- Cleaned inconsistent country names (e.g., removing trailing punctuation from "United States").
-
Convert Data Types
- Converted the
datecolumn from text format to the properDATEformat usingSTR_TO_DATE().
- Converted the
-
Handle Missing Values
- Identified rows with missing key information.
- Removed rows where both
total_laid_offandpercentage_laid_offwere null.
-
Remove Unnecessary Columns
- Temporary columns used during cleaning (such as
row_num) were removed after the process was completed.
- Temporary columns used during cleaning (such as
- MySQL
- SQL Window Functions (
ROW_NUMBER) - Data Cleaning Techniques
- Data Cleaning
- SQL Window Functions
- Data Standardization
- Handling Missing Data
- Data Type Conversion
- Database Management
The final dataset is clean, standardized, and ready for data analysis and visualization.