Skip to content

thang0010/Data-Cleaning-in-MySQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Data-Cleaning-in-MySQL

Project Overview

This project demonstrates a complete data cleaning workflow using MySQL. The goal was to transform a raw dataset of company layoffs into a clean and standardized dataset suitable for further analysis.

The cleaning process includes identifying duplicate records, standardizing inconsistent values, converting data types, and removing incomplete or unnecessary data.

Dataset

The dataset contains information about layoffs across companies, including:

  • Company name
  • Location
  • Industry
  • Total layoffs
  • Percentage of layoffs
  • Date
  • Company stage
  • Country
  • Funds raised

Data Cleaning Steps

The following steps were performed using SQL:

  1. Create a Staging Table

    • A copy of the original dataset was created to perform cleaning operations safely without modifying the raw data.
  2. Remove Duplicate Records

    • Used ROW_NUMBER() with PARTITION BY to identify duplicate rows.
    • Duplicate records were removed from the dataset.
  3. Standardize Data

    • Trimmed extra spaces in company names.
    • Standardized industry names (e.g., converting variations of "Crypto" into a single category).
    • Cleaned inconsistent country names (e.g., removing trailing punctuation from "United States").
  4. Convert Data Types

    • Converted the date column from text format to the proper DATE format using STR_TO_DATE().
  5. Handle Missing Values

    • Identified rows with missing key information.
    • Removed rows where both total_laid_off and percentage_laid_off were null.
  6. Remove Unnecessary Columns

    • Temporary columns used during cleaning (such as row_num) were removed after the process was completed.

Tools Used

  • MySQL
  • SQL Window Functions (ROW_NUMBER)
  • Data Cleaning Techniques

Key Skills Demonstrated

  • Data Cleaning
  • SQL Window Functions
  • Data Standardization
  • Handling Missing Data
  • Data Type Conversion
  • Database Management

Outcome

The final dataset is clean, standardized, and ready for data analysis and visualization.

About

This project demonstrates a data cleaning workflow using MySQL on a layoffs dataset. The process includes removing duplicate records, standardising inconsistent values, converting date formats, trimming text fields, and handling missing data to prepare the dataset for further analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors