GitHub Powerpoint link | Google Drive PowerPoint Link | Accompanying JupyterNoteBook | GitHub Repository Link |
The dataset used in this analysis can be downloaded here.
The link to the accompanying Tableau visualization used can be found below:
This project involves a statistical investigation into departmental salary disparities within a company. SQL & Python was used for statistical calculation, Tableau was used to create supporting visualizations.
Business Case:
Objective:
Deliverables:
Data from raw .csv file was ingested into SQL Server Management Studio [SSMS], from which SQL queries were iteratively built upon to obtain the final output which included departmental statistical information such as:
Through EDA with Python, I came to the conclusion that the dataset contained both hourly & annual salaried workers, through which the scope of the departmental salary analysis was eventually restricted to only include annual salaried workers as reflected in the final SQL code.
Additionally, through investigation of departmental salary distributions with histograms & relevant Quantile-Quantile plots in Python, I’ve concluded to place less emphasis on the count of outliers via Z-Score values & instead, place more emphasis on CV values & std. dev. during departmental evaluation.
An explanation on the significance of the 4 calculated values was mentioned in the accompanying Powerpoint Slide:
Below is the final SQL Query used to output all 4 statistical metrics used in departmental salary disparity analysis