Data Science Jobs
Documentation

Github repository

https://github.com/uthmandantata/excel_projects

Data source: kaggle

https://www.kaggle.com/datasets/nikhilbhathi/data-scientist-salary-us-glassdoor

TODO - talk about dataset(source, what it contains, how it will be useful)

  • SOURCE : Kaggle
  • Information about Data Scientist Jobs
  • Can be useful to know what to learn to get a specific job
  • Can be useful to know which company best fits your skillset


  • Data Preperations

  • Remove duplicates (alt A M)
  • Transform dataset into a table (ctrl T)
  • Rename table fields by columns (alt M C )


  • Exploratory Analysis

    Columns we'll anlyze:
  • All columns with Skills
  • Company Name
  • Salary Estimates
  • Avg Salary(K)
  • Degree
  • seniority_by_title


  • Questions Asked

  • Job title with highest frequency
  • Job title with lowest frequency
  • TOP 3 most requierd skills
  • Is Google data analytics certificate that important?
  • Jnr average Salary
  • What company has the highest frequency
  • What company hires the most jnr data analysts
  • Is sql more important than python for snr data scientists?
  • What is the most required skill based on each job title. Ps use the simplified


  • Answers

  • Job title with highest frequency
  • Ans: DATA SCIENTIST
    Process used: Pivot Table (alt H N V)
    Fields used: job_title_sim

  • Job title with lowest frequency
  • Ans: DATA MODELER & DIRECTOR
    Process used: Pivot Table (alt H N V)
    Fields used: job_title_sim

  • TOP 3 most requierd skills
  • Ans: Python , excel & sql

    Code used: =COUNTIF(Python,"Required")

  • Is Google data analytics certificate that important?
  • Ans: Python , excel & sql

    Code used: =COUNTIF(google_an,"Required") to get the number of jobs that require google data analytics certificate & =COUNTA(job_title_sim) to get the number of jobs

  • Jnr average Salary
  • Ans: 71K

    Code used:

                         
                            =AVERAGEIF(seniority_by_title,"jr",Avg_Salary_K)&"K"
                         
                     


  • What company has the highest frequency
  • Ans: MassMutual , Reynolds American & Takeda Pharmaceuticals

    Process used: Pivot Table

    (alt H N V)
    Fields used: Company Name
  • What company hires the most jnr data analysts
  • Ans: webfx.com , MITRE & The HSC Health Care System

    Process used: Pivot Table

    (alt H N V)
    Fields used: Company Name & seniority_by_title
  • Is sql more important than python for snr data scientists?
  • Ans: Python is more important

    Code used: =IF(COUNTIFS(seniority_by_title,"sr",Python,"required")>COUNTIFS
    (seniority_by_title,"sr",sql,"required"),"Python is more important","Sql is more important" )

  • What is the most required skill based on each job title. Ps use the simplified
  • Ans:

    Code used:

                        
                            
    =IF(COUNTIFS(Table1[job_title_sim],B66,Table1[google_an],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[mongo],"Required"),"Google", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[mongo],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[flink],"Required"),"Mongo", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[flink],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[bi],"Required"),"Flink", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[bi],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[tableau],"Required"),"BI", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[tableau],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[hadoop],"Required"),"tableau", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[hadoop],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[tensor],"Required"),"hadoop", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[tensor],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[scikit],"Required"),"tensor", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[scikit],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[pytorch],"Required"),"scikit", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[pytorch],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[keras],"Required"),"pytorch", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[keras],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[sas],"Required"),"keras", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[sas],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[sql],"Required"),"sras", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[sql],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[excel],"Required"),"sql", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[excel],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[aws],"Required"),"excel", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[aws],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[spark],"Required"),"aws", IF(COUNTIFS(Table1[job_title_sim],B66,Table1[spark],"Required")>COUNTIFS(Table1[job_title_sim],B66,Table1[Python],"Required"),"spark","Python")))))))))))))))