Apply Data Analysis Course

Tools / Frameworks: Using R

Duration: 2 Weeks

Cost: $100

MODULE 1
VISUALIZATION WITH GGPLOT2

  • Grammar of graphics concept
  • Structure of ggplot: data, aes(),
    geoms, themes
  • Bar charts, histograms, density plots
  • Boxplots, violin plots
  • Scatterplots and trend lines
  • Faceting and multi-panel plots
  • Customizing titles, labels, legends
  • Saving and exporting plots
  • Using visualization for exploratory
    data analysis (EDA)

MODULE 2
MORE R CODING

  • Review of data types and objects
  • dplyr verbs: filter, select, mutate,
    arrange, summarise, group_by
  • Using pipes (%>%)
  • Handling missing values (NA)
  • Conditional statements (ifelse)
  • Creating functions
  • Loops and apply family
  • Data reshaping: pivot_longer,
    pivot_wider
  • Importing and exporting data
  • Writing reproducible code

MODULE 3
CORRELATION

  • Meaning of correlation
  • Positive, negative and zero correlation
  • Pearson and Spearman correlation
  • Correlation matrix
  • Visualization using scatterplots and
    heatmaps
  • Testing significance
  • Interpretation and limitations
  • Correlation vs causation

MODULE 4
LINEAR REGRESSION

  • Concept of linear regression
  • Model equation
  • Running regression using lm()
  • Interpreting coefficients
  • Residual analysis
  • R-squared and Adjusted Rsquared
  • Checking model assumptions
  • Plotting regression lines Making
    predictions

MODULE 5
MULTIPLE REGRESSION

  • Why multiple regression
  • Multiple predictors in a model
  • Handling categorical variables
  • Interpreting partial effects
  • Multicollinearity and VIF
  • Interaction effects
  • Model diagnostics
  • Prediction using multiple
    regression

Module 6
Logistic Regression

  • When to use logistic regression
  • Binary outcome variables
  • Logistic function and odds Using
    glm()
  • Interpreting odds ratios
  • Predicted probabilities
  • Confusion matrix
  • Accuracy, sensitivity, specificity
  • ROC curve (introduction)