Adv Stats Blog

Posts

Showing posts from October, 2023

Module 10 Assignment

October 29, 2023

Here is the code used to conduct ANOVA and regression analysis on the "cystfibr" and "secher" datasets in R Studio: # Load necessary libraries library(datasets) # Load the cystfibr dataset data("cystfibr") # Fit a linear model to the data model_cystfibr <- lm(spemax ~ age + weight + bmp + fev1, data=cystfibr) # Display the summary of the model summary(model_cystfibr) # Conduct ANOVA on the model anova(model_cystfibr) # Load the secher dataset data("secher") # Log-transform birth weight and abdominal diameter secher$log_bwt <- log(secher$bwt) secher$log_ad <- log(secher$ad) # Fit a linear model to the data model_secher <- lm(log_bwt ~ log_ad, data=secher) # Display the summary of the model summary(model_secher) In the cystfibr dataset, we’re fitting a linear model where spemax is predicted by age, weight, bmp, and fev1. The coefficients of these variables in the model represent their respective effects on spemax. The intercept is the e...

Module 9 Assignment

October 22, 2023

This weeks assignment is based around two questions, the first is to generate a simple table in R that consists of four rows: Country, age, salary, and purchased. The following code is used to generate that. # 1. Simple table assignment_data <- data.frame ( Country = c ( "France" , "Spain" , "Germany" , "Spain" , "Germany" , "France" , "Spain" , "France" , "Germany" , "France" ) , age = c ( 44 , 27 , 30 , 38 , 40 , 35 , 52 , 48 , 45 , 37 ) , salary = c ( 6000 , 5000 , 7000 , 4000 , 8000 ) , Purchased = c ( "No" , "Yes" , "No" , "No" , "Yes" , "Yes" , "No" , "Yes" , "No" , "Yes" ) ) print ( assignment_data ) The second is used to generate a contingency table known as a rx C table u...

Module 8 Assignment

October 15, 2023

Part 1 Using R we have been asked to report on the drug and stress levels in the provided data set, to begin with we must create the vectors for each group. # Create vectors for each group high_stress <- c(10, 9, 8, 9, 10, 8) moderate_stress <- c(8, 10, 6, 7, 8, 8) low_stress <- c(4, 6, 6, 4, 2, 2) Following that, the vectors must be combined into one data frame, stress_data. # Combine the vectors into a dataframe stress_data <- data.frame( stress_level = factor(rep(c("High", "Moderate", "Low"), each = 6)), score = c(high_stress, moderate_stress, low_stress) ) An ANOVA test is performed to determine whether there is a significant difference between the groups # Perform the ANOVA test anova_result <- aov(score ~ stress_level, data = stress_data) # Print the summary of the ANOVA test summary(anova_result) This creates the following output: Part 2 To perform an ANOVA test on the zelazo dataset using R we must first load the ISwR package and...

Module 7 Assignment

October 08, 2023

1. The assignment begins with the following dataset x <- c(16, 17, 13, 18, 12, 14, 19, 11, 11, 10) y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) Following that is defining the relationship model and calculating the coefficients. Which produces the following output 2. This goes into Part 2, following the question Chi Yau. This includes parts 2.1-2.3, along with the output 3. Part 3 is based around using the multi regression model and is displayed in the next image. 3.1: The coefficients tell us about the relationship between each predictor variable and the response variable, holding all other predictors constant. 4. Part 4 follows the question from our textbook pp. 110 Exercises # 5.1 With the rmr data set, plot metabolic rate versus body weight. Fit a linear regression to the relation. According to t...

Module # 6 Assignment

October 01, 2023

A. a. The mean of the population is calculated as the sum of all values divided by the number of values. In this case, it would be (8+14+16+10+11)/5 = 11.8. b/c. Sample: (3, 5, 2) Mean: (3+5+2)/3 = 3.33 Variance: ((3-3.33)^2 + (5-3.33)^2 + (2-3.33)^2) / 3 = 1.56 Standard Deviation: sqrt(1.56) = 1.25 Sample: (3, 5, 1) Mean: (3+5+1)/3 = 3 Variance: ((3-3)^2 + (5-3)^2 + (1-3)^2) / 3 = 2.67 Standard Deviation: sqrt(2.67) = 1.63 d. Comparing these to the population: The population mean is (8+14+16+10+11)/5 = 11.8. The population variance is ((8-11.8)^2 + (14-11.8)^2 + (16-11.8)^2 + (10-11.8)^2 + (11-11.8)^2) / 5 = 9.36. The population standard deviation is sqrt(9.36) = ~3.06. B. 1. The sample proportion p will have approximately a normal distribution if both np and nq are greater than 5. Since p = .95 and q = .05 (since q = 1 - p), we can calculate: np = .95 * 100 = 95 nq = .05 * 100 = 5 Both np and nq are greater than 5, so yes, the sample proportion p does have approximat...