Final Project - Matthew Luu
Average Real Estate Sale Amount Analysis Between 2001 and 2020 in the State of Connecticut
Throughout the years the real estate market has had many ups and downs, going through an absurd number of economic depressions between the years 2001 and 2020. My goal with this project is to look through the average sale prices of real estate in the State of Connecticut to see if there has been a significant increase in the overall selling price of real estate within those years.
I obtained the data for this project from Dataset - Real Estate Sales 2001-2020 GL, which was provided by the State of Connecticut and its Office of Policy and Management. The dataset itself encompasses all real estate sales with a sales price of $2,000 or greater that occur between October 1 and September 30 of each year. For each sale record, the file includes the town, property address, date of sale, property type (residential, apartment, commercial, industrial or vacant land), sales price, and property assessment.
My hypothesis revolves around the idea that the price of real estate in the state of Connecticut would drastically increase between the years 2015 and 2020 compared to the six years between 2001 and 2006.
1. Dataset
● This Tables shows the average selling price of real estate each year in Connecticut between the years of 2001 to 2020
My hypothesis revolves around the idea that the price of real estate in the state of Connecticut would drastically increase between the years 2015 and 2020 compared to the six years between 2001 and 2006.
The Welch Two Sample t-test is a statistical test used to determine whether the means of two groups are significantly different. Using this I came up with the following results:
● t = -6.3662: This is the t-statistic, which measures the size of the difference relative to the variation in the data. The negative sign indicates that the mean of the first group (2001-2006) is less than the mean of the second group (2015-2020).
● df = 325056: This is the degrees of freedom, which is a measure of the amount of information available in the data for estimating parameters.
● p-value = 1.941e-10: The p-value is very small, much less than 0.05, which suggests that the difference in means is statistically significant.
● 95 percent confidence interval: -137954.29 -73005.52: This is the range in which
we are 95% confident that the true difference between the population means lies.
we are 95% confident that the true difference between the population means lies.
● sample estimates: mean of x = 346507.0, mean of y = 451986.9: These are the sample means for the two groups. The mean sale amount for 2001-2006 is lower than the mean sale amount for 2015-20202.
![]() |
| R Code for the t-test |
![]() |
| Bar graph comparing the two time periods average sale amounts. It shows the large gap in Average Selling Price between the two time periods. |
In conclusion, there is a statistically significant difference in the average sale amounts between the two time periods (2001-2006 and 2015-2020), with the latter period having a higher average sale amount. This supports my initial hypothesis that the 2015-2020 period sale amounts would be significantly higher than the 2001-2006 period sale amounts.






Comments
Post a Comment