Sign In
Not register? Register Now!
Pages:
10 pages/≈2750 words
Sources:
Check Instructions
Style:
APA
Subject:
Mathematics & Economics
Type:
Statistics Project
Language:
English (U.S.)
Document:
MS Word
Date:
Total cost:
$ 51.84
Topic:

Github and R Markdown, and mastery in the practice of regression analysis.

Statistics Project Instructions:

Weight (% of final grade):
Due Date:
25%
11:59 ADT December 10th, 2020
Upon successful completion of this project, students will possess a working knowledge of Githuband R Markdown, and mastery in the practice of regression analysis. These skills are highly valuedglobally by employers in search of data scientists.
Project Description:
Each group will be assigned a dataset. Collectively, group members are to perform a completeregression analysis of their data, details of which must be presented on Github(https://github.com) using R Markdown (https://rmarkdown.rstudio.com/articles_intro.html).The following sections must be included:
Abstract (150 words or less)
Introduction (must contain a thorough description of the questions of interest)Data Description (must contain data visualizations that are properly labelled and explained)Methods (must contain a complete description of all analysis tools used)Results (all figures should be properly labelled and discussed)Conclusion (must contain a concise discussion of what has been learned from the analysis)Appendix (must include all data and R Markdown files for reproducibility)
Data:
Datasets are found at https://lionbridge.ai/datasets/10-open-datasets-for-linear-regression/.Groups 1-5 are to analyse Dataset 1 (Cancer), Groups 6-10 are to analyse Dataset 2 (CDC) ,Groups 11-15 are to analyse Dataset 3 (Fish Market), Groups 16-20 are to analyse Dataset 4(Medical Insurance), Groups 21-25 are to analyse Dataset 5 (New York Stock Exchange), Groups26-30 are to analyse Dataset 7 (Real Estate), Groups 31-35 are to analyse Dataset 8 (Red Wine),Groups 36-40 are to analyse Dataset 9 (Vehicle), Groups 41-45 are to analyse Dataset 10 (WHO).Note: Before commencing your analysis, you must introduce one new additional data point intoyour assigned dataset. A description of this unique data point must be included in your DataDescription section along with some rationale for the values chosen.
Grading Scheme:6 Overall presentation and organization of materials3 Quality of data visualizations6 Correctness of analysis4 Quality and selection of relevant figures6 Interpretation of results--25Regression and Analysis of Variance STAT 3340 / MATH 3340Fall 2020Final Project

Statistics Project Sample Content Preview:

Regression and Prediction Using R
Student's Name
Institutional Affiliation
Abstract
As part of daily life, machine learning is used to make decisions, especially by data scientists. This paper aimed to incorporate machine learning algorithms in the prediction of vehicle prices. First, the car.csv dataset was inspected, cleaned, and organized. A final dataset was arrived at and used for further analysis. The dataset was fitted with Present_Price as the response variable and the rest as explanatory variables using the basic linear model. Three algorithms, linear regression, random forest, and support vector machines, were selected for modeling. Data were partitioned into two; training and testing set. The training data was used to predict the prices on the testing set. The models' performances were evaluated and improved using tuning, cross-validation, and checked for overfitting. Lastly, the models were compared against one another using a calculated RMSE (Root Mean Squared Error). The best performing model was chosen, hence leading to the arrival of meaningful conclusions.
Keywords: Algorithms, linear regression, random forest, basic linear model, dataset, regression, and output.
Introduction
Every day, applications for machine learning are typical to come across. The algorithms help in making critical decisions in every field of work. For instance, media sites rely on machine learning to sift through millions of options to give you song or movie recommendations, and retailers use it to gain insight into their customers' purchasing behavior. Closer to home, data scientists using machine learning to advise on future data patterns and behavior that could be encouraged or discouraged. Besides, it entails building models that offer predictive power and can be used to understand data not yet collected.
In earlier statistics classes, machine learning has been used when running simple regression models. On the other hand, this is a complex topic with a wide range of possibilities and applications. Therefore, this study sought to present a basic understanding of regression modeling using linear, random forest, and support vector modeling, as well as to answer the following questions of interest;
* What is the relationship among variables, especially between the car price variable with other variables?
* Is it possible to predict the price of a new car based on historical data?
* Which is the best model for use in prediction among the three?
Data Description
The car dataset contained information about cars and motorcycles listed on CarDekho.com. The car data was in a CSV file and included the following columns: model/ Car_Name, year, selling price, showroom price/Present_Price, kilometers driven, fuel type, seller type, transmission, and the number of previous owners. Using R, the dataset had nine columns and 301 rows/observations. After cleaning the dataset, there were four numerical and five categorical variables. Those categories were model, fuel type, seller type, and transmission. Categorical ones were namely, year, selling prices, showroom price, kilometers driven, and the number of previous owners. As instructed, an additional data point was added for a 2018 manual city selling at 4.34M but presen...
Updated on
Get the Whole Paper!
Not exactly what you need?
Do you need a custom essay? Order right now:

👀 Other Visitors are Viewing These APA Statistics Project Samples:

HIRE A WRITER FROM $11.95 / PAGE
ORDER WITH 15% DISCOUNT!