Posts

Assignment #10: Building Your Own R Package

Assignment #10: Building Your Own R Package Purpose The Friedman package is designed to help students like me and researchers perform efficient exploratory data analysis in R. It provides tools for data cleaning, summarizing, and visualizing datasets, making common workflows faster and easier. The package is aimed at beginners and intermediate R users who want a consistent set of functions for analyzing data without having to write repetitive code. Key Functions Here are some planned functions: clean_data(): Standardizes column names, handles missing values, and formats data frames for analysis. summarize_data(): Provides summary statistics for numeric and categorical columns, including counts, means, medians, and standard deviations. plot_distribution(): Generates histograms, density plots, and boxplots for selected variables. compare_groups():  Performs group-wise comparisons and outputs summary tables and plots. DESCRIPTION Choices Dependencies (Imports): I included ggplot2 and ...

Assignment #9: Visualization in R – Base Graphics, Lattice, and ggplot2

Image
  Base Graphics, Lattice, and ggplot2 This scatter plot shows the relationship between vehicle weight (wt) and fuel efficiency (mpg) using base R graphics. Each point represents a car in the dataset. The plot reveals a clear negative relationship, where MPG decreases as weight increases. Lighter cars tend to have much higher fuel efficiency, while heavier cars fall into lower MPG ranges. This suggests that vehicle weight is a strong predictor of fuel economy. This histogram displays the distribution of fuel efficiency (mpg) across all cars in the dataset. The x-axis represents MPG ranges, while the y-axis shows the number of cars in each range. Most vehicles fall between 15 and 25 MPG, indicating that moderate fuel efficiency is the most common. The distribution is slightly skewed toward higher MPG values, with fewer cars achieving very high efficiency. Overall, the dataset is concentrated in the lower-to-mid MPG range. This lattice scatter plot shows the relationship between weigh...

Module # 8 Input/Output, string manipulation and plyr package

Input/Output, string manipulation and plyr package

Module # 7 R Object: S3 vs. S4 assignment

Image
For this assignment, I used the built-in R dataset mtcars, which comes from the R datasets package. data("mtcars") head(mtcars, 6) The mtcars dataset contains information about different car models, including: mpg (miles per gallon) cyl (number of cylinders) hp (horsepower) wt (weight) am (transmission type) To check its structure, I got this below: Output shows: Class: "data.frame" Base type: "list" A good question to ask is if a generic function can be assigned to this  A generic function in R is a function that behaves differently depending on the class of the object passed to it. For example: Both print() and summary () are generic functions. And to check if a function is generic: methods(summary) This shows different methods like: summary.data.frame summary.lm Since mtcars is a data.frame, R automatically dispatches: summary.data.frame(mtcars) A Generic Function Be Assigned Because mtcars is an S3 object,  data.frame is an S3 class, gene...

Module # 6 Doing math in R part 2

Module # 6 Doing math in R part 2 1. Matrix Operations Given two matrices: A <- matrix(c(2, 0, 1, 3), ncol = 2) B <- matrix(c(5, 2, 4, -1), ncol = 2) a) Find A + B To find the sum of matrices A and B , simply add corresponding elements: A + B Output: [,1] [,2] [1,] 7 2 [2,] 5 2 Explanation: The sum of the matrices is calculated element by element: (2 + 5) = 7 (0 + 2) = 2 (1 + 4) = 5 (3 + -1) = 2 b) Find A - B To find the difference between matrices A and B , subtract corresponding elements: A - B Output: [,1] [,2] [1,] -3 -2 [2,] -3 4 Explanation: The difference of the matrices is calculated element by element: (2 - 5) = -3 (0 - 2) = -2 (1 - 4) = -3 (3 - -1) = 4 2. Creating a Diagonal Matrix To build a 4x4 matrix with the diagonal values 4, 1, 2, and 3 using the diag() function. diag(c(4, 1, 2, 3), nrow = 4) Output: [,1] [,2] [,3] [,4] [1,] 4 0 0 0 [2,] 0 1 0 0 [3,] 0 0 2 0 [4,] 0 0 0 3 Explana...

Module # 5 Doing Math

Image
  Matrix Inversion and Determinants in R

Module # 4 Programming structure assignment

Image
  Hospital Data Analysis In this assignment, I analyzed data collected by a local hospital. The dataset includes the following variables: The analysis involves visualizing the data through  boxplots  and  histograms  to understand patterns in blood pressure and visit frequency. Boxplot of Blood Pressure by Doctor's First Assessment Patients with a "good" assessment green box have a more consistent and moderate blood pressure, generally ranging between 50 and 150, with most values clustered near the middle of that range. This suggests that patients rated as "good" tend to have stable blood pressure. Patients with a "bad" assessment red box , on the other hand, have more variability in their blood pressure, ranging from as low as 50 to as high as 100. This wider spread could indicate that patients with more extreme or unpredictable blood pressure tend to be rated poorly by the doctor. Histogram of Frequency of Visits The majority of patients have a low ...