-
Notifications
You must be signed in to change notification settings - Fork 5
/
class5.Rmd
276 lines (200 loc) · 10.7 KB
/
class5.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
---
title: 'R Small Group: Class 5'
author: "Amy Allen & Dayne Filer"
date: "July 12, 2016"
output:
html_document: null
pdf_document:
highlight: tango
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE)
set.seed(1234)
```
<!-- Here we style out button a little bit -->
<style>
.showopt {
background-color: #004c93;
color: #FFFFFF;
width: 100px;
height: 20px;
text-align: center;
vertical-align: middle !important;
border-radius: 8px;
float:right;
}
.showopt:hover {
background-color: #dfe4f2;
color: #004c93;
}
</style>
<!--Include script for hiding output chunks-->
<script src="hideOutput.js"></script>
### Using this document
* Code blocks and R code have a grey background (note, code nested in the text is not highlighted in the pdf version of this document but is a different font).
* \# indicates a comment, and anything after a comment will not be evaluated in R
* The comments beginning with \#\# under the code in the grey code boxes are the output from the code directly above; any comments added by us will start with a single \#
* While you can copy and paste code into R, you will learn faster if you type out the commands yourself.
* Read through the document after class. This is meant to be a reference, and ideally, you should be able to understand every line of code. If there is something you do not understand please email us with questions or ask in the following class (you're probably not the only one with the same question!).
### Class 5 expectations
1. Know how to write and run a basic function in R
2. Understand function environments and how functions find things
3. Understand the "do not repeat yourself" (DRY) principle
### Basic functions
So far we've used a lot of functions that already exist in R. Now you will learn how to write your own functions. User-written functions take the following structure:
```{r eval=FALSE}
myfunction <- function(arg1, arg2, ...) {
do some stuff
return(object)
}
```
Make a simple function that just returns the arument that it is given.
```{r}
first_function <- function (x) {
return(x)
}
```
Now if you pass a value to `first_function` it should return that value:
```{r}
first_function(9)
```
A function that only returns the given value is not useful, but it does illustrate how you can add functions to your environment. Now make a function that squares a value and then adds one to it.
```{r}
second_function <- function(x) {
ans <- x^2 + 1
return(ans)
}
second_function(9)
```
### The function environment
An environment is a place to store variables. As we discussed in class 1, when you make assignments in R, they are generally added as entries to the global environment.
Functions are evaluated in their own environments. When a function is called a new environment is created. This new environment is called the evaluation environment. Functions also have an enclosing environment, which is the environment where the function was defined. For a functions defined in the workspace the enclosing environment is the global envrionment.
When a function is evaluated, R looks through a series of environments for variables called. It first searches the evaluation environment and then the enclosing environment. This means that a global variable can be referenced inside a function. This principle is shown below by `third_function` which sums the argument `x` with the variable `a` from the global environment.
```{r}
a <- 9
third_function <- function(x) { x + a }
third_function(11)
```
The evaluation environment is populated with local variables as the evaluation of the function procedes. Once the function completes running, the evaluation environment is destroyed. Look at `second_function` for example. Within the function the variable `ans` is assigned. However, when you evaluate the function the `ans` variable is not available the global environment because it was assigned in the evaluation environment, and the evaluation environment was destroyed after the function completed.
```{r}
second_function(9)
ls()
```
As you can see, listing objects in the global environment only returns the global variable `a` assigned previously and the three functions that you have defined so far.
### DRY principle
The don't repeat yourself (DRY) principle states that
> every piece of knowledge must have a single, unambigous, authorative representation within a system - Dave Thomas and Andy Hunt.
There is a lot to think about when using the DRY principle, like not repeating variable names throughout your code. The DRY principle is particularly useful when thinking about functional programming. If you find yourself writing the same code more than once, or copying and pasting, consider writing a function. Think back to the fruit machine discussed in last class, and consider the following code to generate two sets of fruit:
```{r}
farmer1 <- list(type = sample(x = c("orange", "apple"),
size = 25,
replace = TRUE),
wdth = rnorm(n = 25, mean = 6, sd = 2.5))
farmer2 <- list(type = sample(x = c("orange", "apple"),
size = 25,
replace = TRUE),
wdth = rnorm(n = 25, mean = 8, sd = 1.8))
```
Now imagine you have to created data for 100 farmers. After you fininsh you realize you need to add pears to the vector of possible fruit. If you wrote code like above you would have to edit each line. However, you may see how the code above could be functionalized. Consider what may be variable in the code above, and create a function parameter for each variable part of the code. You may wish to change the type of fruit, the number of fruit, or the size of the fruit.
```{r}
## Create a funciton to generate fruit.
genFruit <- function(fnum, fmean, fsd, ftype = c("apple", "orange")) {
res <- list(type = sample(x = ftype, size = fnum, replace = TRUE),
wdth = rnorm(n = fnum, mean = fmean, sd = fsd))
return(res)
}
farmer1 <- genFruit(fnum = 25, fmean = 6, fsd = 2.5)
farmer2 <- genFruit(fnum = 25, fmean = 8, fsd = 1.8)
```
Notice two things: (1) now any change needed in the fruit generation only requires the code to change in one place -- greatly reducing the amount of work, and more importantly, the change for error; (2) notice how 'ftype' is defined in the function with a default value. You can probably imagine how many functions benefit from default parameter values. For the `genFruit` function above you will most often only generate apples and oranges, so it makes sense to provide a default value for the fruit type. Almost all of the functions you will use in R run with default values that you may or may not be aware of.
Similarly, you can functionalize the fruit machine:
```{r}
fruit_machine <- function(fruit, slice_size = 1) {
# create results list
res <- list (type = character(), pieces = numeric(),
slice_width = numeric())
## For-loop iterating through the 1:(number of fruit)
for (f in 1:length(fruit$type)) {
f_type <- fruit$type[f] ## Get the fruit type from the list
f_wdth <- fruit$wdth[f] ## Get the fruit width from the list
n_pieces <- 1 ## Each fruit is initially 1 piece
## Peel the oranges
if (f_type == "orange") {
f_wdth <- f_wdth - 1/8
}
## Divide the fruit
while (f_wdth >= slice_size) {
f_wdth <- f_wdth/2
n_pieces <- n_pieces*2
}
res$type[f] <- f_type
res$pieces[f] <- n_pieces
res$slice_width[f] <- f_wdth
}
return(res)
}
```
Now you can use the `fruit_machine` function to the `farmer1` and `farmer2` datasets you generated, changing the slice size based on the farmers requests.
```{r}
results1 <- fruit_machine(farmer1, slice_size = 1)
results2 <- fruit_machine(farmer2, slice_size = 0.5)
```
### Function control statements
So far we have only discussed one aspect of function control statments: the `return` function. The `return` function can actually go anywhere in the function, and a function can have multiple `return` functions. For example consider a function that returns "big number" for numbers greater than or equal to 100 and "small number" for numbers under less than 100:
```{r}
xmpl_func <- function(x) {
if (x >= 100) {
return("big number")
}
return("small number")
}
xmpl_func(1)
xmpl_func(1e10)
xmpl_func("see what happens")
```
When a function encounters the `return` function the returns the value and halts any execution. The other two functions worth knowing are: `stop` and `warning`. The `stop` and `warning` functions allow you to do tests within the function and exit the function (`stop`) if necessary, or issue a warning (`warning`).
In the `xmpl_func("see what happens")` example, the function returned `TRUE`. (Recall from class 1 that both would be converted to character in the if statement, which is why the code did not error.) However, you may wish to make sure the input to `xmpl_func` is numeric and has a length of 1. Modify `xmpl_func` to test the input.
```{r,error=TRUE}
xmpl_func <- function(x) {
## Test the length of x
if (length(x) > 1) stop("'x' must be of length 1.")
## Test the class of x
if (!is.numeric(x)) stop("'x' must be numeric.")
if (x >= 100) {
return("big number")
}
return("small number")
}
xmpl_func("see what happens")
xmpl_func(1:10)
```
Finally, you may wish to issue a warning if the input can be converted to numeric instead of an error:
```{r,error=TRUE}
xmpl_func <- function(x) {
## Test the length of x
if (length(x) > 1) stop("'x' must be of length 1.")
if (is.logical(x)) {
x <- as.numeric(x)
warning("'x' was converted from logical to numeric.")
}
if (!is.numeric(x)) stop("'x' could not be converted to numeric.")
if (x >= 100) {
return("big number")
}
return("small number")
}
xmpl_func("see what happens")
xmpl_func(TRUE)
```
Notice the function completed when the input was logical. The `warning` function does not halt execution, but does print a warning message for the user.
### Small Group Exercises
These exercises are to help you solidify and expand on the information given above. We intentionally added some concepts that were not covered above, and hope that you will take a few minutes to think through what is happening and how R is interpreting the code.
1. Write a function that solves the quadratic formula. Recall, given:
$$ax^2 + bx + c = 0$$
The quadratic equation is:
$$x = \frac{-b\pm\sqrt{b^2-4ac}}{2a}$$
Use it to solve the following equations.
$$x^2+x-4=0$$
$$x^2-3x-4=0$$
$$6x^2+11x-35=0$$
2. Add the necessary checks for input length and class to the quadratic formula function from question 1.