forked from ICI3D/RTutorials
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ICI3D_Lab3_EpiStudyDesign.R
414 lines (354 loc) · 17.7 KB
/
ICI3D_Lab3_EpiStudyDesign.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
## Tutorial on study designs and measures of effect
## Clinic on the Meaningful Modeling of Epidemiological Data
## International Clinics on Infectious Disease Dynamics and Data (ICI3D) Program
## African Institute for Mathematical Sciences, Muizenberg, RSA
## Jim Scott 2012
## Identifying study designs
## For each of the following descriptions, determine what type of
## study design was used. The answers are at the end of this tutorial
## Study #1
## A study examined risk factors associated with falling at an
## assisted care living facility. Researchers enrolled 75 patients that
## had falls and another 211 inpatients that did not have falls. For each
## study participant, the researchers examined adverse event reports,
## medical records and nurse staffing records. They found that patients
## with a balance deficit or lower extremity problem were at higher risk
## for a fall.
##
## What type of study design best describes this study?
##
## a) Cohort Study
## b) Case-Control Study
## c) Cross-Sectional
## d) Correlational Study
## e) Randomized Controlled Trial
## Study #2
## Researchers were interested in identifying risk factors associated
## with needle stick injuries among medical students. To do so, a survey
## was mailed to 417 medical students at a National University.
## The survey included questions about demographic factors, knowledge of
## needle handling protocols, and episodes of needlestick injury. Over all,
## 59 students (14.1%) reported experiencing one or more needle stick injuries.
## Invesigators found that those who reported having attended at least one
## needle handling seminar had a lower prevalence of injury compared to
## those that had not reported attending a seminar on needle handling.
##
##
## What type of study design best describes this study?
##
## a) Cohort Study
## b) Case-Control Study
## c) Cross-Sectional
## d) Correlational Study
## e) Randomized Controlled Trial
## Study 3
## Investigators sought to determine if water treatment via solar radiation
## is effective at reducing the overall incidence of diarrheal illness. To
## do so, researchers solicited participants from two neighboring towns.
## All participants recieved clear plastic water containers. However,
## participants in town A were asked to treat their drinking water
## using the solar radtion method while those in town B were given no specific
## instructions. AFter 6 months of follow-up, diarrhea incidence rates were
## compared.
##
## What type of study design best describes this study?
##
## a) Cohort Study
## b) Case-Control Study
## c) Cross-Sectional
## d) Correlational Study
## e) Randomized Controlled Trial
## Study 4
##
## Is alcohol consumption associated with HIV transmission?
## To answer this question, researchers collected data on alcohol sales
## and HIV prevalence in in 48 different countries. To control for possible
## confounding, additional data such as GDP (Gross Domestic Product),
## unemployment, and education were also included in the analysis.
##
## What type of study design best describes this study?
##
## a) Cohort Study
## b) Case-Control Study
## c) Cross-Sectional
## d) Correlational Study
## e) Randomized Controlled Trial
## Analyzing a 2 x 2 table
##
## In order to demonstrate 2 x 2 table analysis and to calculate measures of effect,
## we'll look at some data collected by Lefevre, et. al (2010). In that study
## researchers conducted an experiment to determine if beer consumption increases how
## attractive humans are to mosquitoes. In short, A number of volunteers were
## randomized to consume Beer. Subsequently, mosquitoes were released into a
## controlled apparatus that led them to tents filled with study participants or an
## empty tent of outdoor air (uncontaminated by participants). Two different mosquito
## releases were performed, once before beer was consumed and once after.
## You can read more details about the complete experiment online:
## http://www.plosone.org/article/info:doi%2F10.1371%2Fjournal.pone.0009546
## Prior to beer consumption, 215 mosquitoes flew towards the tent containing study
## participants while 219 flew toward the outdoor air. After beer consumption 369
## mosquitoes flew toward the participants while only 221 flew towards the outdoor air.
## We can replicate these data in R by entering the following commands:
Timing <- c(rep(1,590),rep(0,434))
Choice <- c(rep(1,369),rep(0,221),rep(1,215),rep(0,219))
Timing <- factor(Timing, levels = c(1,0), labels=c("After","Before"))
Choice <- factor(Choice, levels = c(1,0), labels=c("Human","Outdoors"))
MosquitoData <- data.frame(Timing,Choice)
## To see the data in table format you can use the table() command:
table(MosquitoData)
## Examining the data:
## It's usually a good idea to visually inspect the data with an appropriate graph.
## Immediate results can be attained by entering the following command:
barplot(table(MosquitoData))
## There are a number of reasons why this isn't a very satisfying plot. Take a
## a minute to consider what it's lacking. How can it be improved?
## For one, it would probably make more sense to have the data stratified by
## exposure - in this case, Timing of mosquito release - before or after beer
## consumption. Also, the plot is lacking appropriate labels and a title.
table(Choice, Timing) # Timing variable now in columns
## This is an immidiate improvement, but still somewhat misleading:
barplot(table(Choice,Timing),main="Mosquito choice by Timing of release",
xlab="Timing of release relative to beer consumption",
ylab="No. Choosing Participants (dark)",
col = c("darkblue", "lightblue","darkblue", "lightblue"))
## Better is to show the actual distribution of mosquito choice by Beer status using
## percentages:
prop.table(table(Choice,Timing),margin=2)
## This can be directly inserted into the barplot command:
barplot(prop.table(table(Choice, Timing),margin=2),
main="Distribution of Mosquito Choice by Timing of release",
xlab="Timing of release relative to beer consumption",
ylab="Proportion Choosing Participants (dark)",
col = c("darkblue", "lightblue"))
## Now the scales are comparable and it's clear that a greater proportion of
## mosquitoes were attracted to the participants after they consumed the beer.
## The above code demonstrates the flexibility of R in creating plots. Try
## experimenting with different colors. Also, it's probably more appropriate
## to change the column ordering so that the 'Before' column is first. Try using
## your knowledge of R's indexing system to do this on your own. One possible
## answer appears at the end of this tutorial.
## Now that you've looked at the data, a natural question that arises is:
## "Did drinking the beer really increase the attractiveness of the particiapnts?
## (as far as the mosquitoes are concerned, that is!)" -or- "Could the observed
## difference be do to chance?"
## There are a number of statistical tests that could be used to answer this question.
## (in particular, you could use a permuation test as demonstrated in previous lectures)
## Possibly the simplest method would be to use a Chi-square test of independence.
## The null hypothesis for this test is that the column and row variables are
## independent. In this case, we could state the null hypothesis as:
## "Beer consumption has no effect on participant attractiveness". The chi-square
## test statistic will have an approximate chi-square distribution with
## (r - 1)*(c - 1) degrees of freedom (where r and c represent the number of rows
## and columns in the table - here df = 1) as long as the number of observations in
## each cell is not "small". You can get the chi-square test statistic, df, and
## p-value from R by using the summary command in conjunction with table():
summary(table(Choice,Timing))
## The resulting p-value is very small, which provides evidence against the null
## hypothesis. Conclusion: the evidence suggests that beer consumption and
## attractiveness are not independent(!). Of course, more research is needed.
## Chi-square test results can also be obtained in R by using the chisq.test()
## command:
?chisq.test
## Perform the same chi-square test that you did previously, but this time, use
## the chisq.test() command. Note: you may need to change one of the input arguments
## to get results that exaclty match those that you previous obtained.
## Measures of effect
## The odds ratio and relative risk can be calculated directly from our table:
table(Choice,Timing)
## Take a minute to perform these calculations by hand - then check your results
## using R. Also be sure you know how to interpret these measures. which of these
## measures of effect is most appropriate for the given study?
## It's good practice to provide confidence intervales (CIs) when reporting ORs and
## RRs. These convey the degree of uncertainty (due to sampling) that is present in
## the estimate(s) and represent a range of plausible values for the true measure of
## effect. Different methods exist for calculating CIs. We'll rely on R to do the
## calculating for us. One way to obtain CIs is via the 'epiDisplay' package. In
## R-studio, it should be listed under the 'Packages' tab. To load it you need
## check its box or, alternatively, type: library(epiDisplay). If it isn't installed,
## do install.packages('epiDisplay') and then library(epiDisplay)
library(epiDisplay)
##
## If you don't see it listed under the 'Packages' tab, it may not be installed on
## your computer. You can attempt to do so by clicking the 'Install Packages' button
## and typing in: epiDisplay.
## Once epiDisplay is loaded, you have access to many commands relevant to
## epidemiological analysis. The command cci() provides the OR, CI, and results
## from a number of hypothesis tests associated 2x2 tables.
## You can get help using:
## ?cci
## Here it makes sense to consider 'Timing' as the exposure (After = exposed,
## Before = unexposed). 'Choice' could play the role of "disease" (Human = case
## Outdoors = control).
table(Choice, Timing)
## Selecting the appropriate values for the cci command:
cci(369, 221, 215, 219, graph=FALSE)
## Analagously, metrics associated with the RR can be obtained through epiDisplay's
## csi() command. Try using csi() to obtain a CI for the RR. Note: csi doesn't
## plot a graph for a 2x2 table, so you don't need to specify graph=FALSE
## Based on the CI's for the OR and RR, what conclusions can you draw about the
## relationship between the variables in the table?
## Often, it is necessary to control for the effects that a potentially confounding
## variable may have on an exposure/disease relationship. When confounding is
## present, it is not possible to obtain an ubiased a measure of effect
## One way to determine if confounding is present is through the application
## of stratified analysis. Consider the following raw dataset from a
## hypothetical case-control study investigating gender as a risk factor for
## Malaria (adapted from Szklo & Nieto, 2000):
## replicate raw data:
Gender <- c(rep(1,156),rep(0,144))
Malaria <- c(rep(1,88),rep(0,68),rep(1,62),rep(0,82))
Workplace <- c(rep(1,35),rep(0,53),rep(1,53),rep(0,15),rep(1,52),rep(0,10),rep(1,79),rep(0,3))
Gender <- factor(Gender, levels = c(1,0), labels=c("male","female"))
Malaria <- factor(Malaria, levels = c(1,0), labels=c("case","control"))
Workplace <- factor(Workplace, levels = c(1,0), labels=c("indoor","outdoor"))
MalariaData <- data.frame(Gender,Malaria,Workplace)
## take a look at the first few lines of raw data:
head(MalariaData)
## examine the relationship between Gender and Malaria:
table(Gender,Malaria)
## An OR can be obtained using the cci command. Try it out. Note: remember
## the format is cci(caseexp, controlex, casenoex, controlnoex, graph=FALSE).
## Assume exposed = male:
## You should have found the OR = 1.71. This represent the "crude" or "unadjusted"
## OR. It suggests that the odds of malaria is higher for men than for women. Now,
## let's see what happens when we stratify the data by workplace. If workplace is
## unrelated to these data (i.e. not a confounder) then we should get approximately
## the same OR (OR=1.71) as we did before for both levels of workplace.
##
table(Gender,Malaria,Workplace)
## Compute the ORs for each table separately using cci().
## The resulting OR's are both close to 1.00. This reveals two things: 1) Workplace
## appears to be a confounder - the stratified ORs differ from the crude OR. 2)
## Since the stratified ORs are approximately equal, workplace does not appear to
## be an effect modifier (i.e. the ORs do not vary by workplace).
##
## We can further explore the confounding nature of Workplace by examining the
## relationships between workplace & gender and workplace & malaria
table(Workplace,Gender)
cci(68, 88, 13, 131,graph=FALSE) ## Males are more likely to work outdoors
table(Workplace,Malaria)
cci(63, 18, 87, 132, graph=FALSE) ## Malaria is associated with working outside
## Because workplace is associated with both gender and malaria, it is not
## surprising that it had a confounding effect on the relationship between
## gender and malaria.
## To complete the analysis a Mantel-Haenszel method for stratified data (not
## covered in this tutorial) could be applied to determine a combined, adjusted
## measure of effect.
##
## Stratified analysis is perhaps most useful when variables are categorical and
## the overall number of variables is small. When dealing with a larger number of
## variables (e.g. many confounding factors) or continuous explanatory variables,
## generalized linear model methods such as logistic regression can be used to
## estimate adjusted measures of effect.
## For those of you that are already familiar with these types of models, you can
## use R to fit a logistic model to these data using the following commands:
my.model <- glm(Malaria=='case' ~ Gender + Workplace, family=binomial)
summary(my.model)
## The adjusted OR for malaria and gender after controlling for workplace can be
## obtained using:
exp(my.model$coefficients)
## confidence intervals for the ORs can be obtained using:
exp(confint.default(my.model))
##
##
##
## Answers to selected exercises appear below
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
## Answers to Identifying Study Designs
## Study 1: Case control - participants were enrolled based on disease status
## (i.e. "falls") distributions of various exposures were then investigated.
##
## Study 2: Cross sectional - disease (needle stick) and exposure status
## (e.g. attending a seminar) were assessed at the same time. There is no
## way to know for certain which came first. A very detailed survey may have
## asked about dates, but even then, results may be vulnerable to recall bias.
##
## Study 3: Cohort (best answer) - exposed (solar radiation) and unexposed
## (no solar radiation) groups are followed over time and incidence rates
## between the two groups are compared. This could be considered a RCT but
## ONLY IF participants were randomly assigned to a treatment group.
##
## Study 4: Correlational - the unit of analysis is country. No individual
## level data were collected. Incidentally, there is no way to determine if
## those who are HIV positive actually consumed more alcohol (on average) than
## those who are HIV negative.
##
## Examining the data:
## One answer to column switching exercise:
## barplot(prop.table(table(Choice, Timing)[,c(2,1)],margin=2),
## main="Distribution of Mosquito Choice by Timing of release",
## xlab="Timing of release relative to beer consumption",
## ylab="Proportion Choosing Participants (dark)",
## col = c("darkblue", "lightblue"))
##
## Chi-square test:
## chisq.test(Choice,Timing, correct=FALSE)
##
## Measures of effect:
## 369*219 / (221*215) ## OR = 1.7007
##
## The odds of attracting a mosquito are 1.70 times higher after cosuming beer
## (compared to no beer consumption)
##
## (369/(369+221))/(215/(215+219)) ## RR = 1.2625
##
## The risk of attracting a mosquito is 1.26 times higher after cosuming beer
## (compared to no beer consumption)
##
## Which is more appropriate? In this study we know the distribution of exposure
## (i.e. before/after) conditional on disease (i.e. human/outdoors) AND the
## distribution of disease conditional on exposure. As a result, either measure is
## appropriate - however, it's conventional to provide RR whenever possible because
## it is usually considered to be a more intuitive measure. In a case-control study
## only the distriubtion of exposure conditional on disease is known - in that case,
## it would NOT be appropriate to calculate the RR - only the OR.
##
## cci command with appropriate labels:
## cci(369, 221, 215, 219, xlab="Timing",xaxis=c("Before","After"),
## ylab="Odds of choosing a human",yaxis=c("Human","Outdoors"),
## main="Odds of choosing a human by exposure status")
##
## csi command:
## csi(369, 221, 215, 219)
##
## A possible interpretation:
## The range of plausible values for the OR does not include the value 1.00,
## suggesting that the odds of exposure between groups is unequal, therefore,
## we have evidence that the variables are related. Similarly, the range of
## plausible values for the RR does not include 1.00, suggesting that the risk of
## 'choosing a human' is not equal between exposure groups. We have statistical
## evidence thats supports the hypothesis that exposure and disease are
## associated with one another.
##
## Crude OR for Malaria Data:
## cci(88,68,62,82, graph=FALSE)
##
## Stratified ORs
## cci(35,53,52,79,graph=FALSE)
## cci(53,15,10,3,graph=FALSE)
## References:
##
## 1. Lefevre T, et. al. (2010) Beer Consumption Increases Human Attractiveness to
## Malarial Mosquitoes. Plos ONE 5(3); e9546.
##
## 2. Szklo & Nieto, Epidemiology: Beyond the Basics, 2000 Aspen Publishers.
##