Adding a Calendar Link

July 22, 2011

Does this work?

A few weeks ago I wrote a snippet of code that would do a 0700 download of new files from an archive that gets updated daily.  One of those files is a Google Earth .kmz showing 60 hours of smoke forecasts.  Today one of my colleagues emailed to ask why these files didn’t work, despite appearing to be the right size, format, etc.  A good question!  Further reading on download.file() indicated that I needed a mode = “wb” argument to indicate that the file was binary rather than text — no problem and no big deal.

However, in solving this little issue I realized that I am now an expert R user.  Not because I can get R to do almost anything (except for sending an email from Outlook, but that’s another story), but because I understand the help files.  I know how to read them, and where to look for the information I’m seeking.  The number one complaint I hear from R students is that “the help files aren’t helpful”, which is totally true until you are versed enough in R to know how to read them.  I don’t miss the learning curve.

On An Angle

June 11, 2011

I’m doing some quick-and-dirty analyses on help phone hotline counts compared with PM2.5 concentrations related to forest fire smoke.  I don’t have a lot of faith in these data to be sensitive to the smoke signal, so I don’t intend to spend a lot of time making really good code for the data exploration — there’s just no payoff.  I got the plots doing what I wanted and decided that I should add some kind of date to the X-axis to make life easier for other viewers down the road.  I stuck in an axis() with labels where I wanted them, but the dates were too long and they wouldn’t fit nicely.  I turned the labels 90 degrees and they bled off the plotting area unless they were in a tiny font.  I didn’t want to start messing with the margins, but I figured there must be some way to put the labels at an angle.  Googling it produced this:

http://jnlnet.wordpress.com/2009/05/20/x-axis-labels-on-a-45-degree-angle-using-r/#comment-61

Which, in turn, produced this:

As suspected, the data (in this case) are not strongly associated with the event of interest (lots of smoke — over the 95%ile of summer days).  Here’s the code, in case you’re interested:

Never confuse the two (I haven’t so far).  This is an interesting problem in real-time surveillance of public health, for both morbidity and mortality.  The fact is that health quantification is largely administrative, and there is some lag time associated with almost all of the available databases.  The closest exception in BC is the PharmaNet database, which logs medications as they are dispensed.  These data are reliably captured in near-real-time for all pharmacies hooked into the system, which is the vast majority of them.

Deaths get registered less quickly.  The rule-of-thumb figure seems to be that 95% are registered within 7 days, but that’s of little value to us if we want to know that something weird happened yesterday.  Earlier this week I cobbled together some code that will automatically plot mortality vs. temperature for us on a daily basis, but now I need to start generating the data that will help us to assess whether yesterday was weird today, rather than a week from now.  I am working under the assumption that there will be some deviation from the day-after-death registration baseline on days when death is significantly elevated, but the first step is to establish the baseline.  This is a different sort of problem than I used to tackling in R, but I think that I have worked it out:

I learned something new today:

mymax = apply(mydata, 1, which.max)
maxname = names(mydata)[mymax]

I’ve written a function to do this in the past, inserted where which.max sits, but the second line has never really occurred to me as being possible.  Yet another expansion of my thinking in R.


					

Stupid R Tricks

February 13, 2011

I am now using an 11″ MacBook Air as my laptop (love it!) but am trying to avoid installing Windows, even though it is the OS with which I am most familiar.  The biggest hurdle here is R, but I am working through it slowly.  Today I spent a LONG TIME trying to figure out how to submit a line of code from the editor to R, but now I know:

command + return

Gah.

Chronic vs. Sporadic

February 11, 2011

Fay and I have been going back and forth over this issue for the past couple of weeks as we try to finish a draft of the GBD paper.  Condensing 1+ year of work down into 5000 words is never easy, but we want this story to be as clear and straightforward as possible for our readers, which means not telling them about every single thing we did.

In the end we have decided to classify each of the 21 WHO regions as being either chronically- or sporadically-exposed.  This allows us to use two different sets of methods, each appropriate to the exposure type.  The next step was to make the cut in a reasonable, defensible way.  I am writing down here for myself, because with everything I have tried the path has become quite winding.

1) Take the full set of exposure values = 4208 populated cells * 120 months = 504960 values

2) Examine the distribution of these values, which is log normal and looks like this:

3) Take the 90th percentile value (3 ug/m3) as our definition of “high impact”

4) Count the number of high impact months in each of the 10 years in every cell

5) Define cells with >=3 high impact months in >=5 of 10 years as “chronically impacted”.  That looks like this:

6) In regions where >= 40% of the population OR 40% of the land area is covered by chronically-impacted cells, call the region chronically-impacted.  I went with 40% rather than 50% here because it was a natural break in both distributions.  If we went with 50% in both cases Andean South America and Eastern Sub-Saharan Africa would be excluded.  The final result looks like this:

Thousands by Thousands

November 26, 2010

Yang recently re-jigged the MODIS and MISR estimates for the GBD study.  This is a good thing for the project — the estimates are likely more realistic, and Yang was able to (1) back-estimate values for 1997-2001 and (2) fill other missing values in the data.  This is a slight pain for me, because I have to go back and re-do quite a lot of work with the new values and across the new timeline.  No big deal, but somewhat time-consuming since I haven’t given much thought to this code over the past 4 months.

Over the past two weeks I’ve been re-running everything and today I finally output a final result — totally weird when compared to our previous results, which it shouldn’t be.  Hmm.   A little investigating led me to realize that Yang had also re-jigged his factor of 10.  Where the old MODIS and MISR values needed to be multipled by 1000, the new ones did not.  I didn’t look carefully at the data at that stage, and by the time the fused estimate had been calculated the error wasn’t glaring, since it had been averaged into a bunch of other values.  Again, no big deal — just another day in data analysis.

A QuiRk

September 18, 2010

So I’m working away in R on some analyses for the food safety folks at the BCCDC.  I’ve written a nice little function that takes a data frame and an integer as its arguments and I want to loop through all the data frame / integer combinations.

for (i in integers){

for (frame in dataframes){

do function(frame, i)

}

}

No dice.  I get the error “Error in frame$Variable : $ operator is invalid for atomic vectors”.  Whaaat? Sure enough, looping through the data frames this way somehow causes them to be passed to the function as a set of vectors.  I’m sure I’ll find a way to work around it, but I really don’t understand why it’s happening.

Now that all the forecast data are extracted I am able to start evaluating their utility with respect to the development of a heat alert system.  I was working away on this today marveling at how poorly things were coming together when it hit me…forecast data from today are for tomorrow, but I hadn’t offset all the values by a day.  This is mostly just a note to remind myself (1) that I am sometimes quite dumb and (2) that this oversight has now been corrected.

Follow

Get every new post delivered to your Inbox.