mages' blog

googleVis 0.5.6 released on CRAN

Version 0.5.6 of googleVis was released on CRAN over the weekend. This version fixes a bug in gvisMotionChart. Its arguments xvar, yvar, sizevar and colorvar were not always picked up correctly.

Thanks to Juuso Parkkinen for reporting this issue.

Example: Love, or to love

A few years ago Martin Hilpert posted an interesting case study for motion charts. Martin is a linguist and he researched how the usage of words in American English changed over time, e.g. some words were more often used as nouns in the past and then became more popular as a verb. Do you talk about love, or do you tell someone that you love her/him? Visit his motion chart web page for more information and details!


R code


Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] googleVis_0.5.6

loaded via a namespace (and not attached):
[1] RJSONIO_1.3-0 tools_3.1.1

Visualising the seasonality of Atlantic windstorms

Last week Arthur Charpentier sketched out a Markov spatial process to generate hurricane trajectories. Here, I would like to take another look at the data Arthur used, but focus on its time component.

According to the Insurance Information Institute, a normal season, based on averages from 1980 to 2010, has 12 named storms, six hurricanes and three major hurricanes. The usual peak months of August and September passed without any major catastrophes this year, but the Atlantic hurricane season is not over yet.

So, let's take a look at the data again, and here I will use code from Gaston Sanchez, who looked at windstorm data earlier.

I believe, I am using the same data as Arthur, but from a different source, which allows me to download it in one file (13.9MB). Using code from Gaston my chart of all named windstorms over the period of 1989 to 2013 looks like this.


Plotting the data by months illustrates the seasonality of windstorms during the year. Note, that there are no named windstorms for the months of February and March and although the season runs until the end of November, I can clearly see that the peak months are August and September.


Perhaps there are other time components that effect the seasonality across years, such as La Niña? Plotting the years 1989 to 2013 certainly shows that there are years with more windstorm activity than others. Critical to the impact and cost of windstorms is if they make landfall or not. Hence, I add information on deaths and economic damages for major hurricanes over that period from Wikipedia.


Although 1992 doesn't show much windstorm activity, hurricane Andrew was the most expensive one to that date and changed the insurance industry significantly. Following hurricane Andrew insurance companies started to embrace catastrophe models.

The economic loss is a very poor proxy to the loss of lives. In 1998 Hurricane Mitch cost the lives of over 19,000, with most of them living in Honduras. The economic cost of $6.2bn looks small compared to the cost of Andrew six years earlier ($26.5bn).


Although over the last few years we have seen a fairly benign impact of windstorm activity, the next big event can have a much more sever impact than what we have seen historically, as the population (and economy) on the Gulf of Mexico has and is growing rapidly, 150% from 1960 to 2008 alone. A hurricane season causing damages over $200bn doesn't look unreasonable anymore.

R Code


Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] XML_3.98-1.1     data.table_1.9.2 ggplot2_1.0.0   
[4] maps_2.3-7      

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4     gtable_0.1.2    
 [4] labeling_0.3     MASS_7.3-33      munsell_0.4.2   
 [7] plyr_1.8.1       proto_0.3-10     Rcpp_0.11.2     
[10] reshape2_1.4     scales_0.2.4     stringr_0.6.2   
[13] tools_3.1.1 

Running RStudio via Docker in the Cloud

Deploying applications via Docker container is the current talk of town. I have heard about Docker and played around with it a little, but when Dirk Eddelbuettel posted his R and Docker talk last Friday I got really excited and had to have a go myself.

My aim was to rent some resources in the cloud, pull an RStudio Server container and run RStudio in a browser. It was actually surprisingly simple to get started.

I chose Digital Ocean as my cloud provider. They have many Linux systems to choose from and also a pre-built Docker system.


After about a minute I had kicked off the Docker droplet I could login into the system in a browser window and start pulling the Docker file, e.g. Dirk's container.


Once the downloads finished I could start the RStudio Server using the docker run command and login to a RStudio session. To my surprise even my googleVis package worked out of the box. The plot command opened just another browser window to display the chart; here the output of the WorldBank demo.


All of this was done within minutes in a browser window. I didn't even use a terminal window. So, that's how you run R on an iPad. Considering that the cost for the server was $0.015 per hour, I wonder why I should buy my own server, or indeed buy a new computer.

Managing R package dependencies

One of my take aways from last week's EARL conference was that R is more and more growing out of its academic roots into the enterprise. And with that come some challenges, e.g. how do I ensure consistent and systematic access to a set of R packages in an organisation, in particular when one team is providing packages to others?

Two packages can help here: roxyPackage and miniCRAN.

I wrote about roxyPackage earlier on this blog. It allows me to create a local repository to distribute my package, while at the same time execute and control the build process from within R. But what about my package's dependencies? Here miniCRAN helps. miniCRAN is a new package by Andrie de Vries that enables me to find and download all package dependencies and store them in a local repository, e.g. the one used by roxyPackage.

For more details about roxyPackage and miniCRAN read the respective package vignettes.

Example

To create a local sub-CRAN repository for the two packages I maintain on CRAN and with all their dependencies I use:
library("miniCRAN")
my.pkgs <- c("googleVis", "ChainLadder")
pkgs <- pkgDep(my.pkgs, suggests = TRUE, enhances=FALSE)
makeRepo(pkgs = pkgs, path="/Users/Shared/myCRANRepos")
And to visualise the dependencies:
dg <- makeDepGraph(my.pkgs, includeBasePkgs=FALSE, 
                   suggests=TRUE, enhances=TRUE)
set.seed(1)
plot(dg, legendPosEdge = c(-1, 1), 
     legendPosVertex = c(1, 1), vertex.size=20)

What a surprise! In total I end up with 42 packages from CRAN and I didn't expect any connection between the ChainLadder and googleVis package.

Bonus tip

Don't miss out on Pat Burns's insightful talk about effective risk management from EARL. His thoughts reminded me of the great Karl Popper: Good tests kill flawed theories; we remain alive to guess again.

Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics  grDevices utils  datasets  methods  
[7] base     

other attached packages:
[1] miniCRAN_0.1-0

loaded via a namespace (and not attached):
[1] httr_0.5 igraph_0.7.1  stringr_0.6.2 tools_3.1.1  
[5] XML_3.98-1.1

Notes from the Kölner R meeting, 12 September 2014

Last Friday we had guests from Belgium and the Netherlands joining us in Cologne. Maarten-Jan Kallen from BeDataDriven came from The Hague to introduce us to Renjin, and the guys from DataCamp in Leuven, namely Jonathan, Martijn and Dieter, gave an overview of their new online interactive training platform.

Next Kölner R User Meeting: Friday, 12 September 2014

Koeln R
The next Cologne R user group meeting is scheduled for this Friday, 12 September 2014.

We have a great agenda with international speakers:
  • Maarten-Jan Kallen: Introduction to Renjin, the R interpreter for the JVM
  • Jonathan Cornelissen, Martijn Theuwissen: DataCamp - An online interactive learning platform for R
The event will be followed by drinks and schnitzel at the Lux.

Zoom, zoom, googleVis

The Google Charts API is quite powerful and via googleVis you can access it from R. Here is an example that demonstrates how you can zoom into your chart.