For most purposes PDF or other vector graphic formats such as windows metafile and SVG work just fine. However, if I plot lots of points, say 100k, then those files can get quite large and bitmap formats like PNG can be the better option. I just have to be mindful of the resolution.
As an example I create the following plot: x <- rnorm(100000)
plot(x, main=“100,000 points”, col=adjustcolor(“black”, alpha=0.2))
Saving the plot as a PDF creates a 5.

I posted about the various googleVis axis options for base charts, such as line, bar and area charts earlier, but I somehow forgot to mention how to set the axes limits.
Unfortunately, there are no arguments such as ylim and xlim. Instead, the Google Charts axes options are set via hAxes and vAxes, with h and v indicating the horizontal and vertical axis. More precisely, I have to set viewWindowMode : ‘explicit’ and set the viewWindow to the desired min and max values.

Building R packages is not particular hard, but it can be a bit of a daunting endeavour at the beginning, particularly if you are more of a statistician than a computer scientist or programmer. Some concepts may appear foreign or like red tape, yet many of them evolved over time for a reason. They help to stay organise, collaborate more effectively with others and write better code. So, here are my slides of the R package development workshop at Lancaster University.

Following on from last week’s post, here are my slides on using googleVis on shiny from the Advanced R workshop at Lancaster University, 21 May 2013.
googleVis on shiny
Again, I wrote my slides in RMarkdown and I used slidify to create the HTML5 presentation. Unfortunately you may have to reload the slides that use googleVis on shiny as the JavaScript code in the background is potentially not ideal. Any pointers, which could help to improve the performance will be much appreciated.

Last week I was invited to give an introduction to googleVis at Lancaster University. This time I decided to use the R package slidify for my talk. Slidify, like knitr, is built on Markdown and makes it very easy to create beautiful HTML5 presentations.
Introduction to googleVis
Separating content from layout is always a good idea. Markup languages such as TeX/LaTeX or HTML are built on this principle. Ramnath Vaidyanathan has done a fantastic job with slidify, as it is very straightforward to create presentations with R.

Often I like to reduce the alpha value (level of transparency) of colours to identify patterns of over-plotting when displaying lots of data points with R. So, here is a tiny function that allows me to add an alpha value to a given vector of colours, e.g. a RColorBrewer palette, using col2rgb and rgb, which has an argument for alpha, in combination with the wonderful apply and sapply functions.

Setting axis options in googleVis charts can be a bit tricky. Here I present two examples where I set several options to customise the layout of a line and combo chart with two axes. The parameters have to be set in line with the Google Chart Tools API, which uses a JavaScript syntax. In googleVis chart options are set via a list in the options argument. Some of the list items can be a bit more complex, often wrapped in {} brackets, e.

The guys at RStudio have done a fantastic job with shiny. It is really easy to build web apps with R using shiny. With the help of Joe Cheng from RStudio we figured out a way to make googleVis work on shiny as well. This allows you to make use of the Google Charts Tools in your shiny app directly from R. What I present here are three initial examples which seem to work in most browsers.

This is the third post about Christofides’ paper on Regression models based on log-incremental payments [1]. The first post covered the fundamentals of Christofides’ reserving model in sections A - F, the second focused on a more realistic example and model reduction of sections G - K. Today’s post will wrap up the paper with sections L - M and discuss data normalisation and claims inflation. I will use the same triangle of incremental claims data as introduced in my previous post.

Following on from last week’s post I will continue to go through the paper Regression models based on log-incremental payments by Stavros Christofides [1]. In the previous post I introduced the model from the first 15 pages up to section F. Today I will progress with sections G to K which illustrate the model with a more realistic incremental claims payments triangle from a UK Motor Non-Comprehensive account:# Page D5.17

A recent post on the PirateGrunt blog on claims reserving inspired me to look into the paper Regression models based on log-incremental payments by Stavros Christofides [1], published as part of the Claims Reserving Manual (Version 2) of the Institute of Actuaries.
The paper is available together with a spread sheet model, illustrating the calculations. It is very much based on ideas by Barnett and Zehnwirth, see [2] for a reference.

Lattice plots are a great way of displaying multivariate data in R. Deepayan Sarkar, the author of lattice, has written a fantastic book about Multivariate Data Visualization with R [1]. However, I often have to refer back to the help pages to remind myself how to set and change the legend and how to ensure that the legend will use the same colours as my plot. Thus, I thought I write down an example for future reference.

I really should make it a habit of using data.table. The speed and simplicity of this R package are astonishing. Here is a simple example: I have a data frame showing incremental claims development by line of business and origin year. Now I would like add a column with the cumulative claims position for each line of business and each origin year along the development years.
It’s one line with data.

I discussed earlier how the action potential of a neuron can be modelled via the Hodgkin-Huxely equations. Here I will present a simple model that describes how action potentials can be generated and propagated across neurons. The tricky bit here is that I use delay differential equations (DDE) to take into account the propagation time of the signal across the network. My model is based on the paper: Epileptiform activity in a neocortical network: a mathematical model by F.

If connecting data to the real world is the next sexy job, then how do I do this? And how do I connect the real world to R? It can be done as Matt Shottwell showed with his home made ECG and a patched version of R at useR! 2011. However, there are other options as well and here I will use an Arduino. The Arduino is an open-source electronics prototyping platform.

Every year the UK’s general insurance actuarial community organises a big conference, which they call GIRO, short for General Insurance Research Organising committee.
This year’s conference is in Brussels from 18 - 21 September 2012. Despite the fact that Brussels is actually in Belgium the UK actuaries will travel all the way to enjoy good beer and great talks. On Wednesday morning I will run a session on Using R in insurance.

Today I feel very lucky, as I have been invited to the Royal Statistical Society conference to give a tutorial on interactive web graphs with R and googleVis.
I prepared my slides with RStudio, knitr, pandoc and slidy, similar to my Cambridge R talk. You can access the RSS slides online here and you find the original R-Markdown file on github. You will notice some HTML code in the file, which I had to use to overcome my knowledge gaps of Markdown or its limitations.

What is Rook?Rook is a web server interface for R, written by Jeffrey Horner, the author of rApache and brew. But unlike other web frameworks for R, such as brew, R.rsp (which I have used in the past1), Rserve, gWidgetWWWW or sumo (which I haven’t used yet) Rook appears incredible lightweight.
Rook doesn’t need any configuration. It is an R package, which works out of the box with the R HTTP server (R ≥ 2.

At the R in Finance conference Paul Teetor gave a fantastic talk about Fast(er) R Code. Paul mentioned the common higher-order function Reduce, which I hadn’t used before. Reduce allows me to apply a function successively over a vector. What does that mean? Well, if I would like to add up the figures 1 to 5, I could say: add <- function(x,y) x+y add(add(add(add(1,2),3),4),5)orReduce(add, 1:5)

Now this might not sound exciting, but Reduce can be powerful.

One of the great research papers of the 20th century celebrates its 60th anniversary in a few weeks time: A quantitative description of membrane current and its application to conduction and excitation in nerve by Alan Hodgkin and Andrew Huxley. Only shortly after Andrew Huxley died, 30th May 2012, aged 94.
In 1952 Hodgkin and Huxley published a series of papers, describing the basic processes underlying the nervous mechanisms of control and the communication between nerve cells, for which they received the Nobel prize in physiology and medicine, together with John Eccles in 1963.

This evening I will talk about Dynamical systems in R with simecol at the LondonR meeting. Thanks to the work by Thomas Petzoldt, Karsten Rinke, Karline Soetaert and R. Woodrow Setzer it is really straight forward to model and analyse dynamical systems in R with their deSolve and simecol packages. I will give a brief overview of the functionality using a predator-prey model as an example.

This is of course a repeat of my presentation given at the Köln R user group meeting in March.

Transforming data sets with R is usually the starting point of my data analysis work. Here is a scenario which comes up from time to time: transform subsets of a data frame, based on context given in one or a combination of columns.
As an example I use a data set which shows sales figures by product for a number of years:df <- data.frame(Product=gl(3,10,labels=c(“A”,“B”, “C”)), Year=factor(rep(2002:2011,3)), Sales=1:30)
head(df)
## Product Year Sales

Tonight I will give a talk at the Cambridge R user group about googleVis. Following my good experience with knitr and RStudio to create interactive reports, I thought that I should try to create the slides in the same way as well.
Christopher Gandrud’s recent post reminded me of deck.js, a JavaScript library for interactive html slides, which I have used in the past, but as Christopher experienced, it is currently not that straightforward to use with R and knitr.

Last Saturday I met the guys from RStudio at the R in Finance conference in Chicago. I was curious to find out what RStudio could offer. In the past I have used mostly Emacs + ESS for editing R files. Well, and what a surprise it was. JJ, Joe and Josh showed me a preview of version 0.96 of their software, which adds a close integration of Sweave and knitr to RStudio, helping to create dynamic web reports with the new R Markdown and R HTML formats more easily.

Waterfall charts are sometimes quite helpful to illustrate the various moving parts in financial data, particularly when I have positive and negative values like a profit and loss statement (P&L). However, they can be a bit of a pain to produce in Excel. Not so in R, thanks to the waterfall package by James Howard. In combination with the latticeExtra package it is nearly a one-liner to produce a good looking waterfall chart that mimics the look of The Economist:

It is not unusual that you will not have admin rights in an IT controlled office environment. But then again the limitations set by the IT department can spark of some creativity. And I have to admit that I enjoy this kind of troubleshooting.
The other day I ended up in front of a Windows PC with R installed, but a locked down “C:\Programme Files” folder. That ment that R couldn’t install any packages into the default directory “C:\Programme Files\R\R-X.

How do you apply one particular row of your data to all other rows?
Today I came across a data set which showed the revenue split by product and location. The data was formated to show only the split by product for each location and the overall split by location, similar to the example in the table below.
Revenue by product and continent
AfricaAmericaAsiaAustraliaEurope A 40% 30% 50% 40% 40%B 20% 40% 20% 30% 40%C 40% 30% 30% 30% 20%Total 10% 40% 20% 10% 20% I wanted to understand the revenue split by product and location.

How can I embed a small data set into my R code? That was the question I came across today, when I prepared my talk about Dynamical Systems in R with simecol for the forthcoming Cologne R user group meeting. I wanted to add all the R code of the talk to the last slide. That’s easy, but the presentation makes use of a small data set of 3 columns and 21 rows.

The data of the World Bank is absolutely amazing. I had said this before, but their updated iPhone App gives me a reason to return to this topic. Version 3 of the DataFinder App allows you to visualise the data on your phone, including motion maps, see the screen shot below.
Screen shot of DataFinder 3.0I was intrigued by the by the changes in life expectancy over time around the world.

The other day I wrote about the R functions by, apply and friends, which allow me to operate on subsets of data. All those functions work nicely, if the data is given in the right format. More often than not it isn’t and I have to reshape the data beforehand. Thus, time to discuss the reshape function. I will focus on the reshape function in base R, and not the package of the same name.

R is a language, as Luis Apiolaza pointed out in his recent post. This is absolutely true, and learning a programming language is not much different from learning a foreign language. It takes time and a lot of practice to be proficient in it. I started using R when I moved to the UK and I wonder, if I have a better understanding of English or R by now.
Languages are full of surprises, in particular for non-native speakers.

The financial crisis has put a lot of pressure on countries’ long-term foreign currency credit ratings, with France recently being downgraded by S&P. Wikipedia provides a list of countries by credit ratings as report by US rating agencies S&P, Fitch, Moody’s and Dagong, a Chinese rating agency.
So, what does the world look like today through the eyes of those rating agencies?
I use the R packages XML and googleVis to read and display the data from Wikipedia with just a few lines.

On 7th December Google published a new version of their Visualisation API. The new version adds a new chart type: Stepped Area Chart and provides improvements to Geo Chart. Now Geo Chart has similar functionality to Geo Map, but while Geo Map requires Flash, Geo Chart doesn’t, as it renders SVG/VML graphics. So it also works on your iOS devices. These new features have been added to the googleVis R package in version 0.

Fitting distribution with R is something I have to do once in a while, but where do I start?
A good starting point to learn more about distribution fitting with R is Vito Ricci’s tutorial on CRAN. I also find the vignettes of the actuar and fitdistrplus package a good read. I haven’t looked into the recently published Handbook of fitting statistical distributions with R, by Z. Karian and E.J. Dudewicz, but it might be worthwhile in certain cases, see Xi’An’s review.

Data analysis is often an iterative and interactive process. However, when I present about this subject, I feel often limited by the presentation software I use. It doesn’t matter if I use LaTeX/PDF, PowerPoint or Keynote. In all cases it is either very difficult or impossible to include interactive charts, such as Flash or SVG charts. As a result I have to switch between various applications during the talk. This can be fun, but quite often it is not.

My 12” iBook G4 is celebrating its 8th birthday today! Time for a little present. How about R 2.14.0?
The iBook is still in daily use, mostly for browsing the web, writing e-mails and this blog; and I still use it for R as well. For a long time it run R 2.10.1, the last PowerPC binary version available on CRAN for Mac OS 10.4.11 (Tiger). But, R 2.10.1 is a bit dated by now and for the development of my googleVis package I require at least R 2.

Using R with LaTeX via Sweave is a great way to create reproducible output. However, using specific fonts, e.g. your corporate fonts, can be painful with pdflatex. Over the last few weeks I have fallen in love with the TeX format XeLaTeX and its XeTeX engine.
With XeLaTeX I had to overcome some hurdles, which I would like to share here: attaching files,
trimming and clipping images,
learning how to use the tikzDevice package.

Following on from my article about accessing and plotting World Bank data with R I want to talk about how to change the initial view of a motion chart.
Over the last couple of weeks I have been asked a view times how to do this. For instance Stephen O’Grady wanted to create a motion chart, which shows initially a line chart, rather than a bubble chart. Changing the initial settings of a motion chart is actually quite easy, if you know how to.

Over the past couple of days I played around with the data sets of the World Bank, and I have to admit that I am blown away by it. It is amazing, to see what is available on their web site and it is worth visiting their Data Visualisation Tools page. It is fantastic that they provide an API to their data. They have used it to build an iPhone App which is pretty cool.

It seems that you cannot include Google Visualisation Charts into a blog post directly. So, I tried to include the output of a googleVis function as a gadget, but also unsuccessfully. Although you can include gadgets into your site template, it doesn’t seem to work with blog posts. So, here is the trick which works for me: the iframe tag. The following geo map is included as
// jsData function gvisDataGeoChartIDe67b54187fe6 () { var data = new google.