mages' blog

High resolution graphics with R

For most purposes PDF or other vector graphic formats such as windows metafile and SVG work just fine. However, if I plot lots of points, say 100k, then those files can get quite large and bitmap formats like PNG can be the better option. I just have to be mindful of the resolution.

As an example I create the following plot:
x <- rnorm(100000)

Saving the plot as a PDF creates a 5.2 MB big file on my computer, while the PNG output is only 62 KB instead. Of course, the PNG doesn't look as crisp as the PDF file.
png("100kPoints72dpi.png", units = "px", width=400, height=400) plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2)) dev.off()

Hence, I increase the resolution to 150 dots per pixel.
png("100kHighRes150dpi.png", units="px", width=400, height=400, res=150)
dev.off()

This looks a bit odd. The file size is only 29 KB but the annotations look too big. Well, the file has only 400 x 400 pixels and the size of a pixel is fixed. Thus, I have to provide more pixels, or in other words increase the plot size. Doubling the width and height as I double the resolution makes sense.
png("100kHighRes150dpi2.png", units="px", width=800, height=800, res=150)
dev.off()

Next I increase the resolution further to 300 dpi and the graphic size to 1600 x 1600 pixels. The file is still very crisp. Of course the file size increased. Now it is 654 KB in size, yet sill only about 1/8 of the PDF and I can embed it in LaTeX as well.
png("100kHighRes300dpi.png", units="px", width=1600, height=1600, res=300)
dev.off()

Note, you can click on the charts to access the original files of this post.

Review: Kölner R Meeting 18 October 2013

The Cologne R user group met last Friday for two talks on split apply combine in R and XLConnect by Bernd Weiß and Günter Faes respectively, before the usual Schnitzel and Kölsch at the Lux.

Split apply combine in R

The apply family of functions in R is incredible powerful, yet for newcomers often somewhat mysterious. Thus, Bernd gave an overview of the different apply functions and their cousins. The various functions differ in their object inputs, e.g. vectors, arrays, data frames or lists, and their outputs. Other related functions are by, aggregate and ave. While functions like aggregate reduce the output size, others like ave will return as many rows as the input object and repeat the results where necessary.

Alternatively to the base R function Bernd touched also on the **ply functions of the plyr package. The function names are certainly easier to remember, but their syntax can be a little awkward (.()). Bernd's slides, in German, are already available from our Meetup site.

XLConnect

When dealing with data stored in spreadsheets most member of the group rely on read.csv and write.csv in R. However, if you have a spreadsheet with multiple tabs and formatted numbers, read.csv becomes clumsy, as you would have to save each tab without any formatting in separate files.

Günter presented the XLConnect as an alternative to read.csv or indeed RODBC for reading spreadsheet data. It uses the Apache POI API as the underlying interface. XLConnect requires a Java runtime environment on your computer, but no installation of Excel. That makes it a true platform independent solution to exchange data with spreadsheets and R. Not only can you read defined rows and columns from Excel into R, or indeed named ranges, but in the same way data can be stored in Excel files again and to top it all - also graphic output from R.

Next Kölner R meeting

The next meeting is scheduled for 13 December 2013. A discussion of the data.table package is already on the agenda.

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.

Next Kölner R User Meeting: 18 Oktober 2013

Quick reminder: The next Cologne R user group meeting is scheduled for this Friday, 18 October 2013. We will discuss and hear about the apply family of functions and the XLConnect package. Further details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

Thanks to Revolution Analytics, who sponsors the Cologne R user group as part of their vector programme.

Why models need a certain culture to flourish

About half a year ago Ian Branagan, Chief Risk Officer of Renaissance Re - a Bermudian reinsurance company with a focus on property catastrophe insurance, gave a talk about the usage of models in risk management and how they evolved over the last twenty years. Ian's presentation, titled with the famous quote of George E.P. Box: "All models are wrong, but some are useful", was part of the lunch time lecture series of talks at Lloyd's, organised by the Insurance Institute of London.

I re-discovered the talk online over the weekend and found it most enlightening again.

So, what makes models useful? And here I mean models that estimate extreme outcomes / percentiles. Three factors are critical, according to Ian, to embed models successfully in risk management and decision making processes.
1. Need - A clear defined need for the model.
2. Capabilities - The skills and resources to build and maintain the model.
3. Culture - An organisational culture that embraces, understands and challenges the model.
The need, if not driven internally, is often imposed by external requirements, such as regulation, e.g. banks and insurers have to use models to estimate the risk of insolvency in many countries. Building capabilities can largely be achieved by investing in people, technology and data. However, the last factor culture, so Ian, is often the most challenging one. Changing business processes, particularly in decision making at senior level requires people to change.

Where in the past senior management may have relied on advisors' expert judgement to guide them in their decision makings, they have to use models in a similar way now as well. I suppose, in the same way as it takes time and effort to build effective relationships with people, it is true for models as well. And equally, decisions should never rely purely on either other people's opinion or indeed model output. As Ian put it, outsourcing all modelling/thinking, and with that the decision making to vendors of models, such as catastrophe modelling companies or rating agencies, who both aim to provide probabilities for extreme events (catastrophes and companies failures) may be sufficient to tick a risk management box, but can ultimately put the company at risk, if model assumptions and limitations are not well understood.

Perhaps we are at the dawn of another enlightenment? Recall Kant's first sentence of his essay What is enlightenment?: "Enlightenment is man's emergence from his self-incurred immaturity." Indeed, it doesn't matter if we use experts' opinions or the output of models, relying blindly on them is dangerous and foolish. Don't stop thinking for yourself. Be critical! Remember, all models are wrong, but some are useful.

Creating a matrix from a long data.frame

There can never be too many examples for transforming data with R. So, here is another example of reshaping a data.frame into a matrix.

Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.

The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.

The first idea might be to use the reshape function, but that would return a data.frame. Yet, it is actually much easier with the matrix function itself. Most of the code below is about formatting the dimension names of the matrix. Note that I use the with function to save me a bit of typing.

An elegant alternative to matrix provides the acast function of the reshape2 package. It has a nice formula argument and allows me not only to specify the aggregation function, but also to add the margin totals.