As an example I create the following plot:
Saving the plot as a PDF creates a 5.2 MB big file on my computer, while the PNG output is only 62 KB instead. Of course, the PNG doesn't look as crisp as the PDF file.x <- rnorm(100000) plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2))
png("100kPoints72dpi.png", units = "px", width=400, height=400) plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2)) dev.off()
Hence, I increase the resolution to 150 dots per pixel.
png("100kHighRes150dpi.png", units="px", width=400, height=400, res=150) plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2)) dev.off()
This looks a bit odd. The file size is only 29 KB but the annotations look too big. Well, the file has only 400 x 400 pixels and the size of a pixel is fixed. Thus, I have to provide more pixels, or in other words increase the plot size. Doubling the width and height as I double the resolution makes sense.
png("100kHighRes150dpi2.png", units="px", width=800, height=800, res=150) plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2)) dev.off()
Next I increase the resolution further to 300 dpi and the graphic size to 1600 x 1600 pixels. The file is still very crisp. Of course the file size increased. Now it is 654 KB in size, yet sill only about 1/8 of the PDF and I can embed it in LaTeX as well.
png("100kHighRes300dpi.png", units="px", width=1600, height=1600, res=300) plot(x, main="100,000 points", col=adjustcolor("black", alpha=0.2)) dev.off()
Note, you can click on the charts to access the original files of this post.
Split apply combine in R
applyfamily of functions in R is incredible powerful, yet for newcomers often somewhat mysterious. Thus, Bernd gave an overview of the different apply functions and their cousins. The various functions differ in their object inputs, e.g. vectors, arrays, data frames or lists, and their outputs. Other related functions are
ave. While functions like
aggregatereduce the output size, others like
avewill return as many rows as the input object and repeat the results where necessary.
Alternatively to the base R function Bernd touched also on the
**plyfunctions of the
plyrpackage. The function names are certainly easier to remember, but their syntax can be a little awkward (.()). Bernd's slides, in German, are already available from our Meetup site.
XLConnectWhen dealing with data stored in spreadsheets most member of the group rely on
write.csvin R. However, if you have a spreadsheet with multiple tabs and formatted numbers,
read.csvbecomes clumsy, as you would have to save each tab without any formatting in separate files.
Günter presented the
XLConnectas an alternative to
RODBCfor reading spreadsheet data. It uses the Apache POI API as the underlying interface.
XLConnectrequires a Java runtime environment on your computer, but no installation of Excel. That makes it a true platform independent solution to exchange data with spreadsheets and R. Not only can you read defined rows and columns from Excel into R, or indeed named ranges, but in the same way data can be stored in Excel files again and to top it all - also graphic output from R.
Next Kölner R meetingThe next meeting is scheduled for 13 December 2013. A discussion of the data.table package is already on the agenda.
Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.
Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.
Quick reminder: The next Cologne R user group meeting is scheduled for this Friday, 18 October 2013. We will discuss and hear about the apply family of functions and the XLConnect package. Further details and the agenda are available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.
I re-discovered the talk online over the weekend and found it most enlightening again.
So, what makes models useful? And here I mean models that estimate extreme outcomes / percentiles. Three factors are critical, according to Ian, to embed models successfully in risk management and decision making processes.
- Need - A clear defined need for the model.
- Capabilities - The skills and resources to build and maintain the model.
- Culture - An organisational culture that embraces, understands and challenges the model.
Where in the past senior management may have relied on advisors' expert judgement to guide them in their decision makings, they have to use models in a similar way now as well. I suppose, in the same way as it takes time and effort to build effective relationships with people, it is true for models as well. And equally, decisions should never rely purely on either other people's opinion or indeed model output. As Ian put it, outsourcing all modelling/thinking, and with that the decision making to vendors of models, such as catastrophe modelling companies or rating agencies, who both aim to provide probabilities for extreme events (catastrophes and companies failures) may be sufficient to tick a risk management box, but can ultimately put the company at risk, if model assumptions and limitations are not well understood.
Perhaps we are at the dawn of another enlightenment? Recall Kant's first sentence of his essay What is enlightenment?: "Enlightenment is man's emergence from his self-incurred immaturity." Indeed, it doesn't matter if we use experts' opinions or the output of models, relying blindly on them is dangerous and foolish. Don't stop thinking for yourself. Be critical! Remember, all models are wrong, but some are useful.
Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.
The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.
The first idea might be to use the
reshapefunction, but that would return a
data.frame. Yet, it is actually much easier with the
matrixfunction itself. Most of the code below is about formatting the dimension names of the matrix. Note that I use the
withfunction to save me a bit of typing.
An elegant alternative to
acastfunction of the
reshape2package. It has a nice formula argument and allows me not only to specify the aggregation function, but also to add the margin totals.