mages' blog

Next Kölner R User Meeting: Friday, 26 Feburary 2016

The 17th Cologne R user group meeting is scheduled for this Friday, 26 February 2016. We have two talks, followed by networking drinks.

• Introduction to Bayesian Regression Models using Stan with the brms package - Paul-Christian Bürkner (Uni Münster)
• RKWard: A Graphical User Interface and Integrated Development Environment for Statistical Analysis with R - Meik Michalke (Uni Düsseldorf)
Venue: Microsoft Deutschland, Holzmarkt 2a Cologne 50676 DE, Köln

Notes from past meetings are available here.

Bayesian Mixer on Meetup

We had our first successful Bayesian Mixer Meetup last Friday night at the Artillery Arms!

We expected about 15 - 20 people to turn up, when we booked the function room overlooking Bunhill Cemetery and Bayes' grave. Now, looking at the photos taken during the evening, it seems that our prior believe was pretty good.

The event started with a talk from my side about some very basic Bayesian models, which I used a while back to get my head around the concepts in an insurance context. My talk "Experience vs Data" was based on presentations I had given last year at LondonR and the Warsaw R user group.

Jon Sedar followed with a fascinating talk about outlier detection using PyMC3.

Suppose, you have a bunch of data points, most of them centred, but with some further away. How do you decide if they are outliers, or not?

This question sounds very relevant to me in the insurance context as well. I have heard stories of underwriters telling me that certain years or events (meaning costly losses) were freaks, and should be disregarded, or in other words, without those losses the underwriter would have made a huge profit. I am not sure, I buy those arguments, as they undermine the fundamental business proposition of insurance; to pay, when policyholders experience 'freak' events. I am getting on my soap box, which I shouldn't.

We had a good night, very good discussions and some drinks. As a result Jon and I are committed to organise another event.

Jon has already set up a Meetup page, so please register online and get in touch with ideas, venues, talks, etc.

Using SVG graphics in blog posts

My traditional work flow for embedding R graphics into a blog post has been via a PNG files that I upload online. However, when I created a 'simple' graphic with only basic curves and triangles for a recent post, I noticed that the PNG output didn't look as crisp as I expected it to be. So, eventually I used a SVG (scalable vector graphic) instead.

Creating a SVG file with R could't be easier; e.g. use the svg() function in the same way as png(). Next, make the file available online and embed it into your page. There are many ways to do this, in the example here I placed the file into a public GitHub repository.

To embed the figure into my page I could use either the traditional <img> tag, or perhaps better the <object> tag. Paul Murrell provides further details on his blog.

With <object> my code looks like this:
<object data="https://rawgithub.com/mages/diesunddas/master/Blog/transitionPlot.svg" type="image/svg+xml" width="400"> </object>

There is a little trick required to display a graphic file hosted on GitHub.

By default, when I look for the raw URL, GitHub will provide an address starting with https://raw.githubusercontent.com/..., which needs to be replaced with https://rawgithub.com/....

Ok, let's look at the output. As a nice example plot I use a transitionPlot by Max Gordon, something I wanted to do for a long time.

Conclusions

The SVG output is nice and crisp! Zoom in and the quality will not change. The PNG graphic on the other hand appears a little blurry on my screen and even the colours look washed out. Of course, the PNG output could be improved by fiddling with the parameters. But, after all it is a raster graphic.

Yet, I don't think that SVG is always a good answer. The file size of an SVG file can grow quite quickly, if there are many points to be plotted. As an example check the difference in file size for two identical plots with 10,000 points.
x <- rnorm(10000)
png()
plot(x)
dev.off()
file.size("Rplot001.png")/1000
# [1] 118.071
svg()
plot(x)
dev.off()
file.size("Rplot001.svg")/1000
# [1] 3099.181

That's 3.1 Mb vs 118 kb, a factor of 26! Even compressed to a .svgz file, the SVG file is still 317kb.

Update 10 Feb 2016

Or, is SVG the answer? Kenton pointed me towards the svglite package.
library(svglite)
svglite(file = "Rplot001.svg")
plot(x)
dev.off()
file.size("Rplot001.svg")/1000
# [1] 973.619
gz <- function(in_path, out_path = tempfile()) {
out <- gzfile(out_path, "w")
close(out)
invisible(out_path)
}
file.size(gz("Rplot001.svg", "Rplot001.svgz")) / 1000
#> [1] 74.11

Session Info

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets
[7] methods   base

other attached packages:
[1] RColorBrewer_1.1-2 Gmisc_1.3          htmlTable_1.5
[4] Rcpp_0.12.3

loaded via a namespace (and not attached):
[1] Formula_1.2-1       knitr_1.12.3
[3] cluster_2.0.3       magrittr_1.5
[5] splines_3.2.3       munsell_0.4.2
[7] colorspace_1.2-6    lattice_0.20-33
[9] stringr_1.0.0       plyr_1.8.3
[11] tools_3.2.3         nnet_7.3-12
[13] gtable_0.1.2        latticeExtra_0.6-26
[15] htmltools_0.3       digest_0.6.9
[17] forestplot_1.4      survival_2.38-3
[19] abind_1.4-3         gridExtra_2.0.0
[21] ggplot2_2.0.0       acepack_1.3-3.3
[23] rsconnect_0.3.79    rpart_4.1-10
[25] rmarkdown_0.9.2     stringi_1.0-1
[27] scales_0.3.0        Hmisc_3.17-1
[29] XML_3.98-1.3        foreign_0.8-66

First Bayesian Mixer Meeting in London

There is a nice pub between Bunhill Fields and the Royal Statistical Society in London: The Artillery Arms. Clearly, the perfect place to bring people together to talk about Bayesian Statistics. Well, that’s what Jon Sedar (@jonsedar, applied.ai) and I thought.

 Source: http://www.artillery-arms.co.uk/
Hence, we’d like to organise a Bayesian Mixer Meetup on Friday, 12 February, 19:00. We booked the upstairs function room at the Artillery Arms and if you look outside the window, you can see Thomas Bayes’ grave.

We intend the group to be small (announcing only on the stan user group, pymc-devs gitter, and here for now) and geared to open discussion of Bayesian inference, tools, techniques and theory. Neither of us is a great expert, we're really just users of the tools, but we'd love to welcome academic discussion as well as real world examples etc.

Jon is more the Python/PyMC guy, while I come from the R/Rstan corner. We will prepare two talks to kick this off. Jon will talk about GLM Robust Regression with Outlier Detection using PyMC3, while I will talk about Experience vs Data with some stories from insurance and actuarial science, sprinkled with RStan examples.

If you would like to join us, please get in touch via the form below, so that we can keep tabs on numbers, and if this goes all well we shall set up a Meetup site.