mages' blog

Using SVG graphics in blog posts

My traditional work flow for embedding R graphics into a blog post has been via a PNG files that I upload online. However, when I created a 'simple' graphic with only basic curves and triangles for a recent post, I noticed that the PNG output didn't look as crisp as I expected it to be. So, eventually I used a SVG (scalable vector graphic) instead.

Creating a SVG file with R could't be easier; e.g. use the svg() function in the same way as png(). Next, make the file available online and embed it into your page. There are many ways to do this, in the example here I placed the file into a public GitHub repository.

To embed the figure into my page I could use either the traditional <img> tag, or perhaps better the <object> tag. Paul Murrell provides further details on his blog.

With <object> my code looks like this:
<object data="https://rawgithub.com/mages/diesunddas/master/Blog/transitionPlot.svg" type="image/svg+xml" width="400"> </object>

There is a little trick required to display a graphic file hosted on GitHub.

By default, when I look for the raw URL, GitHub will provide an address starting with https://raw.githubusercontent.com/..., which needs to be replaced with https://rawgithub.com/....

Ok, let's look at the output. As a nice example plot I use a transitionPlot by Max Gordon, something I wanted to do for a long time.

SVG output

PNG output


Conclusions

The SVG output is nice and crisp! Zoom in and the quality will not change. The PNG graphic on the other hand appears a little blurry on my screen and even the colours look washed out. Of course, the PNG output could be improved by fiddling with the parameters. But, after all it is a raster graphic.

Yet, I don't think that SVG is always a good answer. The file size of an SVG file can grow quite quickly, if there are many points to be plotted. As an example check the difference in file size for two identical plots with 10,000 points.
x <- rnorm(10000)
png()
plot(x)
dev.off()
file.size("Rplot001.png")/1000
# [1] 118.071
svg()
plot(x)
dev.off()
file.size("Rplot001.svg")/1000
# [1] 3099.181
That's 3.1 Mb vs 118 kb, a factor of 26! Even compressed to a .svgz file, the SVG file is still 317kb.

R code


Session Info

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.3 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] RColorBrewer_1.1-2 Gmisc_1.3          htmlTable_1.5     
[4] Rcpp_0.12.3       

loaded via a namespace (and not attached):
 [1] Formula_1.2-1       knitr_1.12.3       
 [3] cluster_2.0.3       magrittr_1.5       
 [5] splines_3.2.3       munsell_0.4.2      
 [7] colorspace_1.2-6    lattice_0.20-33    
 [9] stringr_1.0.0       plyr_1.8.3         
[11] tools_3.2.3         nnet_7.3-12        
[13] gtable_0.1.2        latticeExtra_0.6-26
[15] htmltools_0.3       digest_0.6.9       
[17] forestplot_1.4      survival_2.38-3    
[19] abind_1.4-3         gridExtra_2.0.0    
[21] ggplot2_2.0.0       acepack_1.3-3.3    
[23] rsconnect_0.3.79    rpart_4.1-10       
[25] rmarkdown_0.9.2     stringi_1.0-1      
[27] scales_0.3.0        Hmisc_3.17-1       
[29] XML_3.98-1.3        foreign_0.8-66

First Bayesian Mixer Meeting in London

There is a nice pub between Bunhill Fields and the Royal Statistical Society in London: The Artillery Arms. Clearly, the perfect place to bring people together to talk about Bayesian Statistics. Well, that’s what Jon Sedar (@jonsedar, applied.ai) and I thought.

Source: http://www.artillery-arms.co.uk/
Hence, we’d like to organise a Bayesian Mixer Meetup on Friday, 12 February, 19:00. We booked the upstairs function room at the Artillery Arms and if you look outside the window, you can see Thomas Bayes’ grave.

We intend the group to be small (announcing only on the stan user group, pymc-devs gitter, and here for now) and geared to open discussion of Bayesian inference, tools, techniques and theory. Neither of us is a great expert, we're really just users of the tools, but we'd love to welcome academic discussion as well as real world examples etc.

Jon is more the Python/PyMC guy, while I come from the R/Rstan corner. We will prepare two talks to kick this off. Jon will talk about GLM Robust Regression with Outlier Detection using PyMC3, while I will talk about Experience vs Data with some stories from insurance and actuarial science, sprinkled with RStan examples.

If you would like to join us, please get in touch via the form below, so that we can keep tabs on numbers, and if this goes all well we shall set up a Meetup site.

Flowing triangles

I have admired the work of the artist Bridget Riley for a long time. She is now in her eighties, but as it seems still very creative and productive. Some of her recent work combines simple triangles in fascinating compositions. The longer I look at them, the more patterns I recognise.

Yet, the actual painting can be explained easily, in a sense of a specification document to reproduce the pattern precisely. However, seeing the real print, as I had the chance at the London Art Fair last week, and a reproduction on the screen is incommensurable.

Having said that, I could not resist programming a figure that resembles the artwork labelled Bagatelle 2. Well, at least I can say that I learned more about grid [1], grid.path [2] and gridSVG [3] in R.

Inspired by Bridget Riley Bagatelle 2

R Code


References

[1] P. Murrell. R Graphics, Second Edition. CRC Press. 2011
[2] P. Murrell. What's in a Name? . The R Journal, 4(2):5–12, dec 2012.
[3] P. Murrell and S. Potter. gridSVG: Export grid graphics as SVG. R package 1.5-0. 2015

Session Info

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils  datasets 
[7] methods   base     

other attached packages:
[1] gridSVG_1.5-0    data.table_1.9.6

loaded via a namespace (and not attached):
[1] tools_3.2.3   RJSONIO_1.3-0 chron_2.3-47  XML_3.98-1.3

Formatting table output in R

Formatting data for output in a table can be a bit of a pain in R. The package formattable by Kun Ren and Kenton Russell provides some intuitive functions to create good looking tables for the R console or HTML quickly. The package home page demonstrates the functions with illustrative examples nicely.

There are a few points I really like:
  • the functions accounting, currency, percent transform numbers into better human readable output
  • cells can be highlighted by adding color information
  • contextual icons can be added, e.g. from Glyphicons
  • output can be displayed in RStudio's viewer pane

The CRAN Task View: Reproducible Research lists other packages as well that help to create tables for web output, such as compareGroups, DT, htmlTable, HTMLUtils, hwriter, Kmisc, knitr, lazyWeave, SortableHTMLTables, texreg and ztable. Yet, if I am not mistaken, most of these packages focus more on generating complex tables with multi-columns rows, footnotes, math notation, etc, than the points I mentioned above.

Finally, here is a little formattable example from my side:



Session Info

R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.2 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] formattable_0.1.5

loaded via a namespace (and not attached):
 [1] shiny_0.12.2.9006 htmlwidgets_0.5.1 R6_2.1.1         
 [4] rsconnect_0.3.79  markdown_0.7.7    htmltools_0.3    
 [7] tools_3.2.3       yaml_2.1.13       Rcpp_0.12.2      
[10] highr_0.5.1       knitr_1.12        jsonlite_0.9.19  
[13] digest_0.6.9      xtable_1.8-0      httpuv_1.3.3     
[16] mime_0.4  

R in Insurance: Registration and abstract submission opened

Following the successful 3rd R in Insurance conference in Amsterdam last year, we return to London this year.

The registration for the 4th conference on R in Insurance on Monday 11 July 2016 at Cass Business School has opened.

This one-day conference will focus again on applications in insurance and actuarial science that use R, the lingua franca for statistical computation.

The intended audience of the conference includes both academics and practitioners who are active or interested in the applications of R in insurance.

Invited talks will be given by:
Details about the registration and abstract submission are given on the dedicated R in Insurance page at Cass Business School, London.

The submission deadline for abstracts is 28 March 2016. Please email your abstract of no more than 300 words to: rinsuranceconference@gmail.com.

Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

For more information about the past events visit www.rininsurance.com.

Sponsors

The organisers gratefully acknowledge the sponsorship of Verisk/ISO, Mirai Solutions, RStudio, Applied AI, and CYBAEA.

Gold Sponsors

ISO2
Mirai Logo

Silver Sponsors

RStudio
AAI
Cybaea

Next Kölner R User Meeting: Friday, 4 December 2015

Koeln R
The 16th Cologne R user group meeting is scheduled for this Friday, 4 December 2015 and we have great line up with three talks followed by networking drinks.

Venue: Startplatz, Im Mediapark, 550670 Köln

Drinks and Networking

The event will be followed by drinks (Kölsch!) and networking opportunities.

For further details visit our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

The organisers, Kirill Pomogajko and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their Matrix programme.

Notes from Warsaw R meetup

I had the great pleasure time to attend the Warsaw R meetup last Thursday. The organisers Olga Mierzwa and Przemyslaw Biecek had put together an event with a focus on R in Insurance (btw, there is a conference with the same name), discussing examples of pricing and reserving in general and life insurance.

Experience vs. Data

I kicked off with some observations of the challenges in insurance pricing. Accidents are thankfully rare events, that's why we buy insurance. Hence, there is often not a lot of claims data available for pricing. Combining the information from historical data and experts with domain knowledge can provide a rich basis for the assessment of risk. I presented some examples using Bayesian analysis to understand the probability of an event occurring. Regular readers of my blog will recognise the examples from earlier posts. You find my slides on GitHub.
Download slides

Non-life insurance in R

Emilia Kalarus from Triple A shared some of her experience of using R in non-life insurance companies. She focused on the challenges in working across teams, with different systems, data sets and mentalities.

As an example Emilia talked about the claims reserving process, which in her view should be embedded in the full life cycle of insurance, namely product development, claims, risk and performance management. Following this thought she presented an idea for claims reserving that models the live of a claim from not incurred and not reported (NINR), to incurred but not reported (IBNR), reported but not settled (RBNS) and finally paid.

Stochastic mortality modelling

The final talk was given by Adam Wróbel from the life insurer Nationale Nederlanden, discussing stochastic mortality modelling. Adam's talk on analysing mortality made me realise that life and non-life insurance companies may be much closer to each other than I thought.

Although life and non-life companies are usually separated for regulatory reasons, they both share the fundamental challenge of predicting future cash-flows. An example where the two industries meet is product liability.

Over the last century technology has changed our environment fundamentally, more so then ever before. Yet, we still don't know which long term impact some of the new technologies and products will have on our life expectancy. Some will prolong our lives, others may make us ill.

A classic example is asbestos, initial regraded as a miracle mineral, as it was impossible to set on fire, abundant, cheap to mine, and easy to manufacture. Not surprisingly it was widely used until it was linked to cause cancer. Over the last 35 years the non-life insurance industry has paid well in excess of hundred billion dollars in compensations.

Slides and Code

The slides and R code of the presentations are hosted on the Warsaw R GitHub page.