mages' blog

Notes from the 4th R in Insurance Conference

The 4th R in Insurance conference took place at Cass Business School London on 11 July 2016. This one-day conference focused once more on the wide range of applications of R in insurance, actuarial science and beyond. The conference programme covered topics including reserving, pricing, loss modelling, the use of R in a production environment and much more.

The audience of the conference included both practitioners (c.80%) and academics (c.20%) who are active or interested in the applications of R in Insurance. It was a truly international event with speakers and delegates from Europe, Asia and the Americas. The coffee breaks and conference dinner offered great networking opportunities.

Mario Wüthrich, ETH Zürich

In the first plenary session Mario Wüthrich (RiskLab ETH Zurich) spoke about the (new) challenges in actuarial science. While fundamentals of analysing data have not changed over the years, the data and technology available has, and with that new challenges emerged. Yet, as Mario pointed out, insurance is still often concerned with analysing 'little' data, as losses occur rarely. Furthermore, the bigger data sets, often generated by sensors, require careful calibration, monitoring and cleansing. Those new challenges provide opportunities for new research (if data is being made available) and the industry. The R community can provide links between the two. Mario would like to see more and better documentation of R packages, more insurance examples and better handling of big data.

Thereafter, the programme consisted of a combination of contributed presentations and lightning talks, as well as a panel discussion on how analytics is transforming the insurance business. Adrian Cuc (Verisk), Simon Brickman (Beazley), Roland Schmid (Mirai Solutions) and Markus Gesmann (Vario Partners) discussed the efforts made in bridging between data vendors, consultants and insurers, as well as the challenges of developing collaborative business models that respond to market needs.

Dan Murphy, Trinostics

In the closing plenary, Dan Murphy (Trinostics, San Francisco) gave an insight into his experience as an actuary on how to provide persuasive advice for senior management. He uses the three-C's: context, confidence and clarity. Context is about articulating the problem in a language senior management can understand it. Why does the management need to worry about the problem? If you have a solution, then you have to deliver it with conviction, because, most importantly is has to be actionable. Clarity, of your actionable insight, ensures that those actions can be delegated to the relevant team/employee by the management without you in the room.

The slides of the conference are available on request.

Scientific committee and sponsors

The members of the scientific committee were: Katrien Antonio (KU Leuven, UvA), Christophe Dutang (Université du Maine), Markus Gesmann (Vario Partners), Giorgio Spedicato (UnipolSai ) and Andreas Tsanakas (Cass Business School).

Finally, we are grateful to our sponsors Verisk, Mirai Solutions, Applied AI, RStudio, CYBAEA and Oasis, without whom the event wouldn't be possible.

R in Insurance 2017

We are delighted to announce next year’s event already. The conference will travel across the Channel to ENSAE, Paris, 8 June 2017. Further details will be published on www.rininsurance.com.

Notes from the Kölner R meeting, 9 July 2016

Last Thursday the Cologne R user group came together again. This time, our two speakers arrived from Bavaria, to talk about Spark and R Server.

Introduction to Apache Spark

Download slides
Dubravko Dulic gave an introduction to Apache Spark and why Spark might be of interest to data scientists using R. Spark is designed for cluster computing, i.e. to distribute jobs across several computers. Not all tasks in R can be split easily across several nodes in a cluster, but if you use functions like by in R, then it is most likely doable. The by function in R splits a data set into several subsets and applies a specific function to each subgroup and collects the results in the end. In the world of Hadoop, this is called MapReduce. Spark has an advanced DAG (directed acyclic graph) execution engine that supports cyclic data flow and in-memory computing. Additionally, Spark has a direct API for R, which makes it relatively ease to write applications with Spark.

Microsoft R Server

Download slides
Since the acquisition of Revolution Analytics in 2015, Microsoft has been busy integrating R into its product offerings. Stefan Cronjaeger gave an overview of how R can be integrated into a production environment. Microsoft R server aims to solve the problem of doing 'big data' analytics with R, which allows to carrying out in-memory and disk-based data analysis. Additional new tools are called ScaleR for big data and parallelized analytics, ConnectR to connect to various other data sources, DistributedR for grid computing. Finally, Stefan showed us how Visual Studio can be used as an R development environment, similar to RStudio.

Next Kölner R meeting

The next meeting will be scheduled in about three months time. Details will be published on our Meetup site. Thanks again to Microsoft for their support.

Please get in touch, if you would like to present at the next meeting.

Notes from 3rd and 3.5th Bayesian Mixer Meetup

Two Bayesian Mixer meet-ups in a row. Can it get any better?

Our third 'regular' meeting took place at Cass Business School on 24 June. Big thanks to Pietro and Andreas, who supported us from Cass. The next day, Jon Sedar of Applied AI, managed to arrange a special summer PyMC3 event.

3rd Bayesian Mixer meet-up

First up was Luis Usier, who talked about cross validation. Luis is a former student of Andrew Gelman, so, of course, his talk touched on Stan and the 'loo' (leave one out) package in R. Luis started with a simple artificial example that aimed to predict the probability of goalkeepers to save a shot on target. Adding a hierarchical structure to the model and treating the variance as a random variable, resulted in a pathological posterior distribution, which makes sampling next to impossible. Instead, fitting different models, with different fixed parameters, allows the user then to compare the models via cross-validation using the 'loo' function. Clever! I need to learn more about this. Luis' slides are available here and the underlying source code on GitHub.

Luis Usier talking about cross-validation in R and Stan

We were lucky to have Robert Cowell talking to us, in what was his final week at Cass. Robert has been very much at the forefront of Bayesian development over the last 30 years. He is one of the co-authors of Probabilistic Networks and Expert Systems. Robert gave an insightful talk on probabilistic models for analysing mixed DNA traces. For illustration purpose, he used a crime case, where a man was killed in a pub, and where blood traces were used to support identifying the murder - turning statistics into a thriller.

Following those two stimulating talks, we had a few networking drinks at the Artillery Arms. But not too many, as the next day continued with another Bayesian event.

3.5th Meetup: PyMC3 summer special

We had a rare opportunity to gather together a few of the core contributors of the PyMC3 package for a talks & hack session. PyMC3 is a leading framework for probabilistic programming entirely based in Python with a 'theano' backend, with support for the NUTS sampler, Variational Inference and lots of useful functionality - an alternative to Stan.

We had two core contributors with us: Chris Fonnesbeck (usually in Nashville, USA) and Thomas Wiecki (online from Düsseldorf, Germany), plus other package contributors.

Chris Fonnesbeck talking about PyMC3

On Saturday morning Chris gave an overview of PyMC3, followed by a detailed talk of Thomas on Bayesian Deep Learning. The afternoon was spent hacking together away on different problems. I was new to PyMC3, so I went through the tutorial on Probabilistic Programming using PyMC3, which Chris had given at a workshop in Oslo.

Many thanks to all who helped to make these events such a success and especially to Chris, Thomas, Luis, Robert, Andreas, Pietro and Jon.

If you have ideas for a future event, then please get in touch and visit our Meetup page.

Early bird registration for R in Insurance closes 30 May


Hurry! The early bird registration offer for the 4th R in Insurance conference, 11 July 2016, at Cass Business School closes 30 May.

This one-day conference will focus once more on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered include reserving, pricing, loss modelling, the use of R in a production environment, and more.

We have a fantastic programme with international speakers and conference dinner at Ironmongers Hall. Keynotes will be given by Mario Wüthrich and Dan Murphy.

The organisers gratefully acknowledge the sponsorship of Verisk, Mirai Solutions, Applied AI, Studio, CYBAEA and Oasis, without whom the event wouldn't be possible.

R in Insurance 2016 Programme


We are delighted to announce that the programme for the 4th R in Insurance conference at Cass Business School in London, 11 July 2016, have been finalised.

Register by the end of May to get the early bird booking fee.

The organisers gratefully acknowledge the sponsorship of Verisk, Mirai Solutions, Applied AI, Studio, CYBAEA and Oasis, without whom the event wouldn't be possible.

New R package to access World Bank data

Staying on top of new CRAN packages is quite a challenge nowadays. However, thanks to Dirk's CRANberries service I occasionally spot a new gem, such as wbstats, which appeared on CRAN last week.

Similarly to the WDI package, wbstats offers an interface to the World Bank database.

With the functions of wbstats the World Bank data can be searched and data for several indicators requested. Unlike WDI, the data is returned in a 'long' table with one column for all values and a separate column for the indicators. Additionally, the function wb allows me to specify how many most recent values (mrv) I am interested.

Thus, to recreate the famous Gapminder chart by Hans Rosling, showing the correlation between fertility, i.e. number of children per woman, and life expectancy over time by country and region, I can write (note, a Flash player is required):


If you'd like to learn more about how to create interactive charts with googleVis, then check out the free tutorial on DataCamp.

Session Info

R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.4 (El Capitan)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] googleVis_0.5.10 data.table_1.9.6 wbstats_0.1     

loaded via a namespace (and not attached):
[1] httr_1.1.0        R6_2.1.2          rsconnect_0.4.2.1
[4] tools_3.2.4       curl_0.9.7        RJSONIO_1.3-0    
[7] jsonlite_0.9.19   chron_2.3-47 

Notes from 2nd Bayesian Mixer Meetup


Last Friday the 2nd Bayesian Mixer Meetup (@BayesianMixer) took place at Cass Business School, thanks to Pietro Millossovich and Andreas Tsanakas, who helped to organise the event.
Bayesian Mixer at Cass

First up was Davide De March talking about the challenges in biochemistry experimentation, which are often characterised by complex and emerging relations among components.

The very little prior knowledge about complex molecules bindings left a fertile field for a probabilistic graphical model. In particular, Bayesian networks can help the investigator in the definition of a conditional dependence/independence structure where a joint multivariate probability distribution is determined. Hence, the use of Bayesian network can lead to a more efficient way of designing experiments.


Davide De March: Bayesian Networks to design optimal experiments

The second act of the night was Mick Cooney, presenting ideas of using growth curves to estimate the ultimate amounts paid in insurance by some cohort of policies.

The talk showed a model for these curves, discussed the implementation in Stan and how posterior predictive checks can be used to assess the output of the model.

Mick Cooney: Bayesian Modelling for Loss Curves in Insurance

Thanks again to everyone who helped to make the event a success, particularly our speakers and Jon Sedar of Applied AI.

We are planning to run another event in mid-June. Please get in touch via our Meetup site with ideas and talk proposals.