Notes from the Kölner R meeting, 18 September 2015
R in a big data pipeline
Yuki Katoh had travelled all the way from Berlin to present on how to embed R with
luigi into a heterogeneous workflow of different applications. This is especially useful when R needs to be integrated with hadoop/hdfs based technologies, such as Spark and Hive. Luigi is not unlike Make, which Kirill presented at our last meeting in June. In a configuration file Yuki specified the various workflow steps and dependencies between the jobs.
luigidserver allows Yuki to monitor the various parts of the dependency graph visually. Thus, he can see the progress of his workflow in real time and identify quickly, when and where a sub process fails. As Yuki pointed out, this becomes critical in production systems, where failures need to be known and fixed quickly, unlike when ones carries out an explorative analysis in a development/research environment. See also Yuki’s blog post for further details.
Shiny + Shinyjs
|Download presentation files|
Shiny is a very popular R package that allows users to develop interactive browser applications. Paul Viefers introduced us to the extension
Experience vs. Data
The last talk of the meeting had a more statistical focus with examples from insurance. I repeated my talk from the LondonR user group meeting in June. One of the challenge in insurance is that despite of having many customers , insurance companies will have little claims data per customer to assess risks.I presented some Bayesian ideas to analyse risks with little data. I used the wonderful “Hit and run accident” example from Daniel Kahneman’s book Thinking, fast and slow to explain Bayes’ formula, introduced Bayesian belief networks for a claims analysis and discussed the challenge of predicting events when they haven’t happened yet (also in Stan). Along the way I mentioned a few ideas on communicating risk, which I learned from David Spiegelhalter earlier this year.
Next Kölner R meeting
Please get in touch, if you would like to present at the next meeting.