Notes from the Kölner R meeting, 12 December 2014

No comments
Last week's Cologne R user group meeting was the best attended so far, and it was a remarkable event - I believe not a single line of R code was shown. Still, it was an R user group meeting with two excellent talks, and you will understand shortly why not much R code needed to be displayed.

Introduction to Julia for R Users

Julia introduction for R users
Download slides
Hans Werner Borchers joined us from Mannheim to give an introduction to Julia for R users. Julia is a high-level, high-performance dynamic programming language for technical computing. The language has gained some considerable traction over the last two years and it was great to get an overview from a familiar perspective.

Interestingly, as Hans Werner pointed out, Julia is by far not the only new language around the block. Indeed, over the last decade nearly every year saw the announcement of a new language. Also big tech companies such as Microsoft, Google, Mozilla and Apple are trying to push their own programming languages: F# (2005), Go (2009), Rust (2010) and Swift (2014) respectively.

Over the more recent years we notice a movement towards the use of LLVM (Low Level Virtual Machine), on which Julia is based as well and which makes it fast. The just in time compilation demands a little mind shift if you come from R, where the mantra for speed is: vectorise - remove all for-loops. Well, the opposite is true for Julia, because your code will be compiled. For-loops are much easier to understand for the underlying compiler. Hans Werner's slides provide some good examples to get you started and pointers to further resources.

Dynamic Linear Models and Kalman Filtering

Dynamic linear models
Download slides
Forecasting time series was the topic of Holger Zien's talk. Holger gained his first experience with time series during his PhD, when he worked with experimental sensor data. That meant he had lots of data, which could often be regarded stationary as well.

Nowadays, his challenges can be very different, sometimes only a few data points from a non-stationary process are available, and yet he is still expected to predict the future.

Dynamic linear models (dlm) can provide a remedy in those situations. In their simplest version a dlm links system and observational equations in the following way:
\[
y_t = F \theta_t + \nu_t\quad\mbox{observation eq. }\\
\theta_t = G \theta_{t-1} + \omega_t\quad\mbox{system eq.}
\] with \(\nu_t, \omega_t\) mutually independent random variables. A special case of dynamic linear models is the well known Kalman filter. In the more general case \(y_t\) and \(\theta_t\) are vectors and \(F_t, G_t\) are time variant matrices.

Holger explained that a dlm can principally be used for three purposes:
  • Filtering: Estimate of the current value of the state/system variable.
  • Smoothing: Estimate of past values of the state/system variable, i.e., estimating at time \(t\) given measurements up to time \(t' > t\).
  • Forecasting: Forecasting future observations or values of the state/system variable.
With this introduction outlined Holger showed us how various ARMA and linear regression models can be expressed as dlm. His talk concluded with some remarks about his personal experience and references to various R packages, articles and books.

Next Kölner R meeting

The next meeting is scheduled for 6 March 2015.

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.

No comments :

Post a Comment

Next Kölner R User Meeting: Friday, 12 December 2014

No comments
Koeln R
The next Cologne R user group meeting is scheduled for this Friday, 12 December 2014.

We have an exciting agenda with two talks on Julia and Dynamic Linear Models:

Introduction to Julia for R Users

Hans Werner Borchers

Julia is a high-performance dynamic programming language for scientific computing, with a syntax that is familiar to users of other technical computing environments (Matlab, Python, R, etc.). It provides a sophisticated compiler, high performance with numerical accuracy, and extensive mathematical function libraries.

Some of the highlights of Julia are an implementation of automated differentiation, an optimisation modelling language, also integrating some of the optimisation solvers available from the COIN-OR project, and user-contributed packages for time series, statistics and machine learning, or operations research.

Dynamic Linear Models and Kalman Filtering

Holger Zien

One of the problems most commonly presented to everybody working in statistics is forecasting time series. The textbook answer is to fit an ARMA model to the data and to use the model for prediction. This approach works well for long and stationary time series. However, often the actual time series one is given to analyse are more complicated. They are ridiculously short and clearly non-stationary. Dynamic Linear Models (DLM) and Kalman Filtering go one step beyond ARMA models and may be applied to these more complicated data. I will give a brief introduction into their mathematical background and will talk about my experience using them in practice.

Drinks and Networking

The event will be followed by drinks and schnitzel at the Lux.

For further details visit our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.

The organisers, Bernd Weiß and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their vector programme.


View Larger Map

No comments :

Post a Comment

Measuring temperature with my Arduino

No comments
It is really getting colder in London - it is now about 5°C outside. The heating is on and I have got better at measuring the temperature at home as well. Or, so I believe.

Last week's approach of me guessing/feeling the temperature combined with an old thermometer was perhaps too simplistic and too unreliable. This week's attempt to measure the temperature with my Arduino might be a little OTT (over the top), but at least I am using the micro-controller again.


Yet, I have to overcome similar challenges again as last week. The Arduino with the temperature sensor (TMP36) should be a better device to measure the temperature than my thermometer, but it isn't perfect either and the readings will have some uncertainty as well. But, it will be easy to make frequent measurements and with a little trick I should be able to filter out the noise and get a much better view of the 'true' temperature.

The trick is Bayesian filtering, or as it turns out, a simplified Kalman filter (univariate, no control actions, no transition matrices).

I can't observe the temperature directly, I can only try to measure it. I think of this scenario as a series of Bayesian networks (or Hidden Markov Model). The hidden or latent variable is the 'true' temperature and the observable variable is the reading of my Arduino sensor.


I start with a prior guess of the Temperature at time \(k=0\), just like last week (note, here I will use \(x_k\) instead of \(\mu_k\)). My initial prior is \(N(x_0, \sigma_0^2)\). Now I predict the state (temperature) \(\hat{x}_1\) at the next time step. I expect the temperature to be largely unchanged and therefore my prediction is just \(\hat{x}_1 = x_0 + \varepsilon_P\) with \(\varepsilon_P \sim N(0, \sigma_P^2)\) describing the additional process variance going from one time step to the next. The process variance is independent from the uncertainty of \(x_0\), hence I can add them up and thus my prior for \(x_1\) is \(N(x_0, \sigma_0^2+\sigma_P^2)\).


The measurement of the temperature should reflect the 'true' temperature with some measurement uncertainty \(\sigma_M\). My prediction of the measurement is \(\hat{z}_1 = \hat{x}_1 + \varepsilon_M\) with \(\varepsilon_M \sim N(0, \sigma_M^2)\). The best estimator for \(\hat{z}_1\) is the actual measurement \(z_1\) and therefore the likelihood distribution of \(z_1|x_1\) is \(\,N(z_1, \sigma_M)\). Applying the formulas for Bayesian conjugates again gives the hyper-parameters for the posterior normal distribution \(x_1|\,z_1, x_0\), the temperature given the measurement and my prediction from the previous time step:

\[
x_1=\left.\left(\frac{x_0}{\sigma_0^2+\sigma_P^2} + \frac{z_1}{\sigma_M^2}\right)\right/\left(\frac{1}{\sigma_0^2+\sigma_P^2} + \frac{1}{\sigma_M^2}\right) \\
\sigma_1^2=\left(\frac{1}{\sigma_0^2+\sigma_P^2} + \frac{1}{\sigma_M^2}\right)^{-1}
\]With \(K_1 := \frac{\sigma_0^2+\sigma_P^2}{\sigma_M^2+\sigma_0^2+\sigma_P^2} \) this simplifies to:
\[
x_1 = K_1\, z_1 + (1 - K_1)\, x_0\\
\sigma_1^2 = (1 - K_1) (\sigma_0^2 + \sigma_P^2)
\]
This gives the simple iterative updating procedure (or Kalman filter):
\[
K_{k+1} = \frac{\sigma_k^2+\sigma_P^2}{\sigma_M^2+\sigma_k^2+\sigma_P^2} \\
\sigma_{k+1}^2 = (1 - K_{k+1}) (\sigma_k^2 + \sigma_P^2)\\
x_{k+1} = K_{k+1}\, z_{k+1} + (1 - K_{k+1})\, x_k
\]
In my code below I set the process variance to \(\sigma_P^2=0.01\) and the measurement variance to \(\sigma_M^2=0.5\), which led to a fairly stable reading of 21.5°C.


Changing the parameters \(\sigma_P\) and \(\sigma_M\) influence how quickly the filter reacts to changes. Here is an example where I briefly touch the temperature sensor with the above parameters. The time axis in those screen shots is about 6 seconds.


I have to do a bit more reading on the Kalman filter. It appears to be an immensely powerful tool to extract the signal from the noise. Well, it helped to put a man on the moon.

Code

This is the Processing and Arduino code I used in this post. You may have to change the port number in line 28 to your own settings.

No comments :

Post a Comment

How cold is it? A Bayesian attempt to measure temperature

No comments
It is getting colder in London, yet it is still quite mild considering that it is late November. Well, indoors it still feels like 20°C (68°F) to me, but I have been told last week that I should switch on the heating.

Luckily I found an old thermometer to check. The thermometer showed 18°C. Is it really below 20°C?

The thermometer is quite old and I'm not sure that is works properly anymore. So, what shall I do now? Perhaps I should consider that both measurements are uncertain and try to combine them.

I believe that I can sense the temperature within ±3°C, while I think that the thermometer still works within ±2°C. Assuming that both measurements follow a Gaussian (Normal) distribution, with the uncertainties given as standard deviations, I can use Bayes' theorem to combine my hypothesis with the data. The posterior distribution will be Gaussian again with conjugated hyper-parameters:
\[
\mu=\left.\left(\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^n x_i}{s^2}\right)\right/\left(\frac{1}{\sigma_0^2} + \frac{n}{s^2}\right) \\
\sigma^2=\left(\frac{1}{\sigma_0^2} + \frac{n}{s^2}\right)^{-1}
\]With \(K := \frac{n\sigma_0^2}{s^2+n\sigma_0^2} \) this simplifies to:
\[
\mu = K\, \bar{x} + (1 - K)\, \mu_0 \mbox{, with } \bar{x}=\frac{1}{n}\sum_{i=1}^n x_i\\
\sigma = s \,\sqrt{K/n}
\]In my case I have: \(n=1,\; x_1=18^{\circ}C,\; s=2^{\circ}C,\; \mu_0=20^{\circ}C,\; \sigma_0=3^{\circ}C\).

Hence, the posterior distribution has parameters \(\mu=18.6^{\circ}C\) and \(\sigma=1.7^{\circ}C\). Thus, my best guess would be that is actually a little colder than I thought. One could argue that the probability that is below 20° is 80%.

Over the last five days my perception of the temperature didn't change, neither did the weather forecast, but the measurements showed: 18°C, 19°C, 17.5°C, 18°C, 18.5°C.

With that information the parameters update to \(\mu=18.3^{\circ}C\) and \(\sigma=0.9^{\circ}C\). I can't deny it any longer it has got colder. The probability that is below 20°C is now at 97% and the heating is on.


Without any prior knowledge I may have used a t-test to check the measurements. But here I believe that I have information about the thermometer and my own temperature sensing abilities which I don't want to ignore.

R code


Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] BayesianFirstAid_0.1 rjags_3-14           coda_0.16-1         
[4] lattice_0.20-29     

loaded via a namespace (and not attached):
[1] grid_3.1.2    MASS_7.3-35   mnormt_1.5-1  stringr_0.6.2

No comments :

Post a Comment

First steps with ChainLadder: Import triangle from Excel into R

1 comment
Taking the first step is often the hardest: getting data from Excel into R.

Suppose you would like to use the ChainLadder package to forecast future claims payments for a run-off triangle that you have stored in Excel.


How do you get the triangle into R and execute a reserving function, such as MackChainLadder?

Well, there are many ways to do this and the ChainLadder package vignette, as well as the R manual on Data Import/Export has all of the details, but here is a quick and dirty solution using a CSV-file.

Open a new Excel workbook and copy your triangle into cell A1, with the first column being the accident or origin period and the first row describing the development period or age. You find an example CSV-file on GitHub.

Make sure that your triangle has no formatting, such a commas to separate thousands, as Excel will save those cell as characters. Now open R and go through the following commands:

# The first command will open a window and 
# ask you to select your CSV-file
myCSVfile <- file.choose()
# Read file into R
dat <- read.csv(file=myCSVfile) 
# use read.csv2 if semicolons are used as a separator
# likely to be the case if you are in continental Europe

dat # to see your data

     AY   X1    X2    X3    X4    X5    X6    X7    X8    X9   X10
1  1981 5012  8269 10907 11805 13539 16181 18009 18608 18662 18834
2  1982  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
3  1983 3410  8992 13873 16141 18735 22214 22863 23466    NA    NA
4  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
5  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
6  1986 1513  6445 11702 12935 15852    NA    NA    NA    NA    NA
7  1987  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
8  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
9  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
10 1990 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA
Ok, the data is in R, but now you have to convert it into a triangle. A triangle is basically a matrix with extra attributes. To do this execute the following steps.

# Load the ChainLadder package
library(ChainLadder)
# Ignore first column which holds accident year information 
tri <- dat[,-1]
# Convert to matrix
tri <- as.matrix(tri)
# Add dimension names
dimnames(tri) <- list(origin=dat[,1], dev=1:ncol(tri))
# Convert into a triangle class
tri <- as.triangle(tri)
tri
      dev
origin    1     2     3     4     5     6     7     8     9    10
  1981 5012  8269 10907 11805 13539 16181 18009 18608 18662 18834
  1982  106  4285  5396 10666 13782 15599 15496 16169 16704    NA
  1983 3410  8992 13873 16141 18735 22214 22863 23466    NA    NA
  1984 5655 11555 15766 21266 23425 26083 27067    NA    NA    NA
  1985 1092  9565 15836 22169 25955 26180    NA    NA    NA    NA
  1986 1513  6445 11702 12935 15852    NA    NA    NA    NA    NA
  1987  557  4020 10946 12314    NA    NA    NA    NA    NA    NA
  1988 1351  6947 13112    NA    NA    NA    NA    NA    NA    NA
  1989 3133  5395    NA    NA    NA    NA    NA    NA    NA    NA
  1990 2063    NA    NA    NA    NA    NA    NA    NA    NA    NA
With those preparations done you can execute the MackChainLadder function:

M <- MackChainLadder(tri, est.sigma = "Mack")
M
     Latest Dev.To.Date Ultimate   IBNR Mack.S.E CV(IBNR)
1981 18,834       1.000   18,834      0        0      NaN
1982 16,704       0.991   16,858    154      206    1.339
1983 23,466       0.974   24,083    617      623    1.010
1984 27,067       0.943   28,703  1,636      747    0.457
1985 26,180       0.905   28,927  2,747    1,469    0.535
1986 15,852       0.813   19,501  3,649    2,002    0.549
1987 12,314       0.694   17,749  5,435    2,209    0.406
1988 13,112       0.546   24,019 10,907    5,358    0.491
1989  5,395       0.336   16,045 10,650    6,333    0.595
1990  2,063       0.112   18,402 16,339   24,566    1.503

               Totals
Latest:    160,987.00
Dev:             0.76
Ultimate:  213,122.23
IBNR:       52,135.23
Mack S.E.:  26,909.01
CV(IBNR):        0.52

To copy the full triangle back into Excel you can use the clipboard:
write.table(M$FullTriangle, file="clipboard", sep="\t").
Go back to Excel and hit <Ctrl> + V on your keyboard to paste the data into R.

For more details see the package vignette and Dan's post on pasting triangles from Excel into R via the clipboard.

If you are after a thorough overview of R in insurance take a look at the book Computational Actuarial Science with R.

Finally, join the Special Interest Group on using R in actuarial science and insurance to share your questions and answers.

Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] ChainLadder_0.1.9 systemfit_1.1-14  lmtest_0.9-33    
[4] zoo_1.7-11        car_2.0-21        Matrix_1.1-4     

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3     actuar_1.1-6        cluster_1.15.3     
 [4] foreign_0.8-61      Formula_1.1-2       grid_3.1.2         
 [7] Hmisc_3.14-5        lattice_0.20-29     latticeExtra_0.6-26
[10] MASS_7.3-35         nnet_7.3-8          plyr_1.8.1         
[13] RColorBrewer_1.0-5  Rcpp_0.11.3         reshape2_1.4       
[16] rpart_4.1-8         sandwich_2.3-2      splines_3.1.2      
[19] statmod_1.4.20      stringr_0.6.2       survival_2.37-7    
[22] tools_3.1.2         tweedie_2.2.1 

1 comment :

Post a Comment

Unknown pleasures

No comments
Have I missed unknown pleasures in Python by focusing on R?

A comment on my blog post of last week suggested just that. Reason enough to explore Python a little. Learning another computer language is like learning another human language - it takes time. Often it is helpful to start by translating from the new language back into the old one.

I found a Python script by Ludwig Schwardt that creates a plot like this:


It is only a small Python script, but it illustrated how to:
  • load packages
  • use conditional statements
  • add comments
  • write functions
  • deal with arrays
  • write loops
  • create plots
  • save output into a PDF file
Here is my translation into R, which actually generated the plot above.

R code


Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] signal_0.7-4 e1071_1.6-4 

loaded via a namespace (and not attached):
[1] class_7.3-11 MASS_7.3-35  tools_3.1.2 

No comments :

Post a Comment

Phase plane analysis in R

1 comment
The forthcoming R Journal has an interesting article about phaseR: An R Package for Phase Plane Analysis of Autonomous ODE Systems by Michael J. Grayling. The package has some nice functions to analysis one and two dimensional dynamical systems.

As an example I use here the FitzHugh-Nagumo system introduced earlier:
\begin{align}
\dot{v}=&2 (w + v - \frac{1}{3}v^3) + I_0 \\
\dot{w}=&\frac{1}{2}(1 - v - w)\\
\end{align}
The FitzHugh-Nagumo system is a simplification of the Hodgkin-Huxley model of spike generation in squid giant axon. Here \(I_0\) is a bifurcation parameter. As I decrease \(I_0\) from 0 the system dynamics change (Hopf-bifurcation): a stable equilibrium solution transform into a limit cycle.

Following Michael's paper, I can use phaseR to plot the velocity field, add nullclines and plot trajectories from different starting points.

Here I plot the FitzHugh-Nagumo system for four different parameters of \(I_0\) and three different initial starting values. The blue line show the nullcline of \(w\) i.e. \(\dot{w}=0\), while the red line shows the nullcline of \(v\). For \(I_0=-2\) I can observe the limit cycle.



Yet, I was a little surprised that the paper didn't make any references to the XPPAUT software by Bard Ermentrout, which has been around for many years as tool to analyse dynamical systems.


The GUI to the software itself gives many more options to analyse dynamical systems, including an interface to the popular bifurcation program AUTO. A good tutorial with the FitzHugh-Nagumo model was given by Mathieu Desroches at the ICS summer school 2012.

Of course I could use XPPAUT as a pure integration engine from R as well:


Considering that R started as a tool for statisticians it has made a remarkable journey; here competing with more traditional engineering tools like Matlab with MatCont or special software like XPPAUT. If someday, someone would find the time and motivation to write an interface to AUTO then R would indeed be a very good environment for the analysis of dynamical systems.

Session Info

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] phaseR_1.3     deSolve_1.10-9

loaded via a namespace (and not attached):
[1] tools_3.1.2

1 comment :

Post a Comment

It is the small data that matters the most

No comments
Everyone is talking about Big Data1, but it is the small data that is holding everything together. The small slowly changing reference tables are the linchpins. Unfortunately, too often politics gets in the way as those small tables, maintained by humans, don't get the attention they deserve; or in other words their owners, if they exists - many of these little tables are orphans, make changes without understanding the potential consequences on downstream systems.

Here is a little story as it happened over the weekend.

A friend of mine flew on Saturday night from Vienna to Yerevan. The plane was scheduled to leave Austria at 10:15 PM and to arrive at 4:35 AM in Armenia. However, last weekend was the weekend when the summer time or daylight saving time (DST) ended in many European countries, including Austria, but not in Armenia.

Armenia used DST in 1981-1995 and 1997-2011. This may look a little odd, but Armenia is bar far not the only region that has changed its policy over the years. Russia decided in 2011 not to switch to winter time and keep DST all year round, only to switch to permanent winter time last weekend again.

Back to my friend, she arrives early Sunday morning at the airport in Yerivan, but her friend who promised to pick her up is not there. After a little while she takes a cab to her friends house, making it just in time before he was about to leave for the airport. What happened? Well, her friend relied on his mobile phone for his wake-up call. However, his mobile phone wasn't aware of the fact that Armenia didn't have DST anymore and set his clock back by one hour, allowing him more sleep. His phone might either be a little old or getting its signal from Russia, I don't know. Yet, this little story illustrates nicely how reliant we are nowadays on our interconnected devices having access to the correct reference data.

Imagine being in an intensive care unit relying on medial devices that have to work correctly together on time signals, when they are sourced from different companies, countries and running on different systems. Try to stay healthy, particular in late March and October, would be my advise.

So, how do systems/computers actually know which timezone to use?

Most computers will use the tz reference code and database maintained by a group of volunteers until 2011 and now by IANA. Still, how do they get informed that a region or state decided to change the rules? By keeping their ears on the ground!

The following extract from the Asia file of the tz database 2014b release is quite telling:


# Armenia
# From Paul Eggert (2006-03-22):
# Shanks & Pottenger have Yerevan switching to 3:00 (with Russian DST)
# in spring 1991, then to 4:00 with no DST in fall 1995, then
# readopting Russian DST in 1997.  Go with Shanks & Pottenger, even
# when they disagree with others.  Edgar Der-Danieliantz
# reported (1996-05-04) that Yerevan probably wouldn't use DST
# in 1996, though it did use DST in 1995.  IATA SSIM (1991/1998) reports 
# that Armenia switched from 3:00 to 4:00 in 1998 and observed DST 
# after 1991, but started switching at 3:00s in 1998.

# From Arthur David Olson (2011-06-15):
# While Russia abandoned DST in 2011, Armenia may choose to
# follow Russia's "old" rules.

# From Alexander Krivenyshev (2012-02-10):
# According to News Armenia, on Feb 9, 2012,
# http://newsarmenia.ru/society/20120209/42609695.html
#
# The Armenia National Assembly adopted final reading of Amendments to the
# Law "On procedure of calculation time on the territory of the Republic of
# Armenia" according to which Armenia [is] abolishing Daylight Saving Time.
# or
# (brief)
# http://www.worldtimezone.com/dst_news/dst_news_armenia03.html
# Zone NAME  GMTOFF RULES FORMAT [UNTIL]
Zone Asia/Yerevan 2:58:00 - LMT 1924 May  2
   3:00 - YERT 1957 Mar    # Yerevan Time
   4:00 RussiaAsia YER%sT 1991 Mar 31 2:00s
   3:00 1:00 YERST 1991 Sep 23 # independence
   3:00 RussiaAsia AM%sT 1995 Sep 24 2:00s
   4:00 - AMT 1997
   4:00 RussiaAsia AM%sT 2012 Mar 25 2:00s
   4:00 - AMT

Does this all sounds familiar to you and the challenges in your own organisation? Well, I gave a talk on Small & Big Data recently with a colleague of mine, should you be interested to find out more about this topic.

1. or Tiny Data in Rasmus' case

No comments :

Post a Comment

Approximating the impact of inflation

7 comments
The other day someone mentioned to me a rule of thumb that he was using to estimate the number of years \(n\) it would take for inflation to destroy half of the purchasing power of today's money:
\[ n = \frac{70}{p}\]
Here \(p\) is the inflation in percent, e.g. if the inflation rate is \(2\%\) then today's money would buy only half of today's goods and services in 35 years. You can also think of a saving account with an interest rate of \(2\%\) that would double your money in 35 years.

It is not difficult to derive this formula. The starting point is:
\[
2K = K (1 + \frac{p}{100})^n
\]
This is equivalent to:
\[
2 = (1 + \frac{p}{100})^n
\]
Taking the log gives:
\[
\log(2) = n \log(1 + \frac{p}{100})
\]
The first term of the Taylor series approximation of \(\log(1+x)\) for small \(x\) is \(x\). Hence for small \(p\) I can set:
\[
\log(2) \doteq n \, \frac{p}{100}
\]
Next I have to estimate the value for \(\log(2)\). Writing it as an integral leads to:
\[
\log(2) = \int_1^2 \frac{1}{x} \,dx
\]
Using Simpson's rule I can approximate the integral with:
\[
\int_1^2 \frac{1}{x} \,dx \doteq \frac{2-1}{6} (1+4\frac{2}{1+2}+\frac{1}{2} )
= \frac{25}{36} \doteq 0.7
\]
Thus,
\[
n \doteq \frac{70}{p}
\]
Plotting the two formulas against each other reveals that the approximation works pretty well, even for inflation rates up to 10%.

R Code

Here is the R code to reproduce the plot.
curve(70/x, from=1, to=10, 
      xlab="Inflation rate p%", 
      ylab="Number of years for purchaing power to half", 
      main="Impact of inflation on purchasing power",
      col="blue", 
      type="p", pch=16, cex=0.5)
curve(log(2)/(log(1+x/100)), 
      from=1, to=10, add=TRUE, 
      col="red")
legend("topright", 
       legend=c("70/p","log(2)/log(1+p/100)"), 
       bty="n",
       col=c("blue", "red"), 
       pch=c(16,16), pt.cex=c(1,1))

7 comments :

Post a Comment

googleVis 0.5.6 released on CRAN

No comments
Version 0.5.6 of googleVis was released on CRAN over the weekend. This version fixes a bug in gvisMotionChart. Its arguments xvar, yvar, sizevar and colorvar were not always picked up correctly.

Thanks to Juuso Parkkinen for reporting this issue.

Example: Love, or to love

A few years ago Martin Hilpert posted an interesting case study for motion charts. Martin is a linguist and he researched how the usage of words in American English changed over time, e.g. some words were more often used as nouns in the past and then became more popular as a verb. Do you talk about love, or do you tell someone that you love her/him? Visit his motion chart web page for more information and details!


R code


Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] googleVis_0.5.6

loaded via a namespace (and not attached):
[1] RJSONIO_1.3-0 tools_3.1.1

No comments :

Post a Comment

Visualising the seasonality of Atlantic windstorms

3 comments
Last week Arthur Charpentier sketched out a Markov spatial process to generate hurricane trajectories. Here, I would like to take another look at the data Arthur used, but focus on its time component.

According to the Insurance Information Institute, a normal season, based on averages from 1980 to 2010, has 12 named storms, six hurricanes and three major hurricanes. The usual peak months of August and September passed without any major catastrophes this year, but the Atlantic hurricane season is not over yet.

So, let's take a look at the data again, and here I will use code from Gaston Sanchez, who looked at windstorm data earlier.

I believe, I am using the same data as Arthur, but from a different source, which allows me to download it in one file (13.9MB). Using code from Gaston my chart of all named windstorms over the period of 1989 to 2013 looks like this.


Plotting the data by months illustrates the seasonality of windstorms during the year. Note, that there are no named windstorms for the months of February and March and although the season runs until the end of November, I can clearly see that the peak months are August and September.


Perhaps there are other time components that effect the seasonality across years, such as La Niña? Plotting the years 1989 to 2013 certainly shows that there are years with more windstorm activity than others. Critical to the impact and cost of windstorms is if they make landfall or not. Hence, I add information on deaths and economic damages for major hurricanes over that period from Wikipedia.


Although 1992 doesn't show much windstorm activity, hurricane Andrew was the most expensive one to that date and changed the insurance industry significantly. Following hurricane Andrew insurance companies started to embrace catastrophe models.

The economic loss is a very poor proxy to the loss of lives. In 1998 Hurricane Mitch cost the lives of over 19,000, with most of them living in Honduras. The economic cost of $6.2bn looks small compared to the cost of Andrew six years earlier ($26.5bn).


Although over the last few years we have seen a fairly benign impact of windstorm activity, the next big event can have a much more sever impact than what we have seen historically, as the population (and economy) on the Gulf of Mexico has and is growing rapidly, 150% from 1960 to 2008 alone. A hurricane season causing damages over $200bn doesn't look unreasonable anymore.

R Code


Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
[1] XML_3.98-1.1     data.table_1.9.2 ggplot2_1.0.0   
[4] maps_2.3-7      

loaded via a namespace (and not attached):
 [1] colorspace_1.2-4 digest_0.6.4     gtable_0.1.2    
 [4] labeling_0.3     MASS_7.3-33      munsell_0.4.2   
 [7] plyr_1.8.1       proto_0.3-10     Rcpp_0.11.2     
[10] reshape2_1.4     scales_0.2.4     stringr_0.6.2   
[13] tools_3.1.1 

3 comments :

Post a Comment

Running RStudio via Docker in the Cloud

12 comments
Deploying applications via Docker container is the current talk of town. I have heard about Docker and played around with it a little, but when Dirk Eddelbuettel posted his R and Docker talk last Friday I got really excited and had to have a go myself.

My aim was to rent some resources in the cloud, pull an RStudio Server container and run RStudio in a browser. It was actually surprisingly simple to get started.

I chose Digital Ocean as my cloud provider. They have many Linux systems to choose from and also a pre-built Docker system.


After about a minute I had kicked off the Docker droplet I could log into the system in a browser window and start pulling the Docker file, e.g. Dirk's container.


Once the downloads finished I could start the RStudio Server using the docker run command and log into a RStudio session. To my surprise even my googleVis package worked out of the box. The plot command opened just another browser window to display the chart; here the output of the WorldBank demo.


All of this was done within minutes in a browser window. I didn't even use a terminal window. So, that's how you run R on an iPad. Considering that the cost for the server was $0.015 per hour, I wonder why I should buy my own server, or indeed buy a new computer.

12 comments :

Post a Comment

Managing R package dependencies

No comments
One of my take aways from last week's EARL conference was that R is more and more growing out of its academic roots into the enterprise. And with that come some challenges, e.g. how do I ensure consistent and systematic access to a set of R packages in an organisation, in particular when one team is providing packages to others?

Two packages can help here: roxyPackage and miniCRAN.

I wrote about roxyPackage earlier on this blog. It allows me to create a local repository to distribute my package, while at the same time execute and control the build process from within R. But what about my package's dependencies? Here miniCRAN helps. miniCRAN is a new package by Andrie de Vries that enables me to find and download all package dependencies and store them in a local repository, e.g. the one used by roxyPackage.

For more details about roxyPackage and miniCRAN read the respective package vignettes.

Example

To create a local sub-CRAN repository for the two packages I maintain on CRAN and with all their dependencies I use:
library("miniCRAN")
my.pkgs <- c("googleVis", "ChainLadder")
pkgs <- pkgDep(my.pkgs, suggests = TRUE, enhances=FALSE)
makeRepo(pkgs = pkgs, path="/Users/Shared/myCRANRepos")
And to visualise the dependencies:
dg <- makeDepGraph(my.pkgs, includeBasePkgs=FALSE, 
                   suggests=TRUE, enhances=TRUE)
set.seed(1)
plot(dg, legendPosEdge = c(-1, 1), 
     legendPosVertex = c(1, 1), vertex.size=20)

What a surprise! In total I end up with 42 packages from CRAN and I didn't expect any connection between the ChainLadder and googleVis package.

Bonus tip

Don't miss out on Pat Burns's insightful talk about effective risk management from EARL. His thoughts reminded me of the great Karl Popper: Good tests kill flawed theories; we remain alive to guess again.

Session Info

R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics  grDevices utils  datasets  methods  
[7] base     

other attached packages:
[1] miniCRAN_0.1-0

loaded via a namespace (and not attached):
[1] httr_0.5 igraph_0.7.1  stringr_0.6.2 tools_3.1.1  
[5] XML_3.98-1.1

No comments :

Post a Comment

Notes from the Kölner R meeting, 12 September 2014

No comments
Last Friday we had guests from Belgium and the Netherlands joining us in Cologne. Maarten-Jan Kallen from BeDataDriven came from The Hague to introduce us to Renjin, and the guys from DataCamp in Leuven, namely Jonathan, Martijn and Dieter, gave an overview of their new online interactive training platform.

No comments :

Post a Comment

Next Kölner R User Meeting: Friday, 12 September 2014

No comments
Koeln R
The next Cologne R user group meeting is scheduled for this Friday, 12 September 2014.

We have a great agenda with international speakers:
  • Maarten-Jan Kallen: Introduction to Renjin, the R interpreter for the JVM
  • Jonathan Cornelissen, Martijn Theuwissen: DataCamp - An online interactive learning platform for R
The event will be followed by drinks and schnitzel at the Lux.

No comments :

Post a Comment

Zoom, zoom, googleVis

No comments
The Google Charts API is quite powerful and via googleVis you can access it from R. Here is an example that demonstrates how you can zoom into your chart.

No comments :

Post a Comment

ChainLadder 0.1.8 released

1 comment
Over the weekend we released version 0.1.8 of the ChainLadder package for claims reserving on CRAN.

What is claims reserving?

The insurance industry, unlike other industries, does not sell products as such but promises. An insurance policy is a promise by the insurer to the policyholder to pay for future claims for an upfront received premium.

As a result insurers don't know the upfront cost for their service, but rely on historical data analysis and judgement to predict a sustainable price for their offering. In General Insurance (or Non-Life Insurance, e.g. motor, property and casualty insurance) most policies run for a period of 12 months. However, the claims payment process can take years or even decades. Therefore often not even the delivery date of their product is known to insurers. The money set aside for those future claims payments are called reserves.

1 comment :

Post a Comment

googleVis 0.5.5 released

No comments
Earlier this week we released googleVis 0.5.5 on CRAN. The package provides an interface between R and Google Charts, allowing you to create interactive web charts from R. This is mainly a maintenance release, updating documentation and minor issues.

Screen shot of some of the Google Charts

New to googleVis? Review the examples of all googleVis charts on CRAN.

Perhaps the best known example of the Google Chart API is the motion chart, popularised by Hans Rosling in his 2006 TED talk.

No comments :

Post a Comment

GrapheR: A GUI for base graphics in R

5 comments
How did I miss the GrapheR package?

The author, Maxime Hervé, published an article about the package [1] in the same issue of the R Journal as we did on googleVis. Yet, it took me a package update notification on CRANbeeries to look into GrapheR in more detail - 3 years later! And what a wonderful gem GrapheR is.

The package provides a graphical user interface for creating base charts in R. It is ideal for beginners in R, as the user interface is very clear and the code is written along side into a text file, allowing users to recreate the charts directly in the console.

Adding and changing legends? Messing around with the plotting window settings? It is much easier/quicker with this GUI than reading the help file and trying to understand the various parameters.

Here is a little example using the iris data set.
library(GrapheR)
data(iris)
run.GrapheR()
This will bring up a window that helps me to create the chart and tweak the various parameters.

5 comments :

Post a Comment

Thanks to R Markdown: Perhaps Word is an option after all?

1 comment
In many cases Word is still the preferred file format for collaboration in the office. Yet, it is often a challenge to work with it, not so much because of the software, but how it is used and abused. Thanks to Markdown it is no longer painful to include mathematical notations and R output into Word.


I have been using R Markdown for a while now and have grown very fond of it. Although I am quite happy with PDF and HTML output for basic reports and to switch to Sweave/LaTeX for more complex documents, I was pleasantly surprised to learn that the new version of RStudio can produce MS Word files directly from R Markdown as well; thanks to the power of pandoc. Perhaps Word is an option after all?

1 comment :

Post a Comment

Hit and run. Think Bayes!

8 comments
At the R in Insurance conference Arthur Charpentier gave a great keynote talk on Bayesian modelling in R. Bayes' theorem on conditional probabilities is strikingly simple, yet incredibly thought provoking. Here is an example from Daniel Kahneman to test your intuition. But first I have to start with Bayes' theorem.

Bayes' theorem

Bayes' theorem states that given two events \(D\) and \(H\), the probability of \(D\) and \(H\) happening at the same time is the same as the probability of \(D\) occurring, given \(H\), weighted by the probability that \(H\) occurs; or the other way round. As a formula it can be written as:
\[
P(H \cap D) = P(H|D) \, P(D) = P(D|H) \, P(H)
\]
Or if I rearrange it:
\[
P(H|D) = \dfrac{P(D|H) \, P(H)}{P(D)}
\]
Imagine \(H\) is short for hypothesis and \(D\) is short for data, or evidence. Then Bayes' theorem states that the probability of a hypothesis given data is the same as the likelihood that we observe the data given the hypothesis, weighted by the prior belief of the hypothesis, normalised by the probability that we observe the data regardless of the hypothesis.

The tricky bit in real life is often to figure out what the hypothesis and data are.

Hit and run accident

This example is taken from Daniel Kahneman's book Thinking, fast and slow [1].
A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. 85% of the cabs in the city are Green and 15% are Blue. A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of the accident and concluded that the witness correctly identified each one of the two colours 80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accident was Blue rather than Green knowing that this witness identified it as Blue?

What is here the data and what is here the hypothesis? Intuitively you may think that the proportion of Blue and Green cabs is the data at hand and the witness accusation that a Blue cab was involved in the accident is the hypothesis. However, after some thought I found the following assignment much more helpful, as then \(P(H|D)\) matches the above question:

\(H =\) Accident caused by Blue cab. \(D =\) Witness said the cab was Blue.

With this it is straightforward to get the probabilities of \(P(H)=15\%\) and \(P(D|H)=80\%\). But what is \(P(D)\)? Well, when would the witness say that the cab was Blue? Either, when the cab was Blue and so the witness is right, or when the cab was actually Green and the witness is incorrect. Thus, following the law of total probability:
$$\begin{align}
P(D) & = P(D|H) P(H) + P(D | \bar{H}) P(\bar{H})\\
& = 0.8 \cdot 0.15 + 0.2 \cdot 0.85 = 0.29
\end{align}$$Therefore I get \(P(H|D)=41\%\). Thus, even if the witness states that the cab involved in the accident was Blue, the probability of this being true is only \(41\%\).

An alternative way to think about this problem is via a Bayesian Network. The colour of the cab will influence the statement of the witness. In R I can specify such a network using the gRain package [2], which I discussed in an earlier post. Here I provide the distribution of the cabs and the conditional probabilities of the witness as an input. After I compile the network, I can again read off the probabilities that a Blue cab was involved, when the witness said so.

R code

8 comments :

Post a Comment

Notes from the 2nd R in Insurance Conference

3 comments
The 2nd R in Insurance conference took place last Monday, 14 July, at Cass Business School London.

This one-day conference focused once more on applications in insurance and actuarial science that use R. Topics covered included reserving, pricing, loss modelling, the use of R in a production environment and more.

In the first plenary session, Montserrat Guillen (Riskcenter, University of Barcelona) and Leo Guelman (Royal Bank of Canada, RBC Insurance) spoke about the rise of uplift models. These predictive models are used for improved targeting of policyholders by marketing campaigns, through the use of experimental data. The presenters illustrated the use of their uplift package (available on CRAN), which they have developed for such applications.

3 comments :

Post a Comment

Simple user interface in R to get login details

3 comments
Occasionally I have to connect to services from R that ask for login details, such as databases. I don't like to store my login details in the R source code file, instead I would prefer to enter the my login details when I execute the code.


Fortunately, I found some old code in a post by Barry Rowlingson that does just that. It uses the tcltk package in R to create a little window in which the user can enter her details, without showing the password. The tcltk package is part of base R, which means the code will run on any operating system. Nice!

3 comments :

Post a Comment

googleVis 0.5.3 released

4 comments
Recently we released googleVis 0.5.3 on CRAN. The package provides an interface between R and Google Charts, allowing you to create interactive web charts from R.

Screen shot of some of the Google Charts

Although this is mainly a maintenance release, I'd like to point out two changes:
  • Default chart width is set to 'automatic' instead of 500 pixels.
  • Intervals for columns roles have to end with the suffix ".i", with "i" being an integer. Several interval columns are allowed, see the roles demo and vignette for more details.
Those changes were required to fix the following issues:
  • The order of y-variables in core charts wasn't maintained. Thanks to John Taveras for reporting this bug.
  • Width and height of googleVis charts were only accepted in pixels, although the Google Charts API uses standard HTML units (for example, '100px', '80em', '60', 'automatic'). If no units are specified the number is assumed to be pixels. This has been fixed. Thanks to Paul Murrell for reporting this issue.
New to googleVis? Review the demo on CRAN.

4 comments :

Post a Comment

Last chance to register for the R in Insurance conference

2 comments

The registration for the 2nd R in Insurance conference at Cass Business School London will close this Friday, 4 July.

The programme includes talks from international practitioners and leading academics, see below. For more details and registration visit: http://www.rininsurance.com.

2 comments :

Post a Comment

Generating and visualising multivariate random numbers in R

No comments
This post will present the wonderful pairs.panels function of the psych package [1] that I discovered recently to visualise multivariate random numbers.

Here is a little example with a Gaussian copula and normal and log-normal marginal distributions. I use pairs.panels to illustrate the steps along the way.

I start with standardised multivariate normal random numbers:
library(psych)
library(MASS)
Sig <- matrix(c(1, -0.7, -.5,
                -0.7, 1, 0.6,
                -0.5, 0.6, 1), 
                nrow=3)
X <- mvrnorm(1000, mu=rep(0,3), Sigma = Sig, 
             empirical = TRUE)  
pairs.panels(X)

No comments :

Post a Comment

Who will win the World Cup and which prediction model?

8 comments
The World Cup has finally kicked off last Thursday and I have seen some fantastic games already. Perhaps the Netherlands appears to be the strongest side so far, following their 5-1 victory over Spain.

To me the question is not only which country will win the World Cup, but also which prediction model will come closest to the actual results. Here I present three teams, FiveThirtyEight, a polling aggregation website, Groll & Schauberger, two academics from Munich and finally Lloyd's of London, the insurance market.

The guys around Nate Silver at FiveThirtyEight have used the ESPN’s Soccer Power Index to predict the outcomes of games and continue to update their model. Brazil is their clear favourite.

Andreas Groll & Gunther Schauberger from the LMU Munich developed a model, which like the approach from FiveThirtyEight aims to estimate the probability of a team winning the world cup. But unlike FiveThirtyEight, they see Germany to take the trophy home.

Lloyd's chose a different approach for predicting the World Cup final. The insurance market looked at the risk aspect of the teams and ranked the teams by their insured value. Arguably the better a team the higher their insured value. As a result Lloyd's predicts Germany to win the World Cup.

Quick reminder; what's the difference between insurance and gambling? Gambling introduces risk, where none exists. Insurance mitigates risk, where risk exists.

8 comments :

Post a Comment

The joy of joining data.tables

No comments
The example I present here is a little silly, yet it illustrates how to join tables with data.table in R.

Mapping old data to new data

Categories in general are never fixed, they always change at some point. And then the trouble starts with the data. For example not that long ago we didn't distinguish between smartphones and dumbphones, or video on demand and video rental shops.

I would like to back track price change data for smartphones and online movie rental shops, assuming that their earlier development can be set to the categories they were formerly part of, namely mobile and video rental shops to create indices.

Here is my toy data:



I'd like to create price indices for all products and where data for the new product categories is missing, use the price changes of the old product category.


The data.table package helps here. I start with my original data, convert it into a data.table and create mapping tables. That allows me to add the old product with its price change to the new product. The trick here is to set certain columns as key. Two data tables with the same key can easily be joined. Once I have the price changes for products and years the price index can be added. The plot illustrates the result, the R code is below.

No comments :

Post a Comment

Early bird registration for R in Insurance closes tomorrow

1 comment
The early bird registration offer for the 2nd R in Insurance conference, 14 July 2014, at Cass Business School closes tomorrow.


This one-day conference will focus once more on applications in insurance and actuarial science that use R, the lingua franca for statistical computation. Topics covered include reserving, pricing, loss modelling, the use of R in a production environment, and more. All topics are to be discussed within the context of using R as a primary tool for insurance risk management, analysis and modelling.

The conference programme consists of invited talks and contributed presentations discussing the wide range of fields in which R is used in insurance:


Attendance of the whole conference is the equivalent of 6.5 hours of CPD for members of the Actuarial Profession.

You can register directly on the conference web site www.rininsurance.com.

The organisers gratefully acknowledge the sponsorship of Mango Solutions, CYBAEA, RStudio and PwC without whom the event wouldn't be possible.

1 comment :

Post a Comment

Notes from the Kölner R meeting, 23 May 2014

No comments
The 10th Kölner R user meeting took place last Friday at the Institute of Sociology and to celebrate the anniversary we invited Andrie de Vries to join us from Revolution Analytics. Andrie is well known in the R community; he is the co-author of the R for Dummies book and an active contributor on stackoverflow.

Taking R to the Enterprise


Andrie de Vries: Taking R to the Enterprise. Photo: Günter Faes

Andrie talked about how R is finding its way into the enterprise. He argued that R has become the go-to tool for data and statistical analysis in many companies. He observed that over the years the commercial user base evolved from a small expert community with a love for the command line and cutting edge technology into a much wider audience. Although R is still used by the technical experts, who demand more and more power, out-of-memory computations on bigger data, etc. the number of casual R users is growing rapidly. Often the casual R user relies on code written by others as part of a bigger workflow and hence prefers a more standardised user interface.

Andrie illustrated how Revolution Analytics aims to satisfy both user groups. He started with the expert R users by demonstrating Revolution Analytics' RevoScaleR package that allows them to carry out analysis on bigger data. His example followed closely the airline analysis of Joseph Rickert's white paper. Andrie then showed how more complex R code and functions can be integrated (and to some extend hidden) into Alteryx and Tableau via Rserve.

googleVis overview & developments


 googleVis Update
Slides and code available on GitHub

I gave a brief introduction to googleVis and presented recent developments. The googleVis package provides an interface between R and the Google Chart Tools API. It allows users to create web pages with interactive charts based on R data frames and to display them either via the local R HTTP help server or within their own sites, without uploading the data to Google. The best known example is perhaps the motion chart of fertility and life expectancy data from the World Bank as first presented by Hans Rosling. Another popular example is the chart that illustrates the performance of the famous Lloyd's insurance market.

Since version 0.5.0 of googleVis many new chart types have been added to googleVis, including annotation, sankey, calendar and timeline charts. Additionally new markdown vignettes have been included to present the examples also on CRAN. The slides and rmarkdown code are available via GitHub.

Kölsch & Schnitzel

No Kölner R meeting would be complete without a few Kölsch and Schnitzel at the Lux. We were lucky with the weather so that we could sit outside to enjoy drinks, food and networking opportunities.

Kölsch and Schnitzel at the Lux. Photo: Günter Faes

Next Kölner R meeting

The next meeting is scheduled for 12 September 2014.

Please get in touch if you would like to present and share your experience, or indeed if you have a request for a topic you would like to hear more about. For more details see also our Meetup page.

Thanks again to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship.

No comments :

Post a Comment

Next Kölner R User Meeting: Friday, 23 May 2014

No comments
The next Cologne R user group meeting is scheduled for this Friday, 23 May 2014.

To celebrate our 10th meeting we welcome:
Followed by drinks and schnitzel at the Lux.

Further details available on our KölnRUG Meetup site. Please sign up if you would like to come along. Notes from past meetings are available here.


The organisers, Bernd Weiß and Markus Gesmann, gratefully acknowledge the sponsorship of Revolution Analytics, who support the Cologne R user group as part of their vector programme.


View Larger Map

No comments :

Post a Comment

The Wiener takes it all? A review of the 2014 Eurovision results

2 comments

Saturday's Eurovision Song Contest (ESC) from Copenhagen was hilarious as usual with acts from all over Europe and some more or less sensible gimmicks: a circular piano, a giant hamster wheel, a sea-saw, or indeed a beard and fancy dress.

The results of the ESC were only a little different to what the bookmakers in the UK had predicted before the event started. Sweden was seen as the favourite, followed by Austria, Netherlands, Armenia and the UK.

In the end Conchita Wurst from Austria won with 290 points in front of the acts from the Netherlands with 238, Sweden with 218 and Armenia with 174 points. The UK ended on the 17th rank with only 40 points. Perhaps the reason the English bookies had put the UK in fifth place reflected the bias of their clients towards their home country.

The points were given as a combination of a public tele vote and a jury in each country. But how big was the resemblance between the votes of the juries and the public? Is there much variance between countries?

The detailed voting data are available from the Eurovision site. Out of the 37 countries that participated in the voting, two countries, San Marino and Albania, had no tele voting and one country, Georgia, had no jury. In the remaining 34 countries Austria and the Netherlands made into the top 2 of the jury and public voting results if they were treated independently. Sweden made it into the top 5 of both as well, but the other countries differed.


So, how consistent was the voting of the jury and public for the top 5 in each country?

Well, in only 11 countries out of 34, the jury agreed with the public on more than 2 of the top 5 songs. In 25 countries the jury and public agreed on at least two candidates.

In some cases the differences between public and jury were so wide, that although a candidate was voted into the top 5 of the tele rankings the act wouldn't get any points. The top public winner in Belgium (Armenia), Ireland (Poland), Montenegro (Russia) and the UK (Poland) didn't get any points at all.

Still, getting the top favourites right seems much easier then any of the followers. Or in other words, it is really difficult to produce a hit, a song/act on which many can agree. But when you hear one, it is much easier to identify it as such, something you really like and believe others would like as well.

R Code

2 comments :

Post a Comment

Customising lines and points with googleVis

No comments
At the end of March Google released a new version of the Chart Tools API with new options for point shapes and line brushes. The arguments are called pointShape and lineDashStyle and can be set directly via googleVis.

We published googleVis 0.5.2 on CRAN yesterday with added examples for those new options in gvisLineChart and gvisScatterChart. Note, these options can be used with most chart types as well, also in combination. Visit the Google documentation for more details.

Customising Points




Customising Lines




Session Info

R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base     

other attached packages:
[1] googleVis_0.5.2

loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.1.0 

No comments :

Post a Comment

R in Insurance 2014: Conference Programme & Abstracts

No comments
I am delighted to announce that the programme and abstracts for the second R in Insurance conference at Cass Business School in London, 14 July 2014, have been finalised.


Register by the end of May to get the early bird booking fee.

The organisers gratefully acknowledge the sponsorship of Mango Solutions, CYBAEA, RStudio and PwC without whom the event wouldn't be possible.

No comments :

Post a Comment

Notes from the Tokyo R User Group meeting, 17 April 2014

2 comments
Last Thursday I had the pleasure to attend the Tokyo R user group meeting. And what a fun meeting it was! Over 40 R users had come together in central Tokyo. Yohei Sato, who organises the meetings, allowed me to talk a little about the recent developments of the googleVis package.


Thankfully all talks were given in English:

Following the meeting the user group had booked a pub around the corner for a few drinks and some food. Brilliant!

Delicious chicken steaks and rice porridge

The next morning, as I woke up in on the 23rd hotel floor in Shinjuku I felt that my bed was moving. I am sure it was the earthquake, but what a weird feeling it was with a little hangover.

2 comments :

Post a Comment

googleVis 0.5.1 released on CRAN

No comments
GoogleVis 0.5.1 was released on CRAN yesterday.


New Features

Changes

  • The help files of gvis charts no longer show all their options, instead a link to the online Google API documentation is given.
  • Updated googleVis demo
  • All googleVis output will be displayed in your default browser. In previous versions of googleVis output could also be viewed in the preview pane of RStudio. This feature is no longer available with the current version of RStudio, but is likely to be introduced again with the release of RStudio version 0.99 or higher.
I will discuss the new features in more detail at the R user group meetings in Tokyo this Thursday and in Cologne, 23 May.

No comments :

Post a Comment