Staying on top of new CRAN packages is quite a challenge nowadays. However, thanks to Dirk’s CRANberries service I occasionally spot a new gem, such as wbstats, which appeared on CRAN last week. Similarly to the WDI package, wbstats offers an interface to the World Bank database.
With the functions of wbstats the World Bank data can be searched and data for several indicators requested. Unlike WDI, the data is returned in a ‘long’ table with one column for all values and a separate column for the indicators.
Formatting data for output in a table can be a bit of a pain in R. The package formattable by Kun Ren and Kenton Russell provides some intuitive functions to create good looking tables for the R console or HTML quickly. The package home page demonstrates the functions with illustrative examples nicely.
There are a few points I really like:
the functions accounting, currency, percent transform numbers into better human readable output
I have to admit that I find the plotmath expressions in R a little fiddly to annotate plots with mathematical notation. Apparently I am not the only one, but Stefano Meschiari did actually something about it. A few days ago his package latex2exp appeared on CRAN. The package provides the wonderful function latex2exp that translates LaTeX code into plotmath expressions. Brillant! All I have to remember is to escape the “” character, that is write “\“ instead of “”.
I love interactive pivot tables. That is the number one reason why I keep using spreadsheet software. The ability to look at data quickly in lots of different ways, without a single line of code helps me to get an understanding of the data really fast.
Perhaps I can do the same now in R as well. At yesterday’s LondonR meeting Enzo Martoglio presented briefly his rpivotTable package. Enzo builds on Nicolas Kruchten’s PivotTable.
We released googleVis version 0.5.8 on CRAN last week. The update is a maintenance release for the forthcoming release of R 3.2.0. Screen shot of some of the Google ChartsNew to googleVis? The package provides an interface between R and the Google Charts Tools, allowing you to create interactive web charts from R without uploading your data to Google. The charts are displayed by default via the R internal help browser.
One of my take aways from last week’s EARL conference was that R is more and more growing out of its academic roots into the enterprise. And with that come some challenges, e.g. how do I ensure consistent and systematic access to a set of R packages in an organisation, in particular when one team is providing packages to others?
Two packages can help here: roxyPackage and miniCRAN.
I wrote about roxyPackage earlier on this blog.
How did I miss the GrapheR package? The author, Maxime Hervé, published an article about the package  in the same issue of the R Journal as we did on googleVis. Yet, it took me a package update notification on CRANbeeries to look into GrapheR in more detail - 3 years later! And what a wonderful gem GrapheR is.
The package provides a graphical user interface for creating base charts in R.
In many cases Word is still the preferred file format for collaboration in the office. Yet, it is often a challenge to work with it, not so much because of the software, but how it is used and abused. Thanks to Markdown it is no longer painful to include mathematical notations and R output into Word. I have been using R Markdown for a while now and have grown very fond of it.
Occasionally I have to connect to services from R that ask for login details, such as databases. I don’t like to store my login details in the R source code file, instead I would prefer to enter the my login details when I execute the code. Fortunately, I found some old code in a post by Barry Rowlingson that does just that. It uses the tcltk package in R to create a little window in which the user can enter her details, without showing the password.
The example I present here is a little silly, yet it illustrates how to join tables with data.table in R. Mapping old data to new dataCategories in general are never fixed, they always change at some point. And then the trouble starts with the data. For example not that long ago we didn’t distinguish between smartphones and dumbphones, or video on demand and video rental shops. I would like to back track price change data for smartphones and online movie rental shops, assuming that their earlier development can be set to the categories they were formerly part of, namely mobile and video rental shops to create indices.
There can never be too many examples for transforming data with R. So, here is another example of reshaping a data.frame into a matrix.
Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years.
The format of the data frame above is how this kind of data is usually stored in a data base. However, I would like to see the payments of the different origin years in rows of a matrix.
The ave function in R is one of those little helper function I feel I should be using more. Investigating its source code showed me another twist about R and the “[” function. But first let’s look at ave.
The top of ave’s help page reads:
Group Averages Over Level Combinations of Factors
Subsets of x are averaged, where each subset consist of those observations with the same factor levels.
A friend of mine asked me the other day how she could use the function optim in R to fit data. Of course, there are built-in functions for fitting data in R and I wrote about this earlier. However, she wanted to understand how to do this from scratch using optim.
The function optim provides algorithms for general-purpose optimisations and the documentation is perfectly reasonable, but I remember that it took me a little while to get my head around how to pass data and parameters to optim.
Documenting code can be a bit of a pain. Yet, the older (and wiser?) I get, the more I realise how important it is. When I was younger I said ‘documentation is for people without talent’. Well, I am clearly loosing my talent, as I sometimes struggle to understand what I programmed years ago. Thus, anything that soothes the pain of writing and maintaining documentation must be good and should help me to better understand my ‘old me’ in the future.
I really should make it a habit of using data.table. The speed and simplicity of this R package are astonishing. Here is a simple example: I have a data frame showing incremental claims development by line of business and origin year. Now I would like add a column with the cumulative claims position for each line of business and each origin year along the development years.
It’s one line with data.
What is Rook?Rook is a web server interface for R, written by Jeffrey Horner, the author of rApache and brew. But unlike other web frameworks for R, such as brew, R.rsp (which I have used in the past1), Rserve, gWidgetWWWW or sumo (which I haven’t used yet) Rook appears incredible lightweight.
Rook doesn’t need any configuration. It is an R package, which works out of the box with the R HTTP server (R ≥ 2.
Transforming data sets with R is usually the starting point of my data analysis work. Here is a scenario which comes up from time to time: transform subsets of a data frame, based on context given in one or a combination of columns.
As an example I use a data set which shows sales figures by product for a number of years:df <- data.frame(Product=gl(3,10,labels=c(“A”,“B”, “C”)), Year=factor(rep(2002:2011,3)), Sales=1:30)
Product Year Sales
Waterfall charts are sometimes quite helpful to illustrate the various moving parts in financial data, particularly when I have positive and negative values like a profit and loss statement (P&L). However, they can be a bit of a pain to produce in Excel. Not so in R, thanks to the waterfall package by James Howard. In combination with the latticeExtra package it is nearly a one-liner to produce a good looking waterfall chart that mimics the look of The Economist:
It is not unusual that you will not have admin rights in an IT controlled office environment. But then again the limitations set by the IT department can spark of some creativity. And I have to admit that I enjoy this kind of troubleshooting.
The other day I ended up in front of a Windows PC with R installed, but a locked down “C:\Programme Files” folder. That ment that R couldn’t install any packages into the default directory “C:\Programme Files\R\R-X.
How do you apply one particular row of your data to all other rows?
Today I came across a data set which showed the revenue split by product and location. The data was formated to show only the split by product for each location and the overall split by location, similar to the example in the table below.
Revenue by product and continent
AfricaAmericaAsiaAustraliaEurope A 40% 30% 50% 40% 40%B 20% 40% 20% 30% 40%C 40% 30% 30% 30% 20%Total 10% 40% 20% 10% 20% I wanted to understand the revenue split by product and location.
How can I embed a small data set into my R code? That was the question I came across today, when I prepared my talk about Dynamical Systems in R with simecol for the forthcoming Cologne R user group meeting. I wanted to add all the R code of the talk to the last slide. That’s easy, but the presentation makes use of a small data set of 3 columns and 21 rows.
I had mentioned the Guardian’s data blog and the need for more data journalism earlier here. What I really like about the Guardian’s approach in particular is that they share the data of their articles and encourage readers to use it.
Of course there are perfectly valuable reasons for only displaying a chart and not making the underlying data available, e.g. to generate leads, as potential customers may get in touch with you asking for the underlying data, or technology issues that don’t allow you to upload data, etc.
The other day I wrote about the R functions by, apply and friends, which allow me to operate on subsets of data. All those functions work nicely, if the data is given in the right format. More often than not it isn’t and I have to reshape the data beforehand. Thus, time to discuss the reshape function. I will focus on the reshape function in base R, and not the package of the same name.
R is a language, as Luis Apiolaza pointed out in his recent post. This is absolutely true, and learning a programming language is not much different from learning a foreign language. It takes time and a lot of practice to be proficient in it. I started using R when I moved to the UK and I wonder, if I have a better understanding of English or R by now.
Languages are full of surprises, in particular for non-native speakers.
Fitting distribution with R is something I have to do once in a while, but where do I start?
A good starting point to learn more about distribution fitting with R is Vito Ricci’s tutorial on CRAN. I also find the vignettes of the actuar and fitdistrplus package a good read. I haven’t looked into the recently published Handbook of fitting statistical distributions with R, by Z. Karian and E.J. Dudewicz, but it might be worthwhile in certain cases, see Xi’An’s review.
Data analysis is often an iterative and interactive process. However, when I present about this subject, I feel often limited by the presentation software I use. It doesn’t matter if I use LaTeX/PDF, PowerPoint or Keynote. In all cases it is either very difficult or impossible to include interactive charts, such as Flash or SVG charts. As a result I have to switch between various applications during the talk. This can be fun, but quite often it is not.
Using R with LaTeX via Sweave is a great way to create reproducible output. However, using specific fonts, e.g. your corporate fonts, can be painful with pdflatex. Over the last few weeks I have fallen in love with the TeX format XeLaTeX and its XeTeX engine.
With XeLaTeX I had to overcome some hurdles, which I would like to share here: attaching files,
trimming and clipping images,
learning how to use the tikzDevice package.