# R is the easiest language to speak badly

I am amazed by the number of comments I received on my recent blog entry about “by”, “apply” and friends. I had started my post by pointing out that R is a language. Well indeed, I have come to the conclusion, that it is a language with lots of irregular expressions and dialects. It feels a bit like German or French where you have to learn and memorise the different articles. The Germans have three singular definite articles: der (male), die (female) and das (neutral), the French have two: le (male) and la (female). Of course there is no mapping between them, and how do you explain that a girl in German is neutral (das Mädchen), while manhood is female (die Männlichkeit)?

Back to R. As I found out, there are lots of different ways to calculate the means on subsets of data. I begin to wonder, why so many different interfaces and functions have been developed over the years, and also why I didn’t use the aggregate function more often in the past?

Can we blame internet search engines? Why should I learn a programming language properly, when I can find approximate answers to my problem online. I may not end up with the best answer, but with something which will work after all: Don’t know why, but it works.

And sometimes the help files can be more difficult to understand than the code in the examples. Hence, I end up playing around with the example code until it works, and only then I try to figure out how it works. That was my experience with reshape.

Maybe this is a bit harsh. It is always up to the individual to improve his language skills, but you can get drunk in a pub as well, by only being able to order beer. I think it was George Bernard Shaw, who said: “R is the easiest language to speak badly.” No, actually he said: “English is the easiest language to speak badly.” Maybe that explains the success of English and R?

Reading helps. More and more books have been published on R over the last years, and not only in English. But which should you pick? Xi’an’s review on the Art of R Programming suggests that it might be a good start.

Back to aggregate. Has anyone noticed, that the formula interface of aggregate is different to summaryBy?

aggregate(cbind(Sepal.Width, Petal.Width) ~ Species, data=iris, FUN=mean)     Species Sepal.Width Petal.Width1     setosa       3.428       0.2462 versicolor       2.770       1.3263  virginica       2.974       2.026

versus

library(doBy)summaryBy(Sepal.Width + Petal.Width ~ Species, data=iris, FUN=mean)     Species Sepal.Width.mean Petal.Width.mean1     setosa            3.428            0.2462 versicolor            2.770            1.3263  virginica            2.974            2.026

And another slightly more complex example:
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, FUN=sum)summaryBy(ncases + ncontrols ~ alcgp + tobgp, data = esoph, FUN=sum)