R related books: Traditional vs online publishing
A few years ago I used the publication list on r-project.org as an argument with the IT department that R is an established statistical programming language and that they should allow me to install it on my PC. I believe at the time there were about 20 R related books available.
A recent post on Recology pointed me to a talk given by Ed Goodwin at the Houston R user group meeting about regular expressions in R, something I always wanted to learn properly, but never got around to do.
So let's see, if we can manage to extract the information of published R books and texts from r-project.org, with what we learned from Ed about regular expressions in R.
We will start analysing the bib-file on r-project.org by publisher and then move on to look more closely at the number of titles published over time, including the self-published PDF-files on CRAN.
We read the bib-file into R using the
readLinesfunction and start analysing the data with regular expressions. The function
regexprallows us to find for each line the character place where "publisher =" starts, or it will return -1 otherwise, if no entry is found. We continue with the R function
strsplitto cut the strings into sub-component for further analysis:
We note that Springer is by far the most popular publisher for R related books. Thus, if you are looking for a specific topic around R your safest bet would be to check out Springer's portfolio.
However, although currently Springer is the publisher with the highest appetite for R, you may be able to find the information free online on r-project.org, in particular if your are looking for a tutorial like document/book.
Hence we want to compare the number of published R books in a traditional way, versus the PDF-files contributed online on CRAN: http://cran.r-project.org/doc/contrib/. CRAN is in this respect also a bit of a publisher, as I assume that the guys behind CRAN have some kind of a filtering and QA process.
We use the XML package to read the online directory content of the contributed books into R to get a better understanding of the published PDF-files. In a similar approach as above, we analyse the R books and PDF files published by year. Please find the R code below the charts.
From the two charts below we can see that over the years more and more R texts have been made available, illustrating the increased interest in R. The first chart shows the number of books/documents published in each year, while the second chart shows the same data in a cumulative way.
Today there are 206 R related books available either on CRAN or via your bookshop. Of the 206 texts 113 are published in the traditional sense with a publishing house, and the number is still growing. However the growth has slowed down a bit over the recent years, after a peak of 26 new books in 2009.
From the second chart I can see that I must have had the discussion on R I mentioned earlier around 2004 - 2005.