Sankey diagrams with googleVis

23 comments
Sankey diagrams are great for visualising flows from one set of data values to another. Although named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 to show the energy efficiency of a steam engine, the best know Sankey diagram is probably Charles Minard's Map of Napoleon's Russian Campaign of 1812, which he actually produced in 1869.

Thomas Rahlf: Datendesign mit R

The above example from Thomas Rahlf's book Datendesign mit R shows that Minard's plot can be reproduced with base graphics in R. Aaron Berdanier posted in 2010 the SankeyR function and January Weiner published the river plot package on CRAN that allows users to create static Sankey charts as well.

Interactive Sankey diagram can be generated with rCharts and now also with googleVis (version >= 0.5.0). For my a first example I use UK visitor data from VisitBritain.org. The following diagram visualises the flow of visitors in 2012; where they came from and which parts of the UK they visited. This example illustrates the key concept already. I need a data frame with three columns that explains the flow of data from a source to a target and the strength or weight of the connection.




My next example uses a graph data set that I visualise in the same way again, but here I start to play around with the various parameters of the Google API.




As stated by Google, the Sankey chart may be undergoing substantial revisions in future Google Charts releases.

For more information and installation instructions see the googleVis project site and Google documentation.

Session Info

R version 3.0.3 (2014-03-06)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets  methods  
[7] base     

other attached packages:
[1] googleVis_0.5.0-4 igraph_0.7.0     

loaded via a namespace (and not attached):
[1] RJSONIO_1.0-3 tools_3.0.3 

23 comments :

  1. Somewhere_In_The_Middle25 March 2014 at 13:53

    The report windows just say SSL Protocol error :(

    ReplyDelete
  2. I believe, this is a result of a security setting at your end. The googleVis charts are hosted on Dropbox and I am sure your proxy regards them as a potential threat.

    ReplyDelete
  3. I get this message trying to reproduce the plots: Error in plot(gvisSankey(UKvisits, from = "origin", to = "visit", weight = "weight", :
    could not find function "gvisSankey"

    ReplyDelete
  4. You have to install the developer version of googleVis first. Visit http://gitub.com/mages/googleVis for more details.

    ReplyDelete
  5. Thanks! Really cool, now it works!

    ReplyDelete
  6. Just a minor thing -- I am the author of the riverplot package :-)

    ReplyDelete
  7. I got an error message to the code

    Error in gvisSankey(UKvisits, from = "origin", to = "visit", weight = "weight", :

    could not find function "gvisChart" :(

    ReplyDelete
  8. Oops, I am really sorry about this, now corrected.

    ReplyDelete
  9. Please send me more details by email, in particular the output of sessionInfo(). It appears that you didn't install googleVis correctly.

    ReplyDelete
  10. Hi Markus, Thanks for your help

    sessionInfo()
    R version 3.0.3 (2014-03-06)
    Platform: x86_64-w64-mingw32/x64 (64-bit)

    locale:
    [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
    [4] LC_NUMERIC=C LC_TIME=English_United States.1252

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] rCharts_0.4.2 devtools_1.4.1 googleVis_0.5.0-4

    loaded via a namespace (and not attached):
    [1] coin_1.0-23 digest_0.6.4 evaluate_0.5.1 grid_3.0.3 httr_0.3 lattice_0.20-27 memoise_0.1
    [8] modeltools_0.2-21 mvtnorm_0.9-9997 parallel_3.0.3 party_1.0-13 plyr_1.8.1 Rcpp_0.11.1 RCurl_1.95-4.1
    [15] rJava_0.9-6 RJSONIO_1.0-3 rpart_4.1-5 sandwich_2.3-0 splines_3.0.3 stats4_3.0.3 stringr_0.6.2
    [22] strucchange_1.5-0 survival_2.37-7 tools_3.0.3 whisker_0.3-2 yaml_2.1.11 zoo_1.7-11

    ReplyDelete
  11. Hi, Markus. Great post. I'm trying to implement gvisSankey(). Just having trouble installing the dev version from your github repo. Getting the following error:

    install_github("mages/googleVis")
    Installing github repo(s) mages/googleVis/master from hadley
    Downloading mages/googleVis.zip from https://github.com/hadley/mages/googleVis/archive/master.zip

    Error: client error: (404) Not Found

    Many thanks! -adam (p.s. please don't spend more than 1 minute thinking about this.)

    Session info:
    R version 3.0.2 (2013-09-25)
    Platform: i386-w64-mingw32/i386 (32-bit)

    locale:
    [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
    [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
    [5] LC_TIME=English_United States.1252

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] httpuv_1.2.1 Rcpp_0.11.0 shiny_0.8.0 knitr_1.5 RJSONIO_1.0-3 devtools_1.3

    loaded via a namespace (and not attached):
    [1] bitops_1.0-6 caTools_1.14 digest_0.6.3 evaluate_0.5 formatR_0.9
    [6] grid_3.0.2 httr_0.2 lattice_0.20-23 memoise_0.1 parallel_3.0.2
    [11] plyr_1.8 rCharts_0.4.1 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2
    [16] whisker_0.3-2 xtable_1.7-1 yaml_2.1.7

    ReplyDelete
  12. It sounds to me like you are behind a firewall that is causing you the problem. You could try to download the source zip-archive and install the package manually, or alternatively install my pre-build binary version for Windows.
    I hope this helps.

    ReplyDelete
  13. That worked. Thanks, Markus. Much appreciated.

    ReplyDelete
  14. alessandro benedetti3 April 2014 at 16:52

    Hi Markus,
    scuse me, i'am an unskilled worker
    about error message "could not find function " gvisChart", I ask you if it possible to have an aid abot developer version.
    thank you Ale

    sessionInfo()

    R version 3.0.3 (2014-03-06)

    Platform: i386-w64-mingw32/i386 (32-bit)

    locale:

    [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252

    [3] LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C

    [5] LC_TIME=Italian_Italy.1252

    attached base packages:

    [1] stats graphics grDevices utils datasets methods base

    other attached packages:

    [1] igraph_0.7.0 devtools_1.4.1

    loaded via a namespace (and not attached):

    [1] digest_0.6.4 evaluate_0.5.3 formatR_0.10 httr_0.3 knitr_1.5

    [6] memoise_0.1 parallel_3.0.3 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.3

    [11] whisker_0.3-2

    ReplyDelete
  15. Ale, I believe you forgot to load the package.
    Try library(googleVis)
    example(gvisSankey)

    ReplyDelete
  16. alessandro benedetti4 April 2014 at 10:41

    thank Markus,

    i would'nt like to abuse your tolerance, but I find very interesting in your work. I try new path but for me it's impossible to replicate your code about Sankey. (i'am trying to introduce R in our poor public school).

    here my bash:

    R version 3.0.3 (2014-03-06) -- "Warm Puppy"
    Copyright (C) 2014 The R Foundation for Statistical Computing
    Platform: i386-w64-mingw32/i386 (32-bit)
    ...
    update.packages(ask='graphics',checkBuilt=TRUE)
    install.packages(googleVis)
    library(googleVis)
    require(igraph)
    library(igraph)
    require(RJSONIO)
    library(RJSONIO)
    #on http://cran.r-project.org/bin/windows/Rtools/ download Rtools31.exe
    install.packages(devtools)
    library(devtools)

    >UKvisits <- data.frame(origin=c(....
    .....
    > plot(
    gvisSankey(UKvisits, from="origin",
    to="visit", weight="weight",
    options=list(
    height=250,
    sankey="{link:{color:{fill:'lightblue'}}}"
    + ))
    + )

    Error in plot(gvisSankey(UKvisits, from = "origin", to = "visit", weight = "weight", :
    could not find function "gvisSankey"
    > gvisSankey <- function(data, from="", to="", weight="",
    options=list(), chartid){
    my.type <- "Sankey"
    dataName <- deparse(substitute(data))
    my.options <- list(gvis=modifyList(list(width=400, height=400),options), dataName=dataName,
    data=list(from=from, to=to, weight=weight,
    +allowed=c("number", "string"))
    )
    #checked.data <- gvisCheckSankeyData(data, my.options)
    output <- gvisChart(type=my.type, checked.data=data, options=my.options,
    chartid=chartid, package="sankey")
    return(output)
    }

    > plot(
    gvisSankey(UKvisits, from="origin",
    to="visit", weight="weight",
    options=list(
    height=250,
    sankey="{link:{color:{fill:'lightblue'}}}"
    ))
    )

    Error in gvisSankey(UKvisits, from = "origin", to = "visit", weight = "weight", :
    could not find function "gvisChart"


    thank in any case.
    ale

    ReplyDelete
  17. You have to install the developer version from GitHub. The following lines of R code will do just that

    install.packages(c("devtools","RJSONIO", "knitr", "shiny", "httpuv"))
    library(devtools)
    install_github("mages/googleVis")

    ReplyDelete
  18. alessandro benedetti7 April 2014 at 09:46

    thank you for your kindness
    congratulation for your work.
    Ale

    ReplyDelete
  19. Brent Schneeman29 April 2014 at 14:40

    In comparing gvisSankey and the rCharts sankey, there are parts of both that I like. When you hover over rCharts, tool tips popup indicating the number of items in the edge or the node. Any chance of getting that into gvisSankey?

    http://timelyportfolio.github.io/rCharts_d3_sankey/example_build_network_sankey.html <- live version of rCharts.



    I really like the gvisSankey behavior of highlighting all edges connected to a node when hovering over the node.

    ReplyDelete
  20. Hello Markus! I'm wondering if it is possible to change the order of the nodes/branches somehow?

    ReplyDelete
  21. Hi Peter, I don't think the options can be set by the user currently. The Google documentation states that the layout is done automatically, see: https://google-developers.appspot.com/chart/interactive/docs/gallery/sankey#SimpleExample
    Markus

    ReplyDelete
  22. Mohammad Moazzam Mehar12 August 2014 at 17:26

    Hi,
    I downloaded your code and try to run on my R Console,but the below error is coming..Please help me how to solve it

    Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
    pdflatex is not available
    Calls: -> texi2pdf -> texi2dvi
    Execution halted
    Error: Command failed (1)

    ReplyDelete
  23. Well, the statement tells you that pdflatex is not available. Google how to install pdflatex on your computer.

    ReplyDelete