polar histogram: pretty and useful

Do you have tens of histograms to show but no room to put them all on the page? As I was reading this paper in Nature Genetics, I came across a simple and clever way of packing all this information in a small space: arrange them all around a circle, and add some guides to help their cross-comparison.

It didn’t look too difficult to implement in ggplot2 thanks to polar coordinates and after a busy Saturday afternoon I ended up with the following image with my data (*) (and a poster-ready pdf, after 2 seconds of prettying up with Inkscape):

The graph shows the proportion of some SNP scores (‘first’, ‘second’ and ‘third’) for a number of phenotypes, which are grouped by themes. I’m quite happy with the result. It’s pretty and useful: it’s very easy to compare one histogram with any of the other 60.

The code is still a bit rough around the edges; a few things are not terribly elegant or are hard-coded. An improved version will be shipped with our graphical package next month. In the mean-time, here it is, if you want to try it with your own data.

It returns a ggplot object containing the graph. You can either display it, with print(), save it as a pdf with ggsave(“myPlot.pdf”) or modify it with the usual ggplot2 commands. I’ve called it polar histogram, which, I think, is self-explanatory. If you know how it’s actually called, please let me know. (No, I will not call it polR histogram.)

And here is some fake data to get you going:

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# fake data for polarHistogram()
# Christophe Ladroue
library(plyr)
library(ggplot2)
source("polarHistogram.R")
 
# a little helper that generates random names for families and items.
randomName<-function(n=1,syllables=3){
  vowels<-c("a","e","i","o","u","y")
  consonants<-setdiff(letters,vowels)
  replicate(n,
            paste(
              rbind(sample(consonants,syllables,replace=TRUE),
                    sample(vowels,syllables,replace=TRUE)),
              sep='',collapse='')
            )
}
 
  set.seed(42)
 
  nFamily<-20
  nItemPerFamily<-sample(1:6,nFamily,replace=TRUE)
  nValues<-3
 
  df<-data.frame(
    family=rep(randomName(nFamily),nItemPerFamily),
    item=randomName(sum(nItemPerFamily),2))
 
df<-cbind(df,as.data.frame(matrix(runif(nrow(df)*nValues),nrow=nrow(df),ncol=nValues)))
 
 
  df<-melt(df,c("family","item"),variable_name="score") # from wide to long
  p<-polarHistogram(df,familyLabel=FALSE)
  print(p)

Options:
Many defaults can be changed already, look at the code for the complete list. The two things you might want to change are familyLabels (logical) which displays (or not) the name of each group as well, and direction, which is either ‘inwards’ or ‘outwards’.

Coding notes:
It wasn’t terribly difficult but it did take me a bit longer than expected, for a few reasons:

  1. coord_polar() doesn’t affect the orientation of geom_text() so it had to be calculated manually.

  2. You’ll notice that the label orientations change between 6 and 9 o’clock, or they would end up upside down and be difficult to read.
  3. There are some scoping issues with plyr and ggplot2 which can be a bit annoying once you encapsulate your code in a function. For example:

    Select All Code:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    df<-data.frame(
      x=runif(10),
      y=runif(10))
     
    z<-10
    ggplot(df)+geom_point(aes(x=x+z,y=y)) # works
     
    rm(z)
    fakeFunction<-function(df){
      z<-10
      ggplot(df)+geom_point(aes(x=x+z,y=y))
      }
     
    fakeFunction(df) # error

Happy plotting!

(*) The numbers are fudged, don’t spend time reverse-engineering them.

Update (24/03/2012):
Christos Hatzis has modified my original code to plot a collection of un-normalised bar charts, like this.

He’s happy to share his code here: PolarBarchart.zip, together with a test file.

Update (02/06/2012):
You can find a better version in my R package ‘phorest‘.

This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

25 Responses to polar histogram: pretty and useful

  1. Elena says:

    When I enter the source(“polarHistogram.R”) it says that is not found. So I can not go thought the example

    thank you!!

    • CL says:

      Hi Elena,

      make sure to download and unzip the source code and leave it where your script is (or specify its location in source(“thePath/polarHistogram.R”)).

  2. Elena says:

    I will try tomorrow morning!! many thanks for your early answer, I thing it’ll be a nice figure. Now I will try to modify or change the script for my data…not easy for a beginner in R but quite exciting aswell.

  3. Elena says:

    I would like to change the colour pallete to my own colour vector. How could I do it without disturbing the other functions of the script?

    Thanks

    • CL says:

      It’s a ggplot object so you can change the colour palette with the usual options for scale_fill_*.

      To use your own colours, use scale_fill_manual().
      Example:

      p<-polarHistogram(myData)
      p+scale_fill_manual(values = c('V1'='red','V2'='blue', 'V3'='green'))
      

      Or if you want to use another palette from colorBrewer:

      p<-polarHistogram(myData)
      p+scale_fill_brewer(type='div',palette='RdYlBu')
      

      More information about scales on here.

  4. It’s a really nice plot… I keep trying to polar-plot everything in my data!

  5. Elena says:

    Hello again,

    I got my polar plot!! It looks like very nice endeed!!But to be finished and perfect I’d like to set up all teh item labels to the same direction. I guess where to change it on the script, but i find it difficult cos of the trigonometrics..

    • CL says:

      The changes in direction are calculated in readableAngle() and readableJustification().
      Replace them with this and no change will be made:

        # item labels
        readableAngle< -function(x){
          angle<-x*(-360/totalLength)-alphaStart*180/pi+90
          #angle+ifelse(sign(cos(angle*pi/180))+sign(sin(angle*pi/180))==-2,180,0)
          angle
        }
        readableJustification<-function(x){
          angle<-x*(-360/totalLength)-alphaStart*180/pi+90
          #ifelse(sign(cos(angle*pi/180))+sign(sin(angle*pi/180))==-2,1,0)
          0
        }
  6. Joao Ricardo says:

    Hi, amazing graph, really liked it.

    I am trying to work on the graph to publish in a pape, but I am using just one item as Family. I wanted to take out the percents and the white lines, but I am not that advanced with R. Could you please help me?

    • CL says:

      Thanks!
      -
      To remove the white guide lines, you can simply comment out the following lines (around line 135):

        # guides
      #   p< -p+geom_segment(
      #     aes(
      #       x=xmin,
      #       xend=xend,
      #       y=y,
      #       yend=y),
      #     colour="white",
      #     data=guidesDF)
      
        # label for guides
      #   guideLabels<-data.frame(
      #     x=0,
      #     y=affine(1-guides/100),
      #     label=paste(guides,"% ",sep='')
      #     )
      #
      #   p<-p+geom_text(
      #     aes(x=x,y=y,label=label),
      #     data=guideLabels,
      #     angle=-alphaStart*180/pi,
      #     hjust=1,
      #     size=4)
      

      Although, I have to say, I find it harder to read.
      • Joao Ricardo says:

        Grat! Thankyou a lot. My data set is far smaller then yours, so it will be beautiful! Thanks again!

      • Joao Ricardo says:

        Hi again, really sorry to bother, but I did not find a way to delete the senteces you ment. I am a new user to R and I am having trouble with that.

        Could you give me a few more beginners steps to take out the lines and the percent numbers.

        Sorry!

      • CL says:

        Hi Joao,

        The lines I’ve mentioned are in polarHistogram.R. You can either delete them or comment them out by adding a # at the beginning of each line.

  7. Bill Raynor says:

    the old guy raises his hand:

    > df<-melt(df,c("family","item"),variable_name="score") # from wide to long

    doesn’t work, giving the following message

    > p<-polarHistogram(df,familyLabel=FALSE)
    Error in order(family, item, score) : object 'score' not found

    but changing variable_name to variable.name does

    > df2 p print(p)

    I am using R 2.15.1 and RStudio 0.96.331. I’m new to R.
    Is this kind of thing (changing arg names, deprecation) typical of R or did I just wander over from SAS at the wrong time? I would have thought ggplot was fairly stable, as I’ve been hearing about it for quite a while. I’ve got decades old SAS code that still runs just fine. (misses all the improvements, but still works…)

    Thanks
    Bill

    • CL says:

      :) well it’s not so much R as R packages (reshape2 in this case) that can change from one version to the next. This can be annoying, but ultimately it’s solvable by looking at the help file.

      By the way, ggplot2 has changed substantially from 0.8 to 0.9, especially when it comes to scales. See this for a transition guide: http://cloud.github.com/downloads/hadley/ggplot2/guide-col.pdf

      (I know it’s annoying but that’s for the best.)

  8. Bill Raynor says:

    That last bit of code got mangled:

    > df2<-melt(df,c("family","item"),variable.name="score") # from wide to long

    I’ll skip reproducing all the deprecation messages (e.g. use “element_blank” instead of “theme_blank”)
    Thanks
    Bill

  9. momo says:

    I have made the tutorial but no graphic window has been lunched and i got the following errors
    Warning messages:
    1: ‘opts’ is deprecated.
    Use ‘theme’ instead.
    See help(“Deprecated”)
    2: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    3: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    4: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    5: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    6: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    7: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    8: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    9: ‘theme_blank’ is deprecated.
    Use ‘element_blank’ instead.
    See help(“Deprecated”)
    > print(p)n(colour=’black’, fill=NA)
    Error: unexpected symbol in ” print(p)n”

    • CL says:

      Hi,

      Use the version from the package if you haven’t already.
      The warning messages are annoying and due to a change of syntax in ggplot2. I’ll try to update the code when I have some time but don’t hold your breath!

      I’ve run the code in the helpfile of phenotypicPhorest::polarhistogram and I do get a graph. Are you sure you’ve used print(p)? It looks like you’ve used print(p)n, which would explain the error message.

  10. Meren says:

    Hi,

    I see it’s been a long time since you posted this, so I’d understand if I don’t get any response.

    I downloaded phorest and have been tyring to get your “fake data for polarHistogram()” running with polarHistogram.

    However, I keep getting this error, even for the example code you shared on this page:

    “Error: Continuous value supplied to discrete scale”

    Do you have any suggestion?

    Thanks in advance!

    • CL says:

      Hi,

      I’m not sure what’s happening: I’ve run the example and it’s working (ignoring the deprecation warnings). Are you using the latest versions of ggplot2, reshape2 and plyr? Which OS are you using?

  11. Hey CL,
    Thats super, I quickly tested it. When I have time, I’ll play more with the code, to make the white gap go, trying scatter plot and histogram in same plot. It might surpass the submissive need for Circos.

    Cheers

    • CL says:

      Thanks, I hope you find it useful.
      -
      You could probably do something like Circos using the the same projection. I was tempted to have a go at it at some point but never got round to it.

  12. monglean says:

    Hi CL,
    It’s super nice and useful plot! Thanks so much. How do you fix the item labels when your data has more groups (family) and items within each group? If you run the following codes, you’ll see that the item labels for the first group (counter-clockwise) are not properly displayed (they are flipped). Thanks for your help.

    randomName<-function(n=3,syllables=7){
      vowels<-c("a","e","i","o","u","y")
      consonants<-setdiff(letters,vowels)
      replicate(n,
                paste(
                  rbind(sample(consonants,syllables,replace=TRUE),
                        sample(vowels,syllables,replace=TRUE)),
                  sep='',collapse='')
      )
    }
    
    set.seed(42)
    
    nFamily<-6
    nItemPerFamily<-sample(1:25,nFamily,replace=TRUE)
    nValues<-4
    
    df<-data.frame(
      family=rep(randomName(nFamily),nItemPerFamily),
      item=randomName(sum(nItemPerFamily),2))
    
    df<-cbind(df,as.data.frame(matrix(runif(nrow(df)*nValues),nrow=nrow(df),ncol=nValues)))
    
    df<-melt(df,c("family","item"),variable_name="score") # from wide to long
    p<-polarHistogram(df,direction="outwards", circleProportion=0.95,familyLabel=TRUE,
                      binSize=0.5,spaceItem=0.3, spaceFamily=1.5, innerRadius=0.4, alphaStart=-.1)
    print(p)
    
    • CL says:

      I see what you mean. The angle for the labels is hardcoded in readableAngle() in polarHistogram(), so you’d need to change this function to get the orientation you want.

      I’ll try and fix it at some point but in the meantime you’re welcome to have a go at it. Let me know if you do and I’ll include your patch. Cheers!

Leave a Reply

Your email address will not be published. Required fields are marked *

*


8 − = zero

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">