A graphical overview of your MySQL database

If you use MySQL, there’s a default schema called ‘information_schema‘ which contains lots of information about your schemas and tables among other things. Recently I wanted to know whether a table I use for storing the results of a large number experiments was any way near maxing out. To cut a brief story even shorter, the answer was “not even close” and could be found in ‘information_schema.TABLES‘. Not being one to avoid any opportunity to procrastinate, I went on to write a short script to produce a global overview of the entire database.

infomation_schema.TABLES contains the following fields: TABLE_SCHEMA, TABLE_NAME, TABLE_ROWS, AVG_ROW_LENGTH and MAX_DATA_LENGTH (and a few others). We can first have a look at the relative sizes of the schemas with the MySQL query “SELECT TABLE_SCHEMA,SUM(DATA_LENGTH) SCHEMA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA“.

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
library("ggplot2") # You'll need ggplot2 0.9 for this.
library("reshape2")
library("RMySQL")
 
connection<-dbConnect(MySQL(), user="username", password="XXXXXX",host="127.0.0.1",port=3306,dbname='')
 
  query<-"SELECT TABLE_SCHEMA,SUM(DATA_LENGTH) SCHEMA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA"
  result<-dbGetQuery(connection,query)
  result$TABLE_SCHEMA<-reorder(result$TABLE_SCHEMA,result$SCHEMA_LENGTH)
  p<-ggplot(result)+geom_bar(aes(x=TABLE_SCHEMA,y=SCHEMA_LENGTH))+coord_flip()
  p<-p+xlab("Size")+ylab("")
  p<-p+opts(title="Schemas' size")
  print(p)

And for the whole overview, let’s break each schema down by tables:

Select All Code:
1
2
3
4
5
6
7
  query<-"SELECT TABLE_SCHEMA,TABLE_NAME,TABLE_ROWS,DATA_LENGTH FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'"
  result<-dbGetQuery(connection,query)
  result<-within(result,TABLE_NAME<-factor(TABLE_NAME,levels=sort(TABLE_NAME,decreasing=TRUE)))
  p<-ggplot(result)+geom_bar(aes(x=TABLE_NAME,y=DATA_LENGTH))+coord_flip()+facet_wrap(~TABLE_SCHEMA,scales='free')
  p<-p+xlab("Size")+ylab("")
  p<-p+opts(title="Tables' size")
  print(p)


Also, using the AVG_ROW_LENGTH and MAX_DATA_LENGTH and assuming a relatively constant row length, we can derive the maximum number of rows that a table can use, which gives us an estimate of how much space there is left:

Select All Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
  query<-"SELECT TABLE_SCHEMA,TABLE_NAME,100*TABLE_ROWS/FLOOR(MAX_DATA_LENGTH/AVG_ROW_LENGTH) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'"
#   query<-"SELECT TABLE_SCHEMA,TABLE_NAME,RAND(42)*100 AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema'"
 
  result<-dbGetQuery(connection,query)
  result$LEFTOVER<-100-result$USED
  result<-within(result,TABLE_NAME<-factor(TABLE_NAME,levels=sort(TABLE_NAME,decreasing=TRUE)))
  result<-melt(result,id.vars=c("TABLE_SCHEMA","TABLE_NAME"),variable.name='TYPE',value.name='PROPORTION',na.rm=TRUE)
  p<-ggplot(result)
  p<-p+geom_bar(aes(x=TABLE_NAME,y=PROPORTION,fill=TYPE),stat='identity')
  p<-p+coord_flip()+facet_wrap(~TABLE_SCHEMA,scales='free')
  p<-p+scale_fill_manual(values=c("USED"='#DD0000',LEFTOVER='#AAAAAA'))
  p<-p+xlab('')+ylab('')+opts(title="Tables' usage")
  print(p)
 
  query<-"SELECT TABLE_SCHEMA, MAX(100*TABLE_ROWS/FLOOR(MAX_DATA_LENGTH/AVG_ROW_LENGTH)) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA"
#   query<-"SELECT TABLE_SCHEMA, MAX(100*RAND(42)) AS USED FROM information_schema.TABLES WHERE TABLE_SCHEMA!='information_schema' GROUP BY TABLE_SCHEMA"
 
  result<-dbGetQuery(connection,query)
  result$LEFTOVER<-100-result$USED
  result$TABLE_SCHEMA<-reorder(result$TABLE_SCHEMA,result$USED)
  result<-melt(result,id.vars=c("TABLE_SCHEMA"),variable.name='TYPE',value.name='PROPORTION',na.rm=TRUE)
  p<-ggplot(result)
  p<-p+geom_bar(aes(x=TABLE_SCHEMA,y=PROPORTION,fill=TYPE),stat='identity')
  p<-p+coord_flip()
  p<-p+scale_fill_manual(values=c("USED"='#DD0000',LEFTOVER='#AAAAAA'))
  p<-p+xlab("")+ylab("")+opts(title="Largest Usage")
  print(p)
dbDisconnect(connection)

Unless you are using very large tables, those last two graphs should come out pretty much all gray. You can check that the colouring works by using the commented out queries instead, which use random values for the estimates.

About dbConnect(): I left it here to make things easier to replicate but I normally call a simple function which is just a wrapper for it, with my username and password in. This way my credentials are in one single place instead of all over my scripts.

PS: This is my first anniveRsary! I’ve been using R for a year now. And I’m certainly planning to carry on.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

5 Responses to A graphical overview of your MySQL database

  1. CPHM says:

    I’ve just two questions :
    – which version of R do you use ?
    – and your OS ?
    Thanks by advance.

    • CL says:

      Hi there,

      I’m on Ubuntu 64b, running R 2.14.2. Is it not working at your end?

      Don’t forget to upgrade to ggplot 0.9 – I don’t think the facetting will work that way in a previous version.

  2. CPHM says:

    My primary problem is that Im on Windows and RMySQL is no longer support since R.2.11.xx. That’s why i was asking about your version and OS.

    Thanks.

    • CL says:

      I see. I never use Windows so I can’t really help you on that front I’m afraid.

      Maybe install.packages(‘RMySQL’,type=’source’) since only the binaries are not available for Windows (but the source is)?

  3. G. Grothendieck says:

    Read the info at the Installation: link on the RMySQL CRAN page: http://cran.r-project.org/package=RMySQL

Leave a Reply

Your email address will not be published. Required fields are marked *

*


nine + 6 =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">