Anarchy Golf! And that’s your Sunday gone.

I like to follow good practice when I program. I want my code to be readable, properly indented, modular and re-usable. And I want my variables to have descriptive names. There’s nothing that I hate moderately dislike more than arbitrary abbreviations and inconsistent style. I have to say that R is not the best example when it comes to style. Even base functions often have weird names, and their arguments are either camelCased, period.separated, abbrvtd, all willy-nilly with no consistency, as if saving a few keystrokes was so important. It’s like learning PHP again. But the weirdest thing I’ve come across yet is the possibility of using a partial name for an argument (e.g. co for collapse). I’m at lost to find a rationale for this; it seems designed to engineer impenetrable code.

Good and consistent style helps you code better. Long, descriptive names make your code more readable and tab-completion will save you those precious keystrokes. So go for it. That’s not to say it can’t a problem sometimes. For example, this week I was adding some bells and whistles to a function I’d written. One statement involved subsetting a data frame on a hard coded value, like subset(result,association==”firstKind”)
A new argument for my function was, you guessed it, association. You see where it’s heading to; the statement turned to:

Select All Code:
1
subset(result,association==association)

And of course it failed; the condition is always true, because both instances of association are interpreted to be referring the column name of the data frame. So all rows are selected, whatever the argument is.

How to get out of this? Well, one could change the argument or the column name but I was already using them all over the function and elsewhere, and didn’t fancy tinkering the code too much at that point. Besides, I was reluctant to rename them in the first place, for reasons that should be obvious now. So what I’ve done is to read up the documentation on scoping, which is what the problem is, and came up with this:

Select All Code:
1
2
e<-environment()
subset(result,association==get('association',e)

The association on the right hand side is now correctly interpreted as the function’s argument. It’s a bit clumsy, but I get to use my beloved descriptive variable names and don’t need to go off on a replacement frenzy and its associated new bugs.

If you don’t see what I mean, here is some code I left on a related stackOverflow thread:

Select All Code:
1
2
3
4
5
6
7
8
9
10
x<-data.frame(
  start=sample(3,20,replace=TRUE),
  someValue=runif(20))
 
e<-environment()
start<-3
cat("\nDefaut scope:")
print(subset(x,start==start)) # all entries, as start==start is evaluated to TRUE
cat("\nSpecific environment:")
print(subset(x,start==get('start',e)))  # second start is replaced by its value in former environment.

However, bad practice has its perks and can be a lot of fun! I recently came across this very addictive online game: anarchy golf. There are more than 500 programming tasks to choose from. Each of them is very easy to code, like printing out the Fibonacci sequence, or just ‘Hello World’ but that’s not where the challenge is.

As the name suggests, the real challenge is to do it in as few bytes as possible! And that’s where obscure and horribly nested code come in handy. Variables names have to be 1 letter max., if you use variables at all, that is.

My current records are:

R-bloggers and readers, I challenge you to beat that!

Careful with the possible invisible line breaks at the end of your file. This bit of perl will get rid of it if your editor insists on adding it: perl -pe ‘chomp if eof’ . And no cheating! Your code must be pure R, so no using system() please.

It’s a great and terribly addictive game, and teaches you some of the weirdest and more obscure R commands and shortcuts. And partial matching suddenly become useful and even recommended.

(Original image: Paradise Valley Golf Course, Fairfield, CA, by David Bastyr, license: CC BY-NC-SA 2.0)

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

8 Responses to Anarchy Golf! And that’s your Sunday gone.

  1. Tal Galili says:

    Hello dear,
    You do not have copyrights for this image, so I removed it from R-bloggers.

    Please be very careful with using images online without explicit copyrights. I already had good friends loosing thousands (!!!) of dollars over this…

  2. Tony says:

    Perhaps a small point …

    I’ve often heard the argument you make about large vs small variable names. In reading good code I’ve seen many good practices. Most are consistent. As others say, good code is boring. Also, communities of practice arise where most members of this or that community follow a similar style. But different communities might have some different styles. And individuals may be consistent in their code. Don Knuth has a distinctive style that’s all his.

    With respect to large vs small variable names, I try to keep to a practice of clean abstraction and fidelity to sources. If a routine has a general application and is documented elsewhere, I put a reference to the source in comments, and code with generic variables suited to the source and its jargon, not with variables suited to my local application. The quality of my algo code will be determined by fidelity to its source, not by fidelity to my local app.

    If the code has a stats source, I’ll use “n” as the variable for population size. If the code is about matching, I’ll use “a” and “b” for the left and right sides. I may assign the value of a large name variable to my small name variable (usually by reference), and then do the algo.

    In its smaller domain the algo code will work. It will be easier to check against a source using “a”, “b”, and “n”, then using your local long variable names.

    In German, there’s an expression, “fach-technische Wörter”, that marks this as a practice: use terms which developed and evolved in a discipline. Code integrates, but is filled with imports. Rather than force translation from an import discipline to one’s own local app, it’s often better to use terms from the import discipline.

    The documentation is easier too: just say “see reference [reference]“. Instead of lots of doc text, link to someone whose writeup was often more thoughtful than what you could come up at home.

    • CL says:

      Thank you for your comment. I hear what you’re saying and indeed there is no true one good practice, and it depends on the developer’s preferences and context. I guess R having such large community contributes to the diversity and inconsistency of styles; enforcing a single style would be near impossible and probably unproductive.

      Nevertheless, there certainly are plenty examples of bad styles and I think package providers (to whom I’m very grateful for their excellent work obviously) should pay some effort to have self-explanatory names when it comes to their function arguments. What they do under the bonnet is up to them of course, but it’d make the life of their users much easier by being able to guess what the correct arguments are and having a shot with tab-completion, instead of having to go back and forth the doc to check what spelling/abbreviation was chosen that day.

      I certainly don’t want to act as the style police! I myself don’t adhere to my ideals all the time. Even in this very post: I used e to store the environment! :)

  3. mpiktas says:

    What’s wrong with just using:


    x[x$start==start,]

    Subset is a convenience function, which creates new environment to work in, so the scoping rule applies, if the local variable has the name similar to the global variable, local variable will be used. I admit it might seem as a quirk, yet it is consistent with how environments work in R.

    • CL says:

      Thank you for your comment. In this example, this syntax would work and I agree it’s simpler.

      However, I’ve learned the hard way to be wary of that shortcut as it can produce some unexpected edge effect. Using subset is more robust and that’s what I use in my code; I do use [] in the command line though.

      (On top of that, the actual condition in my function had more than one argument and subset was more convenient)

      Edit to add one edge effect of [] (I was replying from my phone and couldn’t check):
      If you use:

      Select All Code:
      1
      2
      
      > x[x$start==3,c("someValue")]
      [1] 0.8209 0.7502

      the result is NOT a data frame any more, only a vector:

      Select All Code:
      1
      2
      
      > class(x[x$start==3,c("someValue")])
      [1] "numeric"

      Whereas

      Select All Code:
      1
      2
      3
      4
      5
      6
      
      > subset(x,start==3,select=someValue)
         someValue
      15    0.8209
      17    0.7502
      > class(subset(x,start==3,select=someValue))
      [1] "data.frame"

      Edit 2: another effect of []
      If the column contains some NA, you’re in for some weird behaviour:

      Select All Code:
      1
      
      > x[1:5,]$start<-NA

      Then

      Select All Code:
      1
      2
      3
      4
      5
      6
      7
      8
      9
      
      > x[x$start==3,]
           start someValue
      NA      NA        NA
      NA.1    NA        NA
      NA.2    NA        NA
      NA.3    NA        NA
      NA.4    NA        NA
      15       3    0.8209
      17       3    0.7502

      But

      Select All Code:
      1
      2
      3
      4
      
      > subset(x,start==3)
         start someValue
      15     3    0.8209
      17     3    0.7502

Leave a Reply

Your email address will not be published. Required fields are marked *

*


four − = 2

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">