Skip to content

A few quick observations on StackOverflow questions tagged R

While browsing through Pete Skomoroch’s delicious bookmarks(which is a full-time job in and of itself), I learned that StackOverflow.com makes their underlying q&a data available.

Just for fun, I wrote a few quick queries against this dataset, centered around the R tag. Here are a handful of findings – data is through 31-Oct-2009. Some of this data is already presented in the StackOverflow site but bear with me here.

The most common tags associated with R are:
statistics – 46
ggplot2 – 20
plot – 13
graphics – 10
vector – 9
emacs – 8
matrix – 8

We all know that Dirk, Shane, and Hadley lead the way in terms of questions answered, but who knew that chris_dubois leads the pack when it comes to answering their own question with 10?

And finally, out of 20 posts totaling 32 answers tagged with ggplot2(at the time),Hadley Wickham, the package’s author has only contributed three answers. The fact that the rest of the questions were answered by users speaks volumes of the community behind ggplot2. Excellent Work, Hadley!

Here is my version of the leaderboard as of the end of October, 2009.

r stackoverflow leaderboard

r stackoverflow leaderboard

2 Comments

  1. Shane

    Very neat! Thanks! Looking forward to seeing more of your analysis.

    What are dist_questions?

    Regarding the own_question stat, it might interesting to separate that out by questions that were selected as the accepted answer.

    Posted on 12-Nov-09 at 11:13 am | Permalink
  2. admin

    Sorry, dist_questions represents the number of distinct questions – some people have posted more than a single answer to a given question.

    I intend to do more with accepted answers, scoring, etc. as time permits. Hopefully soon!

    Posted on 12-Nov-09 at 11:28 am | Permalink

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*