Browse Source

additions to org-R tutorial

Dan 11 years ago
parent
commit
9bdfe8825a

BIN
images/org-R/org-plot-example-1.png


+ 561 - 0
org-tutorials/org-R/#org-R.org#

@@ -0,0 +1,561 @@
+#+OPTIONS:    H:3 num:nil toc:t \n:nil @:t ::t |:t ^:t -:t f:t *:t TeX:t LaTeX:t skip:nil d:(HIDE) tags:not-in-toc
+#+STARTUP:    align fold nodlcheck hidestars oddeven lognotestate
+#+SEQ_TODO:   TODO(t) INPROGRESS(i) WAITING(w@) | DONE(d) CANCELED(c@)
+#+TAGS:       Write(w) Update(u) Fix(f) Check(c)
+#+TITLE:      org-R: Computing and data visualisation in Org-mode using R
+#+AUTHOR:     Dan Davison
+#+EMAIL:      davison@stats.ox.ac.uk
+#+LANGUAGE:   en
+#+PRIORITIES: A C B
+#+CATEGORY:   worg-tutorial
+
+# #+INFOJS_OPT: view:overview
+
+[[file:../index.org][{Back to Worg's index}]]
+
+* Introduction
+  org-R is an org-mode extension that performs numerical computations
+  and generates graphics. Numerical output may be stored in the org
+  buffer in org tables, and the input can also come from an org
+  table. Rather than starting off by documenting everything
+  systematically, I'll provide several commented examples. Towards the
+  end there are lists of [[*Table of available actions][available actions]] and [[*Table of available options][other options]].
+  
+  Although, behind the scenes, it uses [[http:www.r-project.org][R]], you do not need to know
+  anything about R. Common operations are provided `off the shelf' by
+  specifying options on lines starting with #+TBLR:. Having said that,
+  org-R also accepts raw R code (TBLRR: lines). For those who don't
+  yet know R, but think they might be interested, try the showcode:t
+  option. It displays the R code corresponding to the action you
+  requested, and so provides a good starting point for fine-tuning
+  your analysis. But that's getting ahead of things.
+
+  My hope is, of course, that this will be of use to people. So at
+  this stage any comments, ideas, feedback, bug reports etc would be
+  *very* welcome. I'd be happy to help anyone that's interested in
+  using this, via the Org mailing list.
+
+* Setting things up
+  The code is currently [[http://www.stats.ox.ac.uk/~davison/software/org-R/org-tblR.el][here]]. Soon it will be in the contrib
+  directory. The other things you need are R (Windows / OS X binaries
+  available on the [[http:www.r-project.org][R website]]; widely available in linux package
+  repositories) and the emacs mode [[http://ess.r-project.org/][Emacs Speaks Statistics]] (ESS). ESS
+  installation instructions are [[http://ess.r-project.org/Manual/readme.html#Installation][here.]]  Personally, under linux, I have
+  something like
+
+#+BEGIN_SRC emacs-lisp
+(add-to-list 'load-path "/path/to/ess/lisp")
+(require 'ess-site)
+#+END_SRC
+
+* Using org-R
+  org-R uses two different option lines to specify an
+  analysis/plot: #+TBLR: and #+TBLRR:. #+TBLRR: is the one that
+  accepts R code, so we'll ignore that for now. To make the action
+  happen, use M-x org-R-apply with point in the #+TBLR:
+  line. That's the only function you need, and it would make sense to
+  bind it to some key. So, first example.
+
+* Computing on org tables: tabulating values
+   Here's a command to tabulate the values in the second column. Issue
+   M-x org-R-apply in the following #+TBLR line.
+
+#+begin_example
+
+| col1 | col2 |
+|------+------|
+| A    | A    |
+| A    | B    |
+| B    | B    |
+#+TBLR: action:tabulate columns:2
+
+#+end_example
+
+  That results in
+
+#+begin_example
+
+| value | count |
+|-------+-------|
+| A     |     1 |
+| B     |     2 |
+
+#+end_example
+
+  . So the values in column 2 were tabulated as requested. However,
+  the original data got overwritten. That leads us to
+
+* Table references
+   
+   We can specify input data for analysis/plotting in 3 different
+   ways:
+   
+   1. by providing a reference to an org table with the intable:
+      option. You can optionally specify the file that the table is in
+      with the infile: option;
+
+   2. by pointing it to a csv file, locally or via http:, using
+     infile:/path/to/file.csv or infile:http://somewhere/file.csv
+
+   3. by doing neither, in which case it looks for a table immediately
+     above the #+TBLR(R) line(s).
+
+Case (3) is what happened above -- the input data came from a table
+immediately above the #+TBLR line. The default behaviour is to replace
+any such table with the output; this allows us to tweak the option
+line and update the analysis. However, normally we'll want to separate
+the data from the analysis output. So let's keep the data as a named
+table in the org file, and refer to it by name:
+
+#+begin_example
+
+#+TBLNAME:data-set-1
+| col1 | col2 |
+|------+------|
+| A    | A    |
+| A    | B    |
+| B    | B    |
+
+[arbitrary other content of org buffer]
+
+#+TBLR: intable:data-set-1 action:tabulate
+
+#+end_example
+
+which results in
+
+#+begin_example
+
+|   | A | B |
+|---+---+---|
+| A | 1 | 1 |
+| B | 0 | 1 |
+
+#+end_example
+
+Note that this time we did a different analysis: I removed the
+columns:2 option, so that tabulate was passed the whole table. As a
+result the output contains counts of joint occurrences of values in
+the two columns: out of the 4 possibilities, the only one we didn't
+observe was "B in column 1 and A in column 2". We could have achieved
+the same result with columns:(1 2). (But don't try to tabulate more
+than 2 columns: org does not do multi-dimensional tables).
+
+* Plotting data
+** Available off-the-shelf plotting commands
+  At the risk of this starting to sound like a bad and boring
+  undergraduate statistics textbook, the sort of plots that are
+  appropriate depend on the sort of data. Let's divide it up as
+
+ - discrete-valued data
+    [e.g. data-set-1 above, or the list of org variables customised by users]
+ - continuous-valued data
+   [e.g. the wing lengths of all Eagle Owls in Europe]
+ - indexed data 
+   [e.g. a data set in which each point is a time,
+    together with the size of the org source code base at that time]
+
+The available off-the-shelf actions are listed [[*Table of available actions][here]].
+
+** Continuous data example:
+    :PROPERTIES:
+    :ID:       2ce0fc04-b308-4b8d-8acc-805a9e5fed7d
+    :END:
+    We're going to need some data. So let's prove that org can also
+    speak statistics and use org-R to simulate the data. This
+    requires some raw R code, so skip this bit if you're not
+    interested.
+
+    The following #+TBLRR line simulates 10 values from a Normal
+    distribution with mean -3, and 10 values from a Normal
+    distribution with mean 3, and lumps them together. The point is that
+    the numbers we get should be concentrated around two different
+    values, and we should be able to see that in a histogram and/or
+    density plot.
+
+#+begin_example
+
+#+TBLRR: x <- c(rnorm(10, mean=-3, sd=1), rnorm(10, mean=3, sd=1))
+#+TBLR: title:"continuous-data" output-to-buffer:t
+
+#+end_example
+
+Here's what I got.  Note that the title: option set the name of the
+table with "#+TBLNAME"; we'll use that to refer to these data.
+
+#+begin_example
+
+#+TBLNAME:continuous-data
+|            values |
+|-------------------|
+| -2.48627002467785 |
+|  -4.0196287273144 |
+| -3.43471960580471 |
+| -5.21985294534255 |
+| -3.84201126431028 |
+| -1.72912705369668 |
+| -2.86703950990613 |
+| -2.82292622464752 |
+| -4.43246430621368 |
+| -1.03188727658288 |
+| 0.882823532068805 |
+|  3.28641606039499 |
+|  3.56029698321959 |
+|  2.91946660223152 |
+|  2.32506089804876 |
+|   3.3606298511366 |
+|  5.19883523425104 |
+|  4.86141359164329 |
+|  2.90073505260204 |
+|  4.21163939487907 |
+#+end_example    
+
+Now to plot the data. Let's have some colour as well, and this time
+the title: option will be used to put a title on the plot (and also to
+name the file link to the graphical output).
+
+
+#+begin_example
+
+[[file:tmp.png][histogram example]]
+#+TBLR: action:hist columns:1 colour:hotpink 
+#+TBLR: intable:continuous-data outfile:"png" title:"histogram example"
+
+#+end_example
+[[file:../../images/org-R/histogram-example.png]]
+
+[Note that you can use multiple TBLR lines rather than cramming all
+the options on to one line.]
+
+An alternative would be to produce a density plot. We don't have
+enough data points to justify that here, but we'll do it anyway just
+to show the sort of plots that are produced. This time we'll specify
+the output file for the png image using the output: option. (For the
+histogram we used output:"png". That's a special case; it doesn't
+create a file called "png" but instead uses org-attach to store the
+output in the org-attach dir for this entry. Same thing for the other
+available output image formats: "jpg", "jpeg", "pdf", "ps", "bmp",
+"tiff")
+
+#+begin_example
+
+[[file:density.png][density plot example]]
+#+TBLR: action:density columns:"values" colour:chartreuse4 args:(:lwd 4)
+#+TBLR: intable:continuous-data outfile:"density.png" title:"density plot example"
+
+#+end_example
+[[file:../../images/org-R/density.png]]
+
+There were a couple of new features there. Firstly, I referred to
+column 1 using its column label, rather than with the
+integer 1. Secondly, note the use of the args: option. It takes the
+form of a lisp property list ("p-list"), specifying extra arguments to
+pass to the R function (in this case density()). Here we used it to
+set the line thickness (lwd=4).
+
+** Discrete data example: the configuration variables survey
+
+The raw data, as collected by Manish, is in a table called
+org-variables-table, in a file called variable-popcon.org. We use the
+file: option to specify the org file containing the data, and the
+table: option to specify the name of the table within that file. [An
+alternative be to give the entry containing the table a unique id with
+org-id-get-create, refer to it with table:<uid>, and rely on the
+org-id mechanism to find it.].
+
+Now we tabulate the data. (We're not currently taking the sensible
+step that Manish did of checking whether the variables were given
+values different from their default).
+
+ Rather than cluttering up this org file with all the count data,
+we'll store them in a separate org file:
+
+#+begin_example
+
+[[file:org-variables-counts.org][org-variables-counts]]
+#+TBLR: action:tabulate columns:2 sort:t
+#+TBLR: infile:"variable-popcon.org" intable:"org-variables-table"
+#+TBLR: outfile:"org-variables-counts.org" title:"org-variables-counts"
+
+#+end_example
+[[file:org-variables-counts.org]]
+
+We can see the top few rows of the table by using action:head
+
+#+begin_example
+
+| rownames(x) | value                       | count |
+|-------------+-----------------------------+-------|
+|           1 | org-agenda-files            |    22 |
+|           2 | org-agenda-start-on-weekday |    22 |
+|           3 | org-log-done                |    22 |
+|           4 | org-todo-keywords           |    22 |
+|           5 | org-agenda-include-diary    |    19 |
+|           6 | org-hide-leading-stars      |    19 |
+#+TBLR: action:head
+#+TBLR: infile:"org-variables-counts.org" intable:"org-variables-counts" output-to-buffer:t
+
+#+end_example
+
+Here's a barplot of the counts. It makes it clear that over half the
+org variables are customised by only one or two users.
+
+#+begin_example
+
+[[file:org-variables-barplot.png][org-variables barplot]]
+#+TBLR: action:barplot rownames:t columns:1 width:800 col:darkblue
+#+TBLR: args:(:names.arg "NULL")
+#+TBLR: infile:"org-variables-counts.org" intable:"org-variables-counts"
+#+TBLR: outfile:"org-variables-barplot.png" title:"org-variables barplot"
+
+#+end_example
+[[file:../../images/org-R/org-variables-barplot.png]]
+
+*** Something more complicated: clustering org variables, and org users
+
+     OK, let's make a bit more use of R's capabilities. We can use the
+     org-variables data set to define distances between pairs of org
+     users (how similar their customisations are), and distances
+     between pairs of org variables (the extent to which people who
+     customise one of them customise the other). Then we can use those
+     distance matrices to cluster org users, and org variables.
+
+     First, let's create a table that's restricted to variables that
+     were customised by more than four users. That's going to require
+     a bit of R code:
+
+#+begin_example
+
+[[file:variable-popcon-restricted.org][org-variables-table]]
+#+TBLR: infile:"variable-popcon.org" intable:"org-variables-table"
+#+TBLR: outfile:"variable-popcon-restricted.org" title:"org-variables-table"
+#+TBLRR: tab <- table(x[,2])
+#+TBLRR: x <- subset(x, Variable %in% names(tab[tab > 4]))
+
+#+end_example
+[[file:variable-popcon-restricted.org][org-variables-table]]
+
+Now let's make a table with a row for each variable, and a column for
+each org user, and fill it with 1s and 0s according to whether user j
+customised variable i. We can do that without writing any R code:
+
+#+begin_example
+
+[[file:org-variables-incidence.org][incidence-matrix]]
+#+TBLR: action:tabulate columns:(1 2) rownames:t
+#+TBLR: infile:"variable-popcon-restricted.org" intable:"org-variables-table"
+#+TBLR: outfile:"org-variables-incidence.org" title:"incidence-matrix"
+
+#+end_example
+[[file:org-variables-incidence.org][incidence-matrix]]
+
+First we'll cluster org users. We use the R function dist to compute a
+distance matrix from the incidence matrix, then hclust to run a
+hierarchical clustering algorithm, and then plot to plot the results
+as a dendrogram:
+
+#+begin_example
+
+[[file:org-users-tree.png][org-users-tree.png]]
+#+TBLRR: par(bg="gray15", fg="turquoise2")
+#+TBLRR: plot(hclust(dist(x, method="binary")), ann=FALSE)
+#+TBLR: infile:"org-variables-incidence.org" intable:"incidence-matrix" rownames:t
+#+TBLR: outfile:"org-users-tree.png" title:"org-users-tree.png"
+
+#+end_example
+[[file:../../images/org-R/org-users-tree.png]]
+
+And to cluster org variables, we use the transpose of that incidence matrix:
+
+#+begin_example
+
+[[file:org-variables-tree.png][org-variables-tree.png]]
+#+TBLRR: par(bg="gray15", fg="turquoise2")
+#+TBLRR: plot(hclust(dist(t(x), method="binary")), ann=FALSE)
+#+TBLR: infile:"org-variables-incidence.org" intable:"incidence-matrix" rownames:t
+#+TBLR: outfile:"org-variables-tree.png" title:"org-variables-tree.png" width:1000
+
+#+end_example
+[[file:../../images/org-R/org-variables-tree.png]]
+
+
+Please note that my main aim here was to give some examples of using
+org-R, rather than to show how the org variables data should be mined
+for useful information! The org-variables dendrogram does seem to have
+made some sensible clusterings (e.g. the clusters of agenda-related
+commands), but I'm going to leave it to others to decide whether this
+exercise really served to do more than illustrate org-R. Does anyone
+recognise any usage affinities between the clustered org users?
+
+** Indexed data example
+   :PROPERTIES:
+   :ID:       45f39291-3abc-4d5b-96c9-3a32f77877a5
+   :END:
+    Let's plot the same data as Eric Schulte used in the [[../org-plot.org][org-plot tutorial]] on worg.
+
+#+begin_example
+
+[[file:/usr/local/src/org-etc/Worg/org-tutorials/org-R/data/45/f39291-3abc-4d5b-96c9-3a32f77877a5/org-R-output-8119M2O.png][An example from the org-plot tutorial, plotted using org-R]]
+#+TBLR: action:lines columns:((1)(2 3))
+#+TBLR: infile:"../org-plot.org"
+#+TBLR: intable:"org-plot-example-1" outfile:"png"
+#+TBLR: title:"An example from the org-plot tutorial, plotted using org-R"
+
+#+end_example
+[[file:../../images/org-R/org-plot-example-1.png]]
+
+* Table of available options
+  In addition to the action:<some-action> option (described [[*Table of available actions][here]], the
+  following options are available:
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| *Input options*                               |                                                                                                                                        |
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| infile:/path/to/file.csv                      | input data comes from file.csv                                                                                                         |
+| infile:http://www.somewhere/file.csv          | input data comes from file.csv somewhere on the web                                                                                    |
+| infile:/path/to/file.org                      | input data comes from file.org; must also specify table with intable:<name-or-id>                                                      |
+| intable:table-name                            | input data is in table named with #+TBLNAME:table-name (in same buffer unless infile:/path/to/file.org is specified)                   |
+| intable:table-id                              | input data is first table under entry with table-id as unique ID. Doesn't make sense with infile:/path/to/file.org                     |
+| rownames:t                                    | does first column contain row names? (default: nil). If t other column indices are as if first column not present --  this may change) |
+| colnames:nil                                  | does first row contain column names? (default: t)                                                                                      |
+| columns:2 columns:(2)                         | operate only on column 2                                                                                                               |
+| columns:"wing length" columns:("wing length") | operate only on column named "wing length"                                                                                             |
+| columns:((1)(2 3))                            | (when plotting) plot columns 2 and 3 on y-axis against column 1 on x-axis                                                              |
+| columns:(("age")("wing length" "fierceness")) | (when plotting) plot columns named "wing length" and "fierceness" on y-axis against "age" on x-axis                                    |
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| *Action options*                              |                                                                                                                                        |
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| action:some-action                            | off-the-shelf plotting action or computation (see [[*Table of available actions][separate list]]), or any R function that makes sense (e.g. head, summary)              |
+| lines:t                                       | (when plotting) join points with lines (similar to action:lines)                                                                       |
+| args:(:xlab "\"the x axis title\"" :lwd 4)    | provide extra arguments as a p-list (note the need to quote strings if they are to appear as strings in R)                             |
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| *Output options*                              |                                                                                                                                        |
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| outfile:/path/to/image.png                    | save image to file and insert link into org buffer (also: .pdf, .ps, .jpg, .jpeg, .bmp, .tiff)                                         |
+| outfile:png                                   | save image to file in org-attach directory and insert link                                                                             |
+| outfile:/path/to/file.csv                     | would make sense but not implemented yet                                                                                               |
+| height:1000                                   | set height of graphical output in (pixels for png, jpeg, bmp, tiff; default 480) / (inches for pdf, ps; default 7)                     |
+| width:1000                                    | set width of graphical output in pixels (default 480 for png)                                                                          |
+| title:"title of table/plot"                   | title to be used in plot, and as #+TBLNAME of table output, and as name of link to output                                              |
+| colour:hotpink col:hotpink color:hotpink      | main colour for plot (i.e. `col' argument in R, enter colors() at R prompt for list of available colours.)                             |
+| sort:t                                        | with action:tabulate, sort in decreasing count order (default is alphabetical on names)                                                |
+| output-to-buffer:t                            | force numerical output to org buffer (shouldn't be necessary)                                                                          |
+| inline:t                                      | don't name links to output (so that graphics are inline when exported to HTML)                                                         |
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| *Misc options*                                |                                                                                                                                        |
+|-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------|
+| showcode:t                                    | Display a buffer containing the R code that was generated to do what was requested.                                                    |
+  
+* Table of available actions
+# <<action list>>
+To specify an action from the following list, use e.g. action:hist on
+the TBLR line.
+
+| *Actions that generate numerical output* |                                                                                                         |
+|------------------------------------------+---------------------------------------------------------------------------------------------------------|
+| tabulate                                 | count occurrences of distinct input values. Input data should be discrete. This is function table in R. |
+| summary                                  | summarise data in columns (minimum, 1st quartile, median, mean, 3rd quartile, max)                      |
+| head                                     | show first 6 rows of a larger table                                                                     |
+| transpose                                | transpose a table                                                                                       |
+|                                          |                                                                                                         |
+| *Actions that generate graphical output* |                                                                                                         |
+|------------------------------------------+---------------------------------------------------------------------------------------------------------|
+|                                          |                                                                                                         |
+| *Discrete data*                          |                                                                                                         |
+| barplot                                  | produces 'side-by-side' bar plots if multiple columns selected                                          |
+|                                          |                                                                                                         |
+| *Indexed data*                           |                                                                                                         |
+| plot                                     | if only 1 column selected, index is automatic: 1,2,...                                                  |
+| lines                                    | same as plot                                                                                            |
+| points                                   | same as plot but don't join points with lines                                                           |
+|                                          |                                                                                                         |
+| *Continuous data*                        |                                                                                                         |
+| hist                                     | histogram                                                                                               |
+| density                                  | like a smoothed histogram (i.e. a curve)                                                                |
+|                                          |                                                                                                         |
+| *Grid of values*                         |                                                                                                         |
+| image                                    | a grid image, with cells coloured according to their numerical values                                   |
+
+
+Apart from tabulate, the action: names are the same as the names of
+the R functions which implement them. `tabulate' is really called
+`table' in R.
+
+  Note that, in addition to the actions listed below, you can also use
+action:R-function, where "R-function" is the name of any existing R
+function. The function must be able to take a data frame as it's first
+argument, and must not *require* any further arguments (i.e. any
+further arguyments must have suitable default values). Any numerical
+output will be sent to the org buffer (use output-to-buffer:t to force
+this, although if that is necessary then that is a bug).
+
+  
+* More detailed description of org-R
+  My aim with org-R is to provide a fairly general facility for using
+   R with Org. The TBLR lines and TBLRR lines together specify an R
+   function, which may take numerical input, and may generate
+   graphical output, or numerical output, or both.
+
+If any input data have been specified, then the R function receives
+   those data as its first argument. The input data may come from an
+   Org table, or from a csv spreadsheet file. In either case they are
+   tabular (1- or 2-dimensional). The input data are passed to the
+   function as an R data frame (a table-like structure in which
+   different columns may contain different types of data -- numeric,
+   character, etc). Inside the R function, that data frame is called
+   'x'. 'x' is also the return value of the R function. Therefore the
+   numerical output of org-R is determined by the modifications to the
+   variable x that are made inside the function (any graphical output
+   is a side effect.)
+
+It's worth noting that one mode of using org-R would be to write your
+own code in a separate file, and use the source() function on a TBLRR
+line to evaluate the code in that file.
+
+Numerical output of the function should also be tabular, and may be
+   received by the Org buffer as an Org table, or sent to file in Org
+   table or csv format. R deals transparently with multi-dimensional
+   arrays, but Org table and csv format do not.
+
+Unless an output file has been specified, graphical output will be
+displayed on screen.
+
+The mapping from the TBLR and TBLRR lines to the R function may
+   benefit from further thought; currently what happens is that code
+   corresponding to the TBLR line is generated, and then any explicit
+   user code is appended to this. Thus the TBLRR lines have the 'last
+   word' on the output. Since multiple, intermixed, TBLR and TBLRR
+   lines can be given, it might make sense instead to follow the order
+   of those lines when constructing the code.
+
+
+* Getting help with R
+  - Bring up an R prompt with R at a shell prompt, or M-x R in emacs (if you have installed ESS)
+  - Enter ?function.name for help on function `function.name'
+  - Enter RSiteSearch("words") for online help matching "words"
+  - Enter ?par to see the full list of graphical parameters
+  - Follow the Documentation link on the left hand side of the R
+    website for "An Introduction to R", and other more technical manuals.
+* Brief advert for R
+  Seeing as this has made use of R, I'll briefly say my bit on it for
+  those who are unfamiliar.
+  1. It's good for simple numerical work, as well as having
+     implementations of a a very large range of more sophisticated
+     mathematical and statistical procedures.
+  2. It's good for producing graphics quickly, and for fine tuning
+     every last detail of the graphics for publication.
+  3. It's a syntactically reasonable, user-friendly, interpreted
+     programming language, that is often used interactively (it comes
+     with its own shell/command-line environment, and runs within
+     emacs using ESS).
+  4. It's a good language for a functional style of programming (in
+     fact I'd say that's how it should be used), which might well
+     appeal to elisp programmers. For example, you want to construct
+     an arbitrarily nested data structure, then pass some function
+     over the tips, returning a data structure of the same shape as
+     the input? No problem ([[http://stat.ethz.ch/R-manual/R-patched/library/base/html/rapply.html][rapply]]).
+  5. There's a *lot* of add-on packages for it (CRAN link on left hand
+     side of [[http://www.r-project.org/][website]].).
+  6. How many programming languages will get [[http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html][their own article]] in the
+     New York Times this year?
+
+     

+ 1 - 0
org-tutorials/org-R/.#org-R.org

@@ -0,0 +1 @@
+dan@Tichodroma.8119:1233711110

BIN
org-tutorials/org-R/data/45/f39291-3abc-4d5b-96c9-3a32f77877a5/org-R-output-8119AYz.png


BIN
org-tutorials/org-R/data/45/f39291-3abc-4d5b-96c9-3a32f77877a5/org-R-output-8119M2O.png


BIN
org-tutorials/org-R/density.png


+ 77 - 15
org-tutorials/org-R/org-R.html

@@ -6,7 +6,7 @@ lang="en" xml:lang="en">
 <title>org-R: Computing and data visualisation in Org-mode using R</title>
 <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1"/>
 <meta name="generator" content="Org-mode"/>
-<meta name="generated" content="2009-02-03 22:12:38 EST"/>
+<meta name="generated" content="2009-02-03 22:58:12 EST"/>
 <meta name="author" content="Dan Davison"/>
 <style type="text/css">
  <!--/*--><![CDATA[/*><!--*/
@@ -94,8 +94,9 @@ lang="en" xml:lang="en">
 </li>
 <li><a href="#sec-7">Table of available options </a></li>
 <li><a href="#sec-8">Table of available actions </a></li>
-<li><a href="#sec-9">Getting help with R </a></li>
-<li><a href="#sec-10">Brief advert for R </a></li>
+<li><a href="#sec-9">More detailed description of org-R </a></li>
+<li><a href="#sec-10">Getting help with R </a></li>
+<li><a href="#sec-11">Brief advert for R </a></li>
 </ul>
 </div>
 </div>
@@ -121,6 +122,12 @@ option. It displays the R code corresponding to the action you
 requested, and so provides a good starting point for fine-tuning
 your analysis. But that's getting ahead of things.
 </p>
+<p>
+My hope is, of course, that this will be of use to people. So at
+this stage any comments, ideas, feedback, bug reports etc would be
+<b>very</b> welcome. I'd be happy to help anyone that's interested in
+using this, via the Org mailing list.
+</p>
 </div>
 
 </div>
@@ -453,7 +460,7 @@ available output image formats: "jpg", "jpeg", "pdf", "ps", "bmp",
 <pre class="example">
 
 [[file:density.png][density plot example]]
-#+TBLR: action:density columns:1 colour:chartreuse4 args:(:lwd 4)
+#+TBLR: action:density columns:"values" colour:chartreuse4 args:(:lwd 4)
 #+TBLR: intable:continuous-data outfile:"density.png" title:"density plot example"
 
 </pre>
@@ -466,10 +473,12 @@ available output image formats: "jpg", "jpeg", "pdf", "ps", "bmp",
 </div>
 </p>
 <p>
-Note the use of the args: option there. It takes the form of a lisp
-property list ("p-list"), specifying extra arguments to pass to the R
-function (in this case density()). Here we used it to set the line
-thickness.
+There were a couple of new features there. Firstly, I referred to
+column 1 using its column label, rather than with the
+integer 1. Secondly, note the use of the args: option. It takes the
+form of a lisp property list ("p-list"), specifying extra arguments to
+pass to the R function (in this case density()). Here we used it to
+set the line thickness (lwd=4).
 </p>
 </div>
 
@@ -699,10 +708,11 @@ recognise any usage affinities between the clustered org users?
 
 <pre class="example">
 
-[[file:tmp.png][png output]]
+[[file:/usr/local/src/org-etc/Worg/org-tutorials/org-R/data/45/f39291-3abc-4d5b-96c9-3a32f77877a5/org-R-output-8119M2O.png][An example from the org-plot tutorial, plotted using org-R]]
 #+TBLR: action:lines columns:((1)(2 3))
 #+TBLR: infile:"../org-plot.org"
 #+TBLR: intable:"org-plot-example-1" outfile:"png"
+#+TBLR: title:"An example from the org-plot tutorial, plotted using org-R"
 
 </pre>
 
@@ -748,7 +758,7 @@ following options are available:
 <tbody>
 <tr><td>action:some-action</td><td>off-the-shelf plotting action or computation (see <a href="#sec-8">separate list</a>), or any R function that makes sense (e.g. head, summary)</td></tr>
 <tr><td>lines:t</td><td>(when plotting) join points with lines (similar to action:lines)</td></tr>
-<tr><td>args:(:xlab "\"the x axis title\"" :lwd 4)</td><td>provide extra arguments as a p-list (note the need to quote strings)</td></tr>
+<tr><td>args:(:xlab "\"the x axis title\"" :lwd 4)</td><td>provide extra arguments as a p-list (note the need to quote strings if they are to appear as strings in R)</td></tr>
 </tbody>
 <tbody>
 <tr><td><b>Output options</b></td><td></td></tr>
@@ -839,9 +849,60 @@ this, although if that is necessary then that is a bug).
 </div>
 
 <div id="outline-container-9" class="outline-2">
-<h2 id="sec-9">Getting help with R </h2>
+<h2 id="sec-9">More detailed description of org-R </h2>
 <div id="text-9">
 
+<p>My aim with org-R is to provide a fairly general facility for using
+R with Org. The TBLR lines and TBLRR lines together specify an R
+function, which may take numerical input, and may generate
+graphical output, or numerical output, or both.
+</p>
+<p>
+If any input data have been specified, then the R function receives
+those data as its first argument. The input data may come from an
+Org table, or from a csv spreadsheet file. In either case they are
+tabular (1- or 2-dimensional). The input data are passed to the
+function as an R data frame (a table-like structure in which
+different columns may contain different types of data &ndash; numeric,
+character, etc). Inside the R function, that data frame is called
+'x'. 'x' is also the return value of the R function. Therefore the
+numerical output of org-R is determined by the modifications to the
+variable x that are made inside the function (any graphical output
+is a side effect.)
+</p>
+<p>
+It's worth noting that one mode of using org-R would be to write your
+own code in a separate file, and use the source() function on a TBLRR
+line to evaluate the code in that file.
+</p>
+<p>
+Numerical output of the function should also be tabular, and may be
+received by the Org buffer as an Org table, or sent to file in Org
+table or csv format. R deals transparently with multi-dimensional
+arrays, but Org table and csv format do not.
+</p>
+<p>
+Unless an output file has been specified, graphical output will be
+displayed on screen.
+</p>
+<p>
+The mapping from the TBLR and TBLRR lines to the R function may
+benefit from further thought; currently what happens is that code
+corresponding to the TBLR line is generated, and then any explicit
+user code is appended to this. Thus the TBLRR lines have the 'last
+word' on the output. Since multiple, intermixed, TBLR and TBLRR
+lines can be given, it might make sense instead to follow the order
+of those lines when constructing the code.
+</p>
+
+</div>
+
+</div>
+
+<div id="outline-container-10" class="outline-2">
+<h2 id="sec-10">Getting help with R </h2>
+<div id="text-10">
+
 <ul>
 <li>
 Bring up an R prompt with R at a shell prompt, or M-x R in emacs (if you have installed ESS)
@@ -864,9 +925,9 @@ website for "An Introduction to R", and other more technical manuals.
 
 </div>
 
-<div id="outline-container-10" class="outline-2">
-<h2 id="sec-10">Brief advert for R </h2>
-<div id="text-10">
+<div id="outline-container-11" class="outline-2">
+<h2 id="sec-11">Brief advert for R </h2>
+<div id="text-11">
 
 <p>Seeing as this has made use of R, I'll briefly say my bit on it for
 those who are unfamiliar.
@@ -902,6 +963,7 @@ side of <a href="http://www.r-project.org/">website</a>.).
 How many programming languages will get <a href="http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html">their own article</a> in the
 New York Times this year?
 
+
 </li>
 </ol>
 </div>
@@ -909,7 +971,7 @@ New York Times this year?
 <div id="postamble"><p class="author"> Author: Dan Davison
 <a href="mailto:davison@stats.ox.ac.uk">&lt;davison@stats.ox.ac.uk&gt;</a>
 </p>
-<p class="date"> Date: 2009-02-03 22:12:38 EST</p>
+<p class="date"> Date: 2009-02-03 22:58:12 EST</p>
 <p>HTML generated by org-mode 6.20f in emacs 22</p>
 </div></body>
 </html>

+ 53 - 5
org-tutorials/org-R/org-R.org

@@ -30,6 +30,11 @@
   requested, and so provides a good starting point for fine-tuning
   your analysis. But that's getting ahead of things.
 
+  My hope is, of course, that this will be of use to people. So at
+  this stage any comments, ideas, feedback, bug reports etc would be
+  *very* welcome. I'd be happy to help anyone that's interested in
+  using this, via the Org mailing list.
+
 * Setting things up
   The code is currently [[http://www.stats.ox.ac.uk/~davison/software/org-R/org-tblR.el][here]]. Soon it will be in the contrib
   directory. The other things you need are R (Windows / OS X binaries
@@ -235,16 +240,18 @@ available output image formats: "jpg", "jpeg", "pdf", "ps", "bmp",
 #+begin_example
 
 [[file:density.png][density plot example]]
-#+TBLR: action:density columns:1 colour:chartreuse4 args:(:lwd 4)
+#+TBLR: action:density columns:"values" colour:chartreuse4 args:(:lwd 4)
 #+TBLR: intable:continuous-data outfile:"density.png" title:"density plot example"
 
 #+end_example
 [[file:../../images/org-R/density.png]]
 
-Note the use of the args: option there. It takes the form of a lisp
-property list ("p-list"), specifying extra arguments to pass to the R
-function (in this case density()). Here we used it to set the line
-thickness.
+There were a couple of new features there. Firstly, I referred to
+column 1 using its column label, rather than with the
+integer 1. Secondly, note the use of the args: option. It takes the
+form of a lisp property list ("p-list"), specifying extra arguments to
+pass to the R function (in this case density()). Here we used it to
+set the line thickness (lwd=4).
 
 ** Discrete data example: the configuration variables survey
 
@@ -392,6 +399,7 @@ recognise any usage affinities between the clustered org users?
 #+TBLR: action:lines columns:((1)(2 3))
 #+TBLR: infile:"../org-plot.org"
 #+TBLR: intable:"org-plot-example-1" outfile:"png"
+#+TBLR: title:"An example from the org-plot tutorial, plotted using org-R"
 
 #+end_example
 [[file:../../images/org-R/org-plot-example-1.png]]
@@ -481,6 +489,45 @@ output will be sent to the org buffer (use output-to-buffer:t to force
 this, although if that is necessary then that is a bug).
 
   
+* More detailed description of org-R
+  My aim with org-R is to provide a fairly general facility for using
+   R with Org. The TBLR lines and TBLRR lines together specify an R
+   function, which may take numerical input, and may generate
+   graphical output, or numerical output, or both.
+
+If any input data have been specified, then the R function receives
+   those data as its first argument. The input data may come from an
+   Org table, or from a csv spreadsheet file. In either case they are
+   tabular (1- or 2-dimensional). The input data are passed to the
+   function as an R data frame (a table-like structure in which
+   different columns may contain different types of data -- numeric,
+   character, etc). Inside the R function, that data frame is called
+   'x'. 'x' is also the return value of the R function. Therefore the
+   numerical output of org-R is determined by the modifications to the
+   variable x that are made inside the function (any graphical output
+   is a side effect.)
+
+It's worth noting that one mode of using org-R would be to write your
+own code in a separate file, and use the source() function on a TBLRR
+line to evaluate the code in that file.
+
+Numerical output of the function should also be tabular, and may be
+   received by the Org buffer as an Org table, or sent to file in Org
+   table or csv format. R deals transparently with multi-dimensional
+   arrays, but Org table and csv format do not.
+
+Unless an output file has been specified, graphical output will be
+displayed on screen.
+
+The mapping from the TBLR and TBLRR lines to the R function may
+   benefit from further thought; currently what happens is that code
+   corresponding to the TBLR line is generated, and then any explicit
+   user code is appended to this. Thus the TBLRR lines have the 'last
+   word' on the output. Since multiple, intermixed, TBLR and TBLRR
+   lines can be given, it might make sense instead to follow the order
+   of those lines when constructing the code.
+
+
 * Getting help with R
   - Bring up an R prompt with R at a shell prompt, or M-x R in emacs (if you have installed ESS)
   - Enter ?function.name for help on function `function.name'
@@ -511,3 +558,4 @@ this, although if that is necessary then that is a bug).
   6. How many programming languages will get [[http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html][their own article]] in the
      New York Times this year?
 
+