org-R.html 30 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977
  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  2. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  3. <html xmlns="http://www.w3.org/1999/xhtml"
  4. lang="en" xml:lang="en">
  5. <head>
  6. <title>org-R: Computing and data visualisation in Org-mode using R</title>
  7. <meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1"/>
  8. <meta name="generator" content="Org-mode"/>
  9. <meta name="generated" content="2009-02-03 22:58:12 EST"/>
  10. <meta name="author" content="Dan Davison"/>
  11. <style type="text/css">
  12. <!--/*--><![CDATA[/*><!--*/
  13. html { font-family: Times, serif; font-size: 12pt; }
  14. .title { text-align: center; }
  15. .todo { color: red; }
  16. .done { color: green; }
  17. .tag { background-color:lightblue; font-weight:normal }
  18. .target { }
  19. .timestamp { color: grey }
  20. .timestamp-kwd { color: CadetBlue }
  21. p.verse { margin-left: 3% }
  22. pre {
  23. border: 1pt solid #AEBDCC;
  24. background-color: #F3F5F7;
  25. padding: 5pt;
  26. font-family: courier, monospace;
  27. font-size: 90%;
  28. overflow:auto;
  29. }
  30. table { border-collapse: collapse; }
  31. td, th { vertical-align: top; }
  32. dt { font-weight: bold; }
  33. div.figure { padding: 0.5em; }
  34. div.figure p { text-align: center; }
  35. .linenr { font-size:smaller }
  36. .code-highlighted {background-color:#ffff00;}
  37. .org-info-js_info-navigation { border-style:none; }
  38. #org-info-js_console-label { font-size:10px; font-weight:bold;
  39. white-space:nowrap; }
  40. .org-info-js_search-highlight {background-color:#ffff00; color:#000000;
  41. font-weight:bold; }
  42. /*]]>*/-->
  43. </style>
  44. <script type="text/javascript">
  45. <!--/*--><![CDATA[/*><!--*/
  46. function CodeHighlightOn(elem, id)
  47. {
  48. var target = document.getElementById(id);
  49. if(null != target) {
  50. elem.cacheClassElem = elem.className;
  51. elem.cacheClassTarget = target.className;
  52. target.className = "code-highlighted";
  53. elem.className = "code-highlighted";
  54. }
  55. }
  56. function CodeHighlightOff(elem, id)
  57. {
  58. var target = document.getElementById(id);
  59. if(elem.cacheClassElem)
  60. elem.className = elem.cacheClassElem;
  61. if(elem.cacheClassTarget)
  62. target.className = elem.cacheClassTarget;
  63. }
  64. /*]]>*/-->
  65. </script>
  66. </head><body>
  67. <h1 class="title">org-R: Computing and data visualisation in Org-mode using R</h1>
  68. <p>
  69. <a href="../index.html">{Back to Worg's index}</a>
  70. </p>
  71. <div id="table-of-contents">
  72. <h2>Table of Contents</h2>
  73. <div id="text-table-of-contents">
  74. <ul>
  75. <li><a href="#sec-1">Introduction </a></li>
  76. <li><a href="#sec-2">Setting things up </a></li>
  77. <li><a href="#sec-3">Using org-R </a></li>
  78. <li><a href="#sec-4">Computing on org tables: tabulating values </a></li>
  79. <li><a href="#sec-5">Table references </a></li>
  80. <li><a href="#sec-6">Plotting data </a>
  81. <ul>
  82. <li><a href="#sec-6.1">Available off-the-shelf plotting commands </a></li>
  83. <li><a href="#sec-6.2">Continuous data example: </a></li>
  84. <li><a href="#sec-6.3">Discrete data example: the configuration variables survey </a>
  85. <ul>
  86. <li><a href="#sec-6.3.1">Something more complicated: clustering org variables, and org users </a></li>
  87. </ul>
  88. </li>
  89. <li><a href="#sec-6.4">Indexed data example </a></li>
  90. </ul>
  91. </li>
  92. <li><a href="#sec-7">Table of available options </a></li>
  93. <li><a href="#sec-8">Table of available actions </a></li>
  94. <li><a href="#sec-9">More detailed description of org-R </a></li>
  95. <li><a href="#sec-10">Getting help with R </a></li>
  96. <li><a href="#sec-11">Brief advert for R </a></li>
  97. </ul>
  98. </div>
  99. </div>
  100. <div id="outline-container-1" class="outline-2">
  101. <h2 id="sec-1">Introduction </h2>
  102. <div id="text-1">
  103. <p>org-R is an org-mode extension that performs numerical computations
  104. and generates graphics. Numerical output may be stored in the org
  105. buffer in org tables, and the input can also come from an org
  106. table. Rather than starting off by documenting everything
  107. systematically, I'll provide several commented examples. Towards the
  108. end there are lists of <a href="#sec-8">available actions</a> and <a href="#sec-7">other options</a>.
  109. </p>
  110. <p>
  111. Although, behind the scenes, it uses <a href="http:www.r-project.org">R</a>, you do not need to know
  112. anything about R. Common operations are provided `off the shelf' by
  113. specifying options on lines starting with #+TBLR:. Having said that,
  114. org-R also accepts raw R code (TBLRR: lines). For those who don't
  115. yet know R, but think they might be interested, try the showcode:t
  116. option. It displays the R code corresponding to the action you
  117. requested, and so provides a good starting point for fine-tuning
  118. your analysis. But that's getting ahead of things.
  119. </p>
  120. <p>
  121. My hope is, of course, that this will be of use to people. So at
  122. this stage any comments, ideas, feedback, bug reports etc would be
  123. <b>very</b> welcome. I'd be happy to help anyone that's interested in
  124. using this, via the Org mailing list.
  125. </p>
  126. </div>
  127. </div>
  128. <div id="outline-container-2" class="outline-2">
  129. <h2 id="sec-2">Setting things up </h2>
  130. <div id="text-2">
  131. <p>The code is currently <a href="http://www.stats.ox.ac.uk/~davison/software/org-R/org-tblR.el">here</a>. Soon it will be in the contrib
  132. directory. The other things you need are R (Windows / OS X binaries
  133. available on the <a href="http:www.r-project.org">R website</a>; widely available in linux package
  134. repositories) and the emacs mode <a href="http://ess.r-project.org/">Emacs Speaks Statistics</a> (ESS). ESS
  135. installation instructions are <a href="http://ess.r-project.org/Manual/readme.html#Installation">here.</a> Personally, under linux, I have
  136. something like
  137. </p>
  138. <pre class="src src-emacs-lisp">
  139. (add-to-list 'load-path <span style="color: #87cefa;">"/path/to/ess/lisp"</span>)
  140. (<span style="color: #afeeee; font-weight: bold;">require</span> '<span style="color: #98fb98;">ess-site</span>)
  141. </pre>
  142. </div>
  143. </div>
  144. <div id="outline-container-3" class="outline-2">
  145. <h2 id="sec-3">Using org-R </h2>
  146. <div id="text-3">
  147. <p>org-R uses two different option lines to specify an
  148. analysis/plot: #+TBLR: and #+TBLRR:. #+TBLRR: is the one that
  149. accepts R code, so we'll ignore that for now. To make the action
  150. happen, use M-x org-R-apply with point in the #+TBLR:
  151. line. That's the only function you need, and it would make sense to
  152. bind it to some key. So, first example.
  153. </p>
  154. </div>
  155. </div>
  156. <div id="outline-container-4" class="outline-2">
  157. <h2 id="sec-4">Computing on org tables: tabulating values </h2>
  158. <div id="text-4">
  159. <p>Here's a command to tabulate the values in the second column. Issue
  160. M-x org-R-apply in the following #+TBLR line.
  161. </p>
  162. <pre class="example">
  163. | col1 | col2 |
  164. |------+------|
  165. | A | A |
  166. | A | B |
  167. | B | B |
  168. #+TBLR: action:tabulate columns:2
  169. </pre>
  170. <p>
  171. That results in
  172. </p>
  173. <pre class="example">
  174. | value | count |
  175. |-------+-------|
  176. | A | 1 |
  177. | B | 2 |
  178. </pre>
  179. <p>
  180. . So the values in column 2 were tabulated as requested. However,
  181. the original data got overwritten. That leads us to
  182. </p>
  183. </div>
  184. </div>
  185. <div id="outline-container-5" class="outline-2">
  186. <h2 id="sec-5">Table references </h2>
  187. <div id="text-5">
  188. <p>
  189. We can specify input data for analysis/plotting in 3 different
  190. ways:
  191. </p>
  192. <ol>
  193. <li>
  194. by providing a reference to an org table with the intable:
  195. option. You can optionally specify the file that the table is in
  196. with the infile: option;
  197. </li>
  198. <li>
  199. by pointing it to a csv file, locally or via http:, using
  200. infile:/path/to/file.csv or infile:<a href="http://somewhere/file.csv">http://somewhere/file.csv</a>
  201. </li>
  202. <li>
  203. by doing neither, in which case it looks for a table immediately
  204. above the #+TBLR(R) line(s).
  205. </li>
  206. </ol>
  207. <p>Case (3) is what happened above &ndash; the input data came from a table
  208. immediately above the #+TBLR line. The default behaviour is to replace
  209. any such table with the output; this allows us to tweak the option
  210. line and update the analysis. However, normally we'll want to separate
  211. the data from the analysis output. So let's keep the data as a named
  212. table in the org file, and refer to it by name:
  213. </p>
  214. <pre class="example">
  215. #+TBLNAME:data-set-1
  216. | col1 | col2 |
  217. |------+------|
  218. | A | A |
  219. | A | B |
  220. | B | B |
  221. [arbitrary other content of org buffer]
  222. #+TBLR: intable:data-set-1 action:tabulate
  223. </pre>
  224. <p>
  225. which results in
  226. </p>
  227. <pre class="example">
  228. | | A | B |
  229. |---+---+---|
  230. | A | 1 | 1 |
  231. | B | 0 | 1 |
  232. </pre>
  233. <p>
  234. Note that this time we did a different analysis: I removed the
  235. columns:2 option, so that tabulate was passed the whole table. As a
  236. result the output contains counts of joint occurrences of values in
  237. the two columns: out of the 4 possibilities, the only one we didn't
  238. observe was "B in column 1 and A in column 2". We could have achieved
  239. the same result with columns:(1 2). (But don't try to tabulate more
  240. than 2 columns: org does not do multi-dimensional tables).
  241. </p>
  242. </div>
  243. </div>
  244. <div id="outline-container-6" class="outline-2">
  245. <h2 id="sec-6">Plotting data </h2>
  246. <div id="text-6">
  247. </div>
  248. <div id="outline-container-6.1" class="outline-3">
  249. <h3 id="sec-6.1">Available off-the-shelf plotting commands </h3>
  250. <div id="text-6.1">
  251. <p>At the risk of this starting to sound like a bad and boring
  252. undergraduate statistics textbook, the sort of plots that are
  253. appropriate depend on the sort of data. Let's divide it up as
  254. </p>
  255. <ul>
  256. <li>
  257. discrete-valued data
  258. [e.g. data-set-1 above, or the list of org variables customised by users]
  259. </li>
  260. <li>
  261. continuous-valued data
  262. [e.g. the wing lengths of all Eagle Owls in Europe]
  263. </li>
  264. <li>
  265. indexed data
  266. [e.g. a data set in which each point is a time,
  267. together with the size of the org source code base at that time]
  268. </li>
  269. </ul>
  270. <p>The available off-the-shelf actions are listed <a href="#sec-8">here</a>.
  271. </p>
  272. </div>
  273. </div>
  274. <div id="outline-container-6.2" class="outline-3">
  275. <h3 id="sec-6.2"><a name="2ce0fc04-b308-4b8d-8acc-805a9e5fed7d" id="2ce0fc04-b308-4b8d-8acc-805a9e5fed7d"></a>Continuous data example: </h3>
  276. <div id="text-6.2">
  277. <p>We're going to need some data. So let's prove that org can also
  278. speak statistics and use org-R to simulate the data. This
  279. requires some raw R code, so skip this bit if you're not
  280. interested.
  281. </p>
  282. <p>
  283. The following #+TBLRR line simulates 10 values from a Normal
  284. distribution with mean -3, and 10 values from a Normal
  285. distribution with mean 3, and lumps them together. The point is that
  286. the numbers we get should be concentrated around two different
  287. values, and we should be able to see that in a histogram and/or
  288. density plot.
  289. </p>
  290. <pre class="example">
  291. #+TBLRR: x &lt;- c(rnorm(10, mean=-3, sd=1), rnorm(10, mean=3, sd=1))
  292. #+TBLR: title:"continuous-data" output-to-buffer:t
  293. </pre>
  294. <p>
  295. Here's what I got. Note that the title: option set the name of the
  296. table with "#+TBLNAME"; we'll use that to refer to these data.
  297. </p>
  298. <pre class="example">
  299. #+TBLNAME:continuous-data
  300. | values |
  301. |-------------------|
  302. | -2.48627002467785 |
  303. | -4.0196287273144 |
  304. | -3.43471960580471 |
  305. | -5.21985294534255 |
  306. | -3.84201126431028 |
  307. | -1.72912705369668 |
  308. | -2.86703950990613 |
  309. | -2.82292622464752 |
  310. | -4.43246430621368 |
  311. | -1.03188727658288 |
  312. | 0.882823532068805 |
  313. | 3.28641606039499 |
  314. | 3.56029698321959 |
  315. | 2.91946660223152 |
  316. | 2.32506089804876 |
  317. | 3.3606298511366 |
  318. | 5.19883523425104 |
  319. | 4.86141359164329 |
  320. | 2.90073505260204 |
  321. | 4.21163939487907 |
  322. </pre>
  323. <p>
  324. Now to plot the data. Let's have some colour as well, and this time
  325. the title: option will be used to put a title on the plot (and also to
  326. name the file link to the graphical output).
  327. </p>
  328. <pre class="example">
  329. [[file:tmp.png][histogram example]]
  330. #+TBLR: action:hist columns:1 colour:hotpink
  331. #+TBLR: intable:continuous-data outfile:"png" title:"histogram example"
  332. </pre>
  333. <p>
  334. <div class="figure">
  335. <p><img src="../../images/org-R/histogram-example.png" alt="../../images/org-R/histogram-example.png" /></p>
  336. </div>
  337. </p>
  338. <p>
  339. [Note that you can use multiple TBLR lines rather than cramming all
  340. the options on to one line.]
  341. </p>
  342. <p>
  343. An alternative would be to produce a density plot. We don't have
  344. enough data points to justify that here, but we'll do it anyway just
  345. to show the sort of plots that are produced. This time we'll specify
  346. the output file for the png image using the output: option. (For the
  347. histogram we used output:"png". That's a special case; it doesn't
  348. create a file called "png" but instead uses org-attach to store the
  349. output in the org-attach dir for this entry. Same thing for the other
  350. available output image formats: "jpg", "jpeg", "pdf", "ps", "bmp",
  351. "tiff")
  352. </p>
  353. <pre class="example">
  354. [[file:density.png][density plot example]]
  355. #+TBLR: action:density columns:"values" colour:chartreuse4 args:(:lwd 4)
  356. #+TBLR: intable:continuous-data outfile:"density.png" title:"density plot example"
  357. </pre>
  358. <p>
  359. <div class="figure">
  360. <p><img src="../../images/org-R/density.png" alt="../../images/org-R/density.png" /></p>
  361. </div>
  362. </p>
  363. <p>
  364. There were a couple of new features there. Firstly, I referred to
  365. column 1 using its column label, rather than with the
  366. integer 1. Secondly, note the use of the args: option. It takes the
  367. form of a lisp property list ("p-list"), specifying extra arguments to
  368. pass to the R function (in this case density()). Here we used it to
  369. set the line thickness (lwd=4).
  370. </p>
  371. </div>
  372. </div>
  373. <div id="outline-container-6.3" class="outline-3">
  374. <h3 id="sec-6.3">Discrete data example: the configuration variables survey </h3>
  375. <div id="text-6.3">
  376. <p>
  377. The raw data, as collected by Manish, is in a table called
  378. org-variables-table, in a file called variable-popcon.org. We use the
  379. file: option to specify the org file containing the data, and the
  380. table: option to specify the name of the table within that file. [An
  381. alternative be to give the entry containing the table a unique id with
  382. org-id-get-create, refer to it with table:&lt;uid&gt;, and rely on the
  383. org-id mechanism to find it.].
  384. </p>
  385. <p>
  386. Now we tabulate the data. (We're not currently taking the sensible
  387. step that Manish did of checking whether the variables were given
  388. values different from their default).
  389. </p>
  390. <p>
  391. Rather than cluttering up this org file with all the count data,
  392. we'll store them in a separate org file:
  393. </p>
  394. <pre class="example">
  395. [[file:org-variables-counts.org][org-variables-counts]]
  396. #+TBLR: action:tabulate columns:2 sort:t
  397. #+TBLR: infile:"variable-popcon.org" intable:"org-variables-table"
  398. #+TBLR: outfile:"org-variables-counts.org" title:"org-variables-counts"
  399. </pre>
  400. <p>
  401. <a href="org-variables-counts.html">org-variables-counts</a>
  402. </p>
  403. <p>
  404. We can see the top few rows of the table by using action:head
  405. </p>
  406. <pre class="example">
  407. | rownames(x) | value | count |
  408. |-------------+-----------------------------+-------|
  409. | 1 | org-agenda-files | 22 |
  410. | 2 | org-agenda-start-on-weekday | 22 |
  411. | 3 | org-log-done | 22 |
  412. | 4 | org-todo-keywords | 22 |
  413. | 5 | org-agenda-include-diary | 19 |
  414. | 6 | org-hide-leading-stars | 19 |
  415. #+TBLR: action:head
  416. #+TBLR: infile:"org-variables-counts.org" intable:"org-variables-counts" output-to-buffer:t
  417. </pre>
  418. <p>
  419. Here's a barplot of the counts. It makes it clear that over half the
  420. org variables are customised by only one or two users.
  421. </p>
  422. <pre class="example">
  423. [[file:org-variables-barplot.png][org-variables barplot]]
  424. #+TBLR: action:barplot rownames:t columns:1 width:800 col:darkblue
  425. #+TBLR: args:(:names.arg "NULL")
  426. #+TBLR: infile:"org-variables-counts.org" intable:"org-variables-counts"
  427. #+TBLR: outfile:"org-variables-barplot.png" title:"org-variables barplot"
  428. </pre>
  429. <p>
  430. <div class="figure">
  431. <p><img src="../../images/org-R/org-variables-barplot.png" alt="../../images/org-R/org-variables-barplot.png" /></p>
  432. </div>
  433. </p>
  434. </div>
  435. <div id="outline-container-6.3.1" class="outline-4">
  436. <h4 id="sec-6.3.1">Something more complicated: clustering org variables, and org users </h4>
  437. <div id="text-6.3.1">
  438. <p>
  439. OK, let's make a bit more use of R's capabilities. We can use the
  440. org-variables data set to define distances between pairs of org
  441. users (how similar their customisations are), and distances
  442. between pairs of org variables (the extent to which people who
  443. customise one of them customise the other). Then we can use those
  444. distance matrices to cluster org users, and org variables.
  445. </p>
  446. <p>
  447. First, let's create a table that's restricted to variables that
  448. were customised by more than four users. That's going to require
  449. a bit of R code:
  450. </p>
  451. <pre class="example">
  452. [[file:variable-popcon-restricted.org][org-variables-table]]
  453. #+TBLR: infile:"variable-popcon.org" intable:"org-variables-table"
  454. #+TBLR: outfile:"variable-popcon-restricted.org" title:"org-variables-table"
  455. #+TBLRR: tab &lt;- table(x[,2])
  456. #+TBLRR: x &lt;- subset(x, Variable %in% names(tab[tab &gt; 4]))
  457. </pre>
  458. <p>
  459. <a href="variable-popcon-restricted.html">org-variables-table</a>
  460. </p>
  461. <p>
  462. Now let's make a table with a row for each variable, and a column for
  463. each org user, and fill it with 1s and 0s according to whether user j
  464. customised variable i. We can do that without writing any R code:
  465. </p>
  466. <pre class="example">
  467. [[file:org-variables-incidence.org][incidence-matrix]]
  468. #+TBLR: action:tabulate columns:(1 2) rownames:t
  469. #+TBLR: infile:"variable-popcon-restricted.org" intable:"org-variables-table"
  470. #+TBLR: outfile:"org-variables-incidence.org" title:"incidence-matrix"
  471. </pre>
  472. <p>
  473. <a href="org-variables-incidence.html">incidence-matrix</a>
  474. </p>
  475. <p>
  476. First we'll cluster org users. We use the R function dist to compute a
  477. distance matrix from the incidence matrix, then hclust to run a
  478. hierarchical clustering algorithm, and then plot to plot the results
  479. as a dendrogram:
  480. </p>
  481. <pre class="example">
  482. [[file:org-users-tree.png][org-users-tree.png]]
  483. #+TBLRR: par(bg="gray15", fg="turquoise2")
  484. #+TBLRR: plot(hclust(dist(x, method="binary")), ann=FALSE)
  485. #+TBLR: infile:"org-variables-incidence.org" intable:"incidence-matrix" rownames:t
  486. #+TBLR: outfile:"org-users-tree.png" title:"org-users-tree.png"
  487. </pre>
  488. <p>
  489. <div class="figure">
  490. <p><img src="../../images/org-R/org-users-tree.png" alt="../../images/org-R/org-users-tree.png" /></p>
  491. </div>
  492. </p>
  493. <p>
  494. And to cluster org variables, we use the transpose of that incidence matrix:
  495. </p>
  496. <pre class="example">
  497. [[file:org-variables-tree.png][org-variables-tree.png]]
  498. #+TBLRR: par(bg="gray15", fg="turquoise2")
  499. #+TBLRR: plot(hclust(dist(t(x), method="binary")), ann=FALSE)
  500. #+TBLR: infile:"org-variables-incidence.org" intable:"incidence-matrix" rownames:t
  501. #+TBLR: outfile:"org-variables-tree.png" title:"org-variables-tree.png" width:1000
  502. </pre>
  503. <p>
  504. <div class="figure">
  505. <p><img src="../../images/org-R/org-variables-tree.png" alt="../../images/org-R/org-variables-tree.png" /></p>
  506. </div>
  507. </p>
  508. <p>
  509. Please note that my main aim here was to give some examples of using
  510. org-R, rather than to show how the org variables data should be mined
  511. for useful information! The org-variables dendrogram does seem to have
  512. made some sensible clusterings (e.g. the clusters of agenda-related
  513. commands), but I'm going to leave it to others to decide whether this
  514. exercise really served to do more than illustrate org-R. Does anyone
  515. recognise any usage affinities between the clustered org users?
  516. </p>
  517. </div>
  518. </div>
  519. </div>
  520. <div id="outline-container-6.4" class="outline-3">
  521. <h3 id="sec-6.4"><a name="45f39291-3abc-4d5b-96c9-3a32f77877a5" id="45f39291-3abc-4d5b-96c9-3a32f77877a5"></a>Indexed data example </h3>
  522. <div id="text-6.4">
  523. <p>Let's plot the same data as Eric Schulte used in the <a href="../org-plot.html">org-plot tutorial</a> on worg.
  524. </p>
  525. <pre class="example">
  526. [[file:/usr/local/src/org-etc/Worg/org-tutorials/org-R/data/45/f39291-3abc-4d5b-96c9-3a32f77877a5/org-R-output-8119M2O.png][An example from the org-plot tutorial, plotted using org-R]]
  527. #+TBLR: action:lines columns:((1)(2 3))
  528. #+TBLR: infile:"../org-plot.org"
  529. #+TBLR: intable:"org-plot-example-1" outfile:"png"
  530. #+TBLR: title:"An example from the org-plot tutorial, plotted using org-R"
  531. </pre>
  532. <p>
  533. <div class="figure">
  534. <p><img src="../../images/org-R/org-plot-example-1.png" alt="../../images/org-R/org-plot-example-1.png" /></p>
  535. </div>
  536. </p>
  537. </div>
  538. </div>
  539. </div>
  540. <div id="outline-container-7" class="outline-2">
  541. <h2 id="sec-7">Table of available options </h2>
  542. <div id="text-7">
  543. <p>In addition to the action:&lt;some-action&gt; option (described <a href="#sec-8">here</a>, the
  544. following options are available:
  545. </p><table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
  546. <col align="left"></col><col align="left"></col>
  547. <thead>
  548. <tr><th><b>Input options</b></th><th></th></tr>
  549. </thead>
  550. <tbody>
  551. <tr><td>infile:/path/to/file.csv</td><td>input data comes from file.csv</td></tr>
  552. <tr><td>infile:<a href="http://www.somewhere/file.csv">http://www.somewhere/file.csv</a></td><td>input data comes from file.csv somewhere on the web</td></tr>
  553. <tr><td>infile:/path/to/file.org</td><td>input data comes from file.org; must also specify table with intable:&lt;name-or-id&gt;</td></tr>
  554. <tr><td>intable:table-name</td><td>input data is in table named with #+TBLNAME:table-name (in same buffer unless infile:/path/to/file.org is specified)</td></tr>
  555. <tr><td>intable:table-id</td><td>input data is first table under entry with table-id as unique ID. Doesn't make sense with infile:/path/to/file.org</td></tr>
  556. <tr><td>rownames:t</td><td>does first column contain row names? (default: nil). If t other column indices are as if first column not present &ndash; this may change)</td></tr>
  557. <tr><td>colnames:nil</td><td>does first row contain column names? (default: t)</td></tr>
  558. <tr><td>columns:2 columns:(2)</td><td>operate only on column 2</td></tr>
  559. <tr><td>columns:"wing length" columns:("wing length")</td><td>operate only on column named "wing length"</td></tr>
  560. <tr><td>columns:((1)(2 3))</td><td>(when plotting) plot columns 2 and 3 on y-axis against column 1 on x-axis</td></tr>
  561. <tr><td>columns:(("age")("wing length" "fierceness"))</td><td>(when plotting) plot columns named "wing length" and "fierceness" on y-axis against "age" on x-axis</td></tr>
  562. </tbody>
  563. <tbody>
  564. <tr><td><b>Action options</b></td><td></td></tr>
  565. </tbody>
  566. <tbody>
  567. <tr><td>action:some-action</td><td>off-the-shelf plotting action or computation (see <a href="#sec-8">separate list</a>), or any R function that makes sense (e.g. head, summary)</td></tr>
  568. <tr><td>lines:t</td><td>(when plotting) join points with lines (similar to action:lines)</td></tr>
  569. <tr><td>args:(:xlab "\"the x axis title\"" :lwd 4)</td><td>provide extra arguments as a p-list (note the need to quote strings if they are to appear as strings in R)</td></tr>
  570. </tbody>
  571. <tbody>
  572. <tr><td><b>Output options</b></td><td></td></tr>
  573. </tbody>
  574. <tbody>
  575. <tr><td>outfile:/path/to/image.png</td><td>save image to file and insert link into org buffer (also: .pdf, .ps, .jpg, .jpeg, .bmp, .tiff)</td></tr>
  576. <tr><td>outfile:png</td><td>save image to file in org-attach directory and insert link</td></tr>
  577. <tr><td>outfile:/path/to/file.csv</td><td>would make sense but not implemented yet</td></tr>
  578. <tr><td>height:1000</td><td>set height of graphical output in (pixels for png, jpeg, bmp, tiff; default 480) / (inches for pdf, ps; default 7)</td></tr>
  579. <tr><td>width:1000</td><td>set width of graphical output in pixels (default 480 for png)</td></tr>
  580. <tr><td>title:"title of table/plot"</td><td>title to be used in plot, and as #+TBLNAME of table output, and as name of link to output</td></tr>
  581. <tr><td>colour:hotpink col:hotpink color:hotpink</td><td>main colour for plot (i.e. `col' argument in R, enter colors() at R prompt for list of available colours.)</td></tr>
  582. <tr><td>sort:t</td><td>with action:tabulate, sort in decreasing count order (default is alphabetical on names)</td></tr>
  583. <tr><td>output-to-buffer:t</td><td>force numerical output to org buffer (shouldn't be necessary)</td></tr>
  584. <tr><td>inline:t</td><td>don't name links to output (so that graphics are inline when exported to HTML)</td></tr>
  585. </tbody>
  586. <tbody>
  587. <tr><td><b>Misc options</b></td><td></td></tr>
  588. </tbody>
  589. <tbody>
  590. <tr><td>showcode:t</td><td>Display a buffer containing the R code that was generated to do what was requested.</td></tr>
  591. </tbody>
  592. </table>
  593. </div>
  594. </div>
  595. <div id="outline-container-8" class="outline-2">
  596. <h2 id="sec-8"><a name="action==list" id="action==list"></a>Table of available actions </h2>
  597. <div id="text-8">
  598. <p>To specify an action from the following list, use e.g. action:hist on
  599. the TBLR line.
  600. </p>
  601. <table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
  602. <col align="left"></col><col align="left"></col>
  603. <thead>
  604. <tr><th><b>Actions that generate numerical output</b></th><th></th></tr>
  605. </thead>
  606. <tbody>
  607. <tr><td>tabulate</td><td>count occurrences of distinct input values. Input data should be discrete. This is function table in R.</td></tr>
  608. <tr><td>summary</td><td>summarise data in columns (minimum, 1st quartile, median, mean, 3rd quartile, max)</td></tr>
  609. <tr><td>head</td><td>show first 6 rows of a larger table</td></tr>
  610. <tr><td>transpose</td><td>transpose a table</td></tr>
  611. <tr><td></td><td></td></tr>
  612. <tr><td><b>Actions that generate graphical output</b></td><td></td></tr>
  613. </tbody>
  614. <tbody>
  615. <tr><td></td><td></td></tr>
  616. <tr><td><b>Discrete data</b></td><td></td></tr>
  617. <tr><td>barplot</td><td>produces 'side-by-side' bar plots if multiple columns selected</td></tr>
  618. <tr><td></td><td></td></tr>
  619. <tr><td><b>Indexed data</b></td><td></td></tr>
  620. <tr><td>plot</td><td>if only 1 column selected, index is automatic: 1,2,&hellip;</td></tr>
  621. <tr><td>lines</td><td>same as plot</td></tr>
  622. <tr><td>points</td><td>same as plot but don't join points with lines</td></tr>
  623. <tr><td></td><td></td></tr>
  624. <tr><td><b>Continuous data</b></td><td></td></tr>
  625. <tr><td>hist</td><td>histogram</td></tr>
  626. <tr><td>density</td><td>like a smoothed histogram (i.e. a curve)</td></tr>
  627. <tr><td></td><td></td></tr>
  628. <tr><td><b>Grid of values</b></td><td></td></tr>
  629. <tr><td>image</td><td>a grid image, with cells coloured according to their numerical values</td></tr>
  630. </tbody>
  631. </table>
  632. <p>
  633. Apart from tabulate, the action: names are the same as the names of
  634. the R functions which implement them. `tabulate' is really called
  635. `table' in R.
  636. </p>
  637. <p>
  638. Note that, in addition to the actions listed below, you can also use
  639. action:R-function, where "R-function" is the name of any existing R
  640. function. The function must be able to take a data frame as it's first
  641. argument, and must not <b>require</b> any further arguments (i.e. any
  642. further arguyments must have suitable default values). Any numerical
  643. output will be sent to the org buffer (use output-to-buffer:t to force
  644. this, although if that is necessary then that is a bug).
  645. </p>
  646. </div>
  647. </div>
  648. <div id="outline-container-9" class="outline-2">
  649. <h2 id="sec-9">More detailed description of org-R </h2>
  650. <div id="text-9">
  651. <p>My aim with org-R is to provide a fairly general facility for using
  652. R with Org. The TBLR lines and TBLRR lines together specify an R
  653. function, which may take numerical input, and may generate
  654. graphical output, or numerical output, or both.
  655. </p>
  656. <p>
  657. If any input data have been specified, then the R function receives
  658. those data as its first argument. The input data may come from an
  659. Org table, or from a csv spreadsheet file. In either case they are
  660. tabular (1- or 2-dimensional). The input data are passed to the
  661. function as an R data frame (a table-like structure in which
  662. different columns may contain different types of data &ndash; numeric,
  663. character, etc). Inside the R function, that data frame is called
  664. 'x'. 'x' is also the return value of the R function. Therefore the
  665. numerical output of org-R is determined by the modifications to the
  666. variable x that are made inside the function (any graphical output
  667. is a side effect.)
  668. </p>
  669. <p>
  670. It's worth noting that one mode of using org-R would be to write your
  671. own code in a separate file, and use the source() function on a TBLRR
  672. line to evaluate the code in that file.
  673. </p>
  674. <p>
  675. Numerical output of the function should also be tabular, and may be
  676. received by the Org buffer as an Org table, or sent to file in Org
  677. table or csv format. R deals transparently with multi-dimensional
  678. arrays, but Org table and csv format do not.
  679. </p>
  680. <p>
  681. Unless an output file has been specified, graphical output will be
  682. displayed on screen.
  683. </p>
  684. <p>
  685. The mapping from the TBLR and TBLRR lines to the R function may
  686. benefit from further thought; currently what happens is that code
  687. corresponding to the TBLR line is generated, and then any explicit
  688. user code is appended to this. Thus the TBLRR lines have the 'last
  689. word' on the output. Since multiple, intermixed, TBLR and TBLRR
  690. lines can be given, it might make sense instead to follow the order
  691. of those lines when constructing the code.
  692. </p>
  693. </div>
  694. </div>
  695. <div id="outline-container-10" class="outline-2">
  696. <h2 id="sec-10">Getting help with R </h2>
  697. <div id="text-10">
  698. <ul>
  699. <li>
  700. Bring up an R prompt with R at a shell prompt, or M-x R in emacs (if you have installed ESS)
  701. </li>
  702. <li>
  703. Enter ?function.name for help on function `function.name'
  704. </li>
  705. <li>
  706. Enter RSiteSearch("words") for online help matching "words"
  707. </li>
  708. <li>
  709. Enter ?par to see the full list of graphical parameters
  710. </li>
  711. <li>
  712. Follow the Documentation link on the left hand side of the R
  713. website for "An Introduction to R", and other more technical manuals.
  714. </li>
  715. </ul>
  716. </div>
  717. </div>
  718. <div id="outline-container-11" class="outline-2">
  719. <h2 id="sec-11">Brief advert for R </h2>
  720. <div id="text-11">
  721. <p>Seeing as this has made use of R, I'll briefly say my bit on it for
  722. those who are unfamiliar.
  723. </p><ol>
  724. <li>
  725. It's good for simple numerical work, as well as having
  726. implementations of a a very large range of more sophisticated
  727. mathematical and statistical procedures.
  728. </li>
  729. <li>
  730. It's good for producing graphics quickly, and for fine tuning
  731. every last detail of the graphics for publication.
  732. </li>
  733. <li>
  734. It's a syntactically reasonable, user-friendly, interpreted
  735. programming language, that is often used interactively (it comes
  736. with its own shell/command-line environment, and runs within
  737. emacs using ESS).
  738. </li>
  739. <li>
  740. It's a good language for a functional style of programming (in
  741. fact I'd say that's how it should be used), which might well
  742. appeal to elisp programmers. For example, you want to construct
  743. an arbitrarily nested data structure, then pass some function
  744. over the tips, returning a data structure of the same shape as
  745. the input? No problem (<a href="http://stat.ethz.ch/R-manual/R-patched/library/base/html/rapply.html">rapply</a>).
  746. </li>
  747. <li>
  748. There's a <b>lot</b> of add-on packages for it (CRAN link on left hand
  749. side of <a href="http://www.r-project.org/">website</a>.).
  750. </li>
  751. <li>
  752. How many programming languages will get <a href="http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html">their own article</a> in the
  753. New York Times this year?
  754. </li>
  755. </ol>
  756. </div>
  757. </div>
  758. <div id="postamble"><p class="author"> Author: Dan Davison
  759. <a href="mailto:davison@stats.ox.ac.uk">&lt;davison@stats.ox.ac.uk&gt;</a>
  760. </p>
  761. <p class="date"> Date: 2009-02-03 22:58:12 EST</p>
  762. <p>HTML generated by org-mode 6.20f in emacs 22</p>
  763. </div></body>
  764. </html>