Julian Barg: Rmarkdown and Pandoc

Julian Barg

For papers that contain a lot of simple statistics, such as counts, keeping track of all numbers when the underlying data changes and not flipping any digits is a major pain. I knew Rmarkdown could tackle this issue and would still allow me to convert to docx etc. I have already been using its underlying technology, pandoc, to generate references and so forth, so I figured I would give it a shot. The experience is quite good, though I found the process a bit confusing. Since I had already written a pandoc markdown document, I just wanted to insert the appropriate numbers, that is it. Rmarkdown is a somewhat opinionated software though and the team expects you to do everything in Rmarkdown. There is one command though that allows you to generate the intermediate .md document that is passed on to pandoc:

rmarkdown::render("file.Rmd", run_pandoc = F)

Strangely enough, the argument output_format = "md_document" does not do exactly this. Instead, it escapes square brackets etc., so it does not produce a document we can feed into pandoc. I also could not figure out how to provide a pandoc default file. Usually, I run pandoc -d article to do all conversions, but I could not figure out how to get Rmarkdown how to pass on that argument. At the end of the day, I don’t need it I suppose. The workflow is as minimal as possible. I decided to do this. Before the main text, I inserted one cell to read my data and one to generate variables. This is an example of how to generate variables without including the cell in the .md file we generate to pass on to pandoc:

```{r count, include = F} n_obs <- length(unique(corp$funder))

...

```

That allows me to write sentences like this:

We aggregated `r n_not_ff_candidates` of these `r n_obs` corporations into `r industries` industries.

Then, after running rmarkdown::render, I obtain a file called file.knit.md where the same sentence looks like this: “We aggregated 549 of these 549 corporations into 31 industries.” Simple, and quite nifty. I suppose an even easier approach would be to save all relevant variables to a csv file with three columns. One with a simple variable name, one with a description, and one with the actual value. Although the result of the current approach is also nice. In one file, I can see not just the variables name but also how exactly I generated the variable, and if I want to I can jump in with Rstudio and look at all variables, too.

Also, this inside an R code cell ensures that number formatting includes a thousands separator:

knit_hooks$set(inline = function(x) { format(x, big.mark=“,”) })

Rmarkdown and Pandoc

Citation