This document provides a brief
introduction to how to use the knitrdata
package to create
standalone Rmarkdown documents. For examples of Rmarkdown documents
using the package, please consult the examples
directory
provided with the package. There is also an instructional video for how to use the package.
Sometimes it would be useful to make completely standalone Rmarkdown
documents that do not depend on data in external files. One important
example of this is scientific publications written in Rmarkdown for
which we often would like to supply the source document with the data to
ensure results are reproducible. The knitrdata
package
addresses this need by creating a mechanism for incorporating arbitrary
text and binary data in Rmarkdown documents. It works conceptually and
technically in a manner that is very similar to how images and other
binary data are incorporated into standalone HTML web pages and email
attachments: data are incorporated into specially delimited chunks that
consist of the data themselves plus a small bit of header information
explaining how the data are to be processed. Text data (e.g., CSV data
tables, BibTeX references, LaTeX style files) is typically incorporated
in chunks as is, whereas binary data (e.g., RDS files, images, NetCDF
files) is encoded as text using one of two standard encoding schemes.
During knitting of the Rmarkdown document, chunk data is decoded if
necessary, after which it can either be loaded into the Rmarkdown R
session or saved to an external file.
knitrdata
achieves this by extending knitr
to provide a new data
language engine (i.e., a new chunk
type). Instead of putting code inside data
chunks, one puts
the contents of the data file that one wishes to use in your Rmarkdown
document. For binary data, the package currently supports two standard
encoding formats: base64
, the standard binary encoding
format used (behind the scenes) for things like email attachments and
standalone HTML web pages; and gpg
, a well-known encryption
algorithm that prevents data from being accessed by users without the
appropriate decryption key. The latter option requires that a GPG
keyring for managing encryption keys be installed and properly
configured.
data
chunks do not produce output in the form of text or
figures as most code chunks do. Instead, the decoded contents of the
chunk are either returned as a variable in the R workspace or saved to
an external file.
There is an instructional video screencast demonstrating the use of
knitrdata
in Rstudio. It is available on youtube or by clicking the video
insert below.
knitrdata
can be installed from its github repository
using the package remotes:
Once the package is installed, it needs to be loaded in the Rmarkdown
script before the first data
chunk, typically in the
setup
chunk at the start of the document:
To ensure that your document is as standalone as possible, you can
combine these two steps in the setup
chunk (though this may
install packages without asking the user):
# If package not installed, install it
if (!requireNamespace("knitrdata")) {
if (!requireNamespace("remotes"))
install.packages("remotes")
remotes::install_github("dmkaplan2000/knitrdata",build_vignettes = TRUE)
}
library(knitrdata) # load package
After the package is installed, data
chunks can be
incorporated in the document. The precise format for data
chunks is described in the “Data chunks” Section,
but beforehand encoding of binary data is presented. If you are only
interested in incorporating text data in your Rmarkdown document, then
you can safely skip directly to the “Data chunks”
Section.
Both text and binary data files can be encoded, but encoding is only
required for binary data. Two encoding formats are currently
implemented: base64
, used for non-sensitive data; and
gpg
, allowing one to encrypt data so that only users with
the decryption key have access. The latter option requires that a GPG
keyring be installed and properly configured.
Two helper functions, data_encode
and
data_decode
, are included in the package to facilitate
encoding and decoding of data files. These are basically wrapper
functions for functionality provided by the xfun and gpg packages for
base64
and gpg
encoding, respectively. To
demonstrate their use, we will use the following simple data frame that
exists in both text (CSV) and binary (RDS) formats:
D = data.frame(a=1:3,b=letters[1:3])
write.csv(D,"test.csv", row.names = FALSE, quote = FALSE)
saveRDS(D,"test.RDS")
D
#> a b
#> 1 1 a
#> 2 2 b
#> 3 3 c
The contents of the CSV file are as follows:
a,b
1,a
2,b
3,c
This CSV text will be used as is in data
chunks (see “Data chunks” Section for details).
Base64 encoding is a widely-used, standard encoding to be used for all non-sensitive binary data. It is based on translating 6-bits of information into one of 64 alphanumeric and symbolic characters.
Encoding in base64 using the data_encode
function works
as follows:
By default this function will silently return the encoded data as a
character string. This character string can then be visualized using the
cat
function so that it can be copied and pasted directly
into a data
chunk:
cat(b64)
#> H4sIAAAAAAAAA4vgYmBgYGZgYWFkYGYFMhlYQ0PcdC2AYsJADhMQv4PSjAwsDJxA
#> mi85P7cgMbkkPjOvpDi1EE2WJSmxOBUqxgsWh9D/QDpBVjlwMICB/QdUGqpGAOIc
#> sFmMiTBGEoyRDLQB2TrWvMTc1GKoPiYs+tCUJ+ckFsOUwwS5UhJLEvXSioAmoSnn
#> LMov14PZAPIFUwOQ+P///1+QawHV+wxQOwEAAA==
This is only practical for relatively small data files, so for larger files, one can place the output in a file:
data_encode("test.RDS","base64",output="test.RDS.base64")
cat(readLines("test.RDS.base64"),sep="\n")
#> H4sIAAAAAAAAA4vgYmBgYGZgYWFkYGYFMhlYQ0PcdC2AYsJADhMQv4PSjAwsDJxA
#> mi85P7cgMbkkPjOvpDi1EE2WJSmxOBUqxgsWh9D/QDpBVjlwMICB/QdUGqpGAOIc
#> sFmMiTBGEoyRDLQB2TrWvMTc1GKoPiYs+tCUJ+ckFsOUwwS5UhJLEvXSioAmoSnn
#> LMov14PZAPIFUwOQ+P///1+QawHV+wxQOwEAAA==
Though it is rarely necessary to call the data_decode
function directly when working with the data
chunks in
Rmarkdown documents, base64
encoded data can be decoded as
follows:
Ecryption of data using GPG requires a properly configured GPG keyring. The functioning of GPG and GPG keyrings is beyond the scope of this document, but numerous websites explain how GPG works and how to install a GPG keyring, including the main gpg website.
For the purposes of this vignette, I will generate a test GPG
private-public key pair using the gpg package, however
in real use scenarios proper keys would typically be generated using the
gpg
command line tool (or an equivalent alternative) with
appropriate options.
id = gpg::gpg_keygen("test","[email protected]")
Next one uses this key to encode a data file:
Note that the ID of the desired encryption key must be supplied as
the receiver
in the options
list input
argument.
Decoding works as follows:
Note that there is no need to supply the receiver ID when decoding
because the appropriate private key is in the keyring. When decoding,
the gpg
package or keyring may prompt for a password to
unlock the decryption key if the key is password protected.
We can delete the public-private key pair we created for this exercise from our keyring as follows:
Data is incorporated into Rmarkdown documents using data
chunks that consist of the data themselves preceded by a header
containing a set of special chunk options describing how the data is to
be processed.
The simplest possible data
chunk is a text data chunk
containing plain text. Textual data can be directly placed into a data
chunk in an Rmarkdown document as follows:
When the Rmarkdown document is knitted, this chunk will put the text
contents of the chunk into the variable d
, which will then
contain the chunk contents as a character string. For the example CSV
data above, the character string can then be converted into a
data.frame
using read.csv
:
One can also load the data directly into a data.frame
using the loader.function
chunk option. The
loader.function
should be a function (or a character string
containing the name of a function) whose first input argument will be
the name of a file. A file containing the (decoded) data
chunk contents will be passed to this function and the output will be
assigned to the variable name contained in output.var
.
This will assign to d
the output of
read.csv
applied to the CSV data in the chunk.
Additional input arguments can be passed to
loader.function
by supplying a list as the
loader.ops
chunk option:
Note that in this case the first line of the CSV data has been
treated as data instead of as a header because we supplied the
header=FALSE
optional argument.
The data inside a text data
chunk does not have
to be scientific data. It can be any textual information, including the
contents of formatting files used by Rmarkdown to generate final output
documents. These include BibTeX files with references, LaTeX style files
(.cls) and bibliography style files (.csl). For example, if we include
the following in a Rmarkdown document:
```{data output.file="references.bib",echo=FALSE}
@article{MeynardTestingmethodsspecies2019,
ids = {MeynardTestingmethodsspecie,MeynardTestingmethodsspeciesinpress},
title = {Testing Methods in Species Distribution Modelling Using Virtual Species: What Have We Learnt and What Are We Missing?},
shorttitle = {Testing Methods in Species Distribution Modelling Using Virtual Species},
author = {Meynard, Christine N. and Leroy, Boris and Kaplan, David M.},
year = {2019},
month = dec,
volume = {42},
pages = {2021--2036},
issn = {0906-7590, 1600-0587},
doi = {10.1111/ecog.04385},
file = {/home/dmk/papers/meynard.et.al.2019.testing_methods_in_species_distribution_modelling_using_virtual_species.pdf},
journal = {Ecography},
keywords = {artificial species,environmental niche models,niche,simulations,species distribution modelling,virtual ecologist},
language = {en},
number = {12}
}
@article{SantosConsequencesdriftcarcass2018,
title = {Consequences of Drift and Carcass Decomposition for Estimating Sea Turtle Mortality Hotspots},
author = {Santos, Bianca S. and Kaplan, David M. and Friedrichs, Marjorie A. M. and Barco, Susan G. and Mansfield, Katherine L. and Manning, James P.},
year = {2018},
month = jan,
volume = {84},
pages = {319--336},
issn = {1470-160X},
doi = {10.1016/j.ecolind.2017.08.064},
copyright = {All rights reserved},
file = {/home/dmk/papers/santos.et.al.2018.consequences_of_drift_and_carcass_decomposition_for_estimating_sea_turtle.pdf},
journal = {Ecological Indicators},
keywords = {Carcass decomposition,Chesapeake bay,Conservation,Drift leeway,Drift simulations,Endangered species,Sea turtle mortality,Sea turtle strandings}
}
```
This will generate the file references.bib
from the
chunk contents. Note that one uses output.file
instead of
output.var
to save the contents to a file. Textual output
to a file can also be achieved using the cat
language
environment as described in the R Markdown Cookbook here,
though the data
language engine provides more options for
the handling of chunk contents (for example, one can use
base64
encoding to embed Rmarkdown documents within
Rmarkdown documents).
As style files like the BibTeX file described above are only used by
knitr/pandoc in the final formatting phase of generating an output
document, these files can be generated from data
chunks
during the initial phases of the knitting. This allows them to be stored
inside the Rmarkdown document itself, with no need for the external file
prior to knitting.
Base64 encoded binary data is incorporated into a chunk by copying
the output of the data_encode
function into the chunk and
supplying the format="binary"
chunk option:
```{data output.var="b",format="binary",echo=FALSE}
H4sIAAAAAAAAA4vgYmBgYGZgZgNiViCTgTU0xE3XAigmDOQwAfE7KM3IwMLACaT5
kvNzCxKTS+Iz80qKUwvRZFmSEotToWK8YHEI/Q+kE2SVAwcDGNh/QKUhaph5wc6B
6GOCsFmQzWfLSS1LzSkGsgTAshBRxkQYIwnGSEbTyJqck1gM0wc3LQ3okfwiiN2o
yvMSc1NhypmwWANWzvIfzUCulMSSRL20IqBmNAM5i/LL9WCGgnzJ1AAk/v///xdk
OwBVQunahwEAAA==
```
This will place the decoded contents of the chunk into a
raw
vector b
. Note that format
must be specified as 'binary'
and that we have chosen
echo=FALSE
to avoid including lots of ugly encoded content
in our formatted document. By default, when
format="binary"
, it is assumed that
encoding="base64"
, but this can also be supplied as a chunk
option for additional clarity.
The contents of the raw vector b
must be written to a
file before they can be read back into the Rmarkdown session:
We can combine the decoding and the writing steps by specifying the
output.file
chunk option instead of
output.var
:
```{data output.file="test_output.RDS",format="binary",echo=FALSE}
H4sIAAAAAAAAA4vgYmBgYGZgZgNiViCTgTU0xE3XAigmDOQwAfE7KM3IwMLACaT5
kvNzCxKTS+Iz80qKUwvRZFmSEotToWK8YHEI/Q+kE2SVAwcDGNh/QKUhaph5wc6B
6GOCsFmQzWfLSS1LzSkGsgTAshBRxkQYIwnGSEbTyJqck1gM0wc3LQ3okfwiiN2o
yvMSc1NhypmwWANWzvIfzUCulMSSRL20IqBmNAM5i/LL9WCGgnzJ1AAk/v///xdk
OwBVQunahwEAAA==
```
This will save the decoded data to the filename given by
output.file
. Then this file can be read back into the
Rmarkdown session:
Finally, we can combine all three steps (decoding, writing to disk,
reading back into R) using the loader.function
chunk
option:
```{data output.var="b",format="binary",echo=FALSE,loader.function=readRDS}
H4sIAAAAAAAAA4vgYmBgYGZgZgNiViCTgTU0xE3XAigmDOQwAfE7KM3IwMLACaT5
kvNzCxKTS+Iz80qKUwvRZFmSEotToWK8YHEI/Q+kE2SVAwcDGNh/QKUhaph5wc6B
6GOCsFmQzWfLSS1LzSkGsgTAshBRxkQYIwnGSEbTyJqck1gM0wc3LQ3okfwiiN2o
yvMSc1NhypmwWANWzvIfzUCulMSSRL20IqBmNAM5i/LL9WCGgnzJ1AAk/v///xdk
OwBVQunahwEAAA==
```
Given these options, during knitting, the contents of the data chunk
will be decoded, written to a temporary file as binary data and then
read back into the R session using the readRDS
function. In
the end, the variable whose name is given by output.var
will be assigned the output of readRDS
:
GPG chunks work similar to base64 chunks except that one must specify
encoding="gpg"
. To demonstrate this functionality, we first
import into the GPG keyring the private key previously used to encode
some data. We would never include the private key in an Rmarkdown
document in a real use case, but this is practical for this vignette and
it demonstrates another use of text data
chunks.
```{data key,output.file="key",include=FALSE}
-----BEGIN PGP PRIVATE KEY BLOCK-----
lQVYBF6A3p8BDADcaf7tveXZUpi0IfEpmYrPP8/OSXSh3iBkd5bdTvbq/FwLGIsD
dp/dFqAWS+0BqCIMFAtV63FUOG4kXYpkajdl2QU1Hy0aY9F9K0imc5JUM1SEry5F
CckjzDFp3u4pmmCPWKF2jVnaHzahJfKz9J9qD9BfBSynfyQU2XgsrRqNgiqeNcOi
f0674hpReawnecBwhENKMWL38O1aOtP1IDx9cFI6busiiOaIHIYYW6qbv178offy
0OWogstsQ3EJQbPBPkkgVTn8wwGUtoorc/2AonSoz99QC4nMWbBaDUGuE9O32yRv
Q7Pe6bWVBuIeV5ASAfSSEypzNHB576BF6MTy+lJvhfXI41Yu97geQJM0CplJ8xav
xAhIvrKjkDoW3zwrZlG54G2TidwEyXoDx7cyRVnCf9tsBCmhEDiKvzlg2IE9Fo65
+LWrD12qCKi7cu4XE28q4zy7S4adhUCBcuflZ8wKMVvbZRXvqnAHBAK8gQxMqHMc
EjWAb7rvmN9bkTUAEQEAAQAL/if4vPeGYaGIvhKkuSRvKOIu01O4tIMKUluF6IEX
6eVxgIuulr85CwLAMKX6fO+4+vuvwuKBARth5G+J2ygcrxE0SyJ4FejcQ0hsyg8N
lHLaoDAzyLNSc/ye8jMd75jx2yMD0rw6JBpPYMvWou4JpcNJPOOOf6ucfgGd8pI/
jjotaecpHuJgLfoapeUyqIq8JK8C/WT+EdGfCpw7YObqQq4I6ZCZPuETbKMwcQ0H
yqfWC7bK9Lk/MvbdSWDH1j70f/t1KaUEBZ2z5xTALqxaFgbwXh+7FybzV+09Sxsn
l5deeubEQXwkbPthapjRpvRo197tJRHLJ8wQVCwag39ip5cvuWQIsej3qILKTepz
VBdgZa4hIyLX8uUCAtLrVYwvWzV1oWxPLAkXJ6KPCzB0jQb7q7UUyrBOUaavdnt2
aWBz5EuXPTaMqnzWqEKIazcXqiCSNjIEv7HWcU734IGUazYper3poYgOWYYIdUes
+xbdWP/j6313N3u4a9BSd3PMvQYA4CLwr+gBfX+dybX3jq3ldB3HJS/Lv90e64rh
BarRu+ByyEO5BcVJZ+ZEUOcBjF/pvG1qI9mfqBuZX/e2aW1lmMsxcXNlWRu5b5vE
geoRwqPMNIo4JIo2hByHZeEPQLcYW/QRy5xkoNbl+udPuS3PMEUnfnPeQKursY71
ao7Zo0TUeFRemEgkvxZpFXfT+IMs9DGI/Wi6PO0ChSJ/Cu/QixgK0eJFUroNCyvl
bW+xy0GSB325wkyOM5xIny681KtvBgD7v5V6n0P2UucxZYU5hhdWaaTf5aF83vtE
o88gSU5NRO1/wPFb+AFP3fw8TNtrvRlA/OakwjL+GbfhioAJ4mtPbdGUojFIAU6X
czMHbaYyNwZTMImBW9uc2gDqta8O1HiSwC7fXnTxVoSz3E/TD6dbAnFyf1FYNntJ
PLKS9H82idCqO0nrU3LtdKJx9VHJ6wLOT16D6zZAdgNB0wK9dzStayfIqQzN/FAz
01u0ehX4SDRCxxgukdR4ZyeZJfdmC5sF+wZ/2mW4Tp7v3kutNAytk4JtMvLIhe2r
BQkYw5eUFMq7tUqXgsXMjA0pVplUSosZknCIpoyoEU7rvS9BF9xdcpRixU5kxeYY
knQg5jtb+vx3Stpp0vbuvFFaGgEJhNP6Tg3al7gBCOwEEAJmSTko4cyf1e45pIMF
+jGbIeozSjeKPWjdJCr4q05tvKgsiAe7BulgUlNhS6Ty5JyQHsiM/WZTPko2BsN2
8Apa/nuOvYwRwFLGGXVVWV3jQroPI9Hbft9ctBhUZXN0IEtleSA8dGVzdEB0ZXN0
Lm9yZz6JAdQEEwEKAD4WIQTvl6O3A7Tx/z8NeR/qzEhnRW4g7QUCXoDenwIbAwUJ
A8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRDqzEhnRW4g7WxrC/94WT6J
HEEgyb9Bskm2ik+c/qUW8w7JgizYRi6jqi8+qiIesh99MZ/XPm5mgMTIvKr0z/IG
xaU+RKYFF5DqsAc4obg/ZmClOSY9FgDWlMEm7hEqourQxfJZXGWRNcU6DTr2tC/K
GpTNkhR802LnjUePeVJU5MMuJ8eyQV+NgGhwXTIcPA6ERwHIC1n24N3QDFNoijcc
pTi5p9+N33w8fBC5ZMeZwrWI6mCJjEWVbxG2zcsIJ2t7htWRM7W1rKi5lHRpQdn/
cd9WtbdDFj7ywGPnjMB2vxYVJreENGbE/LZIZPaJKJHPReWQ+GBSGkyY7nrT32SP
R+qj5gO0Bez7F+61EDU+SXP9PJ8fyTGtUWfTsgz+fTj2TDn39y0tL1wuSciEOAjD
uia+L5qiKE9GK6mBQv78yfzZ/ZOEdJn9ZNRWs8kvs/aG9BygYMdJM5T4vvk2DcWd
m061EGTg/AVUFpMuTon9tb+RCIFfVjSzat8LWcf4Me2nJeFZu+lW/lCmxkedBVgE
XoDenwEMANPff6PrZirginP4HNK7g3ANmB3bDKCI1msAQspXMzvhtMc0Hn8DpM+r
wPUuoOo4hnYwkGHSNZ4dulrtW99mlzQWcFwDuOsvPAqc/OuEIEo0BBvc5HcpNk4d
z94Vno+Dq904VnlStf6DXpGbBFZkZBoC4XVwFUSoEjD1i967ckjFUhOxE5ynlcMb
8mpS65iml4JFd572bcuo9exJ1g7IhdgFIFoDDD2eJkxEhmEHNiVd8B9/j1GHxDCq
v/D0HNbgKuFk8WJUMYvupdqA30wAc5Ujnf+nURfNejgZTOiGXm5FZBrw/dha7yTP
/mlnNFMBKUEBrxYyPo2JVSsYfPf1WzLL1dmv8JPC5fyEKYhEC+zBvlytRWqkZV88
DumgVEdhEnnMEVlofyF8KoVMmWYA9w/FUUKiNymZlK1PEGecqliEhXh+KE03ncHh
AyEo0Zcdh5sSxUW5fNsQb+tp0fqFBs7Yye432w6ID3ZIONrnWrQ6MewWwxeAGMam
x03jgyMlCwARAQABAAv9EJ0e8iicS1JuKOfUwsWHafr26ahqlhAE2EEd+6XY06JA
PbqdhZIwk0RBjjhIz/T8vjnSqIkGQU7NdSHVqW/u/VuhFeYI0xBSIfbrckBbE9Z+
V/z7QUjPBFMcIKsLUu+dQ2yOg1b0BHAis0I3ldqrasq9CStvz4FqY8JtZFrIfGJU
rEyfYBJYEQOY/7Ne3Ap8KO/vkFx8gZLPLecgTOp2bFkCj2xbwl0rXaGl8+fP3CBA
mweyok8GGFbbVDagKE1NiukpEVzHsoMyMfPkxdIMLSj0F2GzQSnhyhyGomNstuTT
EC/i3/u7M9TRvLkpNTP3I6z5VNjayrp0NBs0z3sb1wNzrACELWbTtb/Lo5BVVD9Y
m0MQtDi8+SKzTHci2AdpvewxnhO4IiS/aXYYGcPwmEX4YdlZeV0J5mRXNsvWxYZk
HHFkbfgUkiFSFOmb9uyPD0NMldJoLXbv9+LFiU1okglietVcKK7Fyt5xCKcxbtO8
kdYJTuWonsWeyC8tz1WBBgDcq6doxs3aFSVeLcZ0//WHif+iBYlLFoexmw4irx8e
LnZilDJ5i4mwcu6Q5qxao3UEyeUC7ff//Qn846TQMDDRcC3xtrbqAqVyYBE7u9EI
OMyyCfosk8nNmVBpNdnsFm76lUyG8GiuT6b0j8BiQTRPmH4Xlh3pSiihyuTJIVhX
Y663wV8EwT9IRnYCoVqw9s5qZqJGkI4rxnABuyJui4BpmkrLry70t1xb6MdX2BPD
eK5u0YJ24AmxPW5YGvXnO0sGAPXLRfarrI9IgSz28+QpfYttOIbjp3n3AxB3ImHo
oK+CLsc1vHtsdEV8hElWo9k5EqcdlhPBbeC6IILFqT69Ldx8jK85hxR0bYs2NVLC
qyWo1T3bovPePCEenN4++VPBtVBkEt51MByNIKwC3Bw0zvHcygLcHE3iXRQ40dhq
AZWrPlOqwnC8x9+UqZoWCp/JRWD5qBjD6EPVAxwbtcUdjDOhZ1y51xbUaX59Vlul
BGLse/0Q47m71HrF+d9rGUnlQQYAkDQsdbzijmB/tVzcRXJWbZVgjwLciofxVpoM
TEYyw8+oSYDI1L3Dikejp3XymVr+9pKGmPZjLqL9Q01J9epeHt5wgLjuWTXtkVLW
kbnt7vTy257BIsHGDwiJzMI7PujTlQ4B1ZTPz2WyUJ7gn1f+J9wYpNOr7qeE2pg6
cOeiPQmT5h88jWTUH/eAJ0nAWx46kwgQY4uZz7xsFtCcwQgqVe9bD5MNv/bBUdPW
RkF8ZbRCPRk4Vl2DYM/rXC2VGCFZ6OeJAbYEGAEKACAWIQTvl6O3A7Tx/z8NeR/q
zEhnRW4g7QUCXoDenwIbDAAKCRDqzEhnRW4g7ZayC/954y+kfmjtIzSRDBRpOo2s
npOOwy7RLdOdWvab6jVecyqYsDyd/fiCXVKxALOVR31WTef00iFSLHQactwFxQyJ
zY6YO8tGkvYEXXYJR5O5MNzjlhNMndBqGIbKe9tA2BFLDD/6mmvMD/i9k+IhHzFT
NhoczB5rE9oaApMZhAj9u9Uv2zy0osfcOPcy+RN9b2noodVS/7Ei2BjWl+V/MGqa
I8oBM/ETIW/jcq+OuE8oSqoByFtFHh1DgOzOFugCWApOmAjLQwQCmDiYYtKN1GWq
l1E+txLud78ZBsJQL/78MXO9V2T2dCbcIA0vOfACuoPApfu6seRE0SLeImgoRg+8
7aX6HtiRXRjExDS26YNbGYzAvVTl3Zy1VptXOMwkh5CcIgtTcDv32pLWC3xvNydG
P4xDMM+BVuDi6QTcFfbPtqYbuuT4OFyyaSzee0oWxvKoX2pL81VnMwvb7Uy47Dxf
Ng9Af4cf3nf9UzesAVbSy1gtvlZIyX0HwtZNVLNJSS4=
=C6UF
-----END PGP PRIVATE KEY BLOCK-----
```
This key then needs to be imported into the keyring:
Now that we have this private key, we can decode some binary data encrypted using this key:
```{data t3,format="binary",encoding="gpg",output.var="d",loader.function=readRDS}
-----BEGIN PGP MESSAGE-----
Version: GnuPG v2
hQGMA9TPonHna5j3AQv7BIPNOSR/024iE0Gj3DCo3DvLvj/oEJ29XORHBkn4nul1
+zaRV5E/K4LCKxkkAEx/+FdM72x1hV5FF5Vf0FSet1RHiOOXPuChEEzRHOubkh/U
gw44Q72d8Dp6TOJ+1KT5k/fdkVKsOZRSttL8hvxqC4nyfObF0CkIoG+Kfx+kkYqu
araVWqNtcb3/FbtT+ZC0Hip0Ws6IJ8mGOhZdRxZ3S8KUtgf/t7S3Wa75c6L1wolT
R1/WhPgcWB4epLTvHdSmv9qcu/vFXE8SmNE5MV4V2aSTRU7y9WdPW/+XzU2Et4BK
kGyzhkI6q7QzAXFOeD1sn0uaUeH9/BDwn3AJZEXkwN4qaarPpDKjZ9GVE9Gg8521
BYe7AIZwq7sfnF+v1WyxamFYpSSAiNHze00MHPWot3Db+4SpRFYIWlYlZF00HIMo
Qspb4AmIfnNo9zj0RGG7GJoyod8ZrW4RF5iOEUWtyQ5z6LzymGTSdArWOq1fDgYW
tvEgbkbdYsJA6usJ3Zxc0sBJARfA6gDCFF72nGiAoNS98zoFjtD7hznY9DBvCOoF
jzJ8kHfPQPK9/bRVuofUDP+jOoJyf8/7eB6kANNq+XrzoZL0N42zHR2n47xupqYN
GSdsRljTra8a9zWIs9k8E5/79qRvV25c/wPeysulWkzhLCDaCMMVYJvQQ0JgT2L+
/eBtDeAuqkCqiDjeEGGB0Q4Q81IOIHMxUXFPJHvWJE4eJhnRjLmuPCBDQdL0JYTS
Bc48A/eA/cbfUr+4RluY9RLcaUHRPjKT8e7X98VdnSBPGvikVpSjR3zZhPNQs7Vb
C7H5lml4B8FpRgQBFwt2ou8URLRYR82tUa/OcsByW9jxf988YZx57a1hAw==
=m4sT
-----END PGP MESSAGE-----
```
For small data chunks, one can copy-paste the (encoded) data from the
command line or from a file into a Rmarkdown document. For larger data
chunks, this can be awkward, so knitrdata
includes two
command-line helper functions for creating and inserting
data
chunks into Rmarkdown documents, as well as 3 RStudio Addins that facilitate creating data chunks.
The command-line functions are create_chunk
and
insert_chunk
. We can combine these functions with
data_encode
to generate the base64 data
chunk
described above. First, we use create_chunk
to generate the
chunk:
library(magrittr) # For pipe operator
chunk = data_encode("test.RDS","base64") %>%
create_chunk(chunk_label="mydata",output.var="d",format="binary",echo=FALSE,
loader.function=readRDS)
cat(chunk,sep="\n")
#> ```{data mydata, output.var = "d", format = "binary", echo = FALSE, loader.function = readRDS}
#> H4sIAAAAAAAAA4vgYmBgYGZgYWFkYGYFMhlYQ0PcdC2AYsJADhMQv4PSjAwsDJxA
#> mi85P7cgMbkkPjOvpDi1EE2WJSmxOBUqxgsWh9D/QDpBVjlwMICB/QdUGqpGAOIc
#> sFmMiTBGEoyRDLQB2TrWvMTc1GKoPiYs+tCUJ+ckFsOUwwS5UhJLEvXSioAmoSnn
#> LMov14PZAPIFUwOQ+P///1+QawHV+wxQOwEAAA==
#> ```
Note that with the exception to the chunk label, chunk contents and other named arguments to this function, all the other arguments are not evaluated, so they can be given exactly as they should appear in the Rmarkdown document regardless of whether the chunk options make sense in the current context.
Next, we can use insert_chunk
to place this chunk at a
given line number in a Rmarkdown file (here taken to be
example.Rmd
):
This will insert the new data
chunk at line 11 in the
Rmarkdown document.
knitrdata
also includes the list_rmd_chunks
and splice_rmd_by_chunk
functions for identifying and
potentially removing or working with the chunks in a Rmarkdown document.
See the documentation and examples for these functions for more details
on their use.
Using the procedures above should assure that data inside
data
chunks do not have errors. Nevertheless, there is
always the possibility that a stray keystroke would modify the data,
particularly for binary data encoded as text. To protect against this,
data
chunks can have a md5sum
chunk option
that will be used to test if the decoded data matches expectations. A
MD5 sum is essentially a very large number (typically encoded in
hexadecimal) derived from a file’s contents that has a vanishingly small
probability of being equal to the equivalent number derived from a
different file’s contents. If two files have the same MD5 sum,
they are almost certainly identical.
To use the md5sum
chunk option, one must first determine
the MD5 sum of the decoded source data:
If this character string is given as the md5sum
chunk
option to a data chunk, then the MD5 sum of the decoded data of
that chunk will be calculated and checked against this character string.
If the two do not match, an error will be generated.
We can add a MD5 sum check to the data
chunk generated
in the previous section as follows:
md5 = tools::md5sum("test.RDS")
chunk = data_encode("test.RDS","base64") %>%
create_chunk(chunk_label="mydata",output.var="d",format="binary",
echo=FALSE,loader.function=readRDS,
chunk_options_string = paste0("md5sum='",md5,"'"))
cat(chunk,sep="\n")
#> ```{data mydata, output.var = "d", format = "binary", echo = FALSE, loader.function = readRDS, md5sum='41c7786379c523cd0c75b72ca2d6a0ad'}
#> H4sIAAAAAAAAA4vgYmBgYGZgYWFkYGYFMhlYQ0PcdC2AYsJADhMQv4PSjAwsDJxA
#> mi85P7cgMbkkPjOvpDi1EE2WJSmxOBUqxgsWh9D/QDpBVjlwMICB/QdUGqpGAOIc
#> sFmMiTBGEoyRDLQB2TrWvMTc1GKoPiYs+tCUJ+ckFsOUwwS5UhJLEvXSioAmoSnn
#> LMov14PZAPIFUwOQ+P///1+QawHV+wxQOwEAAA==
#> ```
md5sum
can be used on all data
chunk types,
including text data
chunks that are not encoded in any
special way.
One disadvantage of using data
chunks is that it can
make Rmarkdown files long and difficult to navigate if you use lots of
data. Rstudio can help with this problem by allowing one to hide the
contents of a chunk and by using navigation tools to jump between
sections. In addition, using insert_chunk
or the RStudio Addins to place a data
chunk
inside a document as the last step before public diffusion may avoid
having to work extensively with the large file. Nevertheless, large file
size can still be a problem. To facilitate the initial construction of
standalone Rmarkdown documents, data chunks can be read from external
files using the external.file
chunk option. The external
file must contain the encoded chunk contents exactly as they
would appear in a data
chunk. The intended use of this
option is that large data chunks would be placed initially in external
files, but the contents of these files would be placed directly in the
data
chunks before sharing the document with others.
As it is very common for data chunks to be used for CSV or RDS data,
additional shorthand language engines are included in the
knitrdata
package for working with these data. These chunks
types are csv
, csv2
and rds
for
working with comma-separated CSV data, semicolon-separated CSV data and
RDS data, respectively. These language engines are implemented as
wrappers for the standard data
language engine, but with
the loader.function
option predefined to be
read.csv
, read.csv2
and readRDS
,
respectively. Extra parameters for these loader functions can be given
directly as chunk options, as well as via the loader.ops
chunk option.
As an example, one could load CSV data that has no header and that uses a non-standard delimiter and comma as the decimal point indicator as follows:
The steps described above for encoding data, creating data chunks and
inserting them in Rmarkdown documents have been combined into a set of
GUI Shiny
applications that assist in creating and/or
removing data
chunks:
create_data_chunk_dialog()
for creating a data chunk
and returning it to the command line after the user specifies a data
file and a set of optionsinsert_data_chunk_dialog()
, which has the same
functionality as create_data_chunk_dialog()
, except that
the data chunk that is created is inserted at the cursor location in the
active source document in the Rstudio editorremove_chunks_dialog()
, which shows a data table
listing all chunks in the active source document in the Rstudio editor,
allowing one to select a set of chunks (i.e., rows) and then delete
those chunks with the click of a buttonThe last two of these, insert_data_chunk_dialog()
and
remove_chunks_dialog()
, are accessible in the Rstudio
Addins menu under the knitrdata
heading with titles
Insert filled data chunk
and Remove chunks
,
respectively. There is also a non-interactive addin entitled
Insert empty data chunk
that inserts an empty
data
chunk template in the active source document.
Before using these tools, first install all the additional packages
suggested by knitrdata
: shiny
,
miniUI
, DT
, rstudioapi
.
The use of these tools is generally self explanatory and is explained
in greater detail in the instructional video. The
chunk creation apps ask for the name of the data file to be incorporated
into the chunk, as well as a standard set of other chunk options, such
as data format and encoding and output options. The app will attempt to
suggest sensible options based on the input data file, though these can
be changed afterward. For example, if a binary file is selected, then
the format and encoding options will be set to binary
and
base64
respectively. GPG encoding of data using the app is
supported and will prompt the user to select the key ID(s)s of the
key(s) to be used to encrypt the data. By default, md5sum
checks of all binary data chunks will be included.
The chunk removal addin presents a searchable list of all chunks in
the source document. To eliminate a set of chunks, one just selects the
corresponding rows and clicks the Remove chunks
button.
Note that these tools do not try to determine if the active source
document is a Rmarkdown document. If undesired changes are made to a
document, then one can undo (ctrl-z
) those changes
afterward.
There are many potential scenarios where including data inside
Rmarkdown documents can be useful. The most obvious is to render a
document fully standalone, perhaps for creating a single document that
reproduces an entire publication or report or to make life as simple as
possible for collaborators that are not comfortable managing complex
Rmarkdown documents requiring multiple files. To do this, one would
place not only data inside chunks in the Rmarkdown document, but also
ancillary files associated with the document, such as CSS, LaTeX and CSL
(bibliography) style files, and BibTeX files. These latter chunks would
typically recreate these external files using the
output.file
chunk option.
knitrdata
also has uses in cases where the objective is
not necessarily to render a document fully standalone. One can use it as
a convenient way to input small data tables and data vectors (e.g., see
Numeric vector data below) into an Rmarkdown
document. For example, the data
chunk syntax combined with
CSV data may be simpler and cleaner than creating a
data.frame
or tibble
directly in R code.
Furthermore, though markdown allows one to create small tables by hand,
it is difficult or impossible to reproduce the sophisticated tables that
knitr::kable
and kableExtra
are capable of
creating, and tables created in markdown may not float as other
tables do, potentially an issue for producing scientific publications
with Rmarkdown. knitrdata
can be used as a simple way to
input these tables into R, which can then be used as input to
kable
and other tools. Finally, data
chunks
provide a convenient system for making certain Rmarkdown document text
conditional on parameter values or results (see Conditional text in Rmarkdown using knitrdata).
One knitrdata
feature that may at first glance seem to
be of little value is the possibility to gpg
encrypt data
chunks. Why would one want to encrypt just the data when one can encrypt
the entire Rmarkdown document? However, there are many cases where the
data themselves are confidential (e.g., economically important data,
confidential medical information), but the methods used to analyze and
synthesize those data can and should be publicly available. Furthermore,
it is increasingly common to share code and documents on public
collaboration websites, such as github
or slack, but one may not have
sufficient confidence in the privacy protections of these websites to
give them access to the data themselves.
Though the knitrdata
package can be a powerful tool, it
can also be abused by placing very large amounts of data inside
Rmarkdown files. This will make the documents very large and difficult
to navigate. The options to collapse data
chunks in
Rstudio, use Rstudio’s navigation tools, use knitrdata
’s
helper functions for data
chunk creation and the use of the
external.file
chunk option all can reduce the importance of
these issues, but including very large amounts of data in Rmarkdown
documents is unlikely to be optimal. In particular, Rstudio currently
will not open Rmarkdown documents larger than 5 MB in size, though
larger documents can still be opened by other editors and rendering from
the command line using rmarkdown::render
is always
possible. Basic prudence should be used when deciding what and how much
data to include in Rmarkdown documents.
You can use a data chunk to load numeric vector data as follows:
```{data output.var="v",line.sep=""}
1.2,3.4,5.6,
7.8,9.0
```
```{r}
f = function(v) as.numeric(strsplit(v,",")[[1]])
v=f(v)
v
```
#> [1] 1.2 3.4 5.6 7.8 9.0
One way to create large amounts of text in an Rmarkdown document that
is only included in the document if certain conditions are met is with
the asis
language engine. For example, if one creates an
input parameter in the YAML header of an Rmarkdown document:
Then one could include content in the document based on this parameter as follows:
This will add a section and a paragraph with one sentence. One
disadvantage of this approach is that the content of an
asis
chunk is never evaluated so one cannot use inline R
code.
This limitation can be overcome with knitrdata
by
pushing the content of the chunk into a text variable with conversion
specifications that can be replaced using the sprintf
function. For example, placing the following in the document:
```{data include=FALSE,output.var="cond_text"}
# A conditional section
This **will** be evaluated: $5+4=%d$
```
```{r results='asis',echo=FALSE,eval=params$cond}
cat(sprintf(cond_text,5+4))
```
Will lead to %d
being replaced with 9
.
Using md5sum
checks on text chunks that are not encoded
(i.e., 'asis'
) can be tricky because when the chunk is
processed, an operating-system-dependent newline character will be added
to each line, including the final line, and character encoding (i.e.,
UTF-8
, latin1
, etc.) translation might be
carried out. This will potentially make the md5sum check fail if the
file was created on a different operating system or the final line lacks
a newline character. To have the best possible chances of success when
using md5sum
checks with unencoded text files, follow these
steps:
md5sum
is formatted as is standard for your operating
system (i.e., newlines with '\r\n'
on DOS/Windows, and
newlines with '\n'
on all other operating systems).line.sep
chunk option to explicitly set the newline format to use. For example,
you can specify chunk option line.sep="\r\n"
to force the
data
language engine to output DOS formatted text
files.UTF-8
.If you follow these steps, md5sum
checks of text chunk
contents should work. The Insert filled data chunk
RStudio
addin will attempt to set the line.sep
chunk option to
match that of the incoming text file so that one normally should not
need to worry about the text file format.
If for some reason these steps are impractical or are not sufficient
(e.g., due to differences in character encodings between operating
systems), one can always use a different method to check the validity of
chunk contents. Base64 encoding of text files with a md5sum
check in a data
chunk is always a possibility, but if
content visibility is important, then one can use other hash algorithms
on the output of a data
chunk. For example, the following
would do a post hoc check of the validity of loaded data using
a SHA1 hash:
```{data output.var="d",loader.function=read.csv}
a,b
1,a
2,b
3,c
```
```{r checkdata}
if (digest::sha1(d) != "a5918a84b39f0b6f42e9ab4a19771a7d6a5777a0")
stop("data corrupted!")
```
md5sum
checks on base64
or gpg
encoded text files are generally reliable even if the above steps are
not followed as the exact text file contents are encoded in the
chunk.
If one includes in a Rmarkdown document a GPG-encoded data chunk that
uses a password-protected GPG key, then knitting may fail with the error
Password callback did not return a string value
. This is
because knitting of an Rmarkdown document takes place in a
non-interactive R session and, therefore, gpg::gpg_decrypt
(via knitrdata::data_decode
) is unable to open the password
entry dialog during the knitting process. The gpg
package
does not currently include another mechanism for providing the key
password, but one can work around this problem by using any mechanism to
temporarily store the key password in the GPG keyring manager. This can
generally be achieved by decrypting something that was encrypted by the
same key that was used to encrypt the chunk. This normally leads to the
GPG key password being temporarily stored in the keyring so that
gpg::gpg_decrypt
will not attempt to open the password
entry dialog, thereby avoiding the problem.
As a workaround for this problem, knitrdata
includes the
unlock_gpg_key_passphrase
that when run from the command
line with identifying information regarding the key needed to decrypt
the data chunks in the Rmarkdown document will attempt to unlock the
key.
It achieves this by encrypting and then immediately decrypting a small amount of data with the given key.
In addition to the standard eval
and echo
chunk options, data
chunks supports the following chunk
options:
Chunk option | Description |
---|---|
format |
One of 'text' or 'binary' .
Defaults to 'text' . |
encoding |
One of 'asis' , 'base64' or
'gpg' . Defaults to 'asis' for
format='text' and 'base64' for
format='binary' . |
decoding.ops |
A list with additional arguments for
data_decode . Currently only useful for passing the
verify argument to gpg::gpg_decrypt for
gpg encrypted chunks. |
external.file |
A character string with the name of a file whose text contents will be used as if they were the contents of the data chunk. |
md5sum |
A character string giving the correct md5sum of the decoded chunk data. If supplied, the md5sum of the decoded data will be calculated and compared to the supplied value, returning an error if the two do not match. |
output.var |
A character string with the variable name to which the
chunk output will be assigned. At least one of output.var
or output.file must always be supplied. |
output.file |
A character string with the filename to which the chunk
output will be written. At least one of output.var or
output.file must always be supplied. |
loader.function |
A function that will be passed (as the first argument) the name of a file containing the (potentially decoded) contents of the data chunk. |
loader.ops |
A list of additional arguments to be passed to
loader.function . |
line.sep |
Only used when encoding='asis' . In this
cases, specifies the character string that will be used to join the
lines of the data chunk before export to an external file, further
processing or returning the data. Defaults to
platform.newline() . |
max.echo |
An integer specifying the maximum number of lines of data to echo in the final output document. Defaults to 20. If the data exceeds this length, only the first 20 lines will be shown and a final line indicating the number of ommitted lines will be added. |
csv
, csv2
and rds
chunks
support these same options, but format
and
loader.function
are typically not used and additional
arguments to the predefined loader functions can be passed directly as
chunk options.