vignettes/write_configuration_file.Rmd
write_configuration_file.RmdConfiguration files in BioInstaller are important. We used these configuration files to stored the software and databases URL, the script of installation, and other useful information.
Most of the configuration files are parsed by configr. Compared with original configr package syntax #R# R CMD #R# is a different point. It can be used to mark those R format command.
Built-in configuration files: github.toml, nongithub.toml and db.toml (db_annovar.toml/db_main.toml, nongithub.toml format) can be used to download and install several software and database. install.bioinfo(show.all.names = TRUE) can be used to get all of avaliable softwares and databases existed in github.toml and nongithub.toml.
Variables to control the download and installation steps of software and databases deposited on github:
use_git2r to false, BioInstaller will use the git of your system.use_git2r to false and setted recursive_clone to true, BioInstaller will run this command git clone --recursive https://path/repo
before_install stored the pre-installation stepsinstall mainly be used to store the installation steps. Besides, you can use your own installation script and setted it to #R# system('/path/yourscript')#R#
make_dir is the compile directory of software and database. Because the workdir of R default will be set to download.dir, and need be changed to make_dir finish install steps.[bwa]
github_url = "https://github.com/lh3/bwa"
after_failure = "echo 'fail!'"
after_success = "echo 'successful!'"
make_dir = ["./"]
bin_dir = ["./"]
[bwa.before_install]
linux = ""
mac = ""
[bwa.install]
linux = "make"
mac = "make"
Github software version control can be done by git2r package and github tag API. Source URL of software or files deposited in github can be found by github_url in github.toml.
Variables to control the download and installation steps of software and databases not be deposited on github:
github_url be replaced by source_url
url_all_download to true.version_order_fixed to true. Optional, if the file count of source code was only one, you can set url_all_download to false and writing multiple URL. It will help you to avoid the invalid URL caused download fail.[gmap]
# {{version}} will be parsed to your install.bioinfo `version` parameter
# or the newest version parsed from fetched data.
source_url = "http://research-pub.gene.com/gmap/src/{{version}}.tar.gz"
after_failure = "echo 'fail!'"
after_success = "echo 'successful!'"
make_dir = ["./"]
bin_dir = ["./"]
[gmap.before_install]
linux = ""
mac = ""
[gmap.install]
linux = "./configure --prefix=`pwd` && make && make install"
mac = ["sed -i s/\"## CFLAGS='-O3 -m64' .*\"/\"CFLAGS='-O3 -m64'\"/ config.site",
"./configure --prefix=`pwd` && make && make install"]
Version control of non-github software and databases need a function parsing URL and use {{version}} to replace in the source_url.
Besides, BioInstaller uses configr glue to reduce the length of files name. It can help you to use less word to store more files name.
library(configr)
library(BioInstaller)
blast.databases <- system.file('extdata',
'config/db/db_blast.toml', package = 'BioInstaller')
read.config(blast.databases)$db_blast_nr$source_url
#> [1] "!!glue ftp://ftp.ncbi.nih.gov/blast/db/nr.{ids=sprintf('%02d', 0:68);rep(ids, 2)}.tar.gz{c(rep('', length(ids)), rep('.md5', length(ids)))}"
x <- read.config(blast.databases, glue.parse = TRUE)$db_blast_nr$source_url
length(x)
#> [1] 138
head(x)
#> [1] "ftp://ftp.ncbi.nih.gov/blast/db/nr.00.tar.gz"
#> [2] "ftp://ftp.ncbi.nih.gov/blast/db/nr.01.tar.gz"
#> [3] "ftp://ftp.ncbi.nih.gov/blast/db/nr.02.tar.gz"
#> [4] "ftp://ftp.ncbi.nih.gov/blast/db/nr.03.tar.gz"
#> [5] "ftp://ftp.ncbi.nih.gov/blast/db/nr.04.tar.gz"
#> [6] "ftp://ftp.ncbi.nih.gov/blast/db/nr.05.tar.gz"
mask.github <- tempfile()
file.create(mask.github)
#> [1] TRUE
install.bioinfo(nongithub.cfg = blast.databases, github.cfg = mask.github,
show.all.names = TRUE)
#> Warning in fetch.config(github.cfg): Configuration file /var/folders/nc/
#> yl5qhkkn6vxf_m7s_yz2kzvh0000gn/T//RtmpAgNeGj/file1d3e2b6ff38b is empty,
#> please check the links.
#> [1] "db_blast_env_nr" "db_blast_est_human"
#> [3] "db_blast_est_mouse" "db_blast_est_others"
#> [5] "db_blast_gss" "db_blast_htgs"
#> [7] "db_blast_human_genomic" "db_blast_landmark"
#> [9] "db_blast_mouse_genomic" "db_blast_nr"
#> [11] "db_blast_nt" "db_blast_other_genomic"
#> [13] "db_blast_pataa" "db_blast_patnt"
#> [15] "db_blast_pdbaa" "db_blast_pdbnt"
#> [17] "db_blast_ref_prok_rep_genomes" "db_blast_ref_viroids_rep_genomes"
#> [19] "db_blast_ref_viruses_rep_genomes" "db_blast_refseq_genomic"
#> [21] "db_blast_refseq_protein" "db_blast_refseq_rna"
#> [23] "db_blast_refseqgene" "db_blast_sts"
#> [25] "db_blast_swissprot" "db_blast_taxdb"
#> [27] "db_blast_tsa_nr" "db_blast_tsa_nt"
#> [29] "db_blast_vector"To resolve some software dependence, BioInstaller using the {{key:value}} format expression, and get its value from BBIO_SOFWARES_DB_ACTIVE database.
For example, htslib is the dependence of Pindel, and we use ./INSTALL {{htslib:source.dir}} as the install step of Pindel. In the session of R, the value of {{htslib:source.dir}} will be replaced by the real value stored in BIO_SOFTWARES_DB_ACTIVE or db in install.bioinfo function.
install.bioinfo parameter extra.list
To improve the flexibility of configuration templet, BioInstall using the {{parameters}} format expression to get the function install.bioinfo parameter extra.list. Noteably, the name, version, os.version, destdir were default pass to extra.list.
For example, source_url of GMAP need the version value, and we use source_url = "http://research-pub.gene.com/gmap/src/{{version}}.tar.gz" as the download URL. In the session of R, the {{version}} will be replaced by the version parameter value of install.bioinfo (if the version were NULL, it will be set to be the newest version).