Introduction

In this reference manual, we arranged the databases in annovarR and provide several meta information about the annovarR supported databases and other BioInstaller supported (download only) databases. Some of description or comments have been described in the download configuration file (BioInstaller package) and the annotation configuration file (annovarR).

annovarR supported databases will firstly be supported to download from original sites (exclude the authentication part) in BioInstaller. A portion of BioInstaller supported databases will be introduced in annovarR to as the candidate annotation databases (process method: remain unchanged, re-formate, re-analysis).

Overview of supported database

The followed table shows all annotation names with its required download name.

The followed table shows all database names with its versions and description.

Besides, you can use the function download.database with parameter show.all.buildvers = TRUE get all available buildver.

Gene and clincal annotation

Gene annotation databases contain the gene classification, gene function and phenotype correlation, such as HGNC, OMIM DoCM, CIVic, DisGeNET, ClinVar, and Gene Ontology (GO), .etc.

Variant effect prediction

Variant effect prediction databases contain the various databases generated by the algorithms for prediction of variants effect on protein or RNA structural, such as SIFT, PolyPhen2, PROVEAN, MutationTaster, MutationAssessor, FATHMM, .etc.

Population allele frequency

Population allele frequency databases contain the databases based on the population cohort genome sequencing data (mainly include whole genome sequencing and whole exome sequencing), such as 1000 Genome Project, NHLBI GO Exome Sequencing Project (ESP), gnomAD and ExAC, .etc.

Cancer somatic mutation

Cancer somatic mutation databases generated by the cancer patients case-control paired genomic sequence data, such as COSMIC, Cancer Hotspots, intogen and Cancer Biomarkers database, .etc.

RNA-seq variants

RNA-seq variants databases contributed by variants called from RNA-seq including expressed allele and RNA-editing. annovarR built an RNA-seq variants database, BRVar, based on 1285 cases B-cell lymphoblastic leukemia (B-ALL) patients RNA-seq data (Four different variants detection method be applied).

Expression quantitative trait locus (eQTL)

eQTL databases contain the candidate locus of genome that have an candidate impact on gene expression level, such as Genotype-Tissue Expression (GTEx) QTL, seeQTL and PancanQTL, .etc.

Non-coding RNA

Non-coding RNA databases contain the candidate biomarkers or non-coding RNA targeted transcriptional regulation region, such as Cancer-Specific CirRNA Database and (LNCediting)[http://bioinfo.life.hust.edu.cn/LNCediting/], .etc.