The basic workflow of the rPHG
package is as
follows:
- Create a connection object
- Select a PHG “method”
- Read data into the R environment
- Analyze and visualize data retrieval
This document introduces you to rPHG
’s methods and
grammar, and shows you how to apply them to the previously mentioned
workflow.
Creating connection objects
PHG databases can be connected through two primary sources:
- local
- server
Local connections are for databases set up to use PostgreSQL or SQLite management systems, typically located either on a local machine or hosted on a high performance compute cluster which are accessed via the PHG API.
Conversely, server connections are for databases served on publicly available web services leveraging Breeding API (BrAPI) endpoints for data retrieval. For example, demo.hub.maizegenetics.net is a publicly available PHG database housing information many known diversity populations in maize.
Establishing a local connection
To set up a local connection, prior knowledge about how configuration files are set up is needed. If you would like more information about this topic, please see the vignette “Overview of configuration files”
We can supply a path to a valid configuration file to the
constructor, PHGLocalCon()
:
configFile |> PHGLocalCon()
## A PHGLocalCon connection object
## ❯ Host......: localhost
## ❯ DB Name...: phg_smallseq_test.db
## ❯ DB Type...: sqlite
Here, our configuration file path (configFilePath
) is
parsed to create an object of type PHGLocalCon
.
Establishing a server connection
If you would like to use a PHG web service, we can use the following similar method:
"phg.maizegdb.org" |> PHGServerCon()
## A PHGServerCon connection object
## ❯ Host............: phg.maizegdb.org
## ❯ Server Status...: 200 (OK)
Here, a URL pointing to a PHG web service is supplied to the
constructor PHGServerCon()
which will be parsed to create
an object of type PHGServerCon
.
PHG Methods
configFile |>
PHGLocalCon() |>
showPHGMethods()
## # A tibble: 4 × 3
## type_name method_name description
## <chr> <chr> <list>
## 1 PATHS B73Ref_method_PATH <named list [1]>
## 2 PATHS anchorwave_assembly_PATH <named list [1]>
## 3 PATHS GATK_PIPELINE_PATH <named list [1]>
## 4 PATHS PATH_METHOD <named list [27]>
configFile |>
PHGLocalCon() |>
PHGMethod("PATH_METHOD")
## A PHGMethod promise object:
## <PHGLocalCon> --- <PATH_METHOD>
Reading data
Read samples (e.g. taxa)
configFile |>
PHGLocalCon() |>
PHGMethod("PATH_METHOD") |>
readSamples()
## [1] "LineA1_gbs" "LineA1_wgs"
## [3] "LineA_gbs" "LineA_wgs"
## [5] "LineB1_gbs" "LineB1_wgs"
## [7] "LineB_gbs" "LineB_wgs"
## [9] "RecLineA1LineA1gco2_gbs" "RecLineA1LineA1gco2_wgs"
## [11] "RecLineA1RefA1gco1_gbs" "RecLineA1RefA1gco1_wgs"
## [13] "RecLineALineB1gco3_gbs" "RecLineALineB1gco3_wgs"
## [15] "RecLineB1LineBgco7_gbs" "RecLineB1LineBgco7_wgs"
## [17] "RecLineB1RefA1gco4_gbs" "RecLineB1RefA1gco4_wgs"
## [19] "RecLineBLineB1gco5_gbs" "RecLineBLineB1gco5_wgs"
## [21] "RecLineBLineB1gco8_gbs" "RecLineBLineB1gco8_wgs"
## [23] "RecRefA1LineBgco6_gbs" "RecRefA1LineBgco6_wgs"
## [25] "RefA1_gbs" "RefA1_wgs"
## [27] "Ref_gbs" "Ref_wgs"
Read reference ranges
configFile |>
PHGLocalCon() |>
PHGMethod("PATH_METHOD") |>
readRefRanges()
## GRanges object with 10 ranges and 1 metadata column:
## seqnames ranges strand | rr_id
## <Rle> <IRanges> <Rle> | <character>
## [1] 1 1-3000 * | R1
## [2] 1 6501-9500 * | R2
## [3] 1 13001-16000 * | R3
## [4] 1 19501-22500 * | R4
## [5] 1 26001-29000 * | R5
## [6] 1 32501-35500 * | R6
## [7] 1 39001-42000 * | R7
## [8] 1 45501-48500 * | R8
## [9] 1 52001-55000 * | R9
## [10] 1 58501-61500 * | R10
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
Read haplotype ID matrix
configFile |>
PHGLocalCon() |>
PHGMethod("PATH_METHOD") |>
readHaplotypeIds()
## R1 R2 R3 R4 R5 R6 R7 R8 R9 R10
## RecLineB1RefA1gco4_wgs 112 104 96 120 117 106 98 91 101 109
## RecLineB1RefA1gco4_gbs 112 104 96 120 117 106 98 91 101 109
## RecRefA1LineBgco6_wgs 113 103 94 118 117 106 99 93 100 111
## RecRefA1LineBgco6_gbs 113 103 94 118 117 106 99 93 100 111
## LineB1_wgs 112 104 96 120 116 107 99 93 100 111
## LineB1_gbs 112 104 96 120 116 107 99 93 100 111
## LineA_wgs 114 105 95 119 115 108 97 92 102 110
## LineA_gbs 114 105 95 119 115 108 97 92 102 110
## LineB_wgs 112 104 96 120 116 107 99 93 100 111
## LineB_gbs 112 104 96 120 116 107 99 93 100 111
## LineA1_wgs 114 105 95 119 115 108 97 92 102 110
## LineA1_gbs 114 105 95 119 115 108 97 92 102 110
## RecLineALineB1gco3_wgs 114 105 95 120 116 107 99 93 100 111
## RecLineALineB1gco3_gbs 114 105 95 120 116 107 99 93 100 111
## RecLineB1LineBgco7_wgs 112 104 96 120 116 107 99 93 100 111
## RecLineB1LineBgco7_gbs 112 104 96 120 116 107 99 93 100 111
## RefA1_wgs 113 103 94 118 117 106 98 91 101 109
## RefA1_gbs 113 103 94 118 117 106 98 91 101 109
## Ref_wgs 113 103 94 118 117 106 98 91 101 109
## Ref_gbs 113 103 94 118 117 106 98 91 101 109
## RecLineBLineB1gco8_wgs 112 104 96 120 116 107 99 93 100 111
## RecLineBLineB1gco8_gbs 112 104 96 120 116 107 99 93 100 111
## RecLineA1LineA1gco2_wgs 114 105 95 119 115 108 97 92 102 110
## RecLineA1LineA1gco2_gbs 114 105 95 119 115 108 97 92 102 110
## RecLineBLineB1gco5_wgs 112 104 96 120 116 107 99 93 100 111
## RecLineBLineB1gco5_gbs 112 104 96 120 116 107 99 93 100 111
## RecLineA1RefA1gco1_wgs 114 103 94 118 117 106 98 91 101 109
## RecLineA1RefA1gco1_gbs 113 103 94 118 117 106 98 91 101 109
PHGDataSet
objects
configFile |>
PHGLocalCon() |>
PHGMethod("PATH_METHOD") |>
readPHGDataSet()
## class: PHGDataSet
## dim: 10 28
## metadata(0):
## assays(1): pathMatrix
## rownames(10): R1 R2 ... R9 R10
## rowData names(1): rr_id
## colnames(28): RecLineB1RefA1gco4_wgs RecLineB1RefA1gco4_gbs ...
## RecLineA1RefA1gco1_wgs RecLineA1RefA1gco1_gbs
## colData names(0):