Skip to contents

The basic workflow of the rPHG package is as follows:

  1. Create a connection object
  2. Select a PHG “method”
  3. Read data into the R environment
  4. Analyze and visualize data retrieval

This document introduces you to rPHG’s methods and grammar, and shows you how to apply them to the previously mentioned workflow.

Creating connection objects

PHG databases can be connected through two primary sources:

  • local
  • server

Local connections are for databases set up to use PostgreSQL or SQLite management systems, typically located either on a local machine or hosted on a high performance compute cluster which are accessed via the PHG API.

Conversely, server connections are for databases served on publicly available web services leveraging Breeding API (BrAPI) endpoints for data retrieval. For example, demo.hub.maizegenetics.net is a publicly available PHG database housing information many known diversity populations in maize.

Establishing a local connection

To set up a local connection, prior knowledge about how configuration files are set up is needed. If you would like more information about this topic, please see the vignette “Overview of configuration files

We can supply a path to a valid configuration file to the constructor, PHGLocalCon():

configFile |> PHGLocalCon()
## A PHGLocalCon connection object
##   Host......: localhost
##   DB Name...: phg_smallseq_test.db
##   DB Type...: sqlite

Here, our configuration file path (configFilePath) is parsed to create an object of type PHGLocalCon.

Establishing a server connection

If you would like to use a PHG web service, we can use the following similar method:

"phg.maizegdb.org" |> PHGServerCon()
## A PHGServerCon connection object
##   Host............: phg.maizegdb.org
##   Server Status...: 200 (OK)

Here, a URL pointing to a PHG web service is supplied to the constructor PHGServerCon() which will be parsed to create an object of type PHGServerCon.

PHG Methods

configFile |> 
    PHGLocalCon() |> 
    showPHGMethods()
## # A tibble: 4 × 3
##   type_name method_name              description      
##   <chr>     <chr>                    <list>           
## 1 PATHS     B73Ref_method_PATH       <named list [1]> 
## 2 PATHS     anchorwave_assembly_PATH <named list [1]> 
## 3 PATHS     GATK_PIPELINE_PATH       <named list [1]> 
## 4 PATHS     PATH_METHOD              <named list [27]>
configFile |> 
    PHGLocalCon() |> 
    PHGMethod("PATH_METHOD")
## A PHGMethod promise object:
##   <PHGLocalCon> --- <PATH_METHOD>

Reading data

Read samples (e.g. taxa)

configFile |> 
    PHGLocalCon() |> 
    PHGMethod("PATH_METHOD") |> 
    readSamples()
##  [1] "LineA1_gbs"              "LineA1_wgs"             
##  [3] "LineA_gbs"               "LineA_wgs"              
##  [5] "LineB1_gbs"              "LineB1_wgs"             
##  [7] "LineB_gbs"               "LineB_wgs"              
##  [9] "RecLineA1LineA1gco2_gbs" "RecLineA1LineA1gco2_wgs"
## [11] "RecLineA1RefA1gco1_gbs"  "RecLineA1RefA1gco1_wgs" 
## [13] "RecLineALineB1gco3_gbs"  "RecLineALineB1gco3_wgs" 
## [15] "RecLineB1LineBgco7_gbs"  "RecLineB1LineBgco7_wgs" 
## [17] "RecLineB1RefA1gco4_gbs"  "RecLineB1RefA1gco4_wgs" 
## [19] "RecLineBLineB1gco5_gbs"  "RecLineBLineB1gco5_wgs" 
## [21] "RecLineBLineB1gco8_gbs"  "RecLineBLineB1gco8_wgs" 
## [23] "RecRefA1LineBgco6_gbs"   "RecRefA1LineBgco6_wgs"  
## [25] "RefA1_gbs"               "RefA1_wgs"              
## [27] "Ref_gbs"                 "Ref_wgs"

Read reference ranges

configFile |> 
    PHGLocalCon() |> 
    PHGMethod("PATH_METHOD") |> 
    readRefRanges()
## GRanges object with 10 ranges and 1 metadata column:
##        seqnames      ranges strand |       rr_id
##           <Rle>   <IRanges>  <Rle> | <character>
##    [1]        1      1-3000      * |          R1
##    [2]        1   6501-9500      * |          R2
##    [3]        1 13001-16000      * |          R3
##    [4]        1 19501-22500      * |          R4
##    [5]        1 26001-29000      * |          R5
##    [6]        1 32501-35500      * |          R6
##    [7]        1 39001-42000      * |          R7
##    [8]        1 45501-48500      * |          R8
##    [9]        1 52001-55000      * |          R9
##   [10]        1 58501-61500      * |         R10
##   -------
##   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Read haplotype ID matrix

configFile |> 
    PHGLocalCon() |> 
    PHGMethod("PATH_METHOD") |> 
    readHaplotypeIds()
##                          R1  R2 R3  R4  R5  R6 R7 R8  R9 R10
## RecLineB1RefA1gco4_wgs  112 104 96 120 117 106 98 91 101 109
## RecLineB1RefA1gco4_gbs  112 104 96 120 117 106 98 91 101 109
## RecRefA1LineBgco6_wgs   113 103 94 118 117 106 99 93 100 111
## RecRefA1LineBgco6_gbs   113 103 94 118 117 106 99 93 100 111
## LineB1_wgs              112 104 96 120 116 107 99 93 100 111
## LineB1_gbs              112 104 96 120 116 107 99 93 100 111
## LineA_wgs               114 105 95 119 115 108 97 92 102 110
## LineA_gbs               114 105 95 119 115 108 97 92 102 110
## LineB_wgs               112 104 96 120 116 107 99 93 100 111
## LineB_gbs               112 104 96 120 116 107 99 93 100 111
## LineA1_wgs              114 105 95 119 115 108 97 92 102 110
## LineA1_gbs              114 105 95 119 115 108 97 92 102 110
## RecLineALineB1gco3_wgs  114 105 95 120 116 107 99 93 100 111
## RecLineALineB1gco3_gbs  114 105 95 120 116 107 99 93 100 111
## RecLineB1LineBgco7_wgs  112 104 96 120 116 107 99 93 100 111
## RecLineB1LineBgco7_gbs  112 104 96 120 116 107 99 93 100 111
## RefA1_wgs               113 103 94 118 117 106 98 91 101 109
## RefA1_gbs               113 103 94 118 117 106 98 91 101 109
## Ref_wgs                 113 103 94 118 117 106 98 91 101 109
## Ref_gbs                 113 103 94 118 117 106 98 91 101 109
## RecLineBLineB1gco8_wgs  112 104 96 120 116 107 99 93 100 111
## RecLineBLineB1gco8_gbs  112 104 96 120 116 107 99 93 100 111
## RecLineA1LineA1gco2_wgs 114 105 95 119 115 108 97 92 102 110
## RecLineA1LineA1gco2_gbs 114 105 95 119 115 108 97 92 102 110
## RecLineBLineB1gco5_wgs  112 104 96 120 116 107 99 93 100 111
## RecLineBLineB1gco5_gbs  112 104 96 120 116 107 99 93 100 111
## RecLineA1RefA1gco1_wgs  114 103 94 118 117 106 98 91 101 109
## RecLineA1RefA1gco1_gbs  113 103 94 118 117 106 98 91 101 109

PHGDataSet objects

configFile |> 
    PHGLocalCon() |> 
    PHGMethod("PATH_METHOD") |> 
    readPHGDataSet()
## class: PHGDataSet 
## dim: 10 28 
## metadata(0):
## assays(1): pathMatrix
## rownames(10): R1 R2 ... R9 R10
## rowData names(1): rr_id
## colnames(28): RecLineB1RefA1gco4_wgs RecLineB1RefA1gco4_gbs ...
##   RecLineA1RefA1gco1_wgs RecLineA1RefA1gco1_gbs
## colData names(0):