Skip to contents

The idea behind the PHG is that in a given breeding program, all parental genotypes can be sequenced at high coverage and loaded as parental haplotypes in a relational database. Progeny can then be sequenced at low coverage and used to infer which parental haplotypes/genotypes from the database are the most likely present in a given progeny.

In the following sections, we will give an overview of how to set up configuration files to connect to local databases.

Database types

Currently, the PHG can use SQLite or PostgreSQL to store data for the pan-genomic graph. For more information about how data is stored within the database schema, please refer to the PHG Wiki.

Configuration files

Access to the PHG database, regardless of database type, requires a configuration file. This file contains various metadata needed to access relevant PHG data and/or calculate optimal graph paths:

Field Description
host database host and/or port number
user username
password password
DB path to database
DBtype database type (sqlite or postgres)

An example database configuration can be found below:

SQLite example

host=localHost
user=user
password=password
DB=/tempFileDir/outputDir/phgTestDB_mapq48.db
DBtype=sqlite

PostgreSQL example

host=184.32.99.233:5422
user=user
password=password
DB=phgdb
DBtype=postgres