Skip to content

Update Data

Most data in Pantograph can be extended or updated by admin users:

Upload Metadata

Admin users can upload metadata.

New or updated metadata should be stored in comma- or tab-separated text files (CSV/TSV) and can then be uploaded via the 'Metadata management' tab on Pantograph's Pipeline page.

Requirements

The metadata file must have a header line indicating the metadata categories that will be displayed in Pantograph, and the first column need to be named genome_name. Metadata for each track should be listed in the same order of metadata categories; see the example metadata file below.

INFO

Both categorical and numerical metadata values are supported and will be automatically identified.

IMPORTANT

When a new file is uploaded, its metadata is added to the existing metadata in Pantograph. Each upload is assigned a unique group label, enabling easier organization and the option to delete all metadata associated with a specific group/file only. This is also the reason, why uploading already existing metadata categories from previously uploaded files will duplicate the metadata category.

Example Metadata CSV File

csv
genome_name,cultivation,yield_random  
pathA,cultivated,0.146957  
pathB,landrace,0.326076  
vcfTrack1,cultivated,0.461396  
vcfTrack2,,0.473928

INFO

Missing data is represented by an empty entry in the respective metadata column (e.g., cultivation for vcfTrack2 in the example above).

After the upload, the new data will be available in Pantograph after a reload of the Pantograph website.

Upload Variation Tracks

Admin users can upload new variation tracks using input data in VCF format (Variant Call Format).

Variant tracks are organized into groups, which can be selected together within Pantograph. Typically, the tracks belonging to a group are stored in a single multi-sample VCF file. Each VCF file should be assigned a unique group name.

Uploading VCF Files

Your VCF files must be uploaded to an object storage service (e.g., S3 or MinIO) to avoid large browser uploads. If Pantograph runs on Computomics' premises, you need a 'Keycloak' account to securely access the data storage platform.

Step-by-Step Instructions

  1. Get a Keycloak account (for Pantograph installations on Computomics' premises)
  • Request a 'Keycloak' account from your Computomics representative or from support.
  1. Upload the VCF Files
  • Please follow these instructions how to upload your VCF files to the data storage system at Computomics with your Keycloak account.
  • Copy the absolute path of each VCF file you placed into the bucket.
  1. Create a Samplesheet
  • Format the samplesheet as a CSV file (comma-separated).
  • The header line must be exactly: group,reference,vcf
  • For each VCF file, add a new line with:
    • Group: A user-defined name for the VCF track group (e.g., vcf_group_1). This name will appear in the track selection menu.
    • Reference: The reference genome name (as used in the graph; omit any chromosome or sequence suffix, e.g., use genomeA instead of genomeA_Chr01). VCF Path: The absolute path to the uncompressed .vcf or bgzip-compressed .vcf.gz file in the bucket.
  1. Start the Upload Pipeline
  • Open the Upload VCF Pipeline tab on the Pipeline Page in Pantograph.
  • Enter a pipeline name and provide the samplesheet created in step 3.
  1. Configure Output Options
  1. Start the Pipeline
  • Click "Start Pipeline" to begin processing.
  • The software will convert and integrate the VCF files into the required format. This process may take several hours.

Notes

  • Variant track names are extracted from the SAMPLE columns in the VCF file header.
  • Track names must be globally unique across all variant groups (they may match graph track names, if desired).
  • Warning: Uploading a sample with a name that already exists in the specified variant group will overwrite the existing data.

After the upload completes, the new variant track groups will be available in the Tracks Menu after refreshing the software interface.