Skip to content

Update Data

Most data in Pantograph can be extended or updated by admin users:

  • Upload metadata
  • Upload variation tracks
  • Upload expression data

Upload Metadata

Admin users can upload metadata.

New or updated metadata should be stored in comma- or tab-separated text files (CSV/TSV) and can then be uploaded via the 'Metadata management' tab on Pantograph's Pipeline page.

Requirements

The metadata file must have a header line indicating the metadata categories that will be displayed in Pantograph, and the first column need to be named genome_name. Metadata for each track should be listed in the same order of metadata categories; see the example metadata file below.

INFO

Both categorical and numerical metadata values are supported and will be automatically identified.

IMPORTANT

When a new file is uploaded, its metadata is added to the existing metadata in Pantograph. Each upload is assigned a unique group label, enabling easier organization and the option to delete all metadata associated with a specific group/file only. This is also the reason, why uploading already existing metadata categories from previously uploaded files will duplicate the metadata category.

Example Metadata CSV File

csv
genome_name,cultivation,yield_random  
pathA,cultivated,0.146957  
pathB,landrace,0.326076  
vcfTrack1,cultivated,0.461396  
vcfTrack2,,0.473928

INFO

Missing data is represented by an empty entry in the respective metadata column (e.g., cultivation for vcfTrack2 in the example above).

After the upload, the new data will be available in Pantograph after a reload of the Pantograph website.

Upload Variation Tracks

Admin users can upload new variation tracks.

To upload new variation tracks, ensure the following requirements are met:

Samplesheet Format:

The samplesheet must be in CSV format (comma-separated).
The header line must follow this format: "group,reference,vcf".

The required samplesheet columns are:

  • Column 1: Group name for the VCF files (user-specified; this name will show up in the track menu; e.g., "vcf_group_1")
  • Column 2: Name of the reference genome (as specified in the graph; without sequence suffix, e.g. indicate 'genomeA' and not 'genomeA_Chr01')
  • Column 3: Absolute path of uncompressed (.vcf) or bgzip-compressed (.vcf.gz) vcf file on the bucket

VCF Files

Place all VCF files listed in the samplesheet into a unique folder on the object storage.
Ensure the file paths specified in Column 3 of the samplesheet correctly point to their respective files in the object storage's folder.

Notes:

  • Variant track names are derived from the SAMPLE columns in the header line of the VCF file.
  • Only globally unique variant track names are supported across all variant groups (they may match graph track names, if desired).
  • Uploading data from samples with names that already exist in the indicated variant group will overwrite the corresponding existing data.

After the upload, the new data will be available as new variant track groups in the Tracks Menu in Pantograph after a reload of the Pantograph website.