nucmer可视化

编程语言2024-10-15 18:28:44

gitee：https://gitee.com/liaochenlanruo/mummer2circos
github: https://github.com/metagenlab/mummer2circos

来源：https://taylorreiter.github.io/2019-05-11-Visualizing-NUCmer-Output/

比对及R语言可视化

Installing mummer

conda create -n mummer 
conda activate mummer
conda install -c bioconda mummer4=4.0.0beta2

Running nucmer

To download the test data, run:

# M. harundinacea 6AC
wget -O mh6ac.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/235/565/GCA_000235565.1_ASM23556v1/GCA_000235565.1_ASM23556v1_genomic.fna.gz
gunzip mh6ac.fasta.gz

# M. harundinacea MAG07
wget -O mag07.fasta https://osf.io/d9qyg/download

The general structure of the nucmer command looks like this:

nucmer --mum reference.fasta query.fasta -p query_ref_nucmer

We will use the genbank assembly as a reference, and the metagenome assembled genome bin as the query.

nucmer --mum mh6ac.fasta mag07.fasta -p m_harundinacea

Here, we filter the nucmer output to only include alignment of length 1000. This is arbitrary, and you should use a length that makes sense for your biological question.

delta-filter -l 1000 -q m_harundinacea.delta > m_harundinacea_filter.delta
show-coords -c -l -L 1000 -r -T m_harundinacea_filter.delta > m_harundinacea_filter_coords.txt

Simple plot

-r reference fasta
-q other fasta with to compare with the reference fasta
-l mendatory option to build circular plots
genome tracks are ordered based on the order of the input query fasta files

mummer2circos -l -r genomes/NZ_CP008827.fna -q genomes/*fna

nucmer2circos_simple.png

Condensed tracks

mummer2circos -l -c -r genomes/NZ_CP008827.fna -q genomes/*fna

nucmer2circos_condensed.png

With gene tracks

the header of the reference fasta file chromosome (and eventual plasmids) should be the same as the locus accession of the genbank file. See example file NZ_CP008828.fna.

LOCUS NZ_CP008828 15096 bp DNA CON 16-AUG-2015

mummer2circos -l -r genomes/NZ_CP008827.fna -q genomes/*.fna -gb GCF_000281535_merged.gbk

nucmer2circos_gene_tracks.png

Label specific genes

given a fasta file of protein of interest, label the BBH of each amino acid sequence on the circular plot
the fasta headers are used as labels (see example file VF.faa)

mummer2circos -l -r genomes/NZ_CP008827.fna -q genomes/*.fna -gb GCF_000281535_merged.gbk -b VF.faa

nucmer2circos_labels.png

Show mapping depth along the chromosome (and plasmids)

depth files can be generated from bam file using samtools depth
the labels used in the .depth file should be the same as the fasta header (see example files)
regions with depth higher than 2 times the median are croped to that limit and coloured in green (deal with highly repeated sequences).
regions with depth lower than half of the median depth are coloured in red.

mummer2circos -l -r genomes/NZ_CP008827.fna -q genomes/*.fna -gb GCF_000281535_merged.gbk -b VF.faa -s GCF_000281535.depth

nucmer2circos_depth.png

Add labels based on coordinate file

structure: LOCUS start stop label (see labels.txt)
labels can not include spaces

mummer2circos -l -r genomes/NZ_CP008827.fna -q genomes/NZ_FO834906.fna -gb GCF_000281535_merged.gbk -b VF.faa -s GCF_000281535.depth -lf labels.txt

nucmer2circos_labels_coord.png

show links between two genomes

mummer2circos -r genomes/NZ_CP012745.fna -q genomes/*.fna -gb GCF_000281535_merged.gbk -b VF.faa -s GCF_000281535.depth -lf labels.txt

nucmer2circos_links.png

查看全文

https://www.xamrdz.com/lan/5ey1993889.html