This page contains files giving the strand orientation and position of variants of most genotyping arrays. Currently genome builds NCBI35 to GRCh38 are available. New files will be added as new genome builds or arrays are released.
If you have an array that is not listed here and would like me to create the files please contact me, Will Rayner
william dot rayner at helmholtz-munich dot de
and/or
will dot rayner at strand dot org dot uk
and I can create and post the file(s) to this page. If you have the array annotation file (.csv format) that you require strand files for then do share it with me and I can create the files as required.
The Chipendium tool to identify the array and strand orientation from the SNP list in a plink format bim file is currently unavailable but will be returning soon.
The data for each array and genome build combination are all downloadable from the links below, each zip file contains three files, these are:
.strand
.miss
.multiple
.strand file This contains all the variants where the match to the relevant genomic sequence >90%.
The strand file contains six columns;
SNP ID SNP ID taken from the array manifest |
Chromosome Chromosome as determined from remapping the SNP flanking sequences to the stated genome build |
Position Position as determined from remapping the SNP flanking sequences to the stated genome build |
%match to genomePercentage match of the flanking sequences to the stated genome build at the position specified |
Strand Strand on the reference genome as determined from remapping the TOP strand sequence given in the array annotation file |
TOP Alleles TOP alleles given in the array annotation file, useful for double checking that input data are on the TOP strand |
The SNP ids used are those from the Illumina annotation file and are not necessarily the latest ones for that position from dbSNP. The alleles listed are the Illumina TOP alleles, if you are in any doubt whether your data file can be used with these strand files a check using the Chipendium software can confirm the id and orientation of you data.
.miss file The .miss file contains the same columns as the strand file and lists all the ids of the SNPs that did not reach the required threshold (>90%) to be considerd a match to the genome, the position and strand of the best match are given here. This file will not contain SNPs that did not map to the genome at all.
.multiple file The .multiple file contains SNPs that had more than one
high quality match (>90%) to the genome.
This file contains three columns;
SNP ID SNP ID taken from the array manifest |
Number of matches >90%Where there is more than one match this is the total number of matches of the flanking sequences to the genome >90% |
Number of identical matches >90% Total number of matches that have the same above >90% match to the genome |
The default strand files assume that your genotype calling algorithm has exported the allele calls aligned to the Illumina TOP strand, this is usually the case. However if this is not the case for your data file e.g. as determined by Chipendium or by comparing the TOP strand alleles in a strand file to your data then the most likely alternative is using the alleles derived from the Source Strand. These files and those aligned to the Illumina Strand can also be found here (link to strand):
A script developed by Neil Robertson for updating the
chromosome, position and strand of binary ped files using
these strand and position files can be downloaded here:
update_build.sh
Usage is:
update_build.sh <bed-file-stem> <strand-file>
<output-file-stem>
where:
<bed-file-stem> | is the name of your binary ped set minus the .bed, .bim or .fam extension |
<strand-file> | is appropriate strand file for you chip and current strand orientation (TOP, SOURCE, ILMN) |
<output-file-stem> | is the name of the new output file to create again minus the .bed, .bim or .fam extension |
If your genotype calls are represented as A/B then the files on this tab below allow you to update the alleles from A/B to TOP strand A, C, G, T calls suitable for use with the stran files on this page, if the chip you are using is not listed contact me and I can create these files.
Files listing the Reference and Alternate allele mappings for the Illumina arrays can be found on
this tab below
These are in plink format for use with the
--A1-allele and --keep-reference-allele plink options
These files assume the data are aligned to the TOP strand.
Mouse of the array of interest to view/download the data on the different genome builds