This page contains files giving the strand orientation and position of variants of most genotyping arrays. Currently genome builds NCBI35 to GRCh38 are available. New files will be added as new genome builds or arrays are released.

If you have an array that is not listed here and would like me to create the files please contact me, Will Rayner
william dot rayner at helmholtz-munich dot de
and/or
will dot rayner at strand dot org dot uk
and I can create and post the file(s) to this page. If you have the array annotation file (.csv format) that you require strand files for then do share it with me and I can create the files as required.

The Chipendium tool to identify the array and strand orientation from the SNP list in a plink format bim file is currently unavailable but will be returning soon.

The data for each array and genome build combination are all downloadable from the links below, each zip file contains three files, these are:
.strand
.miss
.multiple

.strand file This contains all the variants where the match to the relevant genomic sequence >90%. The strand file contains six columns;

SNP ID SNP ID taken from the array manifest

Chromosome Chromosome as determined from remapping the SNP flanking sequences to the stated genome build

Position Position as determined from remapping the SNP flanking sequences to the stated genome build

%match to genomePercentage match of the flanking sequences to the stated genome build at the position specified

Strand Strand on the reference genome as determined from remapping the TOP strand sequence given in the array annotation file

TOP Alleles TOP alleles given in the array annotation file, useful for double checking that input data are on the TOP strand

The SNP ids used are those from the Illumina annotation file and are not necessarily the latest ones for that position from dbSNP. The alleles listed are the Illumina TOP alleles, if you are in any doubt whether your data file can be used with these strand files a check using the Chipendium software can confirm the id and orientation of you data.

.miss file The .miss file contains the same columns as the strand file and lists all the ids of the SNPs that did not reach the required threshold (>90%) to be considerd a match to the genome, the position and strand of the best match are given here. This file will not contain SNPs that did not map to the genome at all.

.multiple file The .multiple file contains SNPs that had more than one high quality match (>90%) to the genome. This file contains three columns;

SNP ID SNP ID taken from the array manifest

Number of matches >90%Where there is more than one match this is the total number of matches of the flanking sequences to the genome >90%

Number of identical matches >90% Total number of matches that have the same above >90% match to the genome

When there is more than one match the match with the highest overlap is taken for the .strand file and the number of these is reported, this is first numeric column in the file. If there are two, or more, matches of the same highest quality then one is chosen at random and the number of identical matches is reported, this is the second numeric column in the file.

The default strand files assume that your genotype calling algorithm has exported the allele calls aligned to the Illumina TOP strand, this is usually the case. However if this is not the case for your data file e.g. as determined by Chipendium or by comparing the TOP strand alleles in a strand file to your data then the most likely alternative is using the alleles derived from the Source Strand. These files and those aligned to the Illumina Strand can also be found here (link to strand):

Updating the strand and position

A script developed by Neil Robertson for updating the chromosome, position and strand of binary ped files using these strand and position files can be downloaded here:
update_build.sh

Usage is:
update_build.sh <bed-file-stem> <strand-file> <output-file-stem>
where:

<bed-file-stem>	is the name of your binary ped set minus the .bed, .bim or .fam extension
<strand-file>	is appropriate strand file for you chip and current strand orientation (TOP, SOURCE, ILMN)
<output-file-stem>	is the name of the new output file to create again minus the .bed, .bim or .fam extension

The ASHG 2011 Poster describing this work can be downloaded here.

If your genotype calls are represented as A/B then the files on this tab below allow you to update the alleles from A/B to TOP strand A, C, G, T calls suitable for use with the stran files on this page, if the chip you are using is not listed contact me and I can create these files.

Files listing the Reference and Alternate allele mappings for the Illumina arrays can be found on this tab below These are in plink format for use with the --A1-allele and --keep-reference-allele plink options

Top Strand

These files assume the data are aligned to the TOP strand.

Mouse of the array of interest to view/download the data on the different genome builds

Source Strand

These files assume the data are aligned to the Source Strand.

Mouse of the array of interest to view/download the data on the different genome builds

ILMN Strand

These files assume the data are aligned to the ILMN Strand.

Mouse of the array of interest to view/download the data on the different genome builds

Affymetrix Strand

Affymetrix Strand Files

Coming Soon

AB Allele files

These files map the AB Alleles to the TOP strand alleles

Ref/Alt

These files update the alleles to match the Ref/Alt alleles of the relevant human genome build

Chipendium

Illumina data files

Other Allele Sets

Updating the strand and position

Allele Updates

Ref/Alt

Strand Files

Top Strand

Source Strand

ILMN Strand

Affymetrix Strand

Coming Soon

AB Allele files

Ref/Alt

Coming Soon