Convert PEDMAP to impute2 haps legend user JohnDoe
Contents |
[edit] Documentation
Converts from plink's PED and MAP format to impute2 haps and legend format. Haps and legend is used by impute2 to encode imputation reference datasets. Since haps stores haplotypes you have to nsure that PED stores haplotypes as well. According to: https://mathgen.stats.ox.ac.uk/impute/impute_v1.html : haps is a file containing a set of known haplotypes for the region of interest. The alleles of the haplotypes should be coded as 0 and 1. The format of this input file is one line per SNP and one column per haplotype. Legend for haplotypes file which give rs ID, position and the alleles that are coded as 0 and 1 in the haplotypes file. The alleles should be taken from A, C, G and T. Note that this file needs a header line.
[edit] Parameters
<inputs> </inputs>
[edit] Return
[edit] See also
[edit] Code
def Convert_PEDMAP_to_impute2_haps_legend_user_JohnDoe( input_map_filename = None, input_ped_filename = None, output_haps_filename = None, output_legend_filename = None, ): print "Loading ped and maps.." ped_lines = [x.replace("\n", "").split() for x in open(input_ped_filename).readlines()] map_lines = [x.replace("\n", "").split() for x in open(input_map_filename).readlines()] print "..DONE" samples = len(ped_lines) snps = len(map_lines) print "Saving haps, legend file.." output_haps_file = open(output_haps_filename, "w") output_legend_file = open(output_legend_filename, "w") output_legend_file.write("rsID position a0 a1\n") snp_alleles = {} for snp in xrange(snps): first_in = False second_in = False legend_line = [map_lines[snp][1], map_lines[snp][3]] if not snp % 10000: print snp, "/", snps line = [] for sample in xrange(samples): first_1 = ped_lines[0][(snp*2) + 6] first_2 = ped_lines[0][((snp*2)+1) + 6] current_1 = ped_lines[sample][(snp*2) + 6] current_2 = ped_lines[sample][((snp*2)+1) + 6] for current in [current_1, current_2]: if current == "0": raise Exception("'0' genotype found in ped file. haps / legend format of impute2 does not support unknown genotype..") elif current == first_1: line += ["0"] if not first_in: first_in = True legend_line += [first_1] else: if not second_in: second_in = True legend_line += [current] line += ["1"] if not second_in: print "Problem in ", map_lines[snp] output_haps_file.write(str.join(" ", line) + "\n") output_legend_file.write(str.join(" ", legend_line) + "\n") output_haps_file.close() output_legend_file.close() print "..DONE"
[edit] Unit Tests
def uni1(): return True
[edit] Development Code
def Convert_PEDMAP_to_impute2_haps_legend_user_JohnDoe(): pass
[edit] Permissions
[edit] Documentation Permissions
JohnDoe
[edit] Code Permissions
JohnDoe
[edit] Unit Tests Permissions
JohnDoe