Convert PEDMAP to impute2 haps legend user Kantale

From PyPedia
Jump to: navigation, search

Contents

[edit] Documentation

Converts from plink's PED and MAP format to impute2 haps and legend format. Haps and legend is used by impute2 to encode imputation reference datasets. Since haps stores haplotypes you have to nsure that PED stores haplotypes as well. According to: https://mathgen.stats.ox.ac.uk/impute/impute_v1.html : haps is a file containing a set of known haplotypes for the region of interest. The alleles of the haplotypes should be coded as 0 and 1. The format of this input file is one line per SNP and one column per haplotype. Legend for haplotypes file which give rs ID, position and the alleles that are coded as 0 and 1 in the haplotypes file. The alleles should be taken from A, C, G and T. Note that this file needs a header line.


[edit] Parameters

<inputs>
</inputs>


[edit] Return

[edit] See also

[edit] Code

def Convert_PEDMAP_to_impute2_haps_legend_user_Kantale(
	input_map_filename = None,
	input_ped_filename = None,
	output_haps_filename = None,
	output_legend_filename = None,
):

	print "Loading ped and maps.."
	ped_lines = [x.replace("\n", "").split()  for x in open(input_ped_filename).readlines()]
	map_lines = [x.replace("\n", "").split() for x in open(input_map_filename).readlines()]
	print "..DONE"

	samples = len(ped_lines)
	snps = len(map_lines)


	print "Saving haps, legend file.."
	output_haps_file = open(output_haps_filename, "w")
	output_legend_file = open(output_legend_filename, "w")
	output_legend_file.write("rsID position a0 a1\n")
	snp_alleles = {}
	for snp in xrange(snps):
		first_in = False
		second_in = False
		legend_line = [map_lines[snp][1], map_lines[snp][3]]
		if not snp % 10000: print snp, "/", snps
		line = []
		for sample in xrange(samples):
			first_1 = ped_lines[0][(snp*2) + 6]
			first_2 = ped_lines[0][((snp*2)+1) + 6]

			current_1 = ped_lines[sample][(snp*2) + 6]
			current_2 = ped_lines[sample][((snp*2)+1) + 6]

			for current in [current_1, current_2]:
				if current == "0":
					line += ["?"] # Unknown values are supported
					#raise Exception("'0' genotype found in ped file. haps / legend format of impute2 does not support unknown genotype..")
				elif current == first_1:
					line += ["0"]
					if not first_in:
						first_in = True
						legend_line += [first_1]
				else:
					if not second_in:
						second_in = True
						legend_line += [current]
				 	line += ["1"]
		if not second_in:
			print "Problem in ", map_lines[snp]
		output_haps_file.write(str.join(" ", line) + "\n")
		output_legend_file.write(str.join(" ", legend_line) + "\n")
	output_haps_file.close()
	output_legend_file.close()
	print "..DONE"

[edit] Unit Tests

def uni1():
	return True

[edit] Development Code

def Convert_PEDMAP_to_impute2_haps_legend_user_Kantale():
	pass

[edit] Permissions

[edit] Documentation Permissions

Kantale

[edit] Code Permissions

Kantale

[edit] Unit Tests Permissions

Kantale

[edit] Permissions Permissions

Kantale

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox