Convert impute2 to PEDMAP
From PyPedia
Contents |
[edit] Documentation
Converts from legend and haplotype file of impute2 (https://mathgen.stats.ox.ac.uk/impute/impute_v2.html) to PED and MAP files (http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped)
Parameters:
- chromosome: string, the chromosome.
- legend_file: string, the legend filename.
- haplotypes_file: string, the hap filename.
- sample_names : a Python list with the names of the samples in the haplotypes file. (Optional, if not set the default value is: 0,1,2,3, ..).
- sample_names_filename : filename with one sample name per line (Optional).
- family_id: a Python list with the family ids of the samples. (Optional, if not set the default value is: 1 for all the samples).
- family_id_filename : filename with one family id per line (Optional).
- p_id: a Python list with the paternal id. (Optional, if not set the default value is: 0 for all the samples).
- p_id_filename : filename with one parent_id per line (Optional).
- m_id: a Python list with the maternal id. (Optional, if not set the default value is: 0 for all the samples).
- m_id_filename: filename with one maternal_id per line (Optional).
- gender: a Python list with the gender id. (Optional, if not set the default value is: 3 for all the samples).
- gender_filenane: filename with one gender id per line (Optional)
- pheno: a Python list with the pheno id. (Optional, if not set the default value is: 0 for all the samples).
- pheno_filename: filename with one pheno id per line (Optional)
- output: The prefix of the output filenames. i.e. if output is "results" then the output files will be output.ped and output.map. If not defined then the basename of the legend_file will be used.
For more info about the famility_id, p_id, m_id, gender and pheno values check the definition of the PED format: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped
[edit] Parameters
<inputs> </inputs>
[edit] Return
A tuple with the filenames if the generated ped and map files.
[edit] See also
- User articles: Convert_impute2_to_PEDMAP_user_Kantale
[edit] Code
import os def Convert_impute2_to_PEDMAP( chromosome = None, legend_file = None, haplotypes_file = None, sample_names = None, sample_names_filename = None, family_id = None, family_id_filename = None, p_id = None, p_id_filename = None, m_id = None, m_id_filename = None, gender = None, gender_filename = None, pheno = None, pheno_filename = None, output = None, ): legend = open(legend_file).readlines()[1:] # Skip first line from legend haplotypes = open(haplotypes_file).readlines() snps = len(legend) if not output: output = os.path.splitext(legend_file)[0] map_filename = output + ".map" ped_filename = output + ".ped" map_file = open(map_filename, "w") ped = {} read_filename_lines = lambda filename : [x.replace("\n", "") for x in open(filename, 'U').readlines()] if filename else None if not sample_names: sample_names = read_filename_lines(sample_names_filename) if not family_id: family_id = read_filename_lines(family_id_filename) if not p_id: p_id = read_filename_lines(p_id_filename) if not m_id: m_id = read_filename_lines(m_id_filename) if not gender: gender = read_filename_lines(gender_filename) if not pheno: phrno = read_filename_lines(pheno_filename) print "Saving map file.." for snp_index, pair in enumerate(zip(legend, haplotypes)): l_splitted = [x for x in pair[0].replace("\n", "").split() if len(x) > 0] h_splitted = [x for x in pair[1].replace("\n", "").split() if len(x) > 0] for individual_id, genotype in [(index/2, (h_splitted[index], h_splitted[index+1])) for index, value in enumerate(h_splitted) if index % 2 == 0]: if not ped.has_key(individual_id): ped[individual_id] = {} if family_id: ped[individual_id]["family_id"] = family_id[individual_id] else: ped[individual_id]["family_id"] = 1 if sample_names: ped[individual_id]["ind_id"] = sample_names[individual_id] else: ped[individual_id]["ind_id"] = individual_id if p_id: ped[individual_id]["p_id"] = p_id[individual_id] else: ped[individual_id]["p_id"] = 0 if m_id: ped[individual_id]["m_id"] = m_id[individual_id] else: ped[individual_id]["m_id"] = 0 if gender: ped[individual_id]["gender"] = gender[individual_id] else: ped[individual_id]["gender"] = 3 if pheno: ped[individual_id]["pheno"] = pheno[individual_id] else: ped[individual_id]["pheno"] = 0 ped[individual_id]["geno"] = [] if genotype[0] == "0": allele1 = l_splitted[2] elif genotype[0] == "1": allele1 = l_splitted[3] else: raise Exception("Invalid format in phased file") if genotype[1] == "0": allele2 = l_splitted[2] elif genotype[1] == "1": allele2 = l_splitted[3] else: raise Exception("Invalid format in phased file") ped[individual_id]["geno"] += [ allele1, allele2 ] if not snp_index % 1000: print snp_index, "/", snps map_file.write( str.join("\t", [str(chromosome), l_splitted[0], "0", l_splitted[1]]) + "\n") map_file.close() print "..DONE saving map file" print "Saving ped file.." individuals = len(ped) ped_file = open(ped_filename, "w") for ind, value in ped.iteritems(): print "Writing individual:", ind, "/", individuals ped_file.write(str.join("\t", [str(x) for x in [value["family_id"], value["ind_id"], value["p_id"], value["m_id"], value["gender"], value["pheno"]] + value["geno"]]) + "\n") ped_file.close() print "..DONE saving ped file" return (ped_filename, map_filename)
[edit] Unit Tests
def uni1(): return True
[edit] Development Code
def Convert_impute2_to_PEDMAP(): pass
[edit] Permissions
[edit] Documentation Permissions
Kantale
[edit] Code Permissions
Kantale
[edit] Unit Tests Permissions
Kantale