Fred2.IO Module¶

IO.ADBAdapter¶

IO.EnsemblAdapter¶

IO.FileReader¶

Fred2.IO.FileReader.read_annovar_exonic(annovar_file, gene_filter=None, experimentalDesig=None)¶

Reads an gene-based ANNOVAR output file and generates Variant objects containing all annotated Transcript ids an outputs a list Variant.

Parameters:	annovar_file (str) – The path ot the ANNOVAR file gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns:	List of :class:`~Fred2.Core.Variant.Variants fully annotated
Return type:	list(`Variant`)

Fred2.IO.FileReader.read_fasta(files, type=<class 'Fred2.Core.Peptide.Peptide'>, id_position=1)¶

Generator function:

Read a (couple of) peptide, protein or rna sequence from a FASTA file. User needs to specify the correct type of the underlying sequences. It can either be: Peptide, Protein or Transcript (for RNA).

Raises ValueError:
Parameters:	files (list(str) or str) – A (list) of file names to read in type (`Peptide` or `Transcript` or `Protein`) – The type to read in id_position (int) – the position of the id specified counted by \|
Returns:	a list of the specified sequence type derived from the FASTA file sequences.
Return type:	(list(`type`))
	if a file is not readable

Fred2.IO.FileReader.read_lines(files, type=<class 'Fred2.Core.Peptide.Peptide'>)¶

Generator function:

Read a sequence directly from a line. User needs to manually specify the correct type of the underlying data. It can either be: Peptide, Protein or Transcript, Allele.

Parameters:	files (list(str) or str) – a list of strings of absolute file names that are to be read. type (`Peptide` or `Protein` or `Transcript` or `Allele`) – Possible types are `Peptide`, `Protein`, `Transcript`, and `Allele`.
Returns:	A list of the specified objects
Return type:	(list(`type`))
Raises IOError:	if a file is not readable

IO.MartsAdapter¶

class Fred2.IO.MartsAdapter.MartsAdapter(usr=None, host=None, pwd=None, db=None, biomart=None)¶

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_all_variant_gene(locations, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶: Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported

get_all_variant_ids(**kwargs)¶

Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’

‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet

‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet

Parameters:	'locations' – list of locations as triplets of integer values representing (chrom, start, stop) 'genes' – list of genes as string value of the genes of variation
Returns:	The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)

get_product_sequence(product_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶: fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name

get_protein_sequence_from_protein_id(**kwargs)¶

Returns the protein sequence for a given protein ID that can either be refeseq, uniprot or ensamble id

Parameters:	kwargs –
Returns:

get_transcript_information(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶

It also already uses the Field-Enum for DBAdapters

Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

get_transcript_information_from_protein_id(**kwargs)¶

It also already uses the Field-Enum for DBAdapters

Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

get_transcript_position(start, stop, gene_id, transcript_id, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶: If no transcript position is available for the variant :param start: :param stop: :param gene_id: :param transcript_id: :param _db: :param _dataset: :return:

get_transcript_sequence(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶: Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

get_variant_gene(chrom, start, stop, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶: Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported

get_variant_id_from_gene_id(**kwargs)¶

returns all information needed to instantiate a variation

Parameters:	trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN)
Returns:	list of dicts – containing all information needed for a variant initialization

get_variant_id_from_protein_id(**kwargs)¶

returns all information needed to instantiate a variation

Parameters:	trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN)
Returns:	list of dicts – containing all information needed for a variant initialization

get_variant_ids(**kwargs)¶

Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:

‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’

‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet

‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet

Parameters:	'chrom' – integer value of the chromosome in question 'start' – integer value of the variation start position on given chromosome 'stop' – integer value of the variation stop position on given chromosome 'gene' – string value of the gene of variation 'transcript_id' – string value of the gene of variation
Returns:	The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)

IO.RefSeqAdapter¶

class Fred2.IO.RefSeqAdapter.RefSeqAdapter(prot_file=None, prot_vers=None, mrna_file=None, mrna_vers=None)¶

Bases: Fred2.IO.ADBAdapter.ADBAdapter

get_product_sequence(product_refseq)¶: fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name

get_transcript_information(transcript_refseq)¶

get_transcript_sequence(transcript_refseq)¶: Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name

load(filename)¶

IO.UniProtAdapter¶

class Fred2.IO.UniProtAdapter.UniProtDB(name='fdb')¶

exists(seq)¶

fast check if given sequence exists (as subsequence) in one of the UniProtDB objects collection of sequences.

Parameters:	seq – the subsequence to be searched for
Returns:	True, if it is found somewhere, False otherwise

read_seqs(sequence_file)¶

read sequences from uniprot files (.dat or .fasta) or from lists or dicts of BioPython SeqRecords and make them available for fast search. Appending also with this function.

Parameters:	sequence_file – uniprot files (.dat or .fasta)
Returns:

search(seq)¶

search for first occurrence of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of the first occurrence.

Parameters:	seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:	a dictionary of sequences to lists (of ids, ‘null’ if n/a)

search_all(seq)¶

search for all occurrences of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of all occurrences.

Parameters:	seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences
Returns:	a dictionary of the given sequences to lists (of ids, ‘null’ if n/a)

write_seqs(name)¶

writes all fasta entries in the current object into one fasta file

Parameters:	name – the complete path with file name where the fasta is going to be written