Fred2.IO Module¶
IO.ADBAdapter¶
IO.EnsemblAdapter¶
IO.FileReader¶
-
Fred2.IO.FileReader.read_annovar_exonic(annovar_file, gene_filter=None, experimentalDesig=None)¶ Reads an gene-based ANNOVAR output file and generates
Variantobjects containing all annotatedTranscriptids an outputs a listVariant.Parameters: - annovar_file (str) – The path ot the ANNOVAR file
- gene_filter (list(str)) – A list of gene names of interest (only variants associated with these genes are generated)
Returns: List of :class:`~Fred2.Core.Variant.Variants fully annotated
Return type: list(
Variant)
-
Fred2.IO.FileReader.read_fasta(files, type=<class 'Fred2.Core.Peptide.Peptide'>, id_position=1)¶ Generator function:
Read a (couple of) peptide, protein or rna sequence from a FASTA file. User needs to specify the correct type of the underlying sequences. It can either be: Peptide, Protein or Transcript (for RNA).
Parameters: - files (list(str) or str) – A (list) of file names to read in
- type (
PeptideorTranscriptorProtein) – The type to read in - id_position (int) – the position of the id specified counted by |
Returns: a list of the specified sequence type derived from the FASTA file sequences.
Return type: (list(
type))Raises ValueError: if a file is not readable
-
Fred2.IO.FileReader.read_lines(files, type=<class 'Fred2.Core.Peptide.Peptide'>)¶ Generator function:
Read a sequence directly from a line. User needs to manually specify the correct type of the underlying data. It can either be: Peptide, Protein or Transcript, Allele.
Parameters: - files (list(str) or str) – a list of strings of absolute file names that are to be read.
- type (
PeptideorProteinorTranscriptorAllele) – Possible types arePeptide,Protein,Transcript, andAllele.
Returns: A list of the specified objects
Return type: (list(
type))Raises IOError: if a file is not readable
IO.MartsAdapter¶
-
class
Fred2.IO.MartsAdapter.MartsAdapter(usr=None, host=None, pwd=None, db=None, biomart=None)¶ Bases:
Fred2.IO.ADBAdapter.ADBAdapter-
get_all_variant_gene(locations, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported
-
get_all_variant_ids(**kwargs)¶ Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:
- ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
- ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
- ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters: - 'locations' – list of locations as triplets of integer values representing (chrom, start, stop)
- 'genes' – list of genes as string value of the genes of variation
Returns: The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)
-
get_product_sequence(product_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name
-
get_protein_sequence_from_protein_id(**kwargs)¶ Returns the protein sequence for a given protein ID that can either be refeseq, uniprot or ensamble id
Parameters: kwargs – Returns:
-
get_transcript_information(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ It also already uses the Field-Enum for DBAdapters
Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
get_transcript_information_from_protein_id(**kwargs)¶ It also already uses the Field-Enum for DBAdapters
Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
get_transcript_position(start, stop, gene_id, transcript_id, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ If no transcript position is available for the variant :param start: :param stop: :param gene_id: :param transcript_id: :param _db: :param _dataset: :return:
-
get_transcript_sequence(transcript_refseq, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
get_variant_gene(chrom, start, stop, _db='hsapiens_gene_ensembl', _dataset='gene_ensembl_config')¶ Fetches the important db ids and names for given chromosomal location :param chrom: integer value of the chromosome in question :param start: integer value of the variation start position on given chromosome :param stop: integer value of the variation stop position on given chromosome :return: The respective gene name, i.e. the first one reported
-
get_variant_id_from_gene_id(**kwargs)¶ returns all information needed to instantiate a variation
Parameters: trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN) Returns: list of dicts – containing all information needed for a variant initialization
-
get_variant_id_from_protein_id(**kwargs)¶ returns all information needed to instantiate a variation
Parameters: trans_id – A transcript ID (either ENSAMBLE (ENS) or RefSeq (NM, XN) Returns: list of dicts – containing all information needed for a variant initialization
-
get_variant_ids(**kwargs)¶ Fetches the important db ids and names for given gene _or_ chromosomal location. The former is recommended. AResult is a list of dicts with either of the tree combinations:
- ‘Ensembl Gene ID’, ‘Ensembl Transcript ID’, ‘Ensembl Protein ID’
- ‘RefSeq Protein ID [e.g. NP_001005353]’, ‘RefSeq mRNA [e.g. NM_001195597]’, first triplet
- ‘RefSeq Predicted Protein ID [e.g. XP_001720922]’, ‘RefSeq mRNA predicted [e.g. XM_001125684]’, first triplet
Parameters: - 'chrom' – integer value of the chromosome in question
- 'start' – integer value of the variation start position on given chromosome
- 'stop' – integer value of the variation stop position on given chromosome
- 'gene' – string value of the gene of variation
- 'transcript_id' – string value of the gene of variation
Returns: The list of dicts of entries with transcript and protein ids (either NM+NP or XM+XP)
-
IO.RefSeqAdapter¶
-
class
Fred2.IO.RefSeqAdapter.RefSeqAdapter(prot_file=None, prot_vers=None, mrna_file=None, mrna_vers=None)¶ Bases:
Fred2.IO.ADBAdapter.ADBAdapter-
get_product_sequence(product_refseq)¶ fetches product sequence for the given id :param product_refseq: given refseq id :return: list of dictionaries of the requested sequence, the respective strand and the associated gene name
-
get_transcript_information(transcript_refseq)¶
-
get_transcript_sequence(transcript_refseq)¶ Fetches transcript sequence for the given id :param transcript_refseq: :return: list of dictionary of the requested sequence, the respective strand and the associated gene name
-
load(filename)¶
-
IO.UniProtAdapter¶
-
class
Fred2.IO.UniProtAdapter.UniProtDB(name='fdb')¶ -
exists(seq)¶ fast check if given sequence exists (as subsequence) in one of the UniProtDB objects collection of sequences.
Parameters: seq – the subsequence to be searched for Returns: True, if it is found somewhere, False otherwise
-
read_seqs(sequence_file)¶ read sequences from uniprot files (.dat or .fasta) or from lists or dicts of BioPython SeqRecords and make them available for fast search. Appending also with this function.
Parameters: sequence_file – uniprot files (.dat or .fasta) Returns:
-
search(seq)¶ search for first occurrence of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of the first occurrence.
Parameters: seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences Returns: a dictionary of sequences to lists (of ids, ‘null’ if n/a)
-
search_all(seq)¶ search for all occurrences of given sequence(s) in the UniProtDB objects collection returning (each) the fasta header front part of all occurrences.
Parameters: seq – a string interpreted as a single sequence or a list (of str) interpreted as a coll. of sequences Returns: a dictionary of the given sequences to lists (of ids, ‘null’ if n/a)
-
write_seqs(name)¶ writes all fasta entries in the current object into one fasta file
Parameters: name – the complete path with file name where the fasta is going to be written
-