iPEP

Links

iPEP v2.51 Today is

Documentation

"The value and utility of any experiment are determined by the fitness of the material to the purpose for which it is used, and thus in the case before us it cannot be immaterial what plants are subjected to experiment and in what manner such experiment is conducted. "-Gregor Mendel

Introduction
Data source
Digestions
Workflow / Functionality
Terminology
Measurement
Getting started

Introduction

iPEP is a web-based application developed to compare the effectiveness of different proteolytic digests. Peptide populations can be examined to optimize detection of certain groups of proteins relative to the proteome and the digested peptidome. The application reports proteolytic peptide sequences, theoretical molecular weights, and functional annotations using Gene Ontology (GO) terms. The iPEP tool can improve experimental design by maximizing the detectable proteins and consensus sites of research interest for large scale proteomics assays.

Data source

SwissProt Human protein data dated 07/20/2007 is used as a test dataset for the development of iPEP. There are total 15939 proteins.

Digestion

Twenty-two digestions are available for iPEP. They can be grouped into 4 categories based on the cleavage sites: basic group, acid group, hydrophobic group and combos.

Enzyme or Reagent	Cleavage	Exception
Trypsin	C-terminal side of K or R	if P is C-terminal to K or R
Arg C	C-terminal side of R	if P is C-term to R
Lys C	C-terminal side of K
Lys N	N-terminal side of K
Acid	C-terminal side of D
Asp N	N-terminal side of D
V8 phosphate	C-terminal side of D or E	if P is C-term to D or E, or if E is C-term to D or E
V8 bicarbonate	C-terminal side of E	if P is C-term to E, or if E is C-term to E
Pepsin (PH 1.3)	C-terminal side of F, L
Pepsin (PH 2.0)	C-terminal side of F, L, W, Y, A, E, Q
Proteinase K	C-terminal side of A, F, Y, W, L, I, V
Thermolysin	N-terminal side of A, F, I, L, M, V	if D or E is N-term to A, F, I, L, M, V
Chymo (F/Y/W/M/L)	C-terminal side of F, Y, W
Chymo (F/Y/W)	C-terminal side of F, L, M, W, Y
CNBr	C-terminal side of M
IBA	C-terminal side of W
NTCB	N-terminal side of C
v8_LysC	C-terminal side of D, E and K	if P is C-term to D or E, or if E is C-term to D or E
AspN_ nGlu	N-terminal side of D or E
AspN_LysC	N-terminal side of D or E
Trypsin_Chymo	C-terminal side of K, R, F, Y, W	if P is C-term to K, R, F, Y, W, if P is N-terminal to Y
IBA_CNBr	C-terminal side of M, W

Workflow / Functionalities

Single protein analysis
A single protein can be digested with all 22 enzymes/reagents. The peptide, sequence and residue coverages, the detectability of residue of interest can be compared accross differnt digestions and visualized in table and chart views.
Protein level tracking analysis
Any motif of interest in Prosite format can be searched in the whole proteome digested with 2 emzymes/reagents selected. A motified residue within the motif can be specified based on its position and searched within the proteome. The multiple measures such as proteome complexity, peptidome complexity, fraction of proteome and fraction of peptidome can be caluclated and compared for enzymes/reagents selected. Limiting factors defined by different instruments such as molecular weight, basic residues, are used to test the detectability of proteins, peptides containing motif and residues specified. The protein datasets containing the motif after each filter can be downloaded to FASTA format for further analysis and the Gene Ontology terms contributed by the subset of proteins are ranked to provide further insight to the molecular functions, biological process and cellular components information of these proteins. The proteins obtained from two digestions at each filter can be compared by calculating the set differences such as union, intersect to provide more information on the unique and complementary contributions of each enzyme and enzyme combinations.

Terminology

Motif

A motif is a short sequence of amino acids which has

exact match to a given short sequence
exact match to a consensus regular expression (C-[GAVLIP]-[GAVLIP]-X)

A motif can be represented in a regular expression in prosite format, for example: CAAX motif at C-terminal can be expressed as C-[GAVLIP]-[GAVLIP]-X->

Each residue must be separated by - (minus).
X represents any amino acid.
[DE] means either D or E.
{FWY} means any amino acid except for F, W and Y.
A(2,3) means that A appears 2 to 3 times consecutively.
< means N-terminal
> means C-terminal
The pattern string must be terminated with . (period)

GO Enrichment Factor

Indicating the significance of a GO term in a given subset of proteins.
REF is the ratio of the prevalence of a given GO term in subset of proteins and in total proteins.
E-value=(m_i/M)/(n_i/N))
- m_i is the number of proteins in a subset associated with a GO term
- M is total number of proteins in a subset (detectable).
- n_i is the number of proteins associated with a GO term in the whole proteome
- N is total number of proteins in the whole proteome (Reference)

Measurements

Proteome

Complexity of proteome: Number of protein containing motif / Total number of proteins
Efficiency of protein detection: Number of proteins in detectable range / Total number of proteins
Detectable range proteome complexity: Number of proteins containg motif in detectable range / Total number of proteins in detectable range

Peptidome

Complexity of peptidome: Number of peptides containing motif / Total number of peptides
Efficiency of detecting the primary and secondary motifs using proteomics experiments with digestion
Complexity of peptide: Number of peptide containing motif / Total number of peptides
Efficiency of peptide detection: Number of peptides in detectable range / Total number of peptides
Detectable range peptide complexity: Number of peptides containing motif (peptide) in detectable range / Total number of peptides in detectable range

Getting Started

How to use single protein analysis?
How to use protein level tracking?