Google Scholar

iPEP v2.51     Today is


"The value and utility of any experiment are determined by the fitness of the material to the purpose for which it is used, and thus in the case before us it cannot be immaterial what plants are subjected to experiment and in what manner such experiment is conducted. "-Gregor Mendel
  1. Introduction
  2. Data source
  3. Digestions
  4. Workflow / Functionality
  5. Terminology
  6. Measurement
  7. Getting started


iPEP is a web-based application developed to compare the effectiveness of different proteolytic digests. Peptide populations can be examined to optimize detection of certain groups of proteins relative to the proteome and the digested peptidome. The application reports proteolytic peptide sequences, theoretical molecular weights, and functional annotations using Gene Ontology (GO) terms. The iPEP tool can improve experimental design by maximizing the detectable proteins and consensus sites of research interest for large scale proteomics assays.

Data source

SwissProt Human protein data dated 07/20/2007 is used as a test dataset for the development of iPEP. There are total 15939 proteins.


Twenty-two digestions are available for iPEP. They can be grouped into 4 categories based on the cleavage sites: basic group, acid group, hydrophobic group and combos.

Enzyme or ReagentCleavageException
TrypsinC-terminal side of K or Rif P is C-terminal to K or R
Arg CC-terminal side of Rif P is C-term to R
Lys CC-terminal side of K
Lys NN-terminal side of K
AcidC-terminal side of D
Asp NN-terminal side of D
V8 phosphateC-terminal side of D or Eif P is C-term to D or E, or if E is C-term to D or E
V8 bicarbonateC-terminal side of Eif P is C-term to E, or if E is C-term to E
Pepsin (PH 1.3)C-terminal side of F, L
Pepsin (PH 2.0)C-terminal side of F, L, W, Y, A, E, Q
Proteinase KC-terminal side of A, F, Y, W, L, I, V
ThermolysinN-terminal side of A, F, I, L, M, Vif D or E is N-term to A, F, I, L, M, V
Chymo (F/Y/W/M/L)C-terminal side of F, Y, W
Chymo (F/Y/W)C-terminal side of F, L, M, W, Y
CNBrC-terminal side of M
IBAC-terminal side of W
NTCBN-terminal side of C
v8_LysCC-terminal side of D, E and Kif P is C-term to D or E, or if E is C-term to D or E
AspN_ nGluN-terminal side of D or E
AspN_LysCN-terminal side of D or E
Trypsin_ChymoC-terminal side of K, R, F, Y, Wif P is C-term to K, R, F, Y, W, if P is N-terminal to Y
IBA_CNBrC-terminal side of M, W

Workflow / Functionalities

  1. Single protein analysis

    A single protein can be digested with all 22 enzymes/reagents. The peptide, sequence and residue coverages, the detectability of residue of interest can be compared accross differnt digestions and visualized in table and chart views.

  2. Protein level tracking analysis

    Any motif of interest in Prosite format can be searched in the whole proteome digested with 2 emzymes/reagents selected. A motified residue within the motif can be specified based on its position and searched within the proteome. The multiple measures such as proteome complexity, peptidome complexity, fraction of proteome and fraction of peptidome can be caluclated and compared for enzymes/reagents selected. Limiting factors defined by different instruments such as molecular weight, basic residues, are used to test the detectability of proteins, peptides containing motif and residues specified. The protein datasets containing the motif after each filter can be downloaded to FASTA format for further analysis and the Gene Ontology terms contributed by the subset of proteins are ranked to provide further insight to the molecular functions, biological process and cellular components information of these proteins. The proteins obtained from two digestions at each filter can be compared by calculating the set differences such as union, intersect to provide more information on the unique and complementary contributions of each enzyme and enzyme combinations.



A motif is a short sequence of amino acids which has
  1. exact match to a given short sequence
  2. exact match to a consensus regular expression (C-[GAVLIP]-[GAVLIP]-X)
A motif can be represented in a regular expression in prosite format, for example: CAAX motif at C-terminal can be expressed as C-[GAVLIP]-[GAVLIP]-X->
  1. Each residue must be separated by - (minus).
  2. X represents any amino acid.
  3. [DE] means either D or E.
  4. {FWY} means any amino acid except for F, W and Y.
  5. A(2,3) means that A appears 2 to 3 times consecutively.
  6. < means N-terminal
  7. > means C-terminal
  8. The pattern string must be terminated with . (period)

GO Enrichment Factor

  1. Indicating the significance of a GO term in a given subset of proteins.
  2. REF is the ratio of the prevalence of a given GO term in subset of proteins and in total proteins.
  3. E-value=(mi/M)/(ni/N))
    • mi is the number of proteins in a subset associated with a GO term
    • M is total number of proteins in a subset (detectable).
    • ni is the number of proteins associated with a GO term in the whole proteome
    • N is total number of proteins in the whole proteome (Reference)


  1. Complexity of proteome: Number of protein containing motif / Total number of proteins
  2. Efficiency of protein detection: Number of proteins in detectable range / Total number of proteins
  3. Detectable range proteome complexity: Number of proteins containg motif in detectable range / Total number of proteins in detectable range

  1. Complexity of peptidome: Number of peptides containing motif / Total number of peptides
  2. Efficiency of detecting the primary and secondary motifs using proteomics experiments with digestion
  3. Complexity of peptide: Number of peptide containing motif / Total number of peptides
  4. Efficiency of peptide detection: Number of peptides in detectable range / Total number of peptides
  5. Detectable range peptide complexity: Number of peptides containing motif (peptide) in detectable range / Total number of peptides in detectable range

Getting Started

  1. How to use single protein analysis?
  2. How to use protein level tracking?