Protein Structure Initiative
Print Page | Close Window

TargetDB Statistics Summary Report

                          Last updated: Nov 7 2008


Target Status Statistics

Total number of targets deposited by worldwide SG Centers in TargetDB: 200026

Table 1: TargetDB Status Statistics

Status Total Number of Targets (%) Relative to "Cloned" Targets(%) Relative to "Expressed" Targets(%) Relative to "Purified" Targets (%) Relative to "Crystallized" Targets
Cloned132815100.0---
Expressed8866366.8100.0--
Soluble3428525.838.7--
Purified3095823.334.9100.0-
Crystallized111738.412.636.1100.0
Diffraction-quality Crystals55974.26.318.150.1
Diffraction50493.85.716.345.2
NMR Assigned17751.32.05.7-
HSQC33902.63.811.0-
Crystal Structure40783.14.613.236.5
NMR Structure16911.31.95.5-
In PDB159924.56.819.439
Work Stopped33964-- --
Test Target61-- --
Other8054-- --

Last updated: Nov 7 2008

Note 1:   Number of targets with status "in PDB". A target may reference several PDB IDs (example: structure of the same polypeptides with different ligands). Multiple targets in TargetDB may identify the same PDB structure when a stucture is a result of collaboration between different centers and each center includes the target on its target list.

Figure 1: Experimental Status in TargetDB

Last updated: Nov 7 2008

This graph is normalized relative to number of cloned targets in TargetDB.
Targets that progressed to status "Cloned" constitute 66% of TargetDB.

Table 2: TargetDB Status Statistics by Organism

Organism Total Number1 Work Stopped Cloned Expressed Purified Crystallized Crystal Structure NMR Structure In PDB2
Total Viruses6851173572371313427832
Archaea1397120441019669653204126361048701
Bacteria11895315180817476018721230842729541963342
Total Prokaryotes13292417224919436715224434969035642444043
Yeast265166619721372797116551453
Plasmodium5201336295812642016719019
Trypanosoma6420793974193030159908
Leishmania95972884575221040414621017
Arabidopsis812653924029127932682365388
Rice13410112662124101
Nematode1507534641264555964269930738
Fly93427117194425336
Mouse24379261819144576320668267336
Human12866443966774853273352716110791243
Other Eukaryotes274066115101121358117751495
Total Eukaryotes661811662340456212266363142847814371904
Synthetic303331123
Unknown101100000
Total19979433964132760886193093111153407016915982

Last updated: Nov 7 2008

Note 1:   Total counts in this table may differ from total number of targets. If targtet is a hybrid complex
(for example:a complex of human and mouse polypeptides) it is counted in different organism classifications.

Note 2:   Number of targets with status "in PDB". A target may reference several PDB IDs (example: structure of the same polypeptides with different ligands).
Multiple targets in TargetDB may identify the same PDB structure when a stucture is a result of collaboration between different centers and each center includes
the target on its target list.

Figure 2: Source Organisms in TargetDB

Last updated: Nov 7 2008 back to top


Deposited Structure Statistics

Number of released X-Ray structures reported to TargetDB: 4448

Number of released NMR structures reported to TargetDB: 1713

Number of released Cryo-Electron Microscopy structures reported to TargetDB: 3

Total number of released structures from worldwide SG Centers reported to TargetDB: 6164

View list of all reported to TargetDB structures deposited by worldwide SG Centers to the PDB

Table 3: PDB Status Statistics for Structural Genomics Structures

StatusAll CentersPSI CentersNon-PSI SG Centers in North America SG Centers in EuropeSG Centers in Asia
Total Deposited625532931511362687
Released616432441441302658
Release on Publication2801027
Release on Certain Date00000
In Process6349662
Last updated: Nov 7 2008
1:   Some PDB IDs are cross referenced by different centers. Example: PDB_id 106Y is associated with SPINE and TB centers. Therefore difference between number of structures in "ALL Centers" column and direct sum of number of structures from projects/geographical regions can be observed.
2:   "Total Deposited" are all structures in the PDB including structures released to the public and structures that are in the process to be released("Released on Publication" , "Released on Certain Date", etc.).

Figure 3: Structures Released by SG Centers by Year

Last updated: Nov 7 2008 back to top


Sequence Redundancy Statistics

Table 4: TargetDB Sequence Redundancy Statistics by Experimental Status

Sequence Identity(%)Novel Targets
Status:
Selected
Novel Targets
Status:
Cloned
Novel Targets
Status:
Expressed
Novel Targets
Status:
Purified
Novel Targets
Status:
Crystallized
Novel Targets
Status:
Crystal Structure
Novel Targets
Status:
NMR Structure
Novel Targets
Status:
in PDB
<1001266729092260928237019052348815664875
<901160748518457312224088643332315464685
<701052027862153545212598429327714224527
<50862556665846023186487826312112694214
<3048127405692886712470602225778943295
Last updated: 08-04-08
Sequence redundancy is calculated by clustering analysis using BLASTClust program with similarity threshold set to percent of sequence identity.   Please view detailed explanation of sequence redundancy calculations and BLASTClust threshold settings.  Sequence redundancy calculations are based on comparison to all protein sequences in TargetDB which are in the same experimental status category and at least 20 amino acids long

Table 5: Sequence Redundancy Statistics for Structures Released by SG Centers in the PDB by Year

YearReleased Structures Number of Released Structures <30% Sequence Identity at Time of Release Percent(%) of Released Structures <30% Sequence Identity at Time of Release
<= 2000853136
2001602440
20021525738
200338815239
200491038142
2005102336436
2006108845041
2007155657937
200890244049
Total6164247840
Last updated: 08-11-07
Sequence redundancy is calculated by clustering analysis using BLASTClust program with similarity threshold set to percent of sequence identity.   Please view detailed explanation of sequence redundancy calculations and BLASTClust threshold settings.  Sequence redundancy calculations are based on comparison to all protein sequences in the PDB which are at least 20 amino acids long

Figure 4: Comparison of Novel Structures with Number of Structures Released By SG Centers

Last updated: 08-11-07
Sequence redundancy is calculated by clustering analysis using BLASTClust program with similarity threshold set to percent of sequence identity.   Please view detailed explanation of sequence redundancy calculations and BLASTClust threshold settings.  Sequence redundancy calculations are based on comparison to all protein sequences in the PDB which are at least 20 amino acids long
back to top

Summary Statistics Reports by Project or Geographical Region:

© RCSB PDB