How many proteins on PDB have unknown function?

How many proteins on PDB have unknown function?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I was wondering how many of the files in the Protein Data Bank (PDB) have unknown function. The only paper I can find from an internet search is this one from 2012, which I assume might be outdated. I'd welcome suggestions of how to find this information for myself.

The article cited in the question indicates that the authors searched the PDB with the term “unknown function”. There is nothing special about this - you just type in the standard search field and hit 'Go'. I conducted a search of this type myself:

Which returned 4384 structures out of approx. 149,600 in the data bank.

Of course, it is evident from the first page of the results that the number of unique proteins is smaller than this because a single study may examine different forms of the same protein.

I admit that I was surprised to find that people had spent time and money determining the structure of so many proteins of unknown function, but it appears that there is at least one blanket initiative to determine the structures of bacterial proteins of unknown functions because of their roles as potential pathogens. The idea would seem to be that they even if their mechanism of pathogenicity is unknown they could still be targeted on the basis of their structure.

Ebola Virus Proteins

At the center of the virus, a nucleocapsid composed of several types of proteins protects the genome. The ebola nucleoprotein wraps around the RNA, creating a helical complex. The interaction between nucleoprotein subunits, however, is not as rigid as in other viruses such as tobacco mosaic virus, so ebola virus often shows a wavy structure. Once inside cells, the large "L" protein, which is an RNA-dependent RNA polymerase, creates many new copies of the RNA genome.

As with the other ebola proteins, the nucleocapsid proteins contain several flexibly-connected domains, so researchers have studied them in parts. The RNA-binding portion of nucleoprotein has been studied by cryoelectron microscopy (PDB entry 5z9w), and x-ray crystallography has been used to study other parts of the protein (PDB entry 4qb0). Several other nucleocapsid proteins, which assist with formation of the structure, have also been studied (PDB entries 3vne, 3fke and 2i8b.)

Moonlighting Proteins

Exploring the Structure

Ebola Glycoprotein and Antibodies (PDB entry 3csy)

Researchers are looking hard for ways to fight infection by ebola, both with drugs and with vaccines. The glycoprotein is the major target for vaccines, since it is on the surface of the virus and is accessible to antibodies. The structure shown here, PDB entry 3csy , includes neutralizing antibodies (in red and orange) from a person who survived infection by the virus. The antibodies bind to the underside of the glycoprotein, to a portion of the protein that is not usually masked by carbohydrates and that is essential for the process of fusion. Hopefully, vaccines will be able to elicit these types of antibodies in patients, protecting them from infection. To explore this structure in more detail, click on the image for an interactive JSmol.

Topics for Further Discussion

  1. Entry 3csy is thought to be the ebola glycoprotein before it binds to a cell surface. You can look at entry 2ebo to see a portion of the glycoprotein after it fuses with the cell.

Related PDB-101 Resources


  1. 5z9w: Y. Sugita, H. Matsunami, Y. Kawaoka, T. Noda & M. Wolf (2018) Cryo-EM structure of the Ebola virus nucleoprotein-RNA complex at 3.6 angstrom resolution. Nature 563, 137-140.
  2. 4qb0: P. J. Dziubanska, U. Derewenda, J. F. Ellena, D. A. Engle & Z. S. Derewenda (2014) The structure of the C-terminal domain of Zaire ebolavirus nucleoprotein. Acta Crystallographica Section D 70, 2420-2429.
  3. T. F. Booth, M. J. Rabb & D. R. Beniac (2013) How do filovirus filaments bend without breaking? Trends in Microbiology 21, 583-593.
  4. 4ldb, 4ldd: Z. A. Bornholdt, T. Noda, D. M. Abelson, P. Halfmann, M. R. Wood, Y. Kawaoka & E. O. Saphire (2013) Structural rearrangement of ebola virus VP40 begets multiple functions in the virus life cycle. Cell 154, 763-774.
  5. 3csy: J. E. Lee, M. L. Fusco, A. J. Hessell, W. B. Oswald, D. R. Burton & E. O Saphire (2008) Structure of the ebola virus glycoprotein bound to an antibody from a human survivor. Nature 454, 177-182.
  6. 3vne: A. P. P. Zhang, Z. A. Bornholdt, T. Liu, D. M. Abelson, D. E. Lee, S. Li, V. L. Woods & E. O. Saphire (2012) The ebola virus interferon antagonist VP24 directly binds STAT1 and has a novel, pyramidal fold. PLoS Pathogens 8: e1002550.
  7. 3fke: D. W. Leung, N. D. Ginder, D. B. Fulton, J. Nix, C. F. Basler, R. B. Honzatko & G. K. Amarasinghe (2009) Structure of the ebola VP35 interferon inhibitory domain. Proceedings of the National Academy of Science USA 106, 411-416.
  8. 2i8b: B. Hartlieb, T. Muziol, W. Weissenhorn & S. Becker (2007) Crystal structure of the C-terminal domain of ebola virus VP30 reveals a role in transcription and nucleocapsid association. Proceedings of the National Academy of Science USA 104, 624-629.
  9. 1h2c: F. X. Gomis-Ruth, A. Dessen, J. Timmins, A. Bracher, L. Kolesnikowa, S. Becker, H. D. Klenk & W. Weissenhorn (2003) The matrix protein VP40 from ebola virus octamerizes into pore-like structures with specific RNA binding properties. Structure 11, 423-433.

October 2014, David Goodsell

About PDB-101

PDB-101 helps teachers, students, and the general public explore the 3D world of proteins and nucleic acids. Learning about their diverse shapes and functions helps to understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease to biological energy.

Why PDB-101? Researchers around the globe make these 3D structures freely available at the Protein Data Bank (PDB) archive. PDB-101 builds introductory materials to help beginners get started in the subject ("101", as in an entry level course) as well as resources for extended learning.

GFP-like Proteins

Researchers have used these fluorescent proteins in many clever ways. For example, to study protein interactions, they can split GFP into two pieces and attach one piece to each protein. Then, if the two proteins get close to one another in the cell, the GFP will assemble and light up. PDB entry 4kf5 shows an example of a split GFP, in this case, used as a method to assist crystallization of proteins for structure determination. Two beta strand segments of GFP are fused to a protein of interest (in this case, sfCherry, colored red here), and a version of GFP is created that lacks these two strands (colored green). When the two engineered proteins are mixed together and interact with one another, the two portions of GFP assemble into a functionally fluorescent protein, with the cargo of sfCherry. To explore this engineered complex in more detail, click on the image for an interactive JSmol.

Topics for Further Discussion

  1. Structures for several proteins fused with GFP are available in the PDB archive, for instance, PDB entry 4anj has GFP fused with myosin.
  2. Be sure to look around the internet for micrograph images of cells with GFP-labeled proteins--for particularly beautiful images, try searching for "brainbow" or "fluoresence micrograph cytoskeleton"

Related PDB-101 Resources


  1. D. M. Chudakov, M. V. Matz, S. Lukyanov & K. A. Lukyanov (2010) Fluorescent proteins and their applications in imaging living cells and tissues. Physiological Reviews 90, 1103-1163.
  2. 4kf5: H. B. Nguyen, L. W. Hung, T. O. Yeates, T. C. Terwilliger & G. S. Waldo (2013) Split green fluorescent protein as a modular binding partner for protein crystallization. Acta Crystallographica Section D 69, 2513-2523.
  3. 4ar7: D. Von Stetten, M. Noirclerc-Savoye, J. Goedhart, T. W. J. J. Gadella & A. Royant (2012) Structure of a fluorescent protein from Aequorea victoria bearing the obligate- monomer mutation A206K. Acta Crystallographical Section F 68, 878.
  4. 2y0g: A. Royant & M. Noirclerc-Savoye (2011) Stabilizing role of glutamic acid 222 in the structure of enhanced green fluorescent protein. Journal of Structural Biology 174, 385-390.
  5. 3m24: O. M. Subach, V. N. Malashkevich, W. D. Zencheck, K. S. Morozova, K. D. Piatkevich, S. C. Almo & V. V. Verkhusha (2010) Structural characterization of acylimine-containing blue and red chromophores in mTagBFP and TagRFP fluorescent proteins. Chemistry & Biology 17, 333-341
  6. 2q57: G. D. Malo, L. J. Pouwels, M. Wang, A. Weichsel, W. R. Montfort, M. A. Rizzo, D. W. Piston & R. M. Wachter (2007) X-ray structure of Cerulean GFP: a tryptophan- based chromophore useful for fluorescence lifetime imaging. Biochemistry 46, 9865- 9873.
  7. 2h5o, 2h5q: X. Shu, N. C. Shaner, C. A. Yarbrough, R. Y. Tsien & S. J. Remington (2006) Novel chromophores and buried charges control color in mFruits. Biochemistry 45, 9639-9647.
  8. 1huy: O. Griesbeck, G. S. Baird, R E., Campbell, D. A. Zacharias & R. Y. Tsien (2001) Reducing the environmental sensitivity of yellow fluorescent protein. Journal of Biological Chemistry 276, 29188-29194.
  9. 1g7k: D. Yarbrough, R. M. Wachter, K. Kallio, M. V. Matz & S. J. Remington (2001) Refines crystal structure of DsRed, a red fluorescent protein from coral, at 2.0-A resolution. PNAS USA 98, 462-467.

About PDB-101

PDB-101 helps teachers, students, and the general public explore the 3D world of proteins and nucleic acids. Learning about their diverse shapes and functions helps to understand all aspects of biomedicine and agriculture, from protein synthesis to health and disease to biological energy.

Why PDB-101? Researchers around the globe make these 3D structures freely available at the Protein Data Bank (PDB) archive. PDB-101 builds introductory materials to help beginners get started in the subject ("101", as in an entry level course) as well as resources for extended learning.

Researchers describe protein previously unknown in biology

Ball-and-stick model of part of activated pig aconitase centered on (4Fe4S) cluster bound to cysteine-385, -448, -451, after PDB 7ACN. Credit: wikimedia commons

University of Georgia researchers have discovered a new way that iron is stored in microorganisms, a finding that provides new insights into the fundamental nature of how biological systems work. The research was recently published in the journal Nature Communications.

Iron, a metal that is required by all living organisms, is usually stored with oxygen inside a cell in a complex within a large protein known as ferritin. Researchers have now discovered a new type of protein, known as IssA, that stores iron with sulfur, instead of oxygen, in the form of an iron-sulrfur polymer known as thioferrate.

"This iron-sulfur polymer has been made previously in a test-tube but this is the first time thioferrate has been identified in a biological system," said Michael W. Adams, lead author and Distinguished Research Professor in the department of biochemistry and molecular biology. "In addition, this single type of protein, IssA, self-assembles into extremely large complexes or nanoparticles that can be more than 20-times the size of ferritin. The IssA nanoparticles are so large that they are visible inside whole cells using a microscope."

Researchers also discovered that this new protein plays a role not only in the storage of iron, but also in the assembly of proteins that contain iron-sulfur clusters.

"This work provides new insights into how microorganisms can store iron and also sulfur, and how single proteins can self-assemble into nanoparticles," said Adams. "It also gives a new perspective on how iron-sulfur clusters are synthesized in biological systems."

"Iron sulfur cluster-containing proteins are ubiquitous in biology where the clusters are used to catalyze chemical reactions or to transport electrons, for example, during respiration," he added. "In doing this research, we were interested in elucidating the function and biosynthesis of iron-sulfur clusters."

In the lab, the team grew microorganisms on a large scale, purified them and then were able to characterize a variety of iron-sulfur proteins and enzymes.

"From our genetic analyses of the organism we knew that IssA was a major protein in the cell, and during our biochemical analyses we noticed IssA due to its extremely large size. Its high abundance and large size made it quite easy to purify," he said. "With the purified protein we could apply various analytical, spectroscopic and microscopic techniques and that led us to conclude that IssA was a nanoparticle and contained thioferrate, a iron-sulfur polymer not previously seen in biology. With the pure IssA protein we could also generate antibodies, and this enabled us to visualize IssA in whole cells of the microorganism as a large complex within the cell."

While research of this nature provides fundamental knowledge about how biological systems work, the research could one day be used to engineer nanoparticles for medical or other applications.

"Nanoparticles are used in many medical and electronic applications, although they are typically made of inorganic components," he said. "Engineering protein nanoparticles might be possible if we could understand the properties of IssA that enable it to assemble into nanoparticle-like structures. It is also possible that nanoparticles built on the IssA protein but containing other inorganic materials could have applications."


PDB50 will mark an important milestone in the history of structural biology. In 1971, the structural biology community established the single worldwide archive for macromolecular structure data &mdash the Protein Data Bank (PDB). From its inception, the PDB has embraced a culture of open access, leading to its widespread use by the research community. PDB data are used by hundreds of data resources and millions of users exploring fundamental biology, energy and biomedicine.

Structural biology and structural bioinformatics have had an enormous impact on our understanding of the mechanism and function of biological macromolecules. The PDB acts as a custodian for all these data, representing a repository of the vast majority of the achievements and milestones of the structural biology community. The archive is managed by the Worldwide Protein Data Bank consortium (wwPDB) of partner sites in Asia, Europe and America.

This celebration of the 50th anniversary of the founding of the Protein Data Bank as the first open access digital data resource in biology will include presentations from speakers from around the world who have made tremendous advances in structural biology and bioinformatics. Students and postdoctoral fellows are especially encouraged to attend and will be eligible for poster prizes.

Early and late stage career scientists are encouraged to submit an abstract for poster presentation during the symposium.

The online sessions will take place between 11 a.m. &ndash 4:30 p.m. EDT each day.

The event will be recorded and made available to registered participants after the meeting.

Important dates


Eddy Arnold

  • Rutgers, The State University of New Jersey
  • Using HIV-1 reverse transcriptase structures to guide anti-AIDS drug discovery

Helen M. Berman

  • Rutgers, The State University of New Jersey
  • University of Southern California
  • The evolution of the Protein Data Bank as a community resource

Thomas L. Blundell

  • University of Cambridge
  • A personal history of five decades of structural biology and the PDB: From the X-ray structure of 2-Zinc insulin hexamer in 1970 to Cryo-EM structures of DNA-PK from DNA repair in 2020

Alexandre M. J. J. Bonvin

  • Utrecht University
  • Solving 3D puzzles by integrative modelling using PDB structures

Stephen K. Burley

  • Rutgers, The State University of New Jersey
  • University of California, San Diego
  • Impact of structural biologists and fifty years of Protein Data Bank operations on drug discovery and development

Wah Chiu

Johann Deisenhofer

  • University of Texas Southwestern Medical Center
  • 50 years of PDB &mdash from crazy idea to treasure

Juli Feigon

  • University of California, Los Angeles
  • Structural biology of telomerase

Angela M. Gronenborn

  • University of Pittsburgh
  • Integrated BioNMR &mdash getting by with a little help from my friends

Jennifer L. Martin

  • University of Wollongong
  • Science, crystallography, reflections: A journey with the PDB over 35 years

Stephen L. Mayo

  • California Institute of Technology
  • Antibody small molecule conjugates with computationally designed target binding synergy

Zihe Rao

  • ShanghaiTech University
  • Tsinghua University
  • Structural insight into SARS-CoV-2 replication and transcription complex (RTC)

Hao Wu

  • Harvard Medical School
  • Boston Children's Hospital
  • "Speck"tacular inflammasomes: structures of supramolecular complexes in innate immunity


  • Celia Schiffer, University of Massachusetts Medical School
  • Helen M. Berman, Rutgers, The State University of New Jersey RCSB PDB
  • Stephen K. Burley, Rutgers, The State University of New Jersey RCSB PDB
  • Jeffrey C. Hoch, University of Connecticut BMRB
  • Gerard J. Kleywegt, European Bioinformatics Institute PDBe
  • Genji Kurisu, Osaka University PDBj
  • John L. Markley, University of Wisconsin&ndashMadison BMRB
  • Sameer Velankar, European Bioinformatics Institute PDBe
  • Christine Zardecki, Rutgers, The State University of New Jersey RCSB PDB

Acknowledgement: Illustration by David S. Goodsell, The Scripps Research Institute. doi: 10.2210/rcsb_pdb/goodsell-gallery-003

This illustration shows a cross-section through the blood, with blood serum in the upper half and a red blood cell in the lower half. In the serum, look for Y-shaped antibodies, long thin fibrinogen molecules (in light red) and many small albumin proteins. The large UFO-shaped objects are low density lipoprotein and the six-armed protein is complement C1. The red blood cell is filled with hemoglobin, in red. The cell membrane, in purple, is braced on the inner surface by long spectrin chains connected at one end to a small segment of actin filament.


How many proteins on PDB have unknown function? - Biology

The primary types and functions of proteins are listed in Table 1.

Table 1. Protein Types and Functions
Type Examples Functions
Digestive Enzymes Amylase, lipase, pepsin, trypsin Help in digestion of food by catabolizing nutrients into monomeric units
Transport Hemoglobin, albumin Carry substances in the blood or lymph throughout the body
Structural Actin, tubulin, keratin Construct different structures, like the cytoskeleton
Hormones Insulin, thyroxine Coordinate the activity of different body systems
Defense Immunoglobulins Protect the body from foreign pathogens
Contractile Actin, myosin Effect muscle contraction
Storage Legume storage proteins, egg white (albumin) Provide nourishment in early development of the embryo and the seedling

Two special and common types of proteins are enzymes and hormones. Enzymes, which are produced by living cells, are catalysts in biochemical reactions (like digestion) and are usually complex or conjugated proteins. Each enzyme is specific for the substrate (a reactant that binds to an enzyme) it acts on. The enzyme may help in breakdown, rearrangement, or synthesis reactions. Enzymes that break down their substrates are called catabolic enzymes, enzymes that build more complex molecules from their substrates are called anabolic enzymes, and enzymes that affect the rate of reaction are called catalytic enzymes. It should be noted that all enzymes increase the rate of reaction and, therefore, are considered to be organic catalysts. An example of an enzyme is salivary amylase, which hydrolyzes its substrate amylose, a component of starch.

Hormones are chemical-signaling molecules, usually small proteins or steroids, secreted by endocrine cells that act to control or regulate specific physiological processes, including growth, development, metabolism, and reproduction. For example, insulin is a protein hormone that helps to regulate the blood glucose level.

Proteins have different shapes and molecular weights some proteins are globular in shape whereas others are fibrous in nature. For example, hemoglobin is a globular protein, but collagen, found in our skin, is a fibrous protein. Protein shape is critical to its function, and this shape is maintained by many different types of chemical bonds. Changes in temperature, pH, and exposure to chemicals may lead to permanent changes in the shape of the protein, leading to loss of function, known as denaturation. Different arrangements of the same 20 types of amino acids comprise all proteins. Two rare new amino acids were discovered recently (selenocystein and pirrolysine), and additional new discoveries may be added to the list.

In Summary: Function of Proteins

Proteins are a class of macromolecules that perform a diverse range of functions for the cell. They help in metabolism by providing structural support and by acting as enzymes, carriers, or hormones. The building blocks of proteins (monomers) are amino acids. Each amino acid has a central carbon that is linked to an amino group, a carboxyl group, a hydrogen atom, and an R group or side chain. There are 20 commonly occurring amino acids, each of which differs in the R group. Each amino acid is linked to its neighbors by a peptide bond. A long chain of amino acids is known as a polypeptide.

Proteins are organized at four levels: primary, secondary, tertiary, and (optional) quaternary. The primary structure is the unique sequence of amino acids. The local folding of the polypeptide to form structures such as the α helix and β-pleated sheet constitutes the secondary structure. The overall three-dimensional structure is the tertiary structure. When two or more polypeptides combine to form the complete protein structure, the configuration is known as the quaternary structure of a protein. Protein shape and function are intricately linked any change in shape caused by changes in temperature or pH may lead to protein denaturation and a loss in function.

Author Summary

Genome sequencing has led to the discovery of many new gene products, proteins. These discoveries hold tremendous potential for totally new approaches to the diagnosis and treatment of disease. To realize this potential, one important step is to understand the function of the thousands of proteins whose function is currently unknown. One of these proteins of unknown function is human DJ-1, a protein that appears to play a protective role against Parkinson and other neurodegenerative diseases. Here we present a computational approach to the classification by function of DJ-1 and its family members. Eight DJ-1 family members, all with similar 3-D structure, are analyzed. Three different probable functional classes emerge from this analysis on six of the family members, all with a simple calculation.

Citation: Wei Y, Ringe D, Wilson MA, Ondrechen MJ (2007) Identification of Functional Subclasses in the DJ-1 Superfamily Proteins. PLoS Comput Biol 3(1): e15.

Editor: Luhua Lai, Peking University, China

Received: September 11, 2006 Accepted: December 7, 2006 Published: January 26, 2007

Copyright: © 2007 Wei at al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: The support of US National Science Foundation grant MCB-0517292 is gratefully acknowledged.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: PDB, Protein Data Bank THEMATICS, theoretical microscopic titration curves

Sequence-based classification of proteins

The first hurdle for any functional annotation process is to define 'function'. If the protein is an enzyme, then simply using the EC numbering scheme (see Box 1) can be useful. In general however, the problem is multi-dimensional: a protein can have a molecular function, a cellular role, and be part of a functional complex or pathway (these are the distinctions used in the Gene Ontology (GO see Box 1) [6]). Furthermore, certain aspects of molecular function can be illustrated by multiple descriptive levels (for example, the coarse 'enzyme' category versus a more specific 'protease' assignment). Even the more detailed definition would not reveal the cellular role of the protein (apoptosis, metabolism, blood coagulation, and so on).

Most function-prediction methods, both sequence and structure based, rely on inferring relationships between proteins that permit the transfer of functional annotations and binding specificities from one to the other. A notable challenge here is deciphering the connection between the detected similarities (structural or in sequence) and the actual level of functional relatedness. Function is often associated with domains, and another problem is the identification of functional domains from sequence alone. The accuracy of current methods for predicting domain boundaries is not yet completely satisfactory. Several methods provide reliable predictions if a structural template for the protein is available, but when this is not the case, one is left with the problem of whether the experimental annotation used for the inference refers to the same domain for which the sequence similarity/motif is established [7].

The function of a protein can also be inferred from its evolutionary relationship with proteins of known function, provided that the relationship is properly inspected. Orthologous proteins in different species most often share function, but paralogy (that is, divergence following duplication of the original gene) does not guarantee common function. Distinguishing between orthology and paralogy can be attempted on the basis of observed sequence-similarity patterns, by analyzing the specific conservation pattern of residues responsible for function in the family, or on the basis of the protein structure (either experimentally determined or modeled). In all cases, this requires the clustering of proteins into evolutionary families, which can be achieved using similarity-detection tools such as BLAST [8] or profiling tools based on multiple sequence alignments, for example, PSI-BLAST [9]. Several available resources provide pre-compiled family assignments for proteins on a genomic scale, based only on their sequence. Resources can be subdivided into those that consider full-length sequences and those based on domains or motifs that map to certain sub-sequences. In both cases, the degree of granularity of the classification is important, as this is related to the level of functional features that a group of proteins is expected to share.

A resource that classifies full-length proteins is PIRSF [10], in which a set of rules is applied to define primary and curated clusters that are also based on textual (protein names, literature) and parent-child relationships. These clusters (named superfamilies) are further divided into those with full-length similarity (that is, common domain architecture) and those sharing an ancestral domain. PIRSF covers more than two-thirds of the protein sequence space.

Studying proteins at a domain level allows more accurate functional inference [11] and is useful for predicting the function of novel domain combinations that possibly give rise to new protein functions [12]. In this type of resource, a family of domains is represented as a multiple sequence alignment, which is embodied in a statistical family signature profile (for example, CDD [13] and PROSITE [14]) or a profile-hidden Markov model (for example, Pfam [15] and SMART [16]), collectively referred to here as profiles. Pfam, a prototype for such collections, currently contains more than 9,000 family profiles and covers roughly 70-74% of UniProt sequences, capturing about half of their amino acids [17]. About 40-45% of Pfam families are associated with known structures, whereas 20-25% are currently uncharacterized. Other resources, for example CDD, use externally defined profiles to provide rapid assignments to sequence queries, using a BLAST-like engine to speed up searches.

Profile-based methods and resources differ significantly in their level of automation, their degree of manual curation, and the level of independence from complementary resources used in the classification. Combination of these resources provides a more comprehensive coverage, as reflected by InterPro [18], a repository of protein families integrating signatures from more than 10 member resources, currently covering nearly 75% of UniProt sequences. InterPro also includes Gene3d [19] and SUPERFAMILY [20], which provide sequence profiles corresponding to the structural classification of folds by CATH [21] and SCOP [22], respectively. A resource exploiting the multiplicity of essentially complete genome sequences is COG (Clusters of Orthologous Groups), an evolutionary classification that uses comparative genomics principles, such as phyletic profiles [23] (see Box 1), to identify the presence of orthologs, and group them accordingly.

A notable shortcoming of the methods described above is that they require definition of a threshold similarity for separating families from each other. An alternative approach to defining clusters is the construction of a tree representation that can provide a hierarchical view. Resources in this category include ProtoNet [24], CluSTr [25] and SYSTERS [26]. They are based on sequence similarities detected by an all-against-all sequence comparison, so that any level of evolutionary granularity can be inspected, from closely related subfamilies to more distant relationships.

Approaches that do not rely solely on supervised annotation of family profiles include ProDom [27], which collects putative domain profiles using known sequence domains as query sequences for iterative PSI-BLAST searches [9]. EVEREST [28] is a fully automatic unsupervised method that identifies recurrent conserved regions on the basis of local sequence similarities and iterative profile searches.

The accuracy of sequence-based methods is affected by the type and amount of information on the specific protein family but, overall, they seem to be reasonably accurate. Their success rate has been shown to be greater than 70% when tested on a limited dataset (all structures solved by the Midwest Center for Structural Genomics during the first five years of the Protein Structure Initiative) [29].


Kim, S. C. et al. Substrate and functional diversity of lysine acetylation revealed by a proteomics survey. Mol. Cell 23, 607–618 (2006). The first proteomic study to report the widespread existence of acetylation in human (HeLa) and mouse cells.

Choudhary, C. et al. Lysine acetylation targets protein complexes and co-regulates major cellular functions. Science 325, 834–840 (2009). A large-scale, high-resolution proteomic screen that identified over 3,500 acetylated Lys sites in human cells.

Choudhary, C., Weinert, B. T., Nishida, Y., Verdin, E. & Mann, M. The growing landscape of lysine acetylation links metabolism and cell signalling. Nat. Rev. Mol. Cell Biol. 15, 536–550 (2014).

Verdin, E. & Ott, M. 50 years of protein acetylation: from gene regulation to epigenetics, metabolism and beyond. Nat. Rev. Mol. Cell Biol. 16, 258–264 (2015).

Bannister, A. J. & Kouzarides, T. Regulation of chromatin by histone modifications. Cell Res. 21, 381–395 (2011).

Glozak, M. A. & Seto, E. Histone deacetylases and cancer. Oncogene 26, 5420–5432 (2007).

Zhao, D., Li, F. L., Cheng, Z. L. & Lei, Q. Y. Impact of acetylation on tumor metabolism. Mol. Cell. Oncol. 1, e963452 (2014).

Falkenberg, K. J. & Johnstone, R. W. Histone deacetylases and their inhibitors in cancer, neurological diseases and immune disorders. Nat. Rev. Drug Discov. 13, 673–691 (2014).

Haynes, S. R. et al. The bromodomain: a conserved sequence found in human, Drosophila and yeast proteins. Nucleic Acids Res. 20, 2603 (1992). The first report of the BRD motif, which speculates that it constitutes a protein–protein interaction domain.

Li, Y. et al. AF9 YEATS domain links histone acetylation to DOT1L-mediated H3K79 methylation. Cell 159, 558–571 (2014).

Li, Y. et al. Molecular coupling of histone crotonylation and active transcription by AF9 YEATS domain. Mol. Cell 62, 181–193 (2016).

Andrews, F. H. et al. The Taf14 YEATS domain is a reader of histone crotonylation. Nat. Chem. Biol. 12, 396–398 (2016).

Kim, M. S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–584 (2014).

Muller, S., Filippakopoulos, P. & Knapp, S. Bromodomains as therapeutic targets. Expert Rev. Mol. Med. 13, e29 (2011).

Belkina, A. C. & Denis, G. V. BET domain co-regulators in obesity, inflammation and cancer. Nat. Rev. Cancer 12, 465–477 (2012).

Shi, J. & Vakoc, C. R. The mechanisms behind the therapeutic activity of BET bromodomain inhibition. Mol. Cell 54, 728–736 (2014).

Wang, C. Y. & Filippakopoulos, P. Beating the odds: BETs in disease. Trends Biochem. Sci. 40, 468–479 (2015).

Filippakopoulos, P. & Knapp, S. Targeting bromodomains: epigenetic readers of lysine acetylation. Nat. Rev. Drug Discov. 13, 337–356 (2014).

Basheer, F. & Huntly, B. J. BET bromodomain inhibitors in leukemia. Exp. Hematol. 43, 718–731 (2015).

Theodoulou, N. H., Tomkinson, N. C., Prinjha, R. K. & Humphreys, P. G. Progress in the development of non-BET bromodomain chemical probes. ChemMedChem 11, 477–487 (2016).

Filippakopoulos, P. et al. Histone recognition and large-scale structural analysis of the human bromodomain family. Cell 149, 214–231 (2012). This study reported the first large-scale structural analysis of the human BRD family.

Alsarraj, J. et al. BRD4 short isoform interacts with RRP1B, SIPA1 and components of the LINC complex at the inner face of the nuclear membrane. PLoS ONE 8, e80746 (2013).

Dhalluin, C. et al. Structure and ligand of a histone acetyltransferase bromodomain. Nature 399, 491–496 (1999). The first report of the solution structure of a BRD (from PCAF), which established that it interacts directly with acetylated Lys residues.

Jacobson, R. H., Ladurner, A. G., King, D. S. & Tjian, R. Structure and function of a human TAFII250 double bromodomain module. Science 288, 1422–1425 (2000). The first crystal structure of a tandem BRD module from TAF1, which established the rationale for the simultaneous engagement of multiple acetylated histone peptides.

Owen, D. J. et al. The structural basis for the recognition of acetylated histone H4 by the bromodomain of histone acetyltransferase Gcn5p. EMBO J. 19, 6141–6149 (2000). The first high-resolution crystal structure of the yeast Gcn5 BRD bound to an acetylated H4 peptide.

Mujtaba, S. et al. Structural basis of lysine-acetylated HIV-1 Tat recognition by PCAF bromodomain. Mol. Cell 9, 575–586 (2002).

Mujtaba, S. et al. Structural mechanism of the bromodomain of the coactivator CBP in p53 transcriptional activation. Mol. Cell 13, 251–263 (2004).

Gamsjaeger, R. et al. Structural basis and specificity of acetylated transcription factor GATA1 recognition by BET family bromodomain protein Brd3. Mol. Cell. Biol. 31, 2632–2640 (2011). This study demonstrated that BRD3 recognizes acetylated Lys residues in the transcription factor GATA1, which participates in the regulation of haematopoietic lineages.

Schroder, S. et al. Two-pronged binding with bromodomain-containing protein 4 liberates positive transcription elongation factor b from inactive ribonucleoprotein complexes. J. Biol. Chem. 287, 1090–1099 (2012).

Shi, J. et al. Disrupting the interaction of BRD4 with diacetylated Twist suppresses tumorigenesis in basal-like breast cancer. Cancer Cell 25, 210–225 (2014). This study demonstrated that BRD4 recognizes acetylated Lys residues on the transcription factor TWIST, which affects TWIST-controlled gene expression programmes.

Zou, Z. et al. Brd4 maintains constitutively active NF-κB in cancer cells by binding to acetylated RelA. Oncogene 33, 2395–2404 (2014).

Tsai, W. W. et al. TRIM24 links a non-canonical histone signature to breast cancer. Nature 468, 927–932 (2010). This study demonstrated that the PHD–BRD cassette of TRIM24 combinatorially recognizes H3K23ac and unmodified H3K4.

Li, H. et al. Molecular basis for site-specific read-out of histone H3K4me3 by the BPTF PHD finger of NURF. Nature 442, 91–95 (2006).

Ruthenburg, A. J. et al. Recognition of a mononucleosomal histone modification pattern by BPTF via multivalent interactions. Cell 145, 692–706 (2011). This study demonstrated that BPTF multivalently recognizes Kac histone marks on the tails of different histones within the same nucleosome.

Xi, Q. et al. A poised chromatin platform for TGF-β access to master regulators. Cell 147, 1511–1524 (2011).

Moriniere, J. et al. Cooperative binding of two acetylation marks on a histone tail by a single bromodomain. Nature 461, 664–668 (2009). The first report to show that a single BRD can recognize two Kac histone marks simultaneously.

Chen, Y. et al. Lysine propionylation and butyrylation are novel post-translational modifications in histones. Mol. Cell. Proteomics 6, 812–819 (2007).

Tan, M. et al. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Cell 146, 1016–1028 (2011).

Sabari, B. R. et al. Intracellular crotonyl-CoA stimulates transcription through p300-catalyzed histone crotonylation. Mol. Cell 58, 203–215 (2015).

Goudarzi, A. et al. Dynamic competing histone H4 K5K8 acetylation and butyrylation are hallmarks of highly active gene promoters. Mol. Cell 62, 169–180 (2016). This study demonstrated that histone butyrylation regulates the binding of BRDT to histones.

Rousseaux, S. & Khochbin, S. Histone acylation beyond acetylation: terra incognita in chromatin biology. Cell J. 17, 1–6 (2015).

Flynn, E. M. et al. A subset of human bromodomains recognizes butyryllysine and crotonyllysine histone peptide modifications. Structure 23, 1801–1814 (2015).

Singhal, N. et al. Chromatin-remodeling components of the BAF complex facilitate reprogramming. Cell 141, 943–955 (2010).

Singh, M., Popowicz, G. M., Krajewski, M. & Holak, T. A. Structural ramification for acetyl-lysine recognition by the bromodomain of human BRG1 protein, a central ATPase of the SWI/SNF remodeling complex. Chembiochem 8, 1308–1316 (2007).

Reyes, J. C. et al. Altered control of cellular proliferation in the absence of mammalian Brahma (SNF2α). EMBO J. 17, 6979–6991 (1998).

Bultman, S. et al. A Brg1 null mutation in the mouse reveals functional differences among mammalian SWI/SNF complexes. Mol. Cell 6, 1287–1295 (2000).

Huang, X., Gao, X., Diaz-Trelles, R., Ruiz-Lozano, P. & Wang, Z. Coronary development is regulated by ATP-dependent SWI/SNF chromatin remodeling component BAF180. Dev. Biol. 319, 258–266 (2008).

Burrows, A. E., Smogorzewska, A. & Elledge, S. J. Polybromo-associated BRG1-associated factor components BRD7 and BAF180 are critical regulators of p53 required for induction of replicative senescence. Proc. Natl Acad. Sci. USA 107, 14280–14285 (2010).

Brownlee, P. M., Chambers, A. L., Cloney, R., Bianchi, A. & Downs, J. A. BAF180 promotes cohesion and prevents genome instability and aneuploidy. Cell Rep. 6, 973–981 (2014).

Kaeser, M. D. et al. BRD7, a novel PBAF-specific SWI/SNF subunit, is required for target gene activation and repression in embryonic stem cells. J. Biol. Chem. 283, 32254–32263 (2008).

Chiu, Y. H., Lee, J. Y. & Cantley, L. C. BRD7, a tumor suppressor, interacts with p85α and regulates PI3K activity. Mol. Cell 54, 193–202 (2014). This study showed that BRD7 functions as a tumour suppressor through the regulation of PI3K activity.

Park, S. W. et al. BRD7 regulates XBP1s' activity and glucose homeostasis through its interaction with the regulatory subunits of PI3K. Cell Metab. 20, 73–84 (2014).

Kadoch, C. et al. Proteomic and bioinformatic analysis of mammalian SWI/SNF complexes identifies extensive roles in human malignancy. Nat. Genet. 45, 592–601 (2013). This proteomic and bioinformatic study identified new components of the mammalian SWI–SNF complex and mutations in them that are found in human cancers.

Fairbridge, N. A. et al. Cecr2 mutations causing exencephaly trigger misregulation of mesenchymal/ectodermal transcription factors. Birth Defects Res. A Clin. Mol. Teratol. 88, 619–625 (2010).

Banting, G. S. et al. CECR2, a protein involved in neurulation, forms a novel chromatin remodeling complex with SNF2L. Hum. Mol. Genet. 14, 513–524 (2005).

Bowser, R., Giambrone, A. & Davies, P. FAC1, a novel gene identified with the monoclonal antibody Alz50, is developmentally regulated in human brain. Dev. Neurosci. 17, 20–37 (1995).

Tallant, C. et al. Molecular basis of histone tail recognition by human TIP5 PHD finger and bromodomain of the chromatin remodeling complex NoRC. Structure 23, 80–92 (2015).

Collins, N. et al. An ACF1−ISWI chromatin-remodeling complex is required for DNA replication through heterochromatin. Nat. Genet. 32, 627–632 (2002).

Xiao, A. et al. WSTF regulates the H2A.X DNA damage response via a novel tyrosine kinase activity. Nature 457, 57–62 (2009).

Zhou, Y. et al. Reversible acetylation of the chromatin remodelling complex NoRC is required for non-coding RNA-dependent silencing. Nat. Cell Biol. 11, 1010–1016 (2009).

Jones, M. H., Hamana, N., Nezu, J. & Shimane, M. A novel family of bromodomain genes. Genomics 63, 40–45 (2000).

Huang, H., Rambaldi, I., Daniels, E. & Featherstone, M. Expression of the WDR9 gene and protein products during mouse development. Dev. Dyn. 227, 608–614 (2003).

Philipps, D. L. et al. The dual bromodomain and WD repeat-containing mouse protein BRWD1 is required for normal spermiogenesis and the oocyte−embryo transition. Dev. Biol. 317, 72–82 (2008).

Pattabiraman, S. et al. Mouse BRWD1 is critical for spermatid postmeiotic transcription and female meiotic chromosome stability. J. Cell Biol. 208, 53–69 (2015).

Muller, P., Kuttenkeuler, D., Gesellchen, V., Zeidler, M. P. & Boutros, M. Identification of JAK/STAT signalling components by genome-wide RNA interference. Nature 436, 871–875 (2005).

Dancy, B. M. & Cole, P. A. Protein lysine acetylation by p300/CBP. Chem. Rev. 115, 2419–2452 (2015).

Yao, T. P. et al. Gene dosage-dependent embryonic development and proliferation defects in mice lacking the transcriptional integrator p300. Cell 93, 361–372 (1998).

Tanaka, Y. et al. Extensive brain hemorrhage and embryonic lethality in a mouse null mutant of CREB-binding protein. Mech. Dev. 95, 133–145 (2000).

Kasper, L. H. et al. Conditional knockout mice reveal distinct functions for the global transcriptional coactivators CBP and p300 in T-cell development. Mol. Cell. Biol. 26, 789–809 (2006).

Nagy, Z. & Tora, L. Distinct GCN5/PCAF-containing complexes function as co-activators and are involved in transcription factor and global histone acetylation. Oncogene 26, 5341–5357 (2007).

Krebs, A. R., Karmodiya, K., Lindahl-Allen, M., Struhl, K. & Tora, L. SAGA and ATAC histone acetyl transferase complexes regulate distinct sets of genes and ATAC defines a class of p300-independent enhancers. Mol. Cell 44, 410–423 (2011).

Maurice, T. et al. Altered memory capacities and response to stress in p300/CBP-associated factor (PCAF) histone acetylase knockout mice. Neuropsychopharmacology 33, 1584–1602 (2008).

Xu, W. et al. Loss of Gcn5l2 leads to increased apoptosis and mesodermal defects during mouse development. Nat. Genet. 26, 229–232 (2000).

Bu, P., Evrard, Y. A., Lozano, G. & Dent, S. Y. Loss of Gcn5 acetyltransferase activity leads to neural tube closure defects and exencephaly in mouse embryos. Mol. Cell. Biol. 27, 3405–3416 (2007).

Doyon, Y., Selleck, W., Lane, W. S., Tan, S. & Cote, J. Structural and functional conservation of the NuA4 histone acetyltransferase complex from yeast to humans. Mol. Cell. Biol. 24, 1884–1896 (2004).

Altaf, M. et al. NuA4-dependent acetylation of nucleosomal histones H4 and H2A directly stimulates incorporation of H2A.Z by the SWR1 complex. J. Biol. Chem. 285, 15966–15977 (2010).

Obri, A. et al. ANP32E is a histone chaperone that removes H2A.Z from chromatin. Nature 505, 648–653 (2014).

Yang, X. J. MOZ and MORF acetyltransferases: molecular interaction, animal development and human disease. Biochim. Biophys. Acta 1853, 1818–1826 (2015).

Klein, B. J. et al. Bivalent interaction of the PZP domain of BRPF1 with the nucleosome impacts chromatin dynamics and acetylation. Nucleic Acids Res. 44, 472–484 (2016).

You, L., Chen, L., Penney, J., Miao, D. & Yang, X. J. Expression atlas of the multivalent epigenetic regulator Brpf1 and its requirement for survival of mouse embryos. Epigenetics 9, 860–872 (2014).

Mishima, Y. et al. The Hbo1−Brd1/Brpf2 complex is responsible for global acetylation of H3K14 and required for fetal liver erythropoiesis. Blood 118, 2443–2453 (2011).

Feng, Y. et al. BRPF3−HBO1 regulates replication origin activation and histone H3K14 acetylation. EMBO J. 35, 176–192 (2016).

Gregory, G. D. et al. Mammalian ASH1L is a histone methyltransferase that occupies the transcribed region of active genes. Mol. Cell. Biol. 27, 8466–8479 (2007).

Tanaka, Y. et al. Dual function of histone H3 lysine 36 methyltransferase ASH1 in regulation of Hox gene expression. PLoS ONE 6, e28171 (2011).

Rao, R. C. & Dou, Y. Hijacked in cancer: the KMT2 (MLL) family of methyltransferases. Nat. Rev. Cancer 15, 334–346 (2015).

Milne, T. A. et al. MLL targets SET domain methyltransferase activity to Hox gene promoters. Mol. Cell 10, 1107–1117 (2002).

Filippakopoulos, P. et al. Selective inhibition of BET bromodomains. Nature 468, 1067–1073 (2010). This study reports the first potent and selective thienodiazepine compound that targets BET family BRDs in a NUT midline carcinoma model.

Nicodeme, E. et al. Suppression of inflammation by a synthetic histone mimic. Nature 468, 1119–1123 (2010). This study reports the first benzodiazepine compound that targets the BET family of BRDs and its use as an anti-inflammatory agent.

Wang, F. et al. Brd2 disruption in mice causes severe obesity without type 2 diabetes. Biochem. J. 425, 71–83 (2010).

Shang, E., Wang, X., Wen, D., Greenberg, D. A. & Wolgemuth, D. J. Double bromodomain-containing gene Brd2 is essential for embryonic development in mouse. Dev. Dyn. 238, 908–917 (2009).

Houzelstein, D. et al. Growth and early postimplantation defects in mice deficient for the bromodomain-containing protein Brd4. Mol. Cell. Biol. 22, 3794–3802 (2002).

Lamonica, J. M. et al. Bromodomain protein Brd3 associates with acetylated GATA1 to promote its chromatin occupancy at erythroid target genes. Proc. Natl Acad. Sci. USA 108, E159–E168 (2011).

Stonestrom, A. J. et al. Functions of BET proteins in erythroid gene expression. Blood 125, 2825–2834 (2015).

Jang, M. K. et al. The bromodomain protein Brd4 is a positive regulatory component of P-TEFb and stimulates RNA polymerase II-dependent transcription. Mol. Cell 19, 523–534 (2005).

Yang, Z. et al. Recruitment of P-TEFb for stimulation of transcriptional elongation by the bromodomain protein Brd4. Mol. Cell 19, 535–545 (2005).

Gaucher, J. et al. Bromodomain-dependent stage-specific male genome programming by Brdt. EMBO J. 31, 3809–3820 (2012).

Shang, E., Nickerson, H. D., Wen, D., Wang, X. & Wolgemuth, D. J. The first bromodomain of Brdt, a testis-specific member of the BET sub-family of double-bromodomain-containing proteins, is essential for male germ cell differentiation. Development 134, 3507–3515 (2007).

Matzuk, M. M. et al. Small-molecule inhibition of BRDT for male contraception. Cell 150, 673–684 (2012).

Buchmann, A. M., Skaar, J. R. & DeCaprio, J. A. Activation of a DNA damage checkpoint response in a TAF1-defective cell line. Mol. Cell. Biol. 24, 5332–5339 (2004).

Kimura, J. et al. A functional genome-wide RNAi screen identifies TAF1 as a regulator for apoptosis in response to genotoxic stress. Nucleic Acids Res. 36, 5250–5259 (2008).

Sdelci, S. et al. Mapping the chemical chromatin reactivation landscape identifies BRD4−TAF1 cross-talk. Nat. Chem. Biol. 12, 504–510 (2016).

Wang, P. J. & Page, D. C. Functional substitution for TAFII250 by a retroposed homolog that is expressed in human spermatogenesis. Hum. Mol. Genet. 11, 2341–2346 (2002).

Gong, F. et al. Screen identifies bromodomain protein ZMYND8 in chromatin recognition of transcription-associated DNA damage that promotes homologous recombination. Genes Dev. 29, 197–211 (2015).

Shen, H. et al. Suppression of enhancer overactivation by a RACK7-histone demethylase complex. Cell 165, 331–342 (2016).

Wen, H. et al. ZMYND11 links histone H3.3K36me3 to transcription elongation and tumour suppression. Nature 508, 263–268 (2014).

Guo, R. et al. BS69/ZMYND11 reads and connects histone H3.3 lysine 36 trimethylation-decorated chromatin to regulated pre-mRNA processing. Mol. Cell 56, 298–310 (2014).

Zou, J. X., Revenko, A. S., Li, L. B., Gemo, A. T. & Chen, H. W. ANCCA, an estrogen-regulated AAA+ ATPase coactivator for ERα, is required for coregulator occupancy and chromatin modification. Proc. Natl Acad. Sci. USA 104, 18067–18072 (2007).

Ciro, M. et al. ATAD2 is a novel cofactor for MYC, overexpressed and amplified in aggressive tumors. Cancer Res. 69, 8491–8498 (2009).

Revenko, A. S., Kalashnikova, E. V., Gemo, A. T., Zou, J. X. & Chen, H. W. Chromatin loading of E2F−MLL complex by cancer-associated coregulator ANCCA via reading a specific histone mark. Mol. Cell. Biol. 30, 5260–5272 (2010).

Leachman, N. T., Brellier, F., Ferralli, J., Chiquet-Ehrismann, R. & Tucker, R. P. ATAD2B is a phylogenetically conserved nuclear protein expressed during neuronal differentiation and tumorigenesis. Dev. Growth Differ. 52, 747–755 (2010).

Morozumi, Y. et al. Atad2 is a generalist facilitator of chromatin dynamics in embryonic stem cells. J. Mol. Cell. Biol. 8, 349–362 (2016).

Malovannaya, A. et al. Analysis of the human endogenous coregulator complexome. Cell 145, 787–799 (2011). A large-scale affinity purification–mass spectrometry study that charts endogenous human co-regulator complexes.

Cammas, F. et al. Cell differentiation induces TIF1β association with centromeric heterochromatin via an HP1 interaction. J. Cell Sci. 115, 3439–3448 (2002).

Ivanov, A. V. et al. PHD domain-mediated E3 ligase activity directs intramolecular sumoylation of an adjacent bromodomain required for gene silencing. Mol. Cell 28, 823–837 (2007).

Zeng, L. et al. Structural insights into human KAP1 PHD finger-bromodomain and its role in gene silencing. Nat. Struct. Mol. Biol. 15, 626–633 (2008).

Tubbs, A. T. et al. KAP-1 promotes resection of broken DNA ends not protected by γ-H2AX and 53BP1 in G1-phase lymphocytes. Mol. Cell. Biol. 34, 2811–2821 (2014).

Dupont, S. et al. FAM/USP9x, a deubiquitinating enzyme essential for TGFβ signaling, controls Smad4 monoubiquitination. Cell 136, 123–135 (2009).

Agricola, E., Randall, R. A., Gaarenstroom, T., Dupont, S. & Hill, C. S. Recruitment of TIF1γ to chromatin via its PHD finger-bromodomain activates its ubiquitin ligase and transcriptional repressor activities. Mol. Cell 43, 85–96 (2011).

Kim, J. & Kaartinen, V. Generation of mice with a conditional allele for Trim33. Genesis 46, 329–333 (2008).

Khetchoumian, K. et al. TIF1δ, a novel HP1-interacting member of the transcriptional intermediary factor 1 (TIF1) family expressed by elongating spermatids. J. Biol. Chem. 279, 48329–48341 (2004).

Bloch, D. B. et al. Sp110 localizes to the PML−Sp100 nuclear body and may function as a nuclear hormone receptor transcriptional coactivator. Mol. Cell. Biol. 20, 6138–6146 (2000).

Wasylyk, C., Schlumberger, S. E., Criqui-Filipe, P. & Wasylyk, B. Sp100 interacts with ETS-1 and stimulates its transcriptional activity. Mol. Cell. Biol. 22, 2687–2702 (2002).

Yordy, J. S. et al. SP100 expression modulates ETS1 transcriptional activity and inhibits cell invasion. Oncogene 23, 6654–6665 (2004).

Podcheko, A. et al. Identification of a WD40 repeat-containing isoform of PHIP as a novel regulator of β-cell growth and survival. Mol. Cell. Biol. 27, 6484–6496 (2007).

Huether, R. et al. The landscape of somatic mutations in epigenetic regulators across 1,000 paediatric cancer genomes. Nat. Commun. 5, 3630 (2014). This large-scale study provided important insights into the mutational landscape of BRD-containing proteins across 1,000 paediatric cancer genomes.

Forbes, S. A. et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).

Cleary, S. P. et al. Identification of driver genes in hepatocellular carcinoma by exome sequencing. Hepatology 58, 1693–1702 (2013).

Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519–525 (2012).

Liu, J. et al. Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events. Genome Res. 22, 2315–2327 (2012).

Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

Liu, L. et al. Identification of hallmarks of lung adenocarcinoma prognosis using whole genome sequencing. Oncotarget 6, 38016–38028 (2015).

Peifer, M. et al. Integrative genome analyses identify key somatic driver mutations of small-cell lung cancer. Nat. Genet. 44, 1104–1110 (2012).

Grunwald, C. et al. Expression of multiple epigenetically regulated cancer/germline genes in nonsmall cell lung cancer. Int. J. Cancer 118, 2522–2528 (2006).

Zheng, C. X. et al. Whole-exome sequencing to identify novel somatic mutations in squamous cell lung cancers. Int. J. Oncol. 43, 755–764 (2013).

Ho, A. S. et al. The mutational landscape of adenoid cystic carcinoma. Nat. Genet. 45, 791–798 (2013).

Lourdusamy, A., Rahman, R. & Grundy, R. G. Expression alterations define unique molecular characteristics of spinal ependymomas. Oncotarget 6, 19780–19791 (2015).

Odejide, O. et al. A targeted mutational landscape of angioimmunoblastic T-cell lymphoma. Blood 123, 1293–1296 (2014).

Petrini, I. et al. A specific missense mutation in GTF2I occurs at high frequency in thymic epithelial tumors. Nat. Genet. 46, 844–849 (2014).

Ojesina, A. I. et al. Landscape of genomic alterations in cervical carcinomas. Nature 506, 371–375 (2014).

Shang, P., Meng, F., Liu, Y. & Chen, X. Overexpression of ANCCA/ATAD2 in endometrial carcinoma and its correlation with tumor progression and poor prognosis. Tumor Biol. 36, 4479–4485 (2015).

Okosun, J. et al. Integrated genomic analysis identifies recurrent mutations and evolution patterns driving the initiation and progression of follicular lymphoma. Nat. Genet. 46, 176–181 (2014).

Song, Y. et al. Identification of genomic alterations in oesophageal squamous cell cancer. Nature 509, 91–95 (2014).

Gao, Y. B. et al. Genetic landscape of esophageal squamous cell carcinoma. Nat. Genet. 46, 1097–1102 (2014).

Zhang, L. et al. Genomic analyses reveal mutational signatures and frequently altered genes in esophageal squamous cell carcinoma. Am. J. Hum. Genet. 96, 597–611 (2015).

Oh, H. R., An, C. H., Yoo, N. J. & Lee, S. H. Somatic mutations of amino acid metabolism-related genes in gastric and colorectal cancers and their regional heterogeneity — a short report. Cell. Oncol. (Dordr.) 37, 455–461 (2014).

Zhang, L. H. et al. TRIM24 promotes glioma progression and enhances chemoresistance through activation of the PI3K/Akt signaling pathway. Oncogene 34, 600–610 (2015).

Chen, Y. et al. TRIM66 overexpresssion contributes to osteosarcoma carcinogenesis and indicates poor survival outcome. Oncotarget 6, 23708–23719 (2015).

Kuroyanagi, J. et al. Zinc finger MYND-type containing 8 promotes tumour angiogenesis via induction of vascular endothelial growth factor-A expression. FEBS Lett. 588, 3409–3416 (2014).

Alekseyenko, A. A. et al. The oncogenic BRD4−NUT chromatin regulator drives aberrant transcription within large topological domains. Genes Dev. 29, 1507–1523 (2015).

Andreasen, S., French, C. A., Josiassen, M., Hahn, C. H. & Kiss, K. NUT carcinoma of the sublingual gland. Head Neck Pathol. 10, 362–366 (2015).

Reynoird, N. et al. Oncogenesis by sequestration of CBP/p300 in transcriptionally inactive hyperacetylated chromatin domains. EMBO J. 29, 2943–2952 (2010).

Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184–190 (2010).

Gocho, Y. et al. A novel recurrent EP300ZNF384 gene fusion in B-cell precursor acute lymphoblastic leukemia. Leukemia 29, 2445–2448 (2015).

Marschalek, R. Systematic classification of mixed-lineage leukemia fusion partners predicts additional cancer pathways. Ann. Lab. Med. 36, 85–100 (2016).

Panagopoulos, I. et al. Fusion of ZMYND8 and RELA genes in acute erythroid leukemia. PLoS ONE 8, e63663 (2013).

Edgren, H. et al. Identification of fusion genes in breast cancer by paired-end RNA-sequencing. Genome Biol. 12, R6 (2011).

Kalyana-Sundaram, S. et al. Gene fusions associated with recurrent amplicons represent a class of passenger aberrations in breast cancer. Neoplasia 14, 702–708 (2012).

de Rooij, J. D. et al. Recurrent translocation t(1017)(p15q21) in minimally differentiated acute myeloid leukemia results in ZMYND11/MBTD1 fusion. Genes Chromosomes Cancer 55, 237–241 (2016).

Frattini, V. et al. The integrated landscape of driver genomic alterations in glioblastoma. Nat. Genet. 45, 1141–1149 (2013).

Robinson, D. R. et al. Functionally recurrent rearrangements of the MAST kinase and Notch gene families in breast cancer. Nat. Med. 17, 1646–1651 (2011).

Asmann, Y. W. et al. Detection of redundant fusion transcripts as biomarkers or disease-specific therapeutic targets in breast cancer. Cancer Res. 72, 1921–1928 (2012).

Wu, W. J. et al. Prognostic relevance of BRD7 expression in colorectal carcinoma. Eur. J. Clin. Invest. 43, 131–140 (2013).

Park, Y. A. et al. Tumor suppressive effects of bromodomain-containing protein 7 (BRD7) in epithelial ovarian carcinoma. Clin. Cancer Res. 20, 565–575 (2014).

Herquel, B. et al. Transcription cofactors TRIM24, TRIM28, and TRIM33 associate to form regulatory complexes that suppress murine hepatocellular carcinoma. Proc. Natl Acad. Sci. USA 108, 8212–8217 (2011).

Xue, J. et al. Tumour suppressor TRIM33 targets nuclear β-catenin degradation. Nat. Commun. 6, 6156 (2015).

Varela, I. et al. Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 469, 539–542 (2011). This large-scale study demonstrated the power of the whole-exome sequencing technique to identify mutations in BRD-containing proteins.

Jiao, Y. et al. Exome sequencing identifies frequent inactivating mutations in BAP1, ARID1A and PBRM1 in intrahepatic cholangiocarcinomas. Nat. Genet. 45, 1470–1473 (2013).

Wilson, B. G. & Roberts, C. W. SWI/SNF nucleosome remodellers and cancer. Nat. Rev. Cancer 11, 481–492 (2011).

Egelhofer, T. A. et al. An assessment of histone-modification antibody quality. Nat. Struct. Mol. Biol. 18, 91–93 (2011).

Rothbart, S. B. et al. An interactive database for the assessment of histone antibody specificity. Mol. Cell 59, 502–511 (2015).

Zhang, G. et al. Down-regulation of NF-κB transcriptional activity in HIV-associated kidney disease by BRD4 inhibition. J. Biol. Chem. 287, 28840–28851 (2012).

Li, A. G. et al. An acetylation switch in p53 mediates holo-TFIID recruitment. Mol. Cell 28, 408–421 (2007).

Ciceri, P. et al. Dual kinase-bromodomain inhibitors for rationally designed polypharmacology. Nat. Chem. Biol. 10, 305–312 (2014).

Martin, M. P., Olesen, S. H., Georg, G. I. & Schonbrunn, E. Cyclin-dependent kinase inhibitor dinaciclib interacts with the acetyl-lysine recognition site of bromodomains. ACS Chem. Biol. 8, 2360–2365 (2013).

Dittmann, A. et al. The commonly used PI3-kinase probe LY294002 is an inhibitor of BET bromodomains. ACS Chem. Biol. 9, 495–502 (2014).

Ember, S. W. et al. Acetyl-lysine binding site of bromodomain-containing protein 4 (BRD4) interacts with diverse kinase inhibitors. ACS Chem. Biol. 9, 1160–1171 (2014).

Allen, B. K. et al. Large-scale computational screening identifies first in class multitarget inhibitor of EGFR kinase and BRD4. Sci. Rep. 5, 16924 (2015).

Kurimchak, A. M. et al. Resistance to BET bromodomain inhibitors is mediated by kinome reprogramming in ovarian cancer. Cell Rep. 16, 1273–1286 (2016).

Dawson, M. A. et al. Inhibition of BET recruitment to chromatin as an effective treatment for MLL-fusion leukaemia. Nature 478, 529–533 (2011).

Stathis, A. et al. Clinical response of carcinomas harboring the BRD4−NUT oncoprotein to the targeted bromodomain inhibitor OTX015/MK-8628. Cancer Discov. 6, 492–500 (2016).

Bolden, J. E. et al. Inducible in vivo silencing of Brd4 identifies potential toxicities of sustained BET protein inhibition. Cell Rep. 8, 1919–1929 (2014).

Di Micco, R. et al. Control of embryonic stem cell identity by BRD4-dependent transcriptional elongation of super-enhancer-associated pluripotency genes. Cell Rep. 9, 234–247 (2014). References 182 and 183 raise concerns about the potential toxicity of inhibiting BET proteins.

Korb, E., Herre, M., Zucker-Scharff, I., Darnell, R. B. & Allis, C. D. BET protein Brd4 activates transcription in neurons and BET inhibitor Jq1 blocks memory in mice. Nat. Neurosci. 18, 1464–1473 (2015).

Fong, C. Y. et al. BET inhibitor resistance emerges from leukaemia stem cells. Nature 525, 538–542 (2015).

Rathert, P. et al. Transcriptional plasticity promotes primary and acquired resistance to BET inhibition. Nature 525, 543–547 (2015).

Shu, S. et al. Response and resistance to BET bromodomain inhibitors in triple-negative breast cancer. Nature 529, 413–417 (2016).

Togel, L. et al. Dual targeting of bromodomain and extra-terminal domain proteins, and WNT or MAPK signaling, inhibits c-MYC expression and proliferation of colorectal cancer cells. Mol. Cancer Ther. 15, 1217–1226 (2016).

Using protein-protein interactions to annotate protein function

In addition to identifying disease-associated mutations, PrePPI can also be used to gain a better understanding of the cellular processes in which a gene or protein is involved. Similarly to how the Honig Lab took the BLAST concept and applied it to protein structure, they have also begun using PrePPI within a gene set enrichment analysis (GSEA) framework.

Using PrePPI for gene set enrichment analysis. To infer the function of a particular protein, Q, the Honig Lab places all proteins in the human proteome, li, in a list and sorts them according to the interaction likelihood ratio between li and Q. They then search for gene sets associated with a given Gene Ontology annotation that is enriched among the high-scoring interactors of Q. In the example, Gene Set 1 would be enriched whereas Gene Sets 2 and 3 would not be, since the proteins in those sets are either evenly distributed throughout the ranked list or clustered with proteins that are unlikely to interact with Q. The paper reports on top ranked gene sets found for BRCA1 and PEX2. (Image courtesy of eLife.)

For each protein, they queried PrePPI to construct a list of the proteins whose scores make them most likely to interact with it, and then, using GSEA, looked for the GO terms associated with each. They found that the top-ranked gene sets that PrePPI predicted accurately reflected their function, as documented in a resource called the Molecular Signatures Database (mSigDB). Moreover, through the automatic computational method made possible by PrePPI, they predicted the functions of approximately 2,000 additional proteins whose functions were previously unknown.

Honig cautions that the interactions and functions predicted using PrePPI should not necessarily be assumed as fact. Nevertheless, his lab’s tests so far indicate that they are largely reliable. “PrePPI is based on statistical analysis and not experiment, which is really the gold standard,” he explains. “What we’re ultimately trying to do with these methods is to generate hypotheses that can be cross-referenced with other computational and experimental methods. We’re excited because the number of interactions that PrePPI finds is unprecedented in scope, and so our hope is that it will help systems biologists and other biomedical researchers who do not typically look at structure to be able to incorporate information about this essential layer of activity into their investigations."

Related publications

Garzón JI, Deng L, Murray D, Shapira S, Petrey D, Honig B. A computational interactome and functional annotation for the human proteome. Elife. 2016 Oct 225. pii: e18715.

Westphalen CB, Takemoto Y, Tanaka T, Macchini M, Jiang Z, Renz BW, Chen X, Ormanns S, Nagar K, Tailor Y, May R, Cho Y, Asfaha S, Worthley DL, Hayakawa Y, Urbanska AM, Quante M, Reichert M, Broyde J, Subramaniam PS, Remotti H, Su GH, Rustgi AK, Friedman RA, Honig B, Califano A, Houchen CW, Olive KP, Wang TC. Dclk1 defines quiescent pancreatic progenitors that promote injury-induced regeneration and tumorigenesis. Cell Stem Cell. 2016 Apr 718(4):441-55.

Chen TS, Petrey D, Garzon JI, Honig B. Predicting peptide-mediated interactions on a genome-wide scale. PLoS Comput Biol. 2015 May 411(5):e1004248.

Zhang QC, Petrey D, Deng L, Qiang L, Shi Y, Thu CA, Bisikirska B, Lefebvre C, Accili D, Hunter T, Maniatis T, Califano A, Honig B. Structure-based prediction of protein-protein interactions on a genome-wide scale. Nature. 2012 Oct 25490(7421):556-60.

Watch the video: DNA μέρος 2ο: Ας φτιάξουμε πρωτεΐνες! (February 2023).