How can I find the mRNA sequence for a specific prokaryotic gene?

How can I find the mRNA sequence for a specific prokaryotic gene?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

What I want to find out is the start of the transcription for a specific gene, how long the UTR is before the actual coding sequence starts.

I've looked at various databases like NCBI Gene, Refseq or a specialized one like PRODORIC. But none of them actually contain the mRNA sequence or any information about the untranslated parts of the mRNA. Just the coding sequence itself.

Is that kind of information simply not annotated for many bacteria? I'm not looking at model organisms like E. Coli or B. subtilis, but I would have expected some information to be available.

Is there a database where I can find the actual transcript sequences for prokaryotic genes?

MRNA specific queries

Add this to your NCBI query:AND mrna[filter]

If it's not available in major databases it may not be known.

However mRNA data can be obscure to find even in those major databases but it should be there. I would be surprised if that is the case, however since you're not working with model organisms there may well be nothing available.

When looking for a transcript gene, NCBI have some tips that you can see here for searching their database. #8 might be relevant to you as it's a way of filtering exclusively mRNA records.

If there is no UniGene cluster for this gene and organism, perform a search in the Nucleotide database with the gene name, product name, or symbol. Include the organism in the search to find the most relevant results and filter for transcript sequences, for example:Cytochrome c AND bullfrog[orgn] AND mrna[filter].

Predicting mRNA

Basic: ExPasy DNA translation tool

Advanced: regrna 2

Assuming that advanced querying doesn't reveal what you need you might be able to make some useful predictions before taking it to the lab. There are a handful of other predictive tools that can maybe help you out if that's the only option available. I can't get my head around exactly what information you have as an input. Assuming you have the DNA sequence, I would start with the obvious choice: the ExPasy DNA translation tool.

I would then say that regrna 2 might be able to steer you in the right direction regarding more subtle translational elements.

Why can prokaryotic transcription and translation occur simultaneously?

This is because there is no nucleus in prokaryotes that separates the transcription and translation process. Therefore, when bacterial genes are transcribed then transcripts begin to translate immediately. Prokaryotic transcription occurs in the cytoplasm alongside translation and both processes occur simultaneously.

Secondly, can eukaryotic cells undergo transcription and translation simultaneously? In a prokaryotic cell, transcription and translation are coupled that is, translation begins while the mRNA is still being synthesized. In a eukaryotic cell, transcription occurs in the nucleus, and translation occurs in the cytoplasm.

Likewise, people ask, why can transcription and translation be simultaneously in prokaryotes but not in eukaryotes?

Prokaryotic transcription occurs in the cytoplasm alongside translation. Prokaryotic transcription and translation can occur simultaneously. This is impossible in eukaryotes, where transcription occurs in a membrane-bound nucleus while translation occurs outside the nucleus in the cytoplasm.

Can a cell transcribe all genes simultaneously?

Is it likely that a cell would transcribe all the genes within its nucleus simultaneously? Why or why not? No, cells only transcribe genes for specific proteins at a time. No, certain genes will be switched on/off in specific cells, depending on location and function.

A novel mRNA modification may impact gene expression

Researchers at CCR identified a novel modification in human messenger RNA (mRNA). NAT10, an enzyme that was found to be responsible for the modification, has previously been implicated in cancer and aging. This is one of the first examples of a unique chemical modification to mRNA (a key factor in deciphering the genetic code) that causes an increase in protein production. The study, by Shalini Oberdoerffer, Ph.D., Investigator in the Laboratory of Receptor Biology and Gene Expression, and colleagues, appeared November 15, 2018, in Cell.

Deciphering the genetic code is a multi-step process that begins with transcribing information contained within DNA to a messenger RNA the resulting mRNAs are then translated into proteins that comprise key components of the cell. It is known that RNA can be modified following the transcription process as a means to regulate function. This study provides a first example of a chemical modification to mRNA that enhances protein production. The investigators suggest the modification alters the rate by which the genetic code is read within each strand of mRNA.

The researchers focused on a specific chemical modification to mRNA, N4-acetylcytidine (ac4C). They first mapped the presence of ac4C to thousands of human mRNAs. In the lab the scientists next determined that the presence of ac4C within mRNAs powerfully enhanced their ability to be translated into protein. They further demonstrated that ac4C impacted cell proliferation, a hallmark of cancer cells.

“We hope to one day harness this discovery to specifically direct the modification to key mRNAs, thereby creating novel therapeutics,” said Dr. Oberdoerffer.

The researchers’ next steps are to map out how the modification specifically functions to alter gene expression as well as to determine if misregulation is a root cause of certain diseases.

The title and text of this article has been edited to better reflect that the NAT10 enzyme affects gene expression by altering the mRNA so that it leads to greater protein production. This chemical modification to mRNA does not alter DNA.

What is Eukaryotic Gene Expression

Eukaryotic gene expression is the process of the production of gene products based on the information in the eukaryotic genes. It also occurs through transcription and translation. Here, since eukaryotic DNA occurs inside the nucleus, the transcription also occurs inside the nucleus. Three RNA polymerases are responsible for the transcription of different types of RNAs: RNA polymerase 1, which synthesizes rRNA, RNA polymerase 2, which synthesizes mRNA, and RNA polymerase 3, which synthesizes tRNA. Moreover, each eukaryotic gene is under the control of an individual promoter. Hence, transcription produces a monocistronic mRNA.

Figure 2: Prokaryotic and Eukaryotic Gene expression

On the other hand, the primary transcript of mRNA undergoes post-transcriptional modifications including the addition of a 5’ cap and a 3’ poly A tail. In addition, the introns that interrupt the protein coding region of the eukaryotic mRNA are spliced out in a process called RNA splicing. The ultimate mRNA molecule is the mature mRNA which leaves the nucleus to the cytoplasm and it is ready for the translation. 80S Ribosomes are responsible for the translation of the eukaryotic mRNA.

16.1 The Central Dogma: DNA Encodes mRNA and mRNA Encodes Protein

The flow of genetic information in cells from DNA to mRNA to protein is described by the Central Dogma of molecular biology (Figure 16.2). When a cell needs a particular protein, the gene that codes for that protein is activated and a single-stranded mRNA copy is made of the gene, in a process called transcription. The code copied into the mRNA is then used to determine the order of amino acids in the protein, in a process called translation. The copying of DNA to RNA is relatively straightforward, with one nucleotide being added to the mRNA strand for every nucleotide read in the DNA strand. The translation to protein is a bit more complex because three mRNA nucleotides correspond to one amino acid in the polypeptide sequence (Figure 16.2).

Figure 16.2 The central dogma of molecular biology. Segments of DNA, called genes, are transcribed into mRNA copies. mRNA is then “read” in three-nucleotide codons to specify the order of amino acids in a protein.

The Genetic Code Is Universal and Redundant

How does the order of nucleotides in an mRNA specify the order of amino acids in a protein? mRNA is “read” in three nucleotide segments called codons. Since RNA has four nucleotides (A, C, U, and G), there are 64 (4 3 ) possible combinations of three nucleotides (Figure 16.3). 61 of these codons code for one of the 20 common amino acids. The other three are called stop codons or nonsense codons because they do not code for an amino acid.

Figure 16.3 The genetic code allows cells to translate each nucleotide triplet in mRNA into an amino acid or a termination signal in a protein. (credit: modification of work by NIH)

Scientists painstakingly solved the genetic code by translating synthetic mRNAs in vitro and sequencing the proteins they specified (Figure 16.4). Once all of the codons were known, they discovered some important features of the code.

The genetic code is universal. With a few exceptions, virtually all species use the same genetic code for protein synthesis. Conservation of codons means that a purified mRNA encoding the globin protein in horses could be transferred to a tulip cell, and the tulip would synthesize horse globin. That there is only one genetic code is powerful evidence that all of life on Earth shares a common origin

Since there are more nucleotide triplets than there are amino acids, the genetic code is redundant. In other words, a given amino acid can be encoded by more than one nucleotide triplet. Redundancy reduces the negative impact of random mutations. Codons that specify the same amino acid typically only differ by one nucleotide, usually the third one. For example, ACU, ACC, ACA and ACG all code for the amino acid threonine. In addition, amino acids with chemically similar side chains are encoded by similar codons. For example, UGU and UGC code for the amino acid cysteine, while AGU and AGC code for the amino acid serine. Cysteine and serine both have polar side chains that are very similar in size and other properties. Thus, the redundancy of the genetic code ensures that a single- nucleotide substitution mutation might specify either the same amino acid or a similar amino acid, preventing the protein from being rendered completely nonfunctional.

While 61 of the 64 codons specify the addition of a specific amino acid to a polypeptide chain, the remaining three codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called nonsense codons, or stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5′ end of the mRNA.

Elucidating the Genetic Code

Given the different numbers of “letters” in the mRNA and protein “alphabets,” scientists theorized that combinations of nucleotides corresponded to single amino acids. Nucleotide doublets would not be sufficient to specify every amino acid because there are only 16 possible two-nucleotide combinations (4 2 ). In contrast, there are 64 possible nucleotide triplets (4 3 ). The fact that amino acids were encoded by nucleotide triplets was confirmed experimentally by Francis Crick and Sydney Brenner. They inserted one, two, or three nucleotides into the gene of a virus. When one or two nucleotides were inserted, the protein was not made. When three nucleotides were inserted, the protein was synthesized and functional. This demonstrated that three nucleotides specify each amino acid.The nucleotide triplets that code for amino acids are called codons. The insertion of one or two nucleotides completely changed the triplet reading frame, thereby altering the message for every subsequent amino acid (Figure 16.4). Though insertion of three nucleotides caused an extra amino acid to be inserted during translation, the integrity of the rest of the protein was maintained.

/>Figure 16.4 The deletion of two nucleotides shifts the reading frame of an mRNA and changes the entire protein message, creating a nonfunctional protein or terminating protein synthesis altogether.

Exceptions to the Central Dogma

Many genes code for RNA molecules that do not function as mRNAs and are therefore not translated into proteins. Some RNAs, called rRNA, form parts of the ribosomes. Others form transfer RNAs, or tRNA, which help with translation. Still others can regulate which genes are expressed.

Another exception to the central dogma is in some cases, information flows backwards as is seen in certain viruses called retroviruses. These viruses have genes made up of RNA and when retroviruses infect a cell, the virus has to synthesize a DNA version of the RNA genes using a specialized viral polymerase called reverse transcriptase. The human immunodeficiency virus (HIV), which causes AIDS, is a retrovirus and many of the prescribed drugs used for AIDS patients target the HIV reverse transcriptase.


Gene sampling

Extensive BLAST 39 searches spanning the diversity of eukaryotic and prokaryotic life were performed to identify putative eukaryotic and prokaryotic PPO homologues. Additional searches were also performed excluding land plants (embryophytes, taxid: 3193). The searches for amino acid sequences were conducted against the GenBank non-redundant protein sequences database using the BLASTp tool of the NCBI (National Center for Biotechnology Information) BLAST server. The PPO-TYR domain was also used to search the genomes of unicellular eukaryotes deposited in the Joint Genome Institute ( and Phytozome ( To check the robustness of the gene sampling additional searches using PSI-BLAST 39 (5 iterations), tBLASTn and HMMER 40 were performed.

Phylogenetic analysis

Tyrosinase domains of PPO proteins and other TYR-containing proteins were aligned with ClustalW 41 and edited with Jalview 2.8 42 to minimize gaps. Information about sequences used is described in detail in Supplementary Table S1. The tree in Fig. 1b used the LG protein substitution model 43 , but JTT 44 and Blosum62 45 yielded essentially the same results. Bootstrapped maximum likelihood phylogenetic trees were constructed with Phyml 3.0 46,47 . Gamma correction was used to account for heterogeneity in evolutionary rates with four discrete classes of sites, an estimated alpha parameter and an estimated proportion of invariable sites as implemented in Phyml. A total of 100 bootstrap replicates were used to assess robustness and tree topology was optimized by subtree pruning and regrafting (SPR). The tree was visualized and edited with TreeDyn 48 . The Fig. 1d scheme summarizing and simplifying our current understanding of the supergroup Plantae evolution was inspired by those recently published 3,49,50 .

Protein domain architecture comparison

Genes coding for proteins with tyrosinase domains were compared by examining their corresponding protein domain architectures using CDART (Conserved Domain Architecture Retrieval Tool: 51 and the CDD (Conserved Domains Database: 13 of the NCBI.

Structural modelling

The structural modelling was performed using the intensive modelling mode of the Protein Homology/analogy Recognition Engine (Phyre) Version 2.0 ( html/page.cgi?id=index) 52 .

Plasmids, plant material and growth conditions

Full-length cDNAs encoding Solanum tuberosum PPO (GenBank accession number: U22921) or a PPO version lacking localization signal were amplified from cDNA and cloned into plasmid pDONR207 using Gateway technology (Invitrogen, California, USA). The in silico detection of the subcellular localization signal was identified using ChloroP 21 ( Sequences were then subcloned into plasmids pGWB405 53 , pET28 54 (modified for Gateway-compatible cloning) and pB7FWG2 55 . Constructs were confirmed by restriction mapping and DNA sequence analysis. Information about primers used is described in detail in Supplementary Table S2. Constructs in pGWB405 were used for transient expression in Nicotiana benthamiana leaves 56 . Constructs in pET28 were used for protein expression in Escherichia coli. Constructs in pB7FWG2 were used for stably Agrobacterium tumefaciens-mediated transformation of Arabidopsis thaliana plants (ecotype Col-0) 57 . Homozygous Arabidopsis lines containing a single T-DNA insertion were selected based on the segregation of the phosphinothricin (PPT) resistance marker. Initial segregation analysis was performed by seeding plants on sterile Murashige and Skoog (MS) medium plates supplemented with 20 μg/mL PPT (Duchefa, Haarlem, Netherlands). A total of 25 PPOM and 29 PPOA homozygous lines were isolated from which three independent lines of each PPO version were selected for further characterization. Arabidopsis plants were grown on soil in a climate controlled growth chamber (22 °C, 65–70% RH and 60 μmol m −2 s −1 photosynthetically active radiation) under 8 h of light/16 h of dark photoperiod for 4 weeks. For phenylpropanoid profiling, Arabidopsis seeds were surface-sterilized and sown on Petri plates with sterile MS medium containing 1% agar and 30 g/L sucrose. When indicated, plates were supplemented with 5 μM norflurazon. After stratification for 3 days at 4 °C in the dark, the plates were incubated in growth chambers for 20 days at 22 °C under continuous light (130 μmol m −2 s −1 photosynthetically active radiation).

Expression of recombinant PPOA and PPOM in E. coli cells

Following transformation of competent E. coli cells strain Rosetta 2 (DE3) (Novagen, Merck KGaA, Darmstadt, Germany) with purified plasmids pET28-PPOA and pET28-PPOM, the activity of PPOA and PPOM was confirmed by growing the transformed E. coli cells for three days at 28 °C on LB agar plates supplemented with 34 μg/mL chloramphenicol, 25 μg/mL kanamycin, 0.5 mM IPTG, 40 μg/ml CuSO4 and 600 μg/ml chlorogenic acid, similarly to methods previously described 58 to detect melanin formation. PPOA and PPOM colonies were dark because the cells were able to produce melanin-like polymers.

Confocal laser-scanning microscopy

Subcellular localization of GFP fluorescence and chlorophyll fluorescence was determined with an Olympus FV 1000 confocal laser-scanning microscope (Olympus, Tokyo, Japan) using an argon laser for excitation (at 488 nm) and a 500–510 nm filter for detection of GFP fluorescence and a 610–700 nm filter for detection of chlorophyll fluorescence.

Measurement of rosette leaves surface area

The surface area of rosette leaves was measured using Photoshop (Adobe, California, USA) and GraphPad Prism 5.0a (GraphPad Software, California, USA) by analysing digital images (acquired from above) of 4-week-old plants and comparing them with size standard series ranging from 4 to 16 cm 2 .

Transcripts analysis

Leaf samples from 4-week-old plants were harvested and total RNA was extracted using the Maxwell 16 LEV simplyRNA Tissue Kit (Promega, Wisconsin, USA) according to the manufacturer’s instructions. Purified RNAs were quantified by spectroscopy using a NanoDrop apparatus (Thermo Scientific, Massachusetts, USA) and RNA integrity was evaluated by agarose gel electrophoresis. The First Strand cDNA Synthesis Kit (Roche, Basel, Switzerland) was used to generate cDNA according to the manufacturer’s instructions, using 50 pmol of an anchored poly(dT) primer [d(T18V)] and 2 μg of total RNA. The relative mRNA abundance was evaluated via quantitative reverse transcription PCR (RT-qPCR) in a total reaction volume of 20 μl using LightCycler 480 SYBR Green I Master (Roche, Basel, Switzerland) on a LightCycler 480 Real-Time PCR System (Roche, Basel, Switzerland) with 0.3 μM of each specific sense and anti-sense primers. Three independent biological replicates of each sample and three technical replicates of each biological replicate were performed and the mean values were considered for further calculations. The normalized expression of PPO was calculated as described 59 using UBC (Arabidopsis ubiquitin-conjugating enzyme gene At5g25760, GenBank accession number: DQ027035) as the endogenous reference gene. Information about primers used is described in detail in Supplementary Table S2.

PPO activity

Protein extracts were obtained from rosette leaves homogenized at a ratio of 50 mg of fresh weight to 1000 μl of 20 mM phosphate buffer (pH 6). The homogenate was centrifuged at 16.000 × g and 4 °C for 5 min and the supernatant was used for protein assessment of PPO activity. PPO activity assays were performed in 20 mM phosphate buffer (pH 6) containing 10 mM L-DOPA. The reaction was measured with a SpectraMax M3 spectrophotometer (Molecular Devices, California, USA) by the change in absorbance at 475 nm and 25 °C.

Phenylpropanoid profiling

Phenylpropanoids (specifically, flavonoids) were purified and analysed as described previously 60 with minor modifications. In brief, lyophilized Arabidopsis seedling were homogenized at a ratio of 1 mg of dry weight to 33 μL of methanol:acetate:H2O (9:1:10) extraction solvent containing 0.1 mg/mL of naringenin as an internal standard for quantification. Homogenates were incubated in the dark at 4 °C for 30 min with agitation (1000 rpm) and then centrifuged at 14,000 × g and 4 °C for 5 min. Supernatants were recovered, filtered and analysed by high-performance liquid chromatography (HPLC) using an Agilent 1200 series HPLC equipment (Agilent Technologies, California, USA) with a XSelect CSH C18 (3.5 μm, 4.6 × 100 mm) column (Waters, Massachusetts, USA). Flavonols and anthocyanins were determined at 320 and 520 nm, respectively.

Data analysis

ANOVA followed by Newman-Keuls multiple comparison post-hoc tests were used to determine statistical significance for multiple groups. Pearson correlation coefficients (r values) were calculated using the means of the surface area of rosette leaves, PPO mRNA abundance and PPO activity. Statistical analysis was performed using GraphPad Prism 5.0a (GraphPad Software, California, USA).


Bacterial strains and culture conditions

Escherichia coli MG1655 (ATCC: 700926) overnight cultures were inoculated into fresh LB medium at 1:50 and grown at 37 °C with shaking (150 rpm). Upon reaching the exponential growth phase, the culture was centrifuged at 3000 g for 10 min. The media was removed and the pellet was resuspended in PBS to a concentration of 10 7 cells per μL. The cells were stored on ice and total RNA extraction was performed immediately.

RNA extraction

Trizol (Thermo Fisher Scientific, Cat. # 15596018) RNA extraction was performed following the manufacturer’s protocol. Briefly, 10 8 cells were added to 750 μL Trizol, mixed, and then combined with 150 μL chloroform. After centrifugation, the clear aqueous layer was recovered and precipitated with 375 μL of isopropanol and 0.67 μL of GlycoBlue (Thermo Fisher Scientific, Cat. # AM9515). The pellet was washed twice with 75% ethanol and after the final centrifugation, the resulting pellet was resuspended in RNase-free water.


Poly adenylation

100 ng of total RNA in 2 μL was combined with 3 μL poly-A mix, comprised of 1 μL 5x first strand buffer [250 mM Tris-HCl (pH 8.3), 375 mM KCl, 15 mM MgCl2, comes with Superscript II reverse transcriptase, Invitrogen Cat. # 18064–014], 1 μL blocking primer mix (see Primers), 0.8 μL nuclease-free water, 0.1 μL 10 mM ATP, and 0.1 μL E. coli poly-A polymerase (New England Biolabs, Cat. # M0276S). The mixture was incubated at 37 °C for 10 min. In the control group, no blocking primers were added and 1.8 μL of nuclease-free water was added instead. For EMBR-seq with either unmodified or phosphorylated 3′-end blocking primers, the blocking primer mix was prepared by mixing equal volumes of 50 μM blocking primers specific to 5S, 16S and 23S rRNA. For EMBR-seq with hotspot blocking primers, the blocking primer mix was prepared by mixing equal volumes of 100 μM 3′-end blocking primers with 100 μM hotspot blocking primers, such that the final mixture was 50 μM 3′-end primers (3 primers mixed) and 50 μM hotspot primers (6 primers mixed).

Reverse transcription

The polyadenylation product was mixed with 0.5 μL 10 mM dNTPs (New England Biolabs, Cat. # N0447L), 1 μL reverse transcription primers (25 ng/μL, see Primers), and 1.3 μL blocking primer mix, and heated to 65 °C for 5 min, 58 °C for 1 min, and then quenched on ice. In the control samples, the blocking primers were again replaced with nuclease-free water. Next, 3.2 μL RT mix, consisting of 1.2 μL 5x first strand buffer, 1 μL 0.1 M DTT, 0.5 μL RNaseOUT (Thermo Fisher Scientific, Cat. #10777019), and 0.5 μL Superscript II reverse transcriptase was added to the solution, followed by 1 h incubation at 42 °C. The temperature was then raised to 70 °C for 10 min to heat inactivate Superscript II.

Second strand synthesis

49 μL of the second strand mix, containing 33.5 μL water, 12 μL 5x second strand buffer [100 mM Tris-HCl (pH 6.9), 23 mM MgCl2, 450 mM KCl, 0.75 mM β-NAD, 50 mM (NH4)2 SO4, Invitrogen, Cat. # 10812–014], 1.2 μL 10 mM dNTPs, 0.4 μL E. coli ligase (Invitrogen, Cat. # 18052–019), 1.5 μL DNA polymerase I (Invitrogen, Cat. # 18010–025), and 0.4 μL RNase H (Invitrogen, Cat. # 18021–071), was added to the product from the previous step. The mixture was incubated at 16 °C for 2 h. cDNA was purified with 1x AMPure XP DNA beads (Beckman Coulter, Cat. # A63881) and eluted in 24 μL nuclease-free water that was subsequently concentrated to 6.4 μL.

In vitro transcription

The concentrated solution was mixed with 9.6 μL of Ambion in vitro transcription mix (1.6 μL of each ribonucleotide, 1.6 μL 10x T7 reaction buffer, 1.6 μL T7 enzyme mix, MEGAscript T7 Transcription Kit, Thermo Fisher Scientific, Cat. # AMB13345) and incubated at 37 °C for 13 h. Next, the aRNA was treated with 6 μL EXO-SAP (ExoSAP-IT™ PCR Product Cleanup Reagent, Thermo Fisher Scientific, Cat. # 78200.200.UL) at 37 °C for 15 min followed by fragmentation with 5.5 μL fragmentation buffer (200 mM Tris-acetate (pH 8.1), 500 mM KOAc, 150 mM MgOAc) at 94 °C for 3 min. The reaction was then quenched with 2.75 μL stop buffer (0.5 M EDTA) on ice. The fragmented aRNA was size selected with 0.8x AMPure RNA beads (RNAClean XP Kit, Beckman Coulter, Cat. # A63987) and eluted in 15 μL nuclease-free water. Thereafter, Illumina libraries were prepared as described previously [20].

EMBR-seq with TEX digestion

To test the Terminator™ 5′-phosphate-dependent exonuclease (Lucigen, Cat. # TER5120), 100 ng of total RNA in 2 μL was combined with 18 μL TEX mix, comprised of 14.5 μL nuclease free water, 2 μL Terminator 10x buffer A, 0.5 μL RNAseOUT, and 1 μL TEX. The solution was incubated at 30 °C for 1 h and quenched with 1 μL of 100 mM EDTA. The product was purified with 1x AMPure RNA beads and eluted in 10 μL nuclease-free water and concentrated to 2 μL. This TEX digested total RNA was then used as starting RNA in the EMBR-seq protocol described above.

EMBR-seq bioinformatic analysis

Paired-end sequencing of the EMBR-seq libraries was performed on an Illumina NextSeq 500. All sequencing data has been deposited to Gene Expression Omnibus under the accession number GSE149666. In the sequencing libraries, the left mate contains information about the sample barcode (see Primers). The right mate is mapped to the bacterial transcriptome. Prior to mapping, only reads containing valid sample barcodes were retained. Subsequently, the reads were mapped to the reference transcriptome (E. coli K12 substr. MG1655 cds ASM584v2) using Burrows-Wheeler Aligner (BWA) with default parameters.

Analysis of detection bias in EMBR-seq

E. coli operons were downloaded from RegulonDB [44]. Operons with at least 2 genes were included for this analysis. The data from EMBR-seq libraries with 100 ng starting material was mapped to E. coli K12 substr. MG1655 reference genome (ASM584v2). For each read that maps within an operon, the distance of the mapped location from the 3′ end of the operon was calculated, accounting for the read length. Next, the operons were discretized into 50 bins, and all operons with more than 200 unique reads were considered for downstream analysis. The number of reads in each bin was then normalized by the total number of reads in each operon, and the average of the relative reads within each bin was calculated. To compare bacterial data from EMBR-seq to mammalian data from CEL-seq, we downloaded CEL-seq data reported in Grün et al. (GEO Accession: GSM1322290) and performed similar analysis for the mouse genes [45].

Sequence conservation of 16S and 23S rRNA

16S rRNA sequences from 4000 species were obtained from rrnDB [46], while 23S rRNA sequences from 119 species were selected from NCBI RefSeq [47]. Next, the last 100 bases from the 3’end of each sequence were aligned using Clustal Omega [48]. Shannon entropy for each aligned base location was then calculated such that the maximal entropy value was 1. Five possibilities were allowed: “A”, “T”, “C”, “G”, and “-”.


Reverse transcription primers are shown below with the 6-nucleotide sample barcodes underlined [20]:

The following five barcodes were used in this study:

In the case of the 3′ phosphorylated primers, all blocking primers have a 3′ phosphorylation modification.

16S primer for hotspot at position 107:

16S primer for hotspot at position 682:

16S primer for hotspot at position 1241:

23S primer for hotspot at position 375:

23S primer for hotspot at position 1421:

23S primer for hotspot at position 1641:

Each primer is designed to anneal approximately 100 bp downstream of the hotspot. The exact position and length of each primer was adjusted to ensure the Tm was above 65 °C.

How to locate promoter sequence for a specific gene

Many people have problem identifying or predicting the promoter sequence of a gene, or don’t know how to get the actual sequence for analysis such as primer design, transcription factor binding site search, etc. Here I provide ways how I do these things. Don’t forget to try the latest casino games at DaisySlots to help alleviate the stress you’re experiencing.

How to find and retrieve promoter sequences from genome databases. Promoter sequences are usually the sequence immediately upstream the transcription start site (TSS) or first exon. If we know the TSS of a gene, we will know with confidence where the promoter is even without experimental characterization. For many organisms, such as as human, mouse, the genome is well annotated and TSS well defined. Thus promoter sequence retrieval is an easy task. There are three major genome browsers: NCBI, Ensembl and UCSC. For our purpose, Ensembl provides the most convenient interface. Here is an example:

Materials and Methods

Host Cell Lines and Reporter Gene. A DNA fragment containing 96 head-to-tail tandem repeats of the 50-nt-long sequence 5′-CAGGAGTTGTGTTTGTGGACGAAGAGCACCAGCCAGCTGATCGACCTCGA-3′ was prepared as described by Robinett et al. (19) and inserted into the plasmid pTRE-d2EGFP (Clontech) by using its multiple cloning sites. The resulting plasmid, pTRE-GFP-96-mer, was used to transfect CHO cell line CHO-AA8-Tet-off (Clontech), which possesses a stably integrated gene for the tetracycline-controlled Tet-off transactivator. A geneticin G418-resistant clone (CHO-GFP-96-mer) that responded to 10 ng/ml doxycycline in the medium by turning off its fluorescence within 24 h was selected. To obtain cells expressing histone H2B-GFP, this cell line was transfected with plasmid pBOS-H2BGFP (BD Biosciences), and a clone that exhibited an intense GFP signal in the nuclei was isolated.

Cells were cultured in the α modification of Eagle's minimal essential medium (Sigma) supplemented with 10% TET-System-Approved FBS (Clontech). Imaging was performed in phenol red-free OptiMEM (Invitrogen). Cells used in the ATP-depletion studies were first incubated in glucose-free Dulbecco's modified Eagle's medium (Invitrogen) containing 10 mM sodium azide and 60 mM 2-deoxyglucose for 30 min and then imaged in OptiMEM containing the same inhibitors. After this treatment, the mitochondria in the cells could not be stained by rhodamine 123 (Sigma), confirming that the inhibitors were effective (14).

Molecular Beacons. The sequences of the molecular beacons were Cy3 or Alexa-594-5′-CUUCGUCCACAAACACAACUCCUGAAG-3′-Black Hole Quencher 2. The backbone of the molecular beacons was composed of 2′-O-methylribonucleotides.

Live Cell Imaging. Cells were maintained at 37°C on the microscope stage by controlled heating of the objective and the culture dish (Delta T4 open system, Bioptechs, Butler, PA). Molecular beacons were dissolved in water at a concentration of 2.5 ng/μl, and an ≈0.1- to 1-fl solution was microinjected into each cell by using a FemtoJet microinjection apparatus (Brinkmann). An Axiovert 200M inverted fluorescence microscope (Zeiss), equipped with a ×100 oil-immersion objective, a CoolSNAP HQ camera (Photometrics, Pleasanton, CA) cooled to -30°C, and openlab acquisition software (Improvision, Sheffield, U.K.) were used to acquire the images.

Synthetic RNA Transcripts and Their Hybrids with Molecular Beacons. We prepared a series of pGEM plasmids (Promega) containing 1, 2, 4, 8, 16, 32, or 64 tandem repeats of the sequence described above. In addition, we excised the gene encoding GFP-mRNA-96-mer from pTRE-GFP-96-mer and inserted it into plasmid pGEM, because that plasmid contains a bacteriophage T7 promoter. To produce RNA transcripts possessing a different number of repeats, these plasmids were linearized and used as templates for in vitro transcription by T7 RNA polymerase. The transcript containing 96 repeats possessed a GFP-mRNA sequence, whereas the other transcripts only possessed the repeat motifs. Hybrids were formed by incubating 20 ng of transcripts with 20 ng of molecular beacons in 10 μl of 10 mM Tris·HCl (pH 8.0) containing 1 mM MgCl2 at 37°C for 60 min and were then injected into the cells.


RNA structural probing studies with the 5′ UTRs of ribD and ypaA RNAs confirm that the highly conserved RFN element is a natural FMN-binding aptamer. This RNA motif exhibits an exceptionally high affinity for its target ligand (apparent KD of <10 nM). As with the two natural metabolite-binding aptamers reported (2, 4), the FMN-binding domain of ribD exhibits a high level of discrimination against closely related compounds. Furthermore, both the thiamine pyrophosphate- and the FMN-dependent aptamers require the presence of phosphate groups on their respective ligands to bind with the highest affinity, which is a somewhat surprising achievement for a polyanionic receptor molecule. These aspects of molecular recognition are of particular importance in a biological setting, where the promiscuous binding of closely related biosynthetic intermediates would interfere with proper regulation of genetic expression.

The role of the natural FMN aptamer in these two instances most likely is to serve as the recognition domain for FMN-dependent riboswitches that control gene expression of the riboflavin biosynthetic operon and the riboflavin transporter in B. subtilis. Preliminary investigations into the mechanisms used by the associated expression platforms in both cases are consistent with those proposed from comparative sequence analysis data (9). Specifically, ribD RNA undergoes an increased frequency of transcription termination after the addition of FMN. This is perhaps the most efficient riboswitch mechanism for controlling the expression of large operons, because termination of transcription in the 5′ UTR prevents the synthesis of long mRNAs when their translation is unnecessary. Indeed, regulation by transcription termination may be common among many bacterial species (26). A search for sequences in the B. subtilis genome that could form terminator–antiterminator elements revealed nearly 200 candidates (27). In contrast, the riboswitch in the ypaA mRNA leader seems to control ribosome access to the SD element. Although the entire sequence of this smaller mRNA would be produced, this genetic control mechanism permits the organism to respond rapidly to declining concentrations of FMN.

Our findings provide additional evidence in support of earlier speculation (3, 8, 28–31) that mRNAs have the ability to play an active role in sensing metabolites for the purpose of genetic control. It seems likely that new riboswitches will be discovered that respond to other metabolites and exhibit more diverse mechanisms of genetic control. The riboswitches examined in this study provide examples of two mechanisms for expression platform function for the down-regulation of gene expression. However, we speculate that there will be instances where gene expression might be increased in response to metabolite binding to mRNAs. For example, certain enzymes that make use of the riboswitch effectors coenzyme B12 (2), thiamine pyrophosphate (4), and FMN might make use of expression platforms that permit gene activation. If the occurrence of these or other riboswitches extends across the phylogenetic landscape, or if they are used to control the expression of more than just biosynthetic and transport genes, then it is likely that a greater diversity of genetic control mechanisms will be discovered.

Watch the video: Simplified RT -- Reverse Transcription Animation (February 2023).