31.2: eQTL Basics - Biology

31.2: eQTL Basics - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.


The use of whole genome eQTL analysis has separated eQTLs into two distinct types of manifestation. However, this is indeed an arbitrary cutoff and can be altered by an order of magnitude, for instance.


The second distinct type of eQTL is a trans-eQTL (Figure 31.4). A trans-eQTL does not map near the physical position of the gene it regulates. Its functions are generally more indirect in their effect on the gene expression (not directly boosting or inhibiting transcription but rather, affecting kinetics, signaling path- ways, etc.). Since such effects are harder to determine explicitly, they are harder to find in eQTL analysis; in addition, such networks can be extremely complex, further limiting trans-eQTL analysis. However, eQTL analysis has led to the discovery of trans hotspots which refer to loci that have widespread transcriptional effects [11].

Perhaps the biggest surprise of eQTL research is that, despite the location of trans hotspots and cis-eQTLs, no major trans loci for specific genes have been found in humans [12]. This is probably attributed the current process of whole genome eQTL analysis itself. As useful and widespread whole genome eQTL analysis is, we find that genome-wide significance occurs at (p=5 imes 10^{-8}) with multiple testing on about 20,000 genes. Thus, studies generally use an inadequate sample size to determine the significance of many trans-eQTL associations, which start with priors of very low probability to begin with as compared to cis-eQTLs [4]. Further, the bias reduction methods described in earlier sections deflate variance, which is integral to capture the microtrait associations inherent in trans loci. Finally, non-normal distributions limit the statistical significance of associations between trans-eQTLs and gene expression[4]. This has been slightly remedied by the use of cross-phenotype meta-analysis (CPMA)[5] which relies on the summary statistics from GWAS rather than individual data. This cross-trait analysis is effective because trans-eQTLs affect many genes and thus have multiple associations originating from a single marker. Sample CPMA code can be found in Tools and Resources.

However, while trans loci have not been found, trans-acting variants have been found. Since it can be inferred trans-eQTLs affect many genes, CPMA and ChIP-Seq can be used to detect such cross-trait variants. Indeed, 24 different significant trans-acting transcription factors were determined from a group of 1311 trans-acting SNP variants by observing allelic effects on populations and target gene interactions/connections.

Biology OER

Scientists use a methodology for systematically investigating natural phenomena. This method uses existing information or observations to acquire knew information or validate previous knowledge. These knowledge types come from empirical (experiential) or measured information. Empirical and measured data (or knowledge) are referred to as observations . While empirical data comes from experiences, science has developed into a mode of inquiry using experimentation . Experimental science uses the pre-existing base of knowledge to ask a a testable question called a hypothesis . As a youngster, we&rsquore incorrectly taught that a hypothesis is an educated guess . Formulating previous observations and measurements into a cohesive line of inquiry requires no guessing. People often have &ldquotheories&rdquo on something, when they actually have hypotheses based on their observations and assumptions.

Experimental Science

Hypothesis testing is the the means by which experimental science is conducted. Experimental science is designed to enhance the understanding of a problem and removing biases from the interpretation. The goal of hypothesis testing is to try every way possible to disqualify the validity of the hypothesis. By doing so, the experimenter removes any biases in the experimental design. If the experimenter is unable to invalidate the hypothesis, the hypothesis becomes more valid and better able to act as a predictor of phenomena.

Experiments utilize controls . In a controlled experiment, there is a positive and negative control. These controls act as references in the experiment. A positive control is an experimental condition where the expected outcome that is tested will be produced. This control is necessary to assess the validity of a test or treatment. There can be multiple instances used as a positive control to examine the sensitivity of the experiment. A negative control is an experimental condition where the expected outcome is known not to occur. This type of control sometimes comes in the form of a sham or mock treatment such as giving someone a sugar pill (a placebo ).

Through the use of experimental science and hypothesis testing, an increased refinement of existing knowledge can aid in designing new hypotheses. Hypothesis testing is re-iterative. That is to say, we use new knowledge to continue to enhance our understanding of the universe.

The scientific method is a reiterative process based on testing and revising knowledge. (CC-BY-NC-SA jeremy Seto)


A scientific theory comes from repeated substantiation of multiple tested hypotheses. That is to say, confirmed hypotheses, observations and experiments permit scientists to formulate a cohesive idea that integrates multiple substantiated pieces of evidence. As with hypotheses, theories are designed to be predictive and falsifiable. In the common language, we often hear the word theory to mean a conjecture, and as already discussed, conjectures based on evidence can be formulated into testable hypotheses.

When a theory is accepted by a predominant population of the specialists, it is referred to as a scientific principle . An example of a scientific principle is the theory of evolution by natural selection. Numerous tested hypotheses have been confirmed that lead to the understanding of natural selection as a method of evolution. This theory allows scientists to understand the underlying relatedness of all living things on the planet. Additionally, it unifies the disparate fields of Biology that can utilize the theory in a predictive manner. It is therefore also referred to as a unifying principle of Biology .

Biology: An Introduction

The science of Biology developed from scientists sharing their knowledge and observations and studies with each other. Their fascination with and investigation of the real world has led to a vast body of knowledge that is continuously growing.

Definition of Biology:

Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, evolution, distribution, and taxonomy.

Now a days it is more vast and widen then ever before. There are hundreds and thousands of related topics which can be include in this branch of knowledge. Here first of all we will simply go through its branches as below:

Biology, the scientific study of life, includes several relevant branches. Below is a list of major branches of biology with a brief description for each.

Agriculture – science and practice of producing crops and livestock from the natural resources of the earth.

Anatomy – study of the animal form, particularly human body

Astrobiology – branch of biology concerned with the effects of outer space on living organisms and the search for extraterrestrial life.

Biochemistry – the study of the structure and function of cellular components, such as proteins, carbohydrates, lipids, nucleic acids, and other biomolecules, and of their functions and transformations during life processes

Bioclimatology – a science concerned with the influence of climates on organisms, for instance the effects of climate on the development and distribution of plants, animals, andhumans

Bioengineering – or biological engineering, is a broad-based engineering discipline that deals with bio-molecular and molecular processes, product design, sustainability and analysis of biological systems.

Biogeography – a science that attempts to describe the changing distributions and geographic patterns of living and fossil species of plants and animals

Bioinformatics – information technology as applied to the life sciences, especially the technology used for the collection, storage, and retrieval of genomic data

Biomathematics – mathematical biology or biomathematics is an interdisciplinary field of academic study which aims at modelling natural, biological processes using mathematical techniques and tools. It has both practical and theoretical applications in biological research.

Biophysics – or biological physics is an interdisciplinary science that applies the theories and methods of physical sciences to questions of biology

Biotechnology – applied science that is concerned with biological systems, living organisms, or derivatives thereof, to make or modify products or processes for specific use

Botany – the scientific study of plants

Cell biology – the study of cells at the microscopic or at the molecular level. It includes studying the cells’ physiological properties, structures, organelles, interactions with theirenvironment, life cycle, division and apoptosis

Chronobiology – a science that studies time-related phenomena in living organisms

Conservation Biology – concerned with the studies and schemes of habitat preservation and species protection for the purpose of alleviating extinction crisis and conservingbiodiversity

Cryobiology – the study of the effects of low temperatures on living organisms

Developmental Biology – the study of the processes by which an organism develops from a zygote to its full structure

Ecology – the scientific study of the relationships between plants, animals, and their environment

Ethnobiology – a study of the past and present human interactions with the environment, for instance the use of diverse flora and fauna by indigenous societies

Evolutionary Biology – a subfield concerned with the origin and descent of species, as well as their change over time, i.e. their evolution

Freshwater Biology – a science concerned with the life and ecosystems of freshwater habitats

Genetics – a science that deals with heredity, especially the mechanisms of hereditary transmission and the variation of inherited characteristics among similar or relatedorganisms

Geobiology – a science that combines geology and biology to study the interactions of organisms with their environment

Immunobiology – a study of the structure and function of the immune system, innate and acquired immunity, the bodily distinction of self from nonself, and laboratory techniques involving the interaction of antigens with specific antibodies

Marine Biology – study of ocean plants and animals and their ecological relationships

Medicine – the science which relates to the prevention, cure, or alleviation of disease

Microbiology – the branch of biology that deals with microorganisms and their effects on other living organisms

Molecular Biology – the branch of biology that deals with the formation, structure, and function of macromolecules essential to life, such as nucleic acids and proteins, and especially with their role in cell replication and the transmission of genetic information

Mycology – the study of fungi

Neurobiology – the branch of biology that deals with the anatomy and physiology and pathology of the nervous system

Paleobiology – the study of the forms of life existing in prehistoric or geologic times, as represented by the fossils of plants, animals, and other organisms

Parasitology – the study of parasites and parasitism

Pathology – the study of the nature of disease and its causes, processes, development, and consequences

Pharmacology – the study of preparation and use of drugs and synthetic medicines

Physiology – the biological study of the functions of living organisms and their parts

Protistology – the study of protists

Psychobiology – the study of mental functioning and behavior in relation to other biological processes

Toxicology – the study of how natural or man-made poisons cause undesirable effects in living organisms

Virology – study of viruses

Zoology – The branch of biology that deals with animals and animal life, including the study of the structure, physiology, development, and classification of animals

The Basics of Soil Biology

Have you ever wondered how to create healthy soil to grow strong and vigorous plants in your garden? In order to create and maintain healthy soil, you need to understand the basics of soil science, with a focus on soil biology.

There are millions of organisms living in our soil. These organisms are vital to the health of our gardens. Adding organic material to your garden beds has multiple positive effects for your soil and plants. In this animated video, “Excuses to Buy More Plants” from Fine Gardening, learn more about the interactions between organisms in your soil and plants, and how to feed and support these organisms.

For even more information on how to support soil life, check out the article How to Support Soil Life by Anne Bilke from Fine Gardening #195. Here’s how it begins:

The vitality and resiliency of every garden depends on plants interacting with a vast array of insects, fungi, and microorganisms, especially those that make their homes in the soil. Through their root systems, plants participate in diverse underground communities where nutrients are continuously exchanged and recycled. The above-ground parts of a plant also play a role, with certain soil dwellers relying on material such as dead leaves as a food source. Keeping soil-based communities in mind, gardeners must think carefully about what we add to the soil and minimize activities that disrupt soil life.

More information on improving soil quality:

Get our latest tips, how-to articles, and instructional videos sent to your inbox.

31.2: eQTL Basics - Biology

Количество зарегистрированных учащихся: 27 тыс.

Участвовать бесплатно

An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Получаемые навыки

Statistics, Data Analysis, R Programming, Biostatistics


Very good course and useful understanding statistical aspects of data.

This is the best. It opens my eye for genomic data analysis.

In this week we will cover a lot of the general pipelines people use to analyze specific data types like RNA-seq, GWAS, ChIP-Seq, and DNA Methylation studies.


Jeff Leek, PhD

Associate Professor, Biostatistics

Текст видео

Expression quantitative trait loci or eQTL is one of the most common anagrative analyses that are performed in genomics. So an eQTL is an analysis where you're trying to identify variations in DNA that correlate with variations in RNA. So basically what you do is you measure the abundance of different RNA molecules. And measure the DNA in those same samples and then you try to correlate the variation in DNA to the variation in RNA. This is representative of a whole class of problems that are associated with combining different genomic data types. Whether it's measuring proteomic data and RNA data, or DNA data and RNA data, or RNA and methylation. And then trying to integrate those data together to try to identify their sort of cross regulation between these different measurements. So one of the first examples of an eQTL study was this study be Brem et al in 2002 in Science. And they basically crossed two strains of yeast and they created 112 random segregants. And so once they had those yeast segregants, they measured mRNA expression at the time they used gene expression in the microarrays and then they measured genotypes using a microarray genotyping tool. And the goal was to identify associations between the expression levels as well as the genotype levels. And so you can think of this as basically having two components. One is this sort of the SNP data, so that's the marker or SNP associated with each gene in the genome. In this case, it's the yeast genome. And so you have the position of the particular SNP that you're measuring and then you also have information about a particular gene. Like how much that gene is turned on or expressed. And then you have the information on where that gene is located in the genome as well. So, you're basically trying to do an association between all possible gene expression levels and all possible SNP levels. So, this obviously complicates the issue of multiple testing because you're doing all possible SNPs versus all possible gene expression values. So if you think about it as for every single SNP, you're performing basically a gene expression microarray analysis for every single SNP. And if you have thousands or hundreds of thousands of SNPs, that's thousands or hundreds of thousands of micro experiments. And you're basically looking for in cases like this where you see, so in this case, there are the two strains. They have the BY and RM strains, so those are surrogates for the genotypes in this case. And so here, you're looking for differences in expression. So here you don't see any difference or not a very strong difference in expression between the BY and RM strains for this particular gene, for this particular variant. Here for this other variant for this other gene, you do see differences in the mean level of expression between the two genotypes. And so that would be sort of classified as an eQTL if it passed the significance thresholds. And so this is typically the kind of plot that you can make when you do an eQTL analysis, so on the x-axis here, we've got the position of the marker or the genotype. So again, that was where that SNP was positioned in the genome and then you also have the trait position. So that's where the gene expression levels were located at. So basically you can imagine where's the gene that codes for the mRNA that is being measured and where is the SNP that's being measured. So then youɽ just line up the chromosomes on each axis and so this circled component right here, this diagonal line represents what's called typically CISeQTL. So CISeQTL are often defined as eQTL where the SNP position is close to the gene expression position. And then there are also what's called TRANS eQTL, now in this case, there appear to be lots of TRANS eQTL. But well, it's often been noticed is that if you see these sort of big stripes of loci that seem to associate with many genes' expression levels. Very often, those tend to be artifacts so it might be a batch effect or some sort of artifact in the data that basically are driving the sort of variability. Now sometimes that may or may not be true. Like if you identify, for example, a biological reason that there might be a large number of associations between a particular locus and lots of genes' expression, that might be true. But typically, your assumption is that it might be an artifact if you see these sort of large stripes in the pattern here where there's a particular marker that's associated with many genes. So this idea is actually really popular right now. It's being used in a whole large number of studies. One of the most recent and very large scale studies of gene expression variation in context of eQTL is the GTEx project. Where they took multiple people, multiple donors, and they took from each donor, multiple tissues and they measured information about their DNA sequence. And they also measured their level of expression in various different tissues, say their brain, heart, and liver, and then they performed eQTL analysis that are both across tissues and within tissues. And so, they've identified a large number of eQTL including sort of cross tissue eQTL. That data is all available and you can start analyzing it yourself if you're interested. And so, eQTL is sort of an area that's here to stay and is probably the most popular of the integrative, the genomic sort of applications. So just some notes and further reading. So the cis-eQTL tend to be more believable than trans-eQTL. So the cis-eQTL being those eQTL where the SNP position or the variant position are close to the coding region of the gene. Then the trans-eQTL where you see SNP position that's very distant from the position of the coding gene. There are many potential confounders here. So in this analysis, usually you have to just like in the sort of a GWAS analysis you have to adjust for population stratification. You have to do that here. You also have to adjust for things like batch effects on the gene expression data just like you would do in a gene expression analysis. And then there's even more complicated things like sequence artifacts. Where a sequence artifact could actually make it look like that there's eQTL, especially a trans-eQTL, when they're not actually there. So this paper I've linked to here is actually an excellent review of many of the issues associated with eQTL analysis if you want to learn a little bit more about that.

31.2: eQTL Basics - Biology

Количество зарегистрированных учащихся: 27 тыс.

Участвовать бесплатно

An introduction to the statistics behind the most popular genomic data science projects. This is the sixth course in the Genomic Big Data Science Specialization from Johns Hopkins University.

Получаемые навыки

Statistics, Data Analysis, R Programming, Biostatistics


Very good course and useful understanding statistical aspects of data.

This is the best. It opens my eye for genomic data analysis.

In this week we will cover a lot of the general pipelines people use to analyze specific data types like RNA-seq, GWAS, ChIP-Seq, and DNA Methylation studies.

Watch the video: Lecture 1. An introduction to Toeplitz determinants. Николай Никольский. Лекториум (February 2023).