# Appropriate statistical test for a student lab? We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am a HS Bio teacher and doing a microevolution lab involving candy. Essentially students use four candy types and push them together until one cracks (Nat Sel). They also do simulations of migration, mutation and genetic drift. They then calculate the "allele frequency" of each type per generation and look for changes. I want to have my students do a simple statistical test to see whether the change was significant or not from each evolutionary factor. I was going to do a chi square test, but someone told me that is not appropriate to this type of experiment. So, can anyone confirm, is chi square inappropriate, and if it is… Can anyone point me to a more appropriate test. These are high school kids with no stats background (and I have forgotten virtually all the stats I learned 20 years ago)…

To choose the right statistical method (it is more than just saying "use the t-test") you need to think about your experiment. A good starting point is this figure from Bitesizebio: There are two relevant articles on that website:

Probably also interesting is the definition of statistical terms:

## How to choose the right statistical test?

Today statistics provides the basis for inference in most medical research. Yet, for want of exposure to statistical theory and practice, it continues to be regarded as the Achilles heel by all concerned in the loop of research and publication – the researchers (authors), reviewers, editors and readers.

Most of us are familiar to some degree with descriptive statistical measures such as those of central tendency and those of dispersion. However, we falter at inferential statistics. This need not be the case, particularly with the widespread availability of powerful and at the same time user-friendly statistical software. As we have outlined below, a few fundamental considerations will lead one to select the appropriate statistical test for hypothesis testing. However, it is important that the appropriate statistical analysis is decided before starting the study, at the stage of planning itself, and the sample size chosen is optimum. These cannot be decided arbitrarily after the study is over and data have already been collected.

The great majority of studies can be tackled through a basket of some 30 tests from over a 100 that are in use. The test to be used depends upon the type of the research question being asked. The other determining factors are the type of data being analyzed and the number of groups or data sets involved in the study. The following schemes, based on five generic research questions, should help.

Question 1: Is there a difference between groups that are unpaired? Groups or data sets are regarded as unpaired if there is no possibility of the values in one data set being related to or being influenced by the values in the other data sets. Different tests are required for quantitative or numerical data and qualitative or categorical data as shown in Fig. 1 . For numerical data, it is important to decide if they follow the parameters of the normal distribution curve (Gaussian curve), in which case parametric tests are applied. If distribution of the data is not normal or if one is not sure about the distribution, it is safer to use non-parametric tests. When comparing more than two sets of numerical data, a multiple group comparison test such as one-way analysis of variance (ANOVA) or Kruskal-Wallis test should be used first. If they return a statistically significant p value (usually meaning p < 0.05) then only they should be followed by a post hoc test to determine between exactly which two data sets the difference lies. Repeatedly applying the t test or its non-parametric counterpart, the Mann-Whitney U test, to a multiple group situation increases the possibility of incorrectly rejecting the null hypothesis. Tests to address the question: Is there a difference between groups – unpaired (parallel and independent groups) situation?

Question 2: Is there a difference between groups which are paired? Pairing signifies that data sets are derived by repeated measurements (e.g. before-after measurements or multiple measurements across time) on the same set of subjects. Pairing will also occur if subject groups are different but values in one group are in some way linked or related to values in the other group (e.g. twin studies, sibling studies, parent-offspring studies). A crossover study design also calls for the application of paired group tests for comparing the effects of different interventions on the same subjects. Sometimes subjects are deliberately paired to match baseline characteristics such as age, sex, severity or duration of disease. A scheme similar to Fig. 1 is followed in paired data set testing, as outlined in Fig. 2 . Once again, multiple data set comparison should be done through appropriate multiple group tests followed by post hoc tests. Tests to address the question: Is there a difference between groups – paired situation?

Question 3: Is there any association between variables? The various tests applicable are outlined in Fig. 3 . It should be noted that the tests meant for numerical data are for testing the association between two variables. These are correlation tests and they express the strength of the association as a correlation coefficient. An inverse correlation between two variables is depicted by a minus sign. All correlation coefficients vary in magnitude from 0 (no correlation at all) to 1 (perfect correlation). A perfect correlation may indicate but does not necessarily mean causality. When two numerical variables are linearly related to each other, a linear regression analysis can generate a mathematical equation, which can predict the dependent variable based on a given value of the independent variable. Odds ratios and relative risks are the staple of epidemiologic studies and express the association between categorical data that can be summarized as a 2 × 2 contingency table. Logistic regression is actually a multivariate analysis method that expresses the strength of the association between a binary dependent variable and two or more independent variables as adjusted odds ratios. Tests to address the question: Is there an association between variables?

Question 4: Is there agreement between data sets? This can be a comparison between a new screening technique against the standard test, new diagnostic test against the available gold standard or agreement between the ratings or scores given by different observers. As seen from Fig. 4 , agreement between numerical variables may be expressed quantitatively by the intraclass correlation coefficient or graphically by constructing a Bland-Altman plot in which the difference between two variables x and y is plotted against the mean of x and y. In case of categorical data, the Cohen’s Kappa statistic is frequently used, with kappa (which varies from 0 for no agreement at all to 1 for perfect agreement) indicating strong agreement when it is > 0.7. It is inappropriate to infer agreement by showing that there is no statistically significant difference between means or by calculating a correlation coefficient. Tests to address the question: Is there an agreement between assessment (screening / rating / diagnostic) techniques?

Question 5: Is there a difference between time-to-event trends or survival plots? This question is specific to survival analysis(the endpoint for such analysis could be death or any event that can occur after a period of time) which is characterized by censoring of data, meaning that a sizeable proportion of the original study subjects may not reach the endpoint in question by the time the study ends. Data sets for survival trends are always considered to be non-parametric. If there are two groups then the applicable tests are Cox-Mantel test, Gehan’s (generalized Wilcoxon) test or log-rank test. In case of more than two groups Peto and Peto’s test or log-rank test can be applied to look for significant difference between time-to-event trends.

It can be appreciated from the above outline that distinguishing between parametric and non-parametric data is important. Tests of normality (e.g. Kolmogorov-Smirnov test or Shapiro-Wilk goodness of fit test) may be applied rather than making assumptions. Some of the other prerequisites of parametric tests are that samples have the same variance i.e. drawn from the same population, observations within a group are independent and that the samples have been drawn randomly from the population.

A one-tailed test calculates the possibility of deviation from the null hypothesis in a specific direction, whereas a two-tailed test calculates the possibility of deviation from the null hypothesis in either direction. When Intervention A is compared with Intervention B in a clinical trail, the null hypothesis assumes there is no difference between the two interventions. Deviation from this hypothesis can occur in favor of either intervention in a two-tailed test but in a one-tailed test it is presumed that only one intervention can show superiority over the other. Although for a given data set, a one-tailed test will return a smaller p value than a two-tailed test, the latter is usually preferred unless there is a watertight case for one-tailed testing.

It is obvious that we cannot refer to all statistical tests in one editorial. However, the schemes outlined will cover the hypothesis testing demands of the majority of observational as well as interventional studies. Finally one must remember that, there is no substitute to actually working hands-on with dummy or real data sets, and to seek the advice of a statistician, in order to learn the nuances of statistical hypothesis testing.

## 6.1 Goals for this Chapter

Familiarize ourselves with the statistical machinery of hypothesis testing, its vocabulary, its purpose, and its strengths and limitations.

Understand what multiple testing means.

See that multiple testing is not a problem – but rather, an opportunity, as it overcomes many of the limitations of single testing.

Understand the false discovery rate.

Learn how to make diagnostic plots.

Use hypothesis weighting to increase the power of our analyses.

### 6.1.1 Drinking from the firehose

If statistical testing – decision making with uncertainty – seems a hard task when making a single decision, then brace yourself: in genomics, or more generally with “big data”, we need to accomplish it not once, but thousands or millions of times. In Chapter 2, we saw the example of epitope detection and the challenges from considering not only one, but several positions. Similarly, in whole genome sequencing, we scan every position in the genome for a difference between the DNA library at hand and a reference (or, another library): that’s on the order of three billion tests if we are looking at human data! In genetic or chemical compound screening, we test each of the reagents for an effect in the assay, compared to a control: that’s again tens of thousands, if not millions of tests. In Chapter 8, we will analyse RNA-Seq data for differential expression by applying a hypothesis test to each of the thousands of genes assayed. Figure 6.1: High-throughput data in modern biology are being screened for associations with millions of hypothesis tests. (Source: Bayer)

Yet, in many ways, multiplicity makes the task simpler, not harder. Since we have so much data, and so many tests, we can ask questions like: are the requirements of the tests actually met by the data? What are the prior probabilities that we should assign to the possible outcomes of the tests? Answers to these questions can be incredibly helpful, and we can address them because of the multiplicity. So we should think about it not as a multiple testing “problem”, but as an opportunity!

There is a powerful premise in large-scale association screening: we usually expect that most tests will not be rejected. Out of the thousands or millions of tests, we expect that only a small fraction will be interesting. In fact, if that is not the case, if the hits are not rare, then arguably our analysis method – serially univariate screening each variable for association with the outcome – is not suitable for the dataset. Either we need better data (a more specific assay), or a different analysis method, e.g., a multivariate model.

So, since we can assume that most of our many null hypotheses are true, we can use the behaviour of their test statistics and p-values to empirically understand the null distributions, their correlations, and so on. Rather than having to rely on abstract assumptions we can check the requirements empirically.

### 6.1.2 Testing versus classification

Suppose we measured the expression level of a marker gene to decide whether the cells we are studying are from cell type A or B. First, let’s consider that we have no prior assumption, and it’s equally important to us to get the assignment right no matter whether the true cell type is A or B. This is a classification task. We’ll cover classification in Chapter 12. In this chapter, we consider the asymmetric case: based on what we already know (we could call this our prior), we lean towards calling it an A, and would need strong enough evidence to be convinced otherwise. We can think of this as an application of Occam’s razor 86 86 See also https://en.wikipedia.org/wiki/Occam%27s_razor : do not use a complicated explanation if a simpler one does the job. Or maybe the consequences and costs of calling A and B are very different, for instance, A could be healthy and B diseased. In these cases, the machinery of hypothesis testing is right for us.

Formally, there are many similarities between hypothesis testing and classification. In both cases, we aim to use data to choose between several possible decisions. It is even possible to think of hypothesis testing as a special case of classification. However, these two approaches are geared towards different objectives and underlying assumptions, and when you encounter a statistical decision problem, it is good to keep that in mind in your choice of methodology.

## PROCEDURE

### Materials

Proper personal protective equipment (PPE), including gloves, lab coats, and eye protection, is necessary for students performing this activity. In addition, each pair of students will require the following materials: two Mueller-Hinton agar plates, one nutrient (or tryptic soy) broth culture each of Bacillus cereus str. 971 and Escherichia coli str. K12, a bent-rod bacteria spreader (or two sterile swabs), a 1,000-μL micropipette and four sterile tips, two screw-cap centrifuge tubes (15 mL), two small beakers, two disposable sterile petri dishes, two Luer-Lok syringes (10 mL), two syringe membrane filters (sterile, 0.2𠄰.45 μm), four blank paper disks (sterile), two tetracycline disks, two penicillin disks, one mortar and pestle, one pair of forceps, one 10-mL graduated cylinder (or a 10-mL pipette with pipettor), one weighing spatula, two moringa seeds, one clove of fresh garlic, and 5 mL of 60ଌ water ( Fig. 1A ). If necessary, the students can share a tabletop microcentrifuge, ethanol, forceps, and a Bunsen burner. To save time, one half of the students can prepare the moringa extract, and the other half can prepare the garlic extract. Images of laboratory procedures. (A) Materials needed for the activity (B) Moringa seed with hull removed (C) Mortar and pestle grinding of moringa seed (D) Ground moringa seed slurry being poured into a 15-mL centrifuge tube (E) Slurry supernatant sterilized through a membrane filter (0.2𠄰.45 μm) (F) Sterile disks soaked in sterile extracts (G) Plating extract-soaked disks onto inoculated plates (H) Disk placement on plates of either E. coli (left) or B. cereus (right) (I) Zones of inhibition apparent on plates following 24-hour incubation (37ଌ).

### Student instructions

Detailed student instructions to perform this activity are provided in Appendix 3. Briefly, the students worked together as a class to prepare filter-sterilized moringa and garlic extracts in water ( Fig. 1B𠄾 ). They then plated a “lawn” of each of the provided bacteria (one plate of E. coli and one plate of B. cereus) using either an ethanol-flamed bacteria spreader or sterile swabs. Ethanol-flamed forceps were then used to apply both extract-soaked and antibiotic-infused disks to each inoculated plate ( Fig. 1F–H ), and the plates were incubated for 24 hours at 37ଌ. After incubation, the students measured zones of growth inhibition around each disk ( Fig. 1I ) and completed a post-lab analysis, which included recording their data and answering the post-lab summary questions.

### Faculty instructions

The instructor may obtain the moringa seeds from several possible sources. We obtained our seeds from www.echonet.org. Other vendors may be available via the Internet. Fresh garlic may be purchased from a local grocery store. Fresh (approximately 24-hour) bacterial cultures should be provided for the students to conduct this activity (see Appendix 4 for more details). All other materials are standard laboratory supplies.

Present a pre-lab lecture to ensure the students are familiar with basic microbiological concepts (Appendix 4). If it has not already been covered in the course, the pre-lab lecture should include an introduction to artificial culture media and the use of proper growth conditions to cultivate bacteria. Students should also be made aware of the structural differences between gram-positive and gram-negative bacteria, as these often determine the efficacy of antibiotics that affect cell wall biosynthesis, such as penicillin, versus those that target other processes in the bacterial cell, such as tetracycline, an inhibitor of protein synthesis. In addition, students should be given basic background information regarding plants that produce antimicrobial compounds and the use of moringa and garlic as medicinal plants.

The students should also be reminded of their training regarding the safety precautions for handling bacterial cultures, including their proper disposal, and an instructor demonstration of aseptic technique is recommended. As it is a potential source of injury, students should also be shown the correct technique for ethanol and flame sterilization of forceps and inoculating tools.

### Suggestions for determining student learning

Before beginning the activity, the students were given an unannounced Pre-Lab Quiz ( Table 1 also see Appendices 1 and 2). The quiz was designed to assess the students’ existing knowledge of plants as a source of antibacterial products and how the activity of these products can be measured. After completing the activity and obtaining their results, the students were required to complete a post-lab analysis (Appendices 3 and 4), which included several summary questions to encourage critical thinking and assess student comprehension. The post-lab analysis, with its accompanying collection of data and answers to the summary questions, was due at the beginning of the next lab period. To gauge their retention of key concepts introduced the previous week, the students were then given the same Pre-Lab Quiz (unannounced and completed without the aid of notes) as a Post-Lab assessment.

### TABLE 1

Pre- and post-lab activity quiz, including the learning objective to which each question corresponds.

Question NumberQuiz QuestionCorresponding Learning Objective
1Some plants have been shown to have medicinal value. How might you test a plant for production of antibacterial compounds?1
2What are the requirements for growing bacteria in the laboratory, and what is the significance of aseptic technique in this process?2
3What do clear zones around antimicrobial disks on a bacterial petri plate culture indicate?3
4Bacteria are often categorized as either gram-positive or gram-negative. Why might an antibacterial compound inhibit the growth of gram-positive but not gram-negative bacteria?4
5Why do some plants produce chemicals that have antimicrobial properties?5
6How might humans make use of antimicrobial plant products to combat infectious disease?6

### Sample data

Students measured zones of inhibition 24 hours after plating and incubation. They were able to see the effect of moringa and garlic extract and compare it with penicillin and tetracycline. Classroom results were generally uniform since all students shared the same extracts and tested the same strains of bacteria. Several observations were made, including the effects of penicillin and tetracycline on gram-positive versus gram-negative bacteria and the effects of garlic and moringa extracts on the two bacteria. The students could clearly see that tetracycline was more effective against both gram-positive and gram-negative bacteria than penicillin ( Fig. 2 ). Garlic was also effective against both types of bacteria, whereas moringa was more effective against gram-positive than gram-negative bacteria ( Fig. 2 ). Table 2 shows an example of quantitative data typical of that collected by the students. Measuring zones of inhibition after 24-hour incubation. Inhibition of bacterial growth was evaluated by measuring the diameter (mm) of each zone of inhibition. Comparisons of individual antibacterial agents can be made between the gram-negative E. coli (A) and the gram-positive B. cereus (B). Disks clockwise from top left: P, penicillin T, tetracycline G, garlic M, moringa.

### TABLE 2

Example of typical data collected and reported by the students.

Bacterial SpeciesZone of Inhibition (diameter, mm)
Moringa Seed ExtractGarlic ExtractPenicillinTetracycline
Bacillus cereus11156.512
Escherichia coli7207.518

### Safety issues

The bacterial strains used in this activity (E. coli strain K12 and B. cereus strain 971) are classified as BSL 1 organisms. (Note that several alternative bacterial strains or species may be suitable for use in this activity, but some of these are classified as BSL 2 organisms (see “Possible modifications” below) and, as such, must be handled accordingly in a properly equipped lab.) The students were trained in the safe handling of these microorganisms according to the ASM Biosafety Guidelines (https://www.asm.org/images/asm_biosafety_guidelines-FINAL.pdf) described by Emmert et al. (12). The students acknowledged their training by signing a safety agreement and adhered to standard laboratory safety procedures, which included disinfecting lab surfaces before and after completing the activity and using proper personal protective equipment (because of the potential splash hazard associated with the manipulation of plant extracts and liquid bacterial cultures, PPE should include eye protection). Contaminated pipet tips and swabs, and all bacterial cultures were disposed of by autoclave sterilization after use.

Finally, for this activity, it is necessary for the students to sterilize laboratory utensils (e.g., forceps and bacteria spreader) by dipping in ethanol and flaming with a Bunsen burner. Ensure that student training includes how to properly carry out this procedure.

## Appropriate statistical test for a student lab? - Biology

Definitions | Getting Organized | Referencing from Text | Abbreviation of "Fig." | Numbering Figures and Tables |
| Placement in paper | Legends | Legend Postion | Anatomy of a table | Anatomy of a graph |
| Compound Figures | Bar Graphs | Frequency Histograms | Scatterplots | Line Graphs |

| More examples |

Once your statistical analyses are complete, you will need to summarize the data and results for presentation to your readers. Data summaries may take one of 3 forms: text, Tables and Figures.

Text: contrary to what you may have heard, not all analyses or results warrant a Table or Figure. Some simple results are best stated in a single sentence, with data summarized parenthetically:

Seed production was higher for plants in the full-sun treatment (52.3 +/-6.8 seeds) than for those receiving filtered light (14.7+/- 3.2 seeds, t=11.8, df=55, p<0.001.)

Tables: Tables present lists of numbers or text in columns, each column having a title or label. Do not use a table when you wish to show a trend or a pattern of relationship between sets of values - these are better presented in a Figure. For instance, if you needed to present population sizes and sex ratios for your study organism at a series of sites, and you planned to focus on the differences among individual sites according to (say) habitat type, you would use a table. However, if you wanted to show us that sex ratio was related to population size, you would use a Figure.

Figures: Figures are visual presentations of results, including graphs, diagrams, photos, drawings, schematics, maps, etc. Graphs are the most common type of figure and will be discussed in detail examples of other types of figures are included at the end of this section. Graphs show trends or patterns of relationship.

How to refer to Tables and Figures from the text: Every Figure and Table included in the paper MUST be referred to from the text. Use sentences that draw the reader's attention to the relationship or trend you wish to highlight, referring to the appropriate Figure or Table only parenthetically:

Germination rates were significantly higher after 24 h in running water than in controls (Fig. 4).

DNA sequence homologies for the purple gene from the four congeners (Table 1) show high similarity, differing by at most 4 base pairs.

Avoid sentences that give no information other than directing the reader to the Figure or Table:

Table 1 shows the summary results for male and female heights at Bates College.

Abbreviation of the word "Figure": When referring to a Figure in the text, the word "Figure" is abbreviated as "Fig.", while "Table" is not abbreviated. Both words are spelled out completely in descriptive legends.

• the first sentence functions as the title for the figure (or table) and should clearly indicate what results are shown in the context of the study question,
• the summary statistics that have been plotted (e.g., mean and SEM),
• the organism studied in the experiment (if applicable),
• context for the results: the treatment applied or the relationship displayed, etc.
• location (ONLY if a field experiment),
• specific explanatory information needed to interpret the results shown (in tables, this is frequently done as footnotes) and may include a key to any annotations,
• culture parameters or conditions if applicable (temperature, media, etc) as applicable, and,
• sample sizes and statistical test summaries as they apply.
• Do not simply restate the axis labels with a "versus" written in between.

Format and placement of legends:

• Both Figure and Table legends should match the width of the Table or graph.
• Table legends go above the body of the Table and are left justified Tables are read from the top down.
• Figure legends go below the graph and are left justified graphs and other types of Figures are usually read from the bottom up.
• Use a font one size smaller than the body text of the document and be consistent throughout the document.
• Use the same font as the body text.

Table 4 below shows the typical layout of a table in three sections demarcated by lines. Tables are most easily constructed using your word processor's table function or a spread sheet such as Excel. Gridlines or boxes, commonly invoked by word processors, are helpful for setting cell and column alignments, but should be eliminated from the printed version. Tables formatted with cell boundaries showing are unlikely to be permitted in a journal.

Example 1: Courtesy of Shelley Ball.

Example 2: Courtesy of Shelley Ball.

Example 3: Courtesy of Greg Anderson

In these examples notice several things:

• the presence of a period after "Table #"
• the legend (sometimes called the caption ) goes above the Table
• units are specified in column headings wherever appropriate
• lines of demarcation are used to set legend, headers, data, and footnotes apart from one another.
• footnotes are used to clarify points in the table, or to convey repetitive information about entries
• footnotes may also be used to denote statistical differences among groups.

The sections below show when and how to use the four most common Figure types (bar graph, frequency histogram, XY scatterplot, XY line graph.) The final section gives examples of other, less common, types of Figures.

Parts of a Graph: Below are example figures (typical line and bar graphs) with the various component parts labeled in red. Refer back to these examples if you encounter an unfamiliar term as you read the following sections.

• Big or little? For course-related papers, a good rule of thumb is to size your figures to fill about one-half of a page. Use an easily readable font size for axes and ticks. Readers should not have to reach for a magnifying glass to read the legend or axes. Compound figures may require a full page.
• Color or no color? Most often black and white is preferred. The rationale is that if you need to photocopy or fax your paper, any information conveyed by colors will be lost to the reader. However, for a poster presentation or a talk with projected images, color can be helpful in distinguishing different data sets. Every aspect of your Figure should convey information never use color simply because it is pretty.
• Title or no title? Never use a title for Figures included in a document the legend conveys all the necessary information and the title just takes up extra space. However, for posters or projected images , where people may have a harder time reading the small print of a legend, a larger font title is very helpful.
• Offset axes or not ? Elect to offset the axes only when data points will be obscured by being printed over the Y axis.
• Error bars or not ? Always include error bars (e.g., SD or SEM) when plotting means. In some courses you may be asked to plot other measures associated with the mean, such as confidence intervals. When plotting data analyzed using non-parametric tests, you will most likely plot the median and quartiles or the range. These might be dotplots or box and whisker plots.
• Tick marks - Use common sense when deciding on major (numbered) versus minor ticks. Major ticks should be used to reasonably break up the range of values plotted into integer values. Within the major intervals, it is usually necessary to add minor interval ticks that further subdivide the scale into logical units (i.e., a interval that is a factor of the major tick interval). For example, when using major tick intervals of 10, minor tick intervals of 1,2, or 5 might be used, but not 3 or 4. When the data follow a uniform interval on the x-axis (e.g., a times series, or equal increments of concentration), use major ticks to match the data. No minor intervals would be used in this case.
• Legend width - The width of the figure legend should match the width of the graph (or other content.
• Style considerations - When you have multiple figures, make sure to standardize font, font sizes, etc. such that all figures look stylistically similar.

When you have multiple graphs, or graphs and others illustrative materials that are interrelated, it may be most efficient to present them as a compound figure. Compound figures combine multiple graphs into one common figure and share a common legend. Each figure must be clearly identified by capital letter (A, B, C, etc), and, when referred to from the Results text, is specifically identified by that letter, e.g., ". (Fig. 1b)" . The legend of the compound figure must also identify each graph and the data it presents by letter.

Bar graphs are used when you wish to compare the value of a single variable (usually a summary value such as a mean) among several groups. For example, a bar graph is appropriate to show the mean sizes of plants harvested from plots that received 4 different fertilizer treatments. (Note that although a bar graph might be used to show differences between only 2 groups, especially for pedagogical purposes, editors of many journals would prefer that you save space by presenting such information in the text.)

In this example notice that:

• legend goes below the figure
• a period follows "Figure 1" and the legend itself "Figure" is not abbreviated
• the measured variable is labelled on the Y axis. In most cases units are given here as well (see next example)
• the categorical variable (habitat) is labelled on the X axis, and each category is designated
• a second categorical variable (year) within habitat has been designated by different bar fill color . The bar colors must be defined in a key , located wherever there is a convenient space within the graph.
• error bars are included, extending +1 SD or SEM above the mean.
• statistical differences may be indicated by a system of letters above the bars, with an accompanying note in the caption indicating the test and the significance level used.
• the completeness of the legend, which in this case requires over 3 lines just to describe the treatments used and variable measured.
• axis labels, with units
• treatment group (pH) levels specified on X axis
• error bars and group sample sizes accompany each bar, and each of these is well-defined in legend
• statistical differences in this case are indicated by lines drawn over the bars, and the statistical test and significance level are identified in the legend.

Frequency histograms (also called frequency distributions) are bar-type graphs that show how the measured individuals are distributed along an axis of the measured variable. Frequency (the Y axis) can be absolute (i.e. number of counts) or relative (i.e. percent or proportion of the sample.) A familiar example would be a histogram of exam scores, showing the number of students who achieved each possible score. Frequency histograms are important in describing populations, e.g. size and age distributions.

• the Y axis includes a clear indication ("%") that relative frequencies are used. (Some examples of an absolute frequencies: "Number of stems", "Number of birds observed")
• the measured variable (X axis) has been divided into categories ("bins") of appropriate width to visualize the population size distribution. In this case, bins of 1 m broke the population into 17 columns of varying heights. Setting the bin size at 0.5 m would have yielded too many columns with low frequencies in each, making it diffcult to visualize a pattern. Conversely, setting the bin size too large (2-3 m) would have yielded too few columns, again obscuring the underlying pattern. A rule of thumb is to start with a number of bins that is equal to the square root of the largest value in your data set(s) to be plotted.
• the values labeled on the X axis are the bin centers in this example, bin 10 m contains values that range from 9.50 -10.49 m.
• sample size is clearly indicated, either in the legend (as in this case) or in the body of the graph itself
• the Y axis includes numbered and minor ticks to allow easy determination of bar values.

These are plots of X,Y coordinates showing each individual's or sample's score on two variables. When plotting data this way we are usually interested in knowing whether the two variables show a "relationship", i.e. do they change in value together in a consistent way?

Note in this example that:

• each axis is labeled (including units where appropriate) and includes numbered and minor ticks to allow easy determination of the values of plotted points
• sample size is included in the legend or the body of the graph
• if the data have been analyzed statistically and a relationship between the variables exists, it may be indicated by plotting the regression line on the graph, and by giving the equation of the regression and its statistical significance in the legend or body of the figure
• the range of each axis has been carefully selected to maximize the spread of the points and to minimize wasted blank space where no points fall. For instance, the X axis is truncated below 50 g because no plants smaller than 52 g were measured. The ranges selected also result in labeled ticks that are easy to read (50, 100, 150…, rather than 48, 96, 144…)

Which variable goes on the X axis? When one variable is clearly dependent upon another (e.g. height depends on age, but it is hard to imagine age depending on height), the convention is to plot the dependent variable on the Y axis and the independent variable on the X axis . Sometimes there is no clear independent variable (e.g. length vs. width of leaves: does width depend on width, or vice-versa?) In these cases it makes no difference which variable is on which axis the variables are inter -dependent, and an X,Y plot of these shows the relationship BETWEEN them (rather than the effect of one upon the other.)

In the example plotted above, we can imagine that seed production might depend on plant biomass, but it is hard to see how biomass could depend directly on seed production, so we choose biomass as the X axis. Alternatively, the relationship might be indirect: both seed production and plant biomass might depend on some other, unmeasured variable. Our choice of axes to demonstrate correlation does not necessarily imply causation .

Line graphs plot a series of related values that depict a change in Y as a function of X. Two common examples are a growth curve for an individual or population over time, and a dose-response curve showing effects of increasing doses of a drug or treatment.

When to connect the dots? If each point in the series is obtained from the same source and is dependent on the previous values (e.g. a plot of a baby's weight over the course of a year, or of muscle strength on successive contractions as a muscle fatigues), then the points should be connected by a line in a dot-to-dot fashion. If, however, the series represents independent measurements of a variable to show a trend (e.g. mean price of computer memory over time a standard curve of optical density vs. solute concentration), then the trend or relationship can be modeled by calculating the best-fit line or curve by regression analysis ( see A Painless Guide to Statistics ) Do not connect the dots when the measurements were made independently.

• a different symbol is used for each group (species), and the key to the symbols is placed in the body of the graph where space permits. Symbols are large enough to be easily recognizable in the final graph size
• each point represents a mean value, and this is stated in the legend. Error bars are therefore plotted for each point and defined in the legend as well.
• because measurements were taken on independent groups for each species, the points are NOT connected dot-to-dot instead a curve is fitted to the data to show the trend.
• this time the dots ARE connected dot-to-dot within each treatment, because cumulative percent germination was measured within the same set of seeds each day, and thus is dependent on the measurements of the prior days
• a different symbol is used for each treatment, and symbols are large enough (and connecting lines fine enough) so that all can be easily read at the final graph size
• in addition to the key to symbols, two other kinds of helpful information are supplied in the body of the figure: the values of the highest and lowest final cumulative percents, and a dashed line (baseline) showing the lowest cumulative % germination achieved. This baseline is defined in the legend.

Some Other Types of Figures

Figure 9. Aerial photo of the study site ca. 1949 and in 1998 (inset) showing the regeneration of the forest. Photos courtesy of the USDA Field Office, Auburn, Maine.

## Student's t&ndashtest for one sample

Use Student's t&ndashtest for one sample when you have one measurement variable and a theoretical expectation of what the mean should be under the null hypothesis. It tests whether the mean of the measurement variable is different from the null expectation.

### Introduction

There are several statistical tests that use the t-distribution and can be called a t&ndashtest. One is Student's t&ndashtest for one sample, named after "Student," the pseudonym that William Gosset used to hide his employment by the Guinness brewery in the early 1900s (they had a rule that their employees weren't allowed to publish, and Guinness didn't want other employees to know that they were making an exception for Gosset). Student's t&ndashtest for one sample compares a sample to a theoretical mean. It has so few uses in biology that I didn't cover it in previous editions of this Handbook, but then I recently found myself using it (McDonald and Dunn 2013), so here it is.

### When to use it

Use Student's t&ndashtest when you have one measurement variable, and you want to compare the mean value of the measurement variable to some theoretical expectation. It is commonly used in fields such as physics (you've made several observations of the mass of a new subatomic particle&mdashdoes the mean fit the mass predicted by the Standard Model of particle physics?) and product testing (you've measured the amount of drug in several aliquots from a new batch&mdashis the mean of the new batch significantly less than the standard you've established for that drug?). It's rare to have this kind of theoretical expectation in biology, so you'll probably never use the one-sample t&ndashtest.

I've had a hard time finding a real biological example of a one-sample t&ndashtest, so imagine that you're studying joint position sense, our ability to know what position our joints are in without looking or touching. You want to know whether people over- or underestimate their knee angle. You blindfold 10 volunteers, bend their knee to a 120° angle for a few seconds, then return the knee to a 90° angle. Then you ask each person to bend their knee to the 120° angle. The measurement variable is the angle of the knee, and the theoretical expectation from the null hypothesis is 120°. You get the following imaginary data:

If the null hypothesis were true that people don't over- or underestimate their knee angle, the mean of these 10 numbers would be 120. The mean of these ten numbers is 117.2 the one-sample t&ndashtest will tell you whether that is significantly different from 120.

### Null hypothesis

The statistical null hypothesis is that the mean of the measurement variable is equal to a number that you decided on before doing the experiment. For the knee example, the biological null hypothesis is that people don't under- or overestimate their knee angle. You decided to move people's knees to 120°, so the statistical null hypothesis is that the mean angle of the subjects' knees will be 120°.

### How the test works

Calculate the test statistic, ts, using this formula:

where x is the sample mean, &mu is the mean expected under the null hypothesis, s is the sample standard deviation and n is the sample size. The test statistic, ts, gets bigger as the difference between the observed and expected means gets bigger, as the standard deviation gets smaller, or as the sample size gets bigger.

Applying this formula to the imaginary knee position data gives a t-value of &minus3.69.

You calculate the probability of getting the observed ts value under the null hypothesis using the t-distribution. The shape of the t-distribution, and thus the probability of getting a particular ts value, depends on the number of degrees of freedom. The degrees of freedom for a one-sample t&ndashtest is the total number of observations in the group minus 1. For our example data, the P value for a t-value of &minus3.69 with 9 degrees of freedom is 0.005, so you would reject the null hypothesis and conclude that people return their knee to a significantly smaller angle than the original position.

### Assumptions

The t&ndashtest assumes that the observations within each group are normally distributed. If the distribution is symmetrical, such as a flat or bimodal distribution, the one-sample t&ndashtest is not at all sensitive to the non-normality you will get accurate estimates of the P value, even with small sample sizes. A severely skewed distribution can give you too many false positives unless the sample size is large (above 50 or so). If your data are severely skewed and you have a small sample size, you should try a data transformation to make them less skewed. With large sample sizes (simulations I've done suggest 50 is large enough), the one-sample t&ndashtest will give accurate results even with severely skewed data.

### Example

McDonald and Dunn (2013) measured the correlation of transferrin (labeled red) and Rab-10 (labeled green) in five cells. The biological null hypothesis is that transferrin and Rab-10 are not colocalized (found in the same subcellular structures), so the statistical null hypothesis is that the correlation coefficient between red and green signals in each cell image has a mean of zero. The correlation coefficients were 0.52, 0.20, 0.59, 0.62 and 0.60 in the five cells. The mean is 0.51, which is highly significantly different from 0 (t=6.46, 4 d.f., P=0.003), indicating that transferrin and Rab-10 are colocalized in these cells.

### Graphing the results

Because you're just comparing one observed mean to one expected value, you probably won't put the results of a one-sample t&ndashtest in a graph. If you've done a bunch of them, I guess you could draw a bar graph with one bar for each mean, and a dotted horizontal line for the null expectation.

### Similar tests

The paired t&ndashtest is a special case of the one-sample t&ndashtest it tests the null hypothesis that the mean difference between two measurements (such as the strength of the right arm minus the strength of the left arm) is equal to zero. Experiments that use a paired t&ndashtest are much more common in biology than experiments using the one-sample t&ndashtest, so I treat the paired t&ndashtest as a completely different test.

The two-sample t&ndashtest compares the means of two different samples. If one of your samples is very large, you may be tempted to treat the mean of the large sample as a theoretical expectation, but this is incorrect. For example, let's say you want to know whether college softball pitchers have greater shoulder flexion angles than normal people. You might be tempted to look up the "normal" shoulder flexion angle (150°) and compare your data on pitchers to the normal angle using a one-sample t&ndashtest. However, the "normal" value doesn't come from some theory, it is based on data that has a mean, a standard deviation, and a sample size, and at the very least you should dig out the original study and compare your sample to the sample the 150° "normal" was based on, using a two-sample t&ndashtest that takes the variation and sample size of both samples into account.

### How to do the test

I have set up a spreadsheet to perform the one-sample t&ndashtest. It will handle up to 1000 observations.

#### Web pages

There are web pages to do the one-sample t&ndashtest here and here.

You can use PROC TTEST for Student's t&ndashtest the CLASS parameter is the nominal variable, and the VAR parameter is the measurement variable. Here is an example program for the joint position sense data above. Note that "H0" parameter for the theoretical value is "H" followed by the numeral zero, not a capital letter O.

The output includes some descriptive statistics, plus the t-value and P value. For these data, the P value is 0.005.

### Power analysis

To estimate the sample size you to detect a significant difference between a mean and a theoretical value, you need the following:

• the effect size, or the difference between the observed mean and the theoretical value that you hope to detect
• the standard deviation
• alpha, or the significance level (usually 0.05)
• beta, the probability of accepting the null hypothesis when it is false (0.50, 0.80 and 0.90 are common values)

The G*Power program will calculate the sample size needed for a one-sample t&ndashtest. Choose "t tests" from the "Test family" menu and "Means: Difference from constant (one sample case)" from the "Statistical test" menu. Click on the "Determine" button and enter the theoretical value ("Mean H0") and a mean with the smallest difference from the theoretical that you hope to detect ("Mean H1"). Enter an estimate of the standard deviation. Click on "Calculate and transfer to main window". Change "tails" to two, set your alpha (this will almost always be 0.05) and your power (0.5, 0.8, or 0.9 are commonly used).

As an example, let's say you want to follow up the knee joint position sense study that I made up above with a study of hip joint position sense. You're going to set the hip angle to 70° (Mean H0=70) and you want to detect an over- or underestimation of this angle of 1°, so you set Mean H1=71. You don't have any hip angle data, so you use the standard deviation from your knee study and enter 2.4 for SD. You want to do a two-tailed test at the P<0.05 level, with a probability of detecting a difference this large, if it exists, of 90% (1&minusbeta=0.90). Entering all these numbers in G*Power gives a sample size of 63 people.

### Reference

McDonald, J.H., and K.W. Dunn. 2013. Statistical tests for measures of colocalization in biological microscopy. Journal of Microscopy 252: 295-302.

This page was last revised July 20, 2015. Its address is http://www.biostathandbook.com/onesamplettest.html. It may be cited as:
McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House Publishing, Baltimore, Maryland. This web page contains the content of pages 121-125 in the printed version.

©2014 by John H. McDonald. You can probably do what you want with this content see the permissions page for details.

## Student's t&ndashtest for two samples

Use Student's t&ndashtest for two samples when you have one measurement variable and one nominal variable, and the nominal variable has only two values. It tests whether the means of the measurement variable are different in the two groups.

### Introduction

There are several statistical tests that use the t-distribution and can be called a t&ndashtest. One of the most common is Student's t&ndashtest for two samples. Other t&ndashtests include the one-sample t&ndashtest, which compares a sample mean to a theoretical mean, and the paired t&ndashtest.

Student's t&ndashtest for two samples is mathematically identical to a one-way anova with two categories because comparing the means of two samples is such a common experimental design, and because the t&ndashtest is familiar to many more people than anova, I treat the two-sample t&ndashtest separately.

### When to use it

Use the two-sample t&ndashtest when you have one nominal variable and one measurement variable, and you want to compare the mean values of the measurement variable. The nominal variable must have only two values, such as "male" and "female" or "treated" and "untreated."

### Null hypothesis

The statistical null hypothesis is that the means of the measurement variable are equal for the two categories.

### How the test works

The test statistic, ts, is calculated using a formula that has the difference between the means in the numerator this makes ts get larger as the means get further apart. The denominator is the standard error of the difference in the means, which gets smaller as the sample variances decrease or the sample sizes increase. Thus ts gets larger as the means get farther apart, the variances get smaller, or the sample sizes increase.

You calculate the probability of getting the observed ts value under the null hypothesis using the t-distribution. The shape of the t-distribution, and thus the probability of getting a particular ts value, depends on the number of degrees of freedom. The degrees of freedom for a t&ndashtest is the total number of observations in the groups minus 2, or n1+n2&minus2.

### Assumptions

The t&ndashtest assumes that the observations within each group are normally distributed. Fortunately, it is not at all sensitive to deviations from this assumption, if the distributions of the two groups are the same (if both distributions are skewed to the right, for example). I've done simulations with a variety of non-normal distributions, including flat, bimodal, and highly skewed, and the two-sample t&ndashtest always gives about 5% false positives, even with very small sample sizes. If your data are severely non-normal, you should still try to find a data transformation that makes them more normal, but don't worry if you can't find a good transformation or don't have enough data to check the normality.

If your data are severely non-normal, and you have different distributions in the two groups (one data set is skewed to the right and the other is skewed to the left, for example), and you have small samples (less than 50 or so), then the two-sample t&ndashtest can give inaccurate results, with considerably more than 5% false positives. A data transformation won't help you here, and neither will a Mann-Whitney U-test. It would be pretty unusual in biology to have two groups with different distributions but equal means, but if you think that's a possibility, you should require a P value much less than 0.05 to reject the null hypothesis.

The two-sample t&ndashtest also assumes homoscedasticity (equal variances in the two groups). If you have a balanced design (equal sample sizes in the two groups), the test is not very sensitive to heteroscedasticity unless the sample size is very small (less than 10 or so) the standard deviations in one group can be several times as big as in the other group, and you'll get P<0.05 about 5% of the time if the null hypothesis is true. With an unbalanced design, heteroscedasticity is a bigger problem if the group with the smaller sample size has a bigger standard deviation, the two-sample t&ndashtest can give you false positives much too often. If your two groups have standard deviations that are substantially different (such as one standard deviation is twice as big as the other), and your sample sizes are small (less than 10) or unequal, you should use Welch's t&ndashtest instead.

### Example

In fall 2004, students in the 2 p.m. section of my Biological Data Analysis class had an average height of 66.6 inches, while the average height in the 5 p.m. section was 64.6 inches. Are the average heights of the two sections significantly different? Here are the data:

2 p.m.5 p.m.
6968
7062
6667
6368
6869
7067
6961
6759
6262
6361
7669
5966
6262
6262
7561
6270
72
63

There is one measurement variable, height, and one nominal variable, class section. The null hypothesis is that the mean heights in the two sections are the same. The results of the t&ndashtest (t=1.29, 32 d.f., P=0.21) do not reject the null hypothesis.

### Graphing the results

Because it's just comparing two numbers, you'll rarely put the results of a t&ndashtest in a graph for publication. For a presentation, you could draw a bar graph like the one for a one-way anova.

### Similar tests

Student's t&ndashtest is mathematically identical to a one-way anova done on data with two categories you will get the exact same P value from a two-sample t&ndashtest and from a one-way anova, even though you calculate the test statistics differently. The t&ndashtest is easier to do and is familiar to more people, but it is limited to just two categories of data. You can do a one-way anova on two or more categories. I recommend that if your research always involves comparing just two means, you should call your test a two-sample t&ndashtest, because it is more familiar to more people. If you write a paper that includes some comparisons of two means and some comparisons of more than two means, you may want to call all the tests one-way anovas, rather than switching back and forth between two different names (t&ndashtest and one-way anova) for the same thing.

The Mann-Whitney U-test is a non-parametric alternative to the two-sample t&ndashtest that some people recommend for non-normal data. However, if the two samples have the same distribution, the two-sample t&ndashtest is not sensitive to deviations from normality, so you can use the more powerful and more familiar t&ndashtest instead of the Mann-Whitney U-test. If the two samples have different distributions, the Mann-Whitney U-test is no better than the t&ndashtest. So there's really no reason to use the Mann-Whitney U-test unless you have a true ranked variable instead of a measurement variable.

If the variances are far from equal (one standard deviation is two or more times as big as the other) and your sample sizes are either small (less than 10) or unequal, you should use Welch's t&ndashtest (also know as Aspin-Welch, Welch-Satterthwaite, Aspin-Welch-Satterthwaite, or Satterthwaite t&ndashtest). It is similar to Student's t&ndashtest except that it does not assume that the standard deviations are equal. It is slightly less powerful than Student's t&ndashtest when the standard deviations are equal, but it can be much more accurate when the standard deviations are very unequal. My two-sample t&ndashtest spreadsheet will calculate Welch's t&ndashtest. You can also do Welch's t&ndashtest using this web page, by clicking the button labeled "Welch's unpaired t&ndashtest".

Use the paired t&ndashtest when the measurement observations come in pairs, such as comparing the strengths of the right arm with the strength of the left arm on a set of people.

Use the one-sample t&ndashtest when you have just one group, not two, and you are comparing the mean of the measurement variable for that group to a theoretical expectation.

### How to do the test

I've set up a spreadsheet for two-sample t&ndashtests. It will perform either Student's t&ndashtest or Welch's t&ndashtest for up to 2000 observations in each group.

#### Web pages

There are web pages to do the t&ndashtest here and here. Both will do both the Student's t&ndashtest and Welch's t&ndashtest.

You can use PROC TTEST for Student's t&ndashtest the CLASS parameter is the nominal variable, and the VAR parameter is the measurement variable. Here is an example program for the height data above.

The output includes a lot of information the P value for the Student's t&ndashtest is under "Pr > |t| on the line labeled "Pooled", and the P value for Welch's t&ndashtest is on the line labeled "Satterthwaite." For these data, the P value is 0.2067 for Student's t&ndashtest and 0.1995 for Welch's.

### Power analysis

To estimate the sample sizes needed to detect a significant difference between two means, you need the following:

• the effect size, or the difference in means you hope to detect
• the standard deviation. Usually you'll use the same value for each group, but if you know ahead of time that one group will have a larger standard deviation than the other, you can use different numbers
• alpha, or the significance level (usually 0.05)
• beta, the probability of accepting the null hypothesis when it is false (0.50, 0.80 and 0.90 are common values)
• the ratio of one sample size to the other. The most powerful design is to have equal numbers in each group (N1/N2=1.0), but sometimes it's easier to get large numbers of one of the groups. For example, if you're comparing the bone strength in mice that have been reared in zero gravity aboard the International Space Station vs. control mice reared on earth, you might decide ahead of time to use three control mice for every one expensive space mouse (N1/N2=3.0)

The G*Power program will calculate the sample size needed for a two-sample t&ndashtest. Choose "t tests" from the "Test family" menu and "Means: Difference between two independent means (two groups" from the "Statistical test" menu. Click on the "Determine" button and enter the means and standard deviations you expect for each group. Only the difference between the group means is important it is your effect size. Click on "Calculate and transfer to main window". Change "tails" to two, set your alpha (this will almost always be 0.05) and your power (0.5, 0.8, or 0.9 are commonly used). If you plan to have more observations in one group than in the other, you can make the "Allocation ratio" different from 1.

As an example, let's say you want to know whether people who run regularly have wider feet than people who don't run. You look for previously published data on foot width and find the ANSUR data set, which shows a mean foot width for American men of 100.6 mm and a standard deviation of 5.26 mm. You decide that you'd like to be able to detect a difference of 3 mm in mean foot width between runners and non-runners. Using G*Power, you enter 100 mm for the mean of group 1, 103 for the mean of group 2, and 5.26 for the standard deviation of each group. You decide you want to detect a difference of 3 mm, at the P<0.05 level, with a probability of detecting a difference this large, if it exists, of 90% (1&minusbeta=0.90). Entering all these numbers in G*Power gives a sample size for each group of 66 people.

This page was last revised July 20, 2015. Its address is http://www.biostathandbook.com/twosamplettest.html. It may be cited as:
McDonald, J.H. 2014. Handbook of Biological Statistics (3rd ed.). Sparky House Publishing, Baltimore, Maryland. This web page contains the content of pages 126-130 in the printed version.

©2014 by John H. McDonald. You can probably do what you want with this content see the permissions page for details.

The title should tell the readers “This is what this figure is about” very clearly. Use active voice, and keep it short.

Make it a sentence that summarize the major result seen in the figure.

Example: Hippocampal neurons derived from patients with bipolar disorder show hyperexcitability.

Or, make it a phrase stating the type of analysis used.

Example: Quantification of XYZ transgene expression using RT-PCR.

## Materials

### *Cell Preparation

MCF-7 cells (American Type Culture Collection: HTB-22) cells, CO2 incubator (ref: MCO-19AIC Panasonic).

Medium: Dulbecco's Modified Eagle's Medium/Nutrient Mixture F-12 Ham (ref: D6434 Sigma-Aldrich), l -Glutamine 100× (ref: G7513 Sigma-Aldrich), Fetal Bovine Serum (FBS) (ref: 10270-106 Gibco), activated charcoal (ref: C9157 Sigma-Aldrich), 100× Penicillin/Streptomycin (ref: P0781-100ML Sigma-Aldrich). 17β-Estradiol (E2) (ref: 2824 Tocris).

Cell passaging/splitting: Trypsin EDTA solution (ref: T3924 SIGMA-ALDRICH) and Dulbecco's Phosphate buffered saline (ref: D8537 Sigma-Aldrich). Culture plates 145 cm 2 (ref: 168381 Nunc). Serological pipettes 10 mL (ref: 170356 Nunc).

Rocking platform (ref: WT16 Biometra) cell scraper (ref: 3010 Corning) formaldehyde (ref: F8775 Sigma-Aldrich) Glycine (ref: 410225 Sigma-Aldrich).

### *Cellular and Nuclear Lysis

Dounce B pestle homogenizer (ref: D9938 Sigma-Aldrich).

• Cellular lysis buffer : 5 μM PIPES (pH 8) 85 mM KCl 0.5% NP-40 + protease inhibitors.
• Nuclear lysis buffer : 50 mM Tris (pH 8.1) 10 mM EDTA (pH 8) 1% SDS + protease inhibitors.
• Dilution buffer : 0.01% SDS 1% Triton X100 1 mM EDTA (pH8) 1.5 mM Tris (pH 8.1) 0.167 mM NaCl.
• IP (ImmunoPrecipitation) buffer : 1 volume nuclear lysis buffer/9 volumes dilution buffer.
• Dialysis buffer : 2 mM EDTA (pH 8) 50 mM Tris (pH 8) 0.2% sodium lauroyl sarcosinate.
• Wash buffer : 100 mM Tris (pH 8.8) 0.5M LiCl 1% NP-40 1% sodium deoxycholate.
• TE buffer : 10 mM Tris (pH 8) 1 mM EDTA (pH 8).

### *Sonication/Nucleic Acid Concentration Measurements

Branson 450 sonifier (ref: 101-063-591 Branson) NanoDrop 1000 (Thermo Scientific).

Antibodies: anti-ERα HC-20 (ref: sc-543 Santa Cruz) nonrelevant (if used) anti-HA tag (ref: H9658 Sigma-Aldrich) (alternatively, use H20 which is cheaper).

Other supplies: EZview Red protein A affinity Gel (ref: P6486 Sigma-Aldrich) Sonicated salmon testes DNA (ref: D1626 Sigma-Aldrich) Bovine Serum Albumin (100× solution New England Biolabs) RNAse A (ref: RB0473 Biobasic) Proteinase K (ref: EU0090 Euromedex) Phenol/Chloroform/Isoamyl alcool (ref: 15593-049 Invitrogen) Chloroform (ref: 372978 Sigma-Aldrich) N-Laurylsarcosine sodium salt (ref: L9150 Sigma-Aldrich) Sodium dodecyl sulfate (SDS) (ref: 05030 Sigma-Aldrich) Protease inhibitor cocktail EDTA-free (ref: 04693132001 Roche) Sodium acetate (ref: S2889 Sigma-Aldrich) Sodium deoxycholate (ref: D6750 Sigma-Aldrich) NP-40 (has been substituted by IGEPAL-630) (ref: I8896 Sigma-Aldrich) Lithium chloride (ref: L9650 Sigma-Aldrich) Tris-Base (ref: 200923-A Euromedex) PIPES (ref: 1124-B Euromedex) Triton X-100 (ref: 2000-A Euromedex) EDTA (ref: EU0007 Euromedex) Sodium chloride (ref: 1112-A Euromedex) Glycogen-Glycoblue (ref: AM9515 Thermofisher) Rotating wheel (ref: SB3 Stuart).

SsoFast EvaGreen mix (ref: 1725203 Bio-Rad Laboratories) Real Time PCR detection system CFX 96 (ref: 1855196 Bio-Rad Laboratories) 96 well PCR plates (ref: HSP9601 Bio-Rad Laboratories) Adhesive seals, optical (ref: MSB1001 Bio-Rad Laboratories) single channel electronic pipettes 10-300 μL (ref: 735061 Sartorius) and 5–50 μL (ref: 46200200 Thermoscientific).

### *General Equipment

Snap cap 1.5 mL polypropylene microcentrifuge tubes (ref: 72706 Sarstedt) 15 mL tubes (ref: 352296 CORNING) pipettes p20, p200, p1000 (ref: F123600 F123601 F1236012 Gilson) tips 10 μL, 200 μL, 1000 μL (ref: 713110, 713111, 713113 Clearline) centrifuges (ref 5424 Eppendorf), (ref: Megafuge 16 Heraeus).

## Appropriate statistical test for a student lab? - Biology

The results of your statistical analyses help you to understand the outcome of your study, e.g., whether or not some variable has an effect, whether variables are related, whether differences among groups of observations are the same or different, etc. Statistics are tools of science, not an end unto themselves. Statistics should be used to substantiate your findings and help you to say objectively when you have significant results. Therefore, when reporting the statistical outcomes relevant to your study, subordinate them to the actual biological results.

Top of Page

Reporting Descriptive (Summary) Statistics

Means : Always report the mean (average value) along with a measure of variablility ( standard deviation(s) or standard error of the mean ). Two common ways to express the mean and variability are shown below:

"Total length of brown trout (n=128) averaged 34.4 cm ( s = 12.4 cm) in May, 1994, samples from Sebago Lake."

s = standard deviation (this format is preferred by Huth and others (1994 )

"Total length of brown trout (n=128) averaged 34.4 ± 12.4 cm in May, 1994, samples from Sebago Lake."

This style necessitates specifically saying in the Methods what measure of variability is reported with the mean.

If the summary statistics are presented in graphical form (a Figure), you can simply report the result in the text without verbalizing the summary values:

"Mean total length of brown trout in Sebago Lake increased by 3.8 cm between May and September, 1994 (Fig. 5)."

Frequencies: Frequency data should be summarized in the text with appropriate measures such as percents, proportions, or ratios.

"During the fall turnover period, an estimated 47% of brown trout and 24% of brook trout were concentrated in the deepest parts of the lake (Table 3)."

Reporting Results of Inferential (Hypothesis) Tests

In this example, the key result is shown in blue and the statistical result , which substantiates the finding, is in red.

"Mean total length of brown trout in Sebago Lake increased significantly (3.8 cm) between May (34.4 ± 12.4 cm, n=128) and September (38.2 ± 11.7 cm, n = 114) 1994 (twosample t-test, p < 0.001) ."

NOTE: AVOID writing whole sentences which simply say what test you used to analyze a result followed by another giving the result. This wastes precious words ( economy !!) and unnecessarily increases your paper's length.

Summarizing Statistical Test Outcomes in Figures

If the results shown in a figure have been tested with an inferential test, it is appropriate to summarize the outcome of the test in the graph so that your reader can quickly grasp the significance of the findings. It is imperative that you include information in your Materials and Methods, or in the figure legend, to explain how to interpret whatever system of coding you use.

Several common methods for summarizing statistical outcomes are shown below.

Examples: Comparing groups (t-tests, ANOVA, etc)

Comparison of the means of 2 or more groups is usually depicted in a bar graph of the means and associated error bars.

For two groups , the larger mean may have 1-4 asterisks centered over the error bar to indicate the relative level of the p-value. In general, "*" means p< 0.05, "**" means p< 0.01, "***" means p< 0.001, and "****" means p<0.0001. In all cases, the p-value should be reported as well in the figure legend

The asterisk may also be used with tabular results as shown below. Note how the author has used a footnote to define the p-values that correspond to the number of asterisks. ( Courtesy of Shelley Ball )

For three or more groups there are two systems typically used: lines or letters. The system you use depends on how complicated it is to summarize the result. The first example below shows a comparison of three means. The line spanning two adjacent bars indicates that they are not significantly different (based on a multiple comparisons test), and because the line does not include the pH 2 mean, it indicates that the pH 2 mean is significantly different from both the pH 5.3 (control) and the pH 3.5 group means. Note that information about how to interpret the coding system (line or letters) is included in the figure legend.

When lines cannot easily be drawn to summarize the result, the most common alternative is to use capital letters placed over the error bars. Letters shared in common between or among the groups would indicate no significant difference.