# Applied Biostatistics 1

PUH 5302, Applied Biostatistics 1

Course Learning Outcomes for Unit VIII

Upon completion of this unit, students should be able to:

3. Evaluate study designs and statistical tests for public health research and analysis. 3.1 Compare and contrast various types of tests used in nonparametric methods. 3.2 Analyze the use of data visualization methods in public health.

Course/Unit Learning Outcomes

Learning Activity

3.1 Unit Lesson Chapter 10 Unit VIII Assessment

3.2 Unit Lesson Chapter 12 Unit VIII Assessment

Reading Assignment

Chapter 10: Nonparametric Tests

Chapter 12: Data Visualization

Unit Lesson

Welcome to Unit VIII. In the previous unit, we learned how to analyze public health information and interpret results of biostatistical analysis. We also defined examples of the dependent and independent variables and closed with a discussion on multivariable methods.

In this unit, we will discuss how to select appropriate study designs and statistical methods for public health. In doing so, we will compare and contrast various methods used in nonparametric statistics and close with some information on data presentation and visualization methods.

Nonparametric Methods

Statistical methods have different forms of classifications such as descriptive and inferential statistics and parametric and nonparametric methods, to list a few. We will concentrate on nonparametric methods, but let’s briefly review some points about parametric methods.

Parametric methods work best with normally distributed populations (or close to normal populations).

They use two parameters to achieve normal distributions, namely mean and standard deviation.

They rely on assumptions made about a given population such as confidence interval for a population with known and unknown standard deviation, confidence interval for a population variance, and confidence interval of two means with unknown standard deviation.

In contrast with parametric methods, nonparametric methods do not have to make any assumptions about the population under study. They do not have any dependence on population under study, do not have fixed parameters, and are distribution-free methods. Many researchers have shown interest in nonparametric methods because, aside from the characteristics described above, they are easy to apply and understand, and they do not have any constraints.

UNIT VIII STUDY GUIDE

Selecting the Appropriate Study Design

PUH 5302, Applied Biostatistics 2

UNIT x STUDY GUIDE

Title

Comparing Parametric and Nonparametric Methods (Summary)

Parametric statistics depend on normal distribution, but nonparametric statistics do not.

There are less assumptions made in parametric than nonparametric statistics.

Parametric statistics use simpler formulae in comparison to nonparametric statistics.

Parametric statistics are used for normal or close to normal distribution. Nonparametric methods are used for data that are not normally distributed.

Parametric statistics are commonly used in preliminary data analysis, while nonparametric statistics are not used as often and generally only apply to special cases (Sullivan, 2018).

Applications of Nonparametric Methods Nonparametric methods are mostly used in studies involving populations with attributes that can be ranked. Data that can be ranked with no clear numerical underpinnings or interpretations are normally used in nonparametric analysis. Ordinal data are examples of such. Nonparametric methods are applied widely because they make fewer assumptions about the population under study. In addition, because there are fewer assumptions, they facilitate robust statistics by seeking to provide methods that follow popular statistical methods. However, one of the differences is that nonparametric methods are not affected by outliers or values that are plus or minus a few departures from the mean. Nonparametric methods have been associated with simplicity because they save the researcher from committing to other analyses to justify the use of parametric methods. However, this simplicity may serve as a weakness in that, in cases where a parametric test would be appropriate, the researcher may decide to choose the parametric method over the nonparametric method. Types of Data and Tests Used in Nonparametric Statistics Nonparametric statistics are used on nominal or ordinal data or scales of measurement. The table below gives you a summary of the type of tests used in nonparametric testing. The Chi-square statistics and their modifications are used for nominal data. All other nonparametric statistics are appropriate only when data are measured on an ordinal scale of measurement. See table below for examples of the different tests used for nominal and ordinal data.

Nominal Data Ordinal Data

Chi-Square Goodness-of-Fit Test Mann-Whitney U Test

Chi-Square Test of Independence Wilcoxon Signed Rank Test

McNemar’s Test Kruskal-Wallis Test Friedman Two-way Analysis of Variance (ANOVA) by Ranks Test Spearman rs

For a more comprehensive look at some of the tests above as well as other nonparametric tests read the information below:

The Chi-Square Goodness-of-Fit is a nonparametric test deployed to establish the significant difference between the observed value and the expected value. It helps to discern how the theoretical value fits the calculated value. It is most used to compare samples involving intervals.

The Fisher Exact Probability is used to test the statistical significance in certain samples of data. It falls in one of the classes of exact tests because the exact significance of the deviation from the null hypothesis can be calculated instead of approximated. The Fisher Exact Probability test is useful for categorical data to examine the significance of the association between the two categories.

The Mann-Whitney U test is the nonparametric counterpart of the parametric t-test. It does not require a normal distribution for its calculation, and it is equally effective as the t-test. In order to calculate the Mann-Whitney U test, some assumptions must be made: All observations are independent, they have ordinal data, distributions are equal for null hypothesis, and distributions are not equal for the alternative hypothesis. With these assumptions, the researcher can effectively conduct the test with reliable results.

The Wilcoxon Signed-Rank test is a nonparametric test used in evaluating the differences in two groups that are correlated. The basic requirement for using this test is that the data must be matched,

PUH 5302, Applied Biostatistics 3

UNIT x STUDY GUIDE

Title

the dependent variable must be continuous, and there should be no ties between the samples. It mostly works with the median of data samples.

The Kruskal-Wallis test is the nonparametric test for its parametric counterpart, analysis of variance (ANOVA). The two tests are used to examine significant differences between a continuous variable and a categorical variable. The continuous variable must be the dependent variable, and the categorical variable must be the independent variable with two or more groups. Unlike ANOVA, where assumptions of normality are assumed for the dependent variable, the Kruskall-Wallis test does not have such assumptions.

The Friedman Two-way Analysis of Variance (ANOVA) by Ranks test is also a nonparametric test similar to ANOVA and is used to examine differences across multiple samples using ranking. It is similar to the Kruskal-Wallis test.

The Kolmogorov-Smirnov test attempts to determine if two datasets are significantly different. It is a distribution-free test and makes no assumption about the distribution of the data. The Kolmogorov- Smirnov test may serve another purpose. It can be modified to function as a goodness-of-fit test, but it has been found to be less powerful in its function as a test for normality compared to other tests.

The Anderson-Darling test is a modification of the Kolmogorov-Smirnov test and is more powerful. The Anderson-Darling test uses specific distribution to calculate critical values and is more sensitive, making it an advantageous test. The limitation to the test is that it requires calculation of the critical values for each distribution.

In order to choose any of these tests for analysis, we must examine our samples in terms of number and relationship between variables. That is, the researcher must conduct exploratory data analysis or prepare the data for testing. Nonparametric tests, as opposed to parametric tests, use ranking. As an example of ranking, let’s examine the pain scale. The pain scale is often measured from 0 to 10, with 0 representing no pain and 10 representing agonizing pain. Sometimes pain scales use visual anchors such as smiling or crying faces that rank the intensity of the pain. Nonparametric testing uses ranks to compare data without taking the normality of the data into consideration. Let’s now examine the various steps in nonparametric testing. Steps in Nonparametric Testing Like parametric testing (discussed in Chapter 7), the nonparametric testing follows the same five steps of hypothesis testing. Please see page 227 of your textbook for further discussion of these steps. Data Visualization Data visualization is a graphical representation of information communicated to an audience. The information is encoded into visual graphics including charts, lines, and bar graphs. The goal here is to help the researcher communicate information clearly and effectively via graphical means that should stimulate the viewer’s attention (Sullivan, 2018). Recipients of the results of scientific findings need clear and accurate reporting of data and statistical results. These graphic presentations may help to generate interest and provoke the thoughts of the audience. Different Formats of Graphical Presentations Information is presented in different formats including texts, tables, figures, and pie charts. Chapter 12 in the textbook gives a clear picture of each of these various formats. One thing that is common with all of these formats is that they must be labeled effectively to provide meaning and interpretation centered on the information they represent. The chart below gives some of the characteristics of these various formats.

PUH 5302, Applied Biostatistics 4

UNIT x STUDY GUIDE

Title

Texts Tables Figures

A few numbers

Data that are secondary or ancillary to main analysis

Many data points to present and exact values

Main findings (often readers turn to tables before reading text)

Complex relationships among variables

Trends over time

Geographic variation

Main findings (often readers turn to figures before reading text)

(Sullivan, 2018)

Importance of Data Visualizations To the researcher and the consumer, data visualization is significant because it aids the quick absorption of information. It also helps to save time by looking at the big picture instead of pieces of information and shows patterns and trends in the data. Many consumers have become interested in reading research findings or materials that are presented in graphical forms because the graphical nature of the material helps hold their interest longer. Furthermore, data visualization makes data more accessible and less confusing and helps the researcher share his or her insights with everyone. In many cases, data visualization quickly reveals the outliers in the data and helps researcher or presenters memorize the important insights (Tandon, 2017). Summary Statistical analysis is key to researchers and consumers of the reports advanced from scientific studies. Two main forms of scientific data analysis commonly used in research are parametric and nonparametric methods. Both the parametric and nonparametric follow the same pattern when it comes to data analysis. However, the major difference lies with nonparametric methods not requiring normality of data for analysis. The results of these analyses are sometimes best presented in visual forms for easy and clear presentation to the consumer.

References Sullivan, L. M. (2018). Essentials of biostatistics in public health (3rd ed.). Burlington, MA: Jones & Bartlett

Learning. Tandon, D. (2017, March 14). The Importance of data visualization in your business and 10 ways to pull it off

easily [Blog post]. Retrieved from https://thekinigroup.com/importance-data-visualization/