analysis of variance

María J. Blanca, Rafael Alarcón, Jaume Arnau, Roser Bono and Rebecca Bendayan 552 One-way analysis of variance (ANOVA) or F-test is one of the most common statistical techniques in educational and psychological research (Keselman et al., 1998; Kieffer, Reese, & Thompson, 2001). The F-test assumes that the outcome variable is normally and independently distributed with equal variances among groups. However, real data are often not normally distributed and variances are not always equal. With regard to normality, Micceri (1989) analyzed 440 distributions from ability and psychometric measures and found that most of them were contaminated, including different types of tail weight (uniform to double exponential) and different classes of asymmetry. Blanca, Arnau, López-Montiel, Bono, and Bendayan (2013) analyzed 693 real datasets from psychological variables and found that 80% of them presented values of skewness and kurtosis ranging between -1.25 and 1.25, with extreme departures from the normal distribution being infrequent. These results were consistent with other studies with real data (e.g., Harvey & Siddique, 2000; Kobayashi, 2005; Van Der Linder, 2006). The effect of non-normality on F-test robustness has, since the 1930s, been extensively studied under a wide variety of conditions. As our aim is to examine the independent effect of non-normality the literature review focuses on studies that assumed variance homogeneity. Monte Carlo studies have considered unknown and known distributions such as mixed non-normal, lognormal, Poisson, exponential, uniform, chi-square, double exponential, Student’s t, binomial, gamma, Cauchy, and beta (Black, Ard, Smith, & Schibik, 2010; Bünning, 1997; Clinch & Kesselman, 1982; Feir-Walsh & Thoothaker, 1974; Gamage & Weerahandi, 1998; Lix, Keselman, & Keselman, 1996; Patrick, 2007; Schmider, Ziegler, Danay, Beyer, & Bühner, 2010). One of the fi rst studies on this topic was carried out by Pearson (1931), who found that F-test was valid provided that the deviation from normality was not extreme and the number of degrees of freedom apportioned to the residual variation was not too small. Norton (1951, cit. Lindquist, 1953) analyzed the effect of distribution shape on robustness (considering either that the distributions had the same shape in all the groups or a different shape in each group) ISSN 0214 – 9915 CODEN PSOTEG Copyright © 2017 Psicothema www.psicothema.com Non-normal data: Is ANOVA still a valid option? María J. Blanca1, Rafael Alarcón1, Jaume Arnau2, Roser Bono2 and Rebecca Bendayan1,3 1 Universidad de Málaga, 2 Universidad de Barcelona and 3 MRC Unit for Lifelong Health and Ageing, University College London Abstract Resumen Background: The robustness of F-test to non-normality has been studied from the 1930s through to the present day. However, this extensive body of research has yielded contradictory results, there being evidence both for and against its robustness. This study provides a systematic examination of F-test robustness to violations of normality in terms of Type I error, considering a wide variety of distributions commonly found in the health and social sciences. Method: We conducted a Monte Carlo simulation study involving a design with three groups and several known and unknown distributions. The manipulated variables were: Equal and unequal group sample sizes; group sample size and total sample size; coeffi cient of sample size variation; shape of the distribution and equal or unequal shapes of the group distributions; and pairing of group size with the degree of contamination in the distribution. Results: The results showed that in terms of Type I error the F-test was robust in 100% of the cases studied, independently of the manipulated conditions. Keywords: F-test, ANOVA, robustness, skewness, kurtosis. Datos no normales: ¿es el ANOVA una opción válida? Antecedentes: las consecuencias de la violación de la normalidad sobre la robustez del estadístico F han sido estudiadas desde 1930 y siguen siendo de interés en la actualidad. Sin embargo, aunque la investigación ha sido extensa, los resultados son contradictorios, encontrándose evidencia a favor y en contra de su robustez. El presente estudio presenta un análisis sistemático de la robustez del estadístico F en términos de error de Tipo I ante violaciones de la normalidad, considerando una amplia variedad de distribuciones frecuentemente encontradas en ciencias sociales y de la salud. Método: se ha realizado un estudio de simulación Monte Carlo considerando un diseño de tres grupos y diferentes distribuciones conocidas y no conocidas. Las variables manipuladas han sido: igualdad o desigualdad del tamaño de los grupos, tamaño muestral total y de los grupos; coefi ciente de variación del tamaño muestral; forma de la distribución e igualdad o desigualdad de la forma en los grupos; y emparejamiento entre el tamaño muestral con el grado de contaminación en la distribución. Resultados: los resultados muestran que el estadístico F es robusto en términos de error de Tipo I en el 100% de los casos estudiados, independientemente de las condiciones manipuladas. Palabras clave: estadístico F, ANOVA, robustez, asimetría, curtosis. Psicothema 2017, Vol. 29, No. 4, 552-557 doi: 10.7334/psicothema2016.383 Received: December 14, 2016 • Accepted: June 20, 2017 Corresponding author: María J. Blanca Facultad de Psicología Universidad de Málaga 29071 Málaga (Spain) e-mail: blamen@uma.es Non-normal data: Is ANOVA still a valid option? 553 and found that, in general, F-test was quite robust, the effect being negligible. Likewise, Tiku (1964) stated that distributions with skewness values in a different direction had a greater effect than did those with values in the same direction unless the degrees of freedom for error were fairly large. However, Glass, Peckham, and Sanders (1972) summarized these early studies and concluded that the procedure was affected by kurtosis, whereas skewness had very little effect. Conversely, Harwell, Rubinstein, Hayes, and Olds (1992), using meta-analytic techniques, found that skewness had more effect than kurtosis. A subsequent meta-analytic study by Lix et al. (1996) concluded that Type I error performance did not appear to be affected by non-normality. These inconsistencies may be attributable to the fact that a standard criterion has not been used to assess robustness, thus leading to different interpretations of the Type I error rate. The use of a single and standard criterion such as that proposed by Bradley (1978) would be helpful in this context. According to Bradley’s (1978) liberal criterion a statistical test is considered robust if the empirical Type I error rate is between .025 and .075 for a nominal alpha level of .05. In fact, had Bradley’s criterion of robustness been adopted in the abovementioned studies, many of their results would have been interpreted differently, leading to different conclusions. Furthermore, when this criterion is considered, more recent studies provide empirical evidence for the robustness of F-test under non-normality with homogeneity of variances (Black et al., 2010; Clinch & Keselman, 1982; Feir-Walsh & Thoothaker, 1974; Gamage & Weerahandi, 1998; Kanji, 1976; Lantz, 2013; Patrick, 2007; Schmider et al., 2010; Zijlstra, 2004). Based on most early studies, many classical handbooks on research methods in education and psychology draw the following conclusions: Moderate departures from normality are of little concern in the fi xed-effects analysis of variance (Montgomery, 1991); violations of normality do not constitute a serious problem, unless the violations are especially severe (Keppel, 1982); F-test is robust to moderate departures from normality when sample sizes are reasonably large and are equal (Winer, Brown, & Michels, 1991); and researchers do not need to be concerned about moderate departures from normality provided that the populations are homogeneous in form (Kirk, 2013). To summarize, F-test is robust to departures from normality when: a) the departure is moderate; b) the populations have the same distributional shape; and c) the sample sizes are large and equal. However, these conclusions are broad and ambiguous, and they are not helpful when it comes to deciding whether or not F-test can be used. The main problem is that expressions such as “moderate”, “severe” and “reasonably large sample size” are subject to different interpretations and, consequently, they do not constitute a standard guideline that helps applied researchers decide whether they can trust their F-test results under non-normality. Given this situation, the main goals of the present study are to provide a systematic examination of F-test robustness, in terms of Type I error, to violations of normality under homogeneity using a standard criterion such as that proposed by Bradley (1978). Specifi cally, we aim to answer the following questions: Is F-test robust to slight and moderate departures from normality? Is it robust to severe departures from normality? Is it sensitive to differences in shape among the groups? Does its robustness depend on the sample sizes? Is its robustness associated with equal or unequal sample sizes? To this end, we designed a Monte Carlo simulation study to examine the effect of a wide variety of distributions commonly found in the health and social sciences on the robustness of F-test. Distributions with a slight and moderate degree of contamination (Blanca et al., 2013) were simulated by generating distributions with values of skewness and kurtosis ranging between -1 and 1. Distributions with a severe degree of contamination (Micceri, 1989) were represented by exponential, double exponential, and chi-square with 8 degrees of freedom. In both cases, a wide range of sample sizes were considered with balanced and unbalanced designs and with equal and unequal distributions in groups. With unequal sample size and unequal shape in the groups, the pairing of group sample size with the degree of contamination in the distribution was also investigated. Method Instruments We conducted a Monte Carlo simulation study with non- normal data using SAS 9.4. (SAS Institute, 2013). Non-normal distributions were generated using the procedure proposed by Fleishman (1978), which uses a polynomial transformation to generate data with specifi c values of skewness and kurtosis. Procedure In order to examine the effect of non-normality on F-test robustness, a one-way design with 3 groups and homogeneity of variance was considered. The group effect was set to zero in the population model. The following variables were manipulated: 1. Equal and unequal group sample sizes. Unbalanced designs are more common than balanced designs in studies involving one-way and factorial ANOVA (Golinski & Cribbie, 2009; Keselman et al., 1998). Both were considered in order to extend our results to different research situations. 2. Group sample size and total sample size. A wide range of group sample sizes were considered, enabling us to study small, medium, and large sample sizes. With balanced designs the group sizes were set to 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90, and 100, with total sample size ranging from 15 to 300. With unbalanced designs, group sizes were set between 5 and 160, with a mean group size of between 10 and 100 and total sample size ranging from 15 to 300. 3. Coeffi cient of sample size variation (Δn), which represents the amount of inequality in group sizes. This was computed by dividing the standard deviation of the group sample size by its mean. Different degrees of variation were considered and were grouped as low, medium, and high. A low Δn was fi xed at approximately 0.16 (0.141 – 0.178), a medium coeffi cient at 0.33 (0.316 – 0.334), and a high value at 0.50 (0.491 – 0.521). Keselman et al. (1998) showed that the ratio of the largest to the smallest group size was greater than 3 in 43.5% of cases. With Δn = 0.16 this ratio was equal to 1.5, with Δn = 0.33 it was equal to either 2.3 or 2.5, and with Δn = 0.50 it ranged from 3.3 to 5.7. 4. Shape of the distribution and equal and unequal shape in the groups. Twenty-two distributions were investigated, involving several degrees of deviation from normality and with both equal and unequal shape in the groups. For equal shape and slight and moderate departures from normality, María J. Blanca, Rafael Alarcón, Jaume Arnau, Roser Bono and Rebecca Bendayan 554 the distributions had values of skewness (γ 1 ) and kurtosis (γ 2 ) ranging between -1 and 1, these values being representative of real data (Blanca et al., 2013). The values of γ 1 and γ 2 are presented in Table 2 (distributions 1-12). For severe departures from normality, distributions had values of γ 1 and γ 2 corresponding to the double exponential, chi-square with 8 degrees of freedom, and exponential distributions (Table 2, distributions 13-15). For unequal shape, the values of γ 1 and γ 2 of each group are presented in Table 3. Distributions 16-21 correspond to slight and moderate departures from normality and distribution 22 to severe departure. 5. Pairing of group size with degree of contamination in the distribution. This condition was included with unequal shape and unequal sample size. The pairing was positive when the largest group size was associated with the greater contamination, and vice versa. The pairing was negative when the largest group size was associated with the smallest contamination, and vice versa. The specifi c conditions with unequal sample size are shown in Table 1. Ten thousand replications of the 1308 conditions resulting from the combination of the above variables were performed at a signifi cance level of .05. This number of replications was chosen to ensure reliable results (Bendayan, Arnau, Blanca, & Bono, 2014; Robey & Barcikowski, 1992). Data analysis Empirical Type I error rates associated with F-test were analyzed for each condition according to Bradley’s robustness criterion (1978). Results Tables 2 and 3 show descriptive statistics for the Type I error rate across conditions for equal and unequal shapes. Although the tables do not include all available information (due to article length limitations), the maximum and minimum values are suffi cient for assessing robustness. Full tables are available upon request from the corresponding author. All empirical Type I error rates were within the bounds of Bradley’s criterion. The results show that F-test is robust for 3 groups in 100% of cases, regardless of the degree of deviation from a normal distribution, sample size, balanced or unbalanced cells, and equal or unequal distribution in the groups. Discussion We aimed to provide a systematic examination of F-test robustness to violations of normality under homogeneity of variance, applying Bradley’s (1978) criterion. Specifi cally, we sought to answer the following question: Is F-test robust, in terms of Type I error, to slight, moderate, and severe departures from normality, with various sample sizes (equal or unequal sample size) and with same or different shapes in the groups? The answer to this question is a resounding yes, since F-test controlled Type I error to within the bounds of Bradley’s criterion. Specifi cally, the results show that F-test remains robust with 3 groups when distributions have values of skewness and kurtosis ranging between -1 and 1, as well as with data showing a greater departure from normality, such as the exponential, double exponential, and chi-squared (8) distributions. This applies even when sample sizes are very small (i.e., n= 5) and quite different in the groups, and also when the group distributions differ signifi cantly. In addition, the test’s robustness is independent of the pairing of group size with the degree of contamination in the distribution. Our results support the idea that the discrepancies between studies on the effect of non-normality may be primarily attributed to differences in the robustness criterion adopted, rather than to the degree of contamination of the distributions. These fi ndings highlight the need to establish a standard criterion of robustness to clarify the potential implications when performing Monte Carlo studies. The present analysis made use of Bradley’s criterion, which has been argued to be one of the most suitable criteria for Table 1 Specifi c conditions studied under non-normality for unequal shape in the groups as a function of total sample size (N), means group size (N/J), coeffi cient of sample size variation (Δn), and pairing of group size with the degree of distribution contamination: (+) the largest group size is associated with the greater contamination and vice versa, and (-) the largest group size is associated with the smallest contamination and vice versa n Pairing N N/J Δn + – 30 10 0.16 0.33 0.50 8, 10, 12 6, 10, 14 5, 8, 17 12, 10, 8 14, 10, 6 17, 8, 5 45 15 0.16 0.33 0.50 12, 15, 18 9, 15, 21 6, 15, 24 18, 15, 12 21, 15, 9 24, 15, 6 60 20 0.16 0.33 0.50 16, 20, 24 12, 20, 28 8, 20, 32 24, 20, 16 28, 20, 12 32, 20, 8 75 25 0.16 0.33 0.50 20, 25, 30 15, 25, 35 10, 25, 40 30, 25, 20 35, 25, 15 40, 25, 10 90 30 0.16 0.33 0.50 24, 30, 36 18, 30, 42 12, 30, 48 36, 30, 24 42, 30, 18 48, 30, 12 120 40 0.16 0.33 0.50 32, 40, 48 24, 40, 56 16, 40, 64 48, 40, 32 56, 40, 24 64, 40, 16 150 50 0.16 0.33 0.50 40, 50, 60 30, 50, 70 20, 50, 80 60, 50, 40 70, 50, 30 80, 50, 20 180 60 0.16 0.33 0.50 48, 60, 72 36, 60, 84 24, 60, 96 72, 60, 48 84, 60, 36 96, 60, 24 210 70 0.16 0.33 0.50 56, 70, 84 42, 70, 98 28, 70, 112 84, 70, 56 98, 70, 42 112, 70, 28 240 80 0.16 0.33 0.50 64, 80, 96 48, 80, 112 32, 80, 128 96, 80, 64 112, 80, 48 128, 80, 32 270 90 0.16 0.33 0.50 72, 90, 108 54, 90, 126 36, 90, 144 108, 90, 72 126, 90, 54 144, 90, 36 300 100 0.16 0.33 0.50 80, 100, 120 60, 100, 140 40, 100, 160 120, 100, 80 140, 100, 60 160, 100, 40 Non-normal data: Is ANOVA still a valid option? 555 examining the robustness of statistical tests (Keselman, Algina, Kowalchuk, & Wolfi nger, 1999). In this respect, our results are consistent with previous studies whose Type I error rates were within the bounds of Bradley’s criterion under certain departures from normality (Black et al., 2010; Clinch & Keselman, 1982; Feir-Walsh & Thoothaker, 1974; Gamage & Weerahandi, 1998; Kanji, 1976; Lantz, 2013; Lix et al., 1996; Patrick, 2007; Schmider et al., 2010; Zijlstra, 2004). By contrast, however, our results do not concur, at least for the conditions studied here, with those classical handbooks which conclude that F-test is only robust if the departure from normality is moderate (Keppel, 1982; Montgomery, 1991), the populations have the same distributional shape (Kirk, 2013), and the sample sizes are large and equal (Winer et al., 1991). Our fi ndings are useful for applied research since they show that, in terms of Type I error, F-test remains a valid statistical procedure under non-normality in a variety of conditions. Data transformation or nonparametric analysis is often recommended when data are not normally distributed. However, data transformations offer no additional benefi ts over the good control of Type I error achieved by F-test. Furthermore, it is usually diffi cult to determine which transformation is appropriate for a set of data, and a given transformation may not be applicable when groups differ in shape. In addition, results are often diffi cult to interpret when data transformations are adopted. There are also disadvantages to using non-parametric procedures such as the Kruskal-Wallis test. This test converts quantitative continuous data into rank-ordered data, with a consequent loss of information. Moreover, the null hypothesis associated with the Kruskal-Wallis test differs from that of F-test, unless the distribution of groups has exactly the same shape (see Maxwell & Delaney, 2004). Given these limitations, there is no reason to prefer the Kruskal-Wallis test under the conditions studied in the present paper. Only with equal shape in the groups might the Kruskal-Wallis test be preferable, given its power advantage over F-test under specifi c distributions (Büning, 1997; Lantz, 2013). However, other studies suggest that F-test is robust, in terms of power, to violations of normality under certain conditions (Ferreira, Rocha, & Mequelino, 2012; Kanji, 1976; Schmider et al., 2010), even with very small sample size (n = 3; Khan & Rayner, 2003). In light of these inconsistencies, future research should explore the power of F-test when the normality assumption is not met. At all events, we encourage researchers to analyze the distribution underlying their data (e.g., coeffi cients of skewness and kurtosis in each group, goodness of fi t tests, and normality graphs) and to estimate a priori the sample size needed to achieve the desired power. Table 2 Descriptive statistics of Type I error for F-test with equal shape for each combination of skewness (γ 1 ) and kurtosis (γ 2 ) across all conditions Distributions γ1 γ2 n Min Max Mdn M SD 1 0 0.4 = ≠ .0434 .0445 .0541 .0556 .0491 .0497 .0493 .0496 .0029 .0022 2 0 0.8 = ≠ .0444 .0458 .0534 .0527 .0474 .0484 .0479 .0487 .0023 .0016 3 0 -0.8 = ≠ .0468 .0426 .0512 .0532 .0490 .0486 .0491 .0487 .0014 .0024 4 0.4 0 = ≠ .0360 .0392 .0499 .0534 .0469 .0477 .0457 .0472 .0044 .0032 5 0.8 0 = ≠ .0422 .0433 .0528 .0553 .0477 .0491 .0476 .0491 .0029 .0030 6 -0.8 0 = ≠ .0427

 

Order a Similar Paper

Article Analysis

Article Analysis: Example 2

Article Citation

and Permalink

 

Utens, C. M. A., Goossens, L. M. A., van Schayck, O. C. P., Rutten-van Mölken, M. P. M. H., van Litsenburg, W., Janssen, A., … Smeenk, F. W. J. M. (2013). Patient preference and satisfaction in hospital-at-home and usual hospital care for COPD exacerbations: Results of a randomised controlled trial. International Journal of Nursing Studies50, 1537–1549. doi.org/10.1016/j.ijnurstu.2013.03.006

 

Link: https://www.ncbi.nlm.nih.gov/pubmed/23582671

(Include permalink for articles from GCU Library.)

 

PointDescription
Broad Topic Area/TitleThe differences in preference and satisfaction based upon hospital care location for COPD exacerbations.
Define HypothesesHypothesis not stated. Below is an example from the study:

H0: There is no difference in satisfaction levels based upon treatment location.

H1: There is a difference in satisfaction levels based upon treatment locations.

 

 

Define Variables and Types of Data for VariablesTreatment Location – categorical – “home treatment” and “hospital treatment”

Satisfaction – Ordinal Scale (1-5)

Preference – categorical “home treatment” and “hospital treatment”

 

Population of Interest for the StudyCOPD exacerbation patients from five hospitals and three home care organizations
Sample139 patients

69 from the usual hospital care group

70 from the early assisted discharge care group

Sampling MethodMixed methods; quantitative was randomized sampling
How Were Data Collected?A questionnaire with both open-ended questions and questions with a scale of 1-5 (p. 1539)

 

© 2019. Grand Canyon University. All Rights Reserved.

 

2

 

Order a Similar Paper

Project Two Summary Report

MAT 243 Project Two Summary Report

 

[Full Name]

[SNHU Email]

Southern New Hampshire University

Note: Replace the bracketed text on page one (the cover page) with your personal information.

 

Introduction: Problem Statement

 

Discuss the statement of the problem in terms of the statistical analyses that are being performed. In your response, you should address the following questions:

 

· What is the problem you are going to solve?

· What data set are you using?

· What statistical methods will you be using to do the analysis for this project?

 

 Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include Python code in your report.

 

Introduction: Your Team and the Assigned Team

 

In the Python script, you picked the same team and years that you picked for Project One. The assigned team and its range of years will be the same as in Project One as well.

 

See Steps 1 and 2 in the Python script to address the following items in the table below:

 

· What team did you pick and what years were picked to do the analysis?

· What team and range of years were you assigned for the comparative study? (Hint: this is called the assigned team in the Python script.) Present this information in a formatted table as shown below.

 

Table 1. Information on the Teams

 

 Name of TeamYears Picked
1. YoursTeam (e.g. Knicks)XXXX-YYYY (e.g. 2013 – 2015)
2. AssignedTeam (e.g. Bulls)XXXX-YYYY (e.g. 1996- 1998)

 

 

 Answer the questions in a paragraph response. Remove all questions and this note (but not the table) before submitting! Do not include Python code in your report.

 

Hypothesis Test for the Population Mean (I)

 

Suppose a relative skill level of 1420 represents a critically low skill level in the league. The management of your team has hypothesized that the average relative skill level of your team is greater than 1420. You tested this claim using a 5% level of significance. For this test, you assumed that the population standard deviation for relative skill level is unknown. Explain the steps you took to test this problem and interpret your results.

 

See Step 3 in the Python script to address the following items:

 

· In general, how is hypothesis testing used to test claims about a population mean?

· Summarize all important steps of the hypothesis test. This includes:

a. Null Hypothesis (statistical notation and its description in words)

b. Alternative Hypothesis (statistical notation and its description in words)

c. Level of Significance

d. Report the Test Statistic and the P-value in a formatted table as shown below:

Table 2: Hypothesis Test for the Population Mean (I)

 

StatisticValue
Test StatisticX.XX

*Round off to 2 decimal places.

P-valueX.XXXX

*Round off to 4 decimal places.

 

 

e. Conclusion of the hypothesis test and its interpretation based on the P-value

· What are the implications of your findings from this hypothesis test? What is its practical significance?

 

 Answer the questions in a paragraph response. Remove all questions and this note (but not the table) before submitting! Do not include Python code in your report.

 

Hypothesis Test for the Population Mean (II)

 

Your team’s coach has hypothesized that average number of points scored by your team in the team’s years is less than 110 points. For this test, you assumed that the population standard deviation for points scored is unknown. You tested the claim using a 1% level of significance. Explain the steps you took to test this problem and interpret your results.

 

See Step 4 in the Python script to address the following items:

 

· Summarize all important steps of the hypothesis test. This includes:

a. Null Hypothesis (statistical notation and its description in words)

b. Alternative Hypothesis (statistical notation and its description in words)

c. Level of Significance

d. Report the Test Statistic and the P-value in a formatted table as shown below:

Table 3: Hypothesis Test for the Population Mean (II)

 

StatisticValue
Test StatisticX.XX

*Round off to 2 decimal places.

P-valueX.XXXX

*Round off to 4 decimal places.

 

 

e. Conclusion of the hypothesis test and its interpretation based on the P-value

· What are the implications of your findings from this hypothesis test? What is its practical significance?

 

 Answer the questions in a paragraph response. Remove all questions and this note (but not the table) before submitting! Do not include Python code in your report.

 

Hypothesis Test for the Population Proportion

 

Suppose the management claims that the proportion of games that your team wins when scoring 80 or more points is 0.50. You tested this claim using a 5% level of significance. Explain the steps you took to test this problem and interpret your results.

 

See Step 5 in the Python script to address the following items:

 

· In general, how is hypothesis testing used to test claims about a population proportion?

· Summarize all important steps of the hypothesis test. This includes:

a. Null Hypothesis (statistical notation and its description in words)

b. Alternative Hypothesis (statistical notation and its description in words)

c. Level of Significance

d. Report the Test Statistic and the P-value in a formatted table as shown below:

 

Table 4: Hypothesis Test for the Population Proportion

 

StatisticValue
Test StatisticX.XX

*Round off to 2 decimal places.

P-valueX.XXXX

*Round off to 4 decimal places.

 

 

e. Conclusion of the hypothesis test and its interpretation based on the P-value

· What are the implications of your findings from this hypothesis test? What is its practical significance?

 

 Answer the questions in a paragraph response. Remove all questions and this note (but not the table) before submitting! Do not include Python code in your report.

 

Hypothesis Test for the Difference Between Two Population Means

 

You were asked to compare your team’s skill level (from its years) with the assigned team’s skill level (from the assigned time frame). You tested the claim that the skill level of your team is the same as the skill level of the assigned team, using a 1% level of significance.

 

See Step 6 in the Python script to address the following items:

 

· In general, how is hypothesis testing used to test claims about the difference between two population means?

· Summarize all important steps of the hypothesis test. This includes:

a. Null Hypothesis (statistical notation and its description in words)

b. Alternative Hypothesis (statistical notation and its description in words)

c. Level of Significance

d. Report the Test Statistic and the P-value in a formatted table as shown below:

 

Table 5: Hypothesis Test for the Difference Between Two Population Means

 

StatisticValue
Test StatisticX.XX

*Round off to 2 decimal places.

P-valueX.XXXX

*Round off to 4 decimal places.

 

 

e. Conclusion of the hypothesis test and its interpretation based on the P-value

· What are the implications of your findings from this hypothesis test? What is its practical significance?

 

 Answer the questions in a paragraph response. Remove all questions and this note (but not the table) before submitting! Do not include Python code in your report.

 

 

Conclusion

 

Describe the results of your statistical analyses clearly, using proper descriptions of statistical terms and concepts.

 

· What is the practical importance of the analyses that were performed?

· Describe what these results mean for the scenario.

 

 Answer the questions in a paragraph response. Remove all questions and this note before submitting! Do not include Python code in your report.

 

Citations

 

You were not required to use external resources for this report. If you did not use any resources, you should remove this entire section. However, if you did use any resources to help you with your interpretation, you must cite them. Use proper APA format for citations.

 

Insert references here in the following format:

 

Author’s Last Name, First Initial. Middle Initial. (Year of Publication). Title of book: Subtitle of book, edition. Place of Publication: Publisher.

 

Order a Similar Paper

ALUMNI GIVING

UV6394 Rev. Sept. 22, 2014

 

This case was prepared by Phillip E. Pfeifer, Richard S. Reynolds Professor of Business Administration. It was written as a basis for class discussion rather than to illustrate effective or ineffective handling of an administrative situation. It should not be used as a source of primary research. Although the characters and situation are fictional, the data are not. Copyright  2012 by the University of Virginia Darden School Foundation, Charlottesville, VA. All rights reserved. To order copies, send an e-mail to sales@dardenbusinesspublishing.com. No part of this publication may be reproduced, stored in a retrieval system, used in a spreadsheet, or transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the permission of the Darden School.

 

ALUMNI GIVING

Madison Kryswada had been director of alumni relationships at State University for two years before she shared her frustrations about alumni giving with her staff. Kryswada’s team was responsible for all aspects of alumni relationships at State University, but the only thing the president’s office seemed to care about was money.

“Not only is there extra pressure to increase total giving now that state support has been

cut back,” she lamented, “but we are also pressured to increase the percentage of alumni who give (in any period).” Kryswada continued:

This new metric is called average alumni giving rate by U.S. News & World Report, and it constitutes 5% of the overall rating the publication gives to colleges and universities. In today’s competitive market for students, those ratings, as flawed as they are, are tremendously important. The average alumni giving rate is used as a measure of student satisfaction, and our 8% rate is in the low category. Analysis done by folks in the president’s office suggest that raising it is one of the easier ways to improve our overall #132 U.S. News ranking in the National Universities category. Not that I’m looking for excuses, but it always seemed to me that 8% was a very reasonable rate for a school like ours. We are not a small liberal-arts college that can cater to its students. We are a large public university that serves students from a wide variety of backgrounds. Although we do have a first-rate football team, our campus life is not, shall we say, as “memorable” as some of the football- powerhouse schools in the South. Our faculty members do a lot of research, and classes tend to be large and are often taught by graduate students. A substantial percentage of our students commute. All of this suggests to me that convincing 8% of alums to give something to good old State University in any year is actually something to be proud of. Or, excuse me, something of which we should be proud.

Do N

ot C

op y

or P

os t

This document is authorized for educator review use only by Adam Guerrero, until January 2017. Copying or posting is an infringement of copyright. Permissions@hbsp.harvard.edu or 617.783.7860

 

 

-2- UV6394

Alumni Giving Rate Data

To investigate further, Kryswada asked her assistant to assemble data on class sizes, graduation rates, and, most important, alumni giving rates for a selection of peer schools. Rather than spend a lot of energy defining the set of peer schools, Kryswada suggested using the list of U.S. schools that fielded football teams in the Football Bowl Subdivision (the highest competitive division for U.S. college football). Within a day, the assistant (with the help of a $34.99 subscription to premier content on the U.S. News & World Report website), assembled relevant data on 123 U.S. schools. Exhibit 1 describes the variables in the database, Exhibit 2 shows the first and last three entities in the database, and Exhibit 3 provides summary statistics for each of the six variables. The Research Questions

Kryswada was eager to learn what these data could tell her. As she saw it, this was an opportunity to uncover “the drivers of alumni giving rate.” To help her assistant take the next steps, Kryswada compiled a list of very specific questions:

1. School A’s graduation rate is 10 points higher than school B’s. How much higher do we

expect A’s giving rate to be? 2. How does the answer to question 1 change if we learn that A and B have identical

student-to-faculty ratios? 3. Which of the 123 schools has the most (least) impressive giving rate? 4. Consider a school similar to ours. We have a 67% graduation rate and a student–faculty

ratio of 17:1, 34% of the classes have fewer than 20 students, 23% of the classes have more than 50 students, and we have a freshman retention rate of 77%. Should this school’s giving rate be greater than or less than 8%? “We’ll meet first thing in the morning,” Kryswada told her assistant. “You can give me

the answers, and we’ll take it from there. You find this exciting, right?”

Do N

ot C

op y

or P

os t

This document is authorized for educator review use only by Adam Guerrero, until January 2017. Copying or posting is an infringement of copyright. Permissions@hbsp.harvard.edu or 617.783.7860

 

 

-3- UV6394

Exhibit 1 ALUMNI GIVING

Descriptions of Variables

Variable Description ID An identifier running from 1 to 123 (schools are listed in alphabetical order)

School Name of the school SFR Student-to-faculty ratio LT20 Percentage of classes with fewer than 20 students GT50 Percentage of classes with greater than 50 students GRAD Average six-year graduation rate FRR Freshman retention rate

GIVE Average alumni giving rate Source: Created by case writer.

 

Exhibit 2 ALUMNI GIVING

Alumni Giving Database

ID School SFR LT20 GT50 GRAD FRR GIVE 1 Arizona State University 24 42% 16% 59% 81% 8% 2 Arkansas State University 19 49% 4% 37% 69% 11% 3 Auburn University 18 24% 17% 66% 87% 31% . . . . . . . . . . . . . . . .

121 West Virginia University 23 32% 19% 59% 80% 12% 122 Western Kentucky University 19 43% 6% 49% 73% 13% 123 Western Michigan University 19 36% 11% 52% 74% 10%

Data source: U.S. News & World Report, 2010 survey data, http://premium.usnews.com/best-colleges (accessed May 22, 2012).

Do N

ot C

op y

or P

os t

This document is authorized for educator review use only by Adam Guerrero, until January 2017. Copying or posting is an infringement of copyright. Permissions@hbsp.harvard.edu or 617.783.7860

 

 

-4- UV6394

Exhibit 3 ALUMNI GIVING Summary Statistics

 

Source: Created by the case writer using StatTools, a statistics toolkit from the Palisade Corporation.

StatTools Report Analysis: One Variable Summary

Performed By: Pfeifer, Phil Date: Thursday, April 17, 2014

Updating: Live

SFR LT20 GT50 GRAD FRR GIVE One Variable Summary Alumni Giving Data Alumni Giving Data Alumni Giving Data Alumni Giving Data Alumni Giving Data Alumni Giving Data Mean 17.772 0.4037 0.13628 0.6452 0.84114 0.14179 Variance 20.407 0.0179 0.00361 0.0288 0.00705 0.00651 Std. Dev. 4.517 0.1339 0.06006 0.1698 0.08394 0.08067 Skewness ‐0.3623 1.1068 ‐0.0129 ‐0.0459 ‐0.2427 1.1040 Median 18.000 0.3800 0.13000 0.6400 0.84000 0.13000 Mode 19.000 0.3400 0.16000 0.8000 0.97000 0.13000 Minimum 6.000 0.1400 0.00000 0.2600 0.58000 0.02000 Maximum 31.000 0.9500 0.31000 0.9600 0.98000 0.41000 Count 123 123 123 123 123 123

Do N

ot C

op y

or P

os t

This document is authorized for educator review use only by Adam Guerrero, until January 2017. Copying or posting is an infringement of copyright. Permissions@hbsp.harvard.edu or 617.783.7860

 

Order a Similar Paper

This assignment picks up where the Module Two assignment left off and will use components of that assignment as a foundation.

This assignment picks up where the Module Two assignment left off and will use components of that assignment as a foundation.

You have submitted your initial analysis to the sales team at D.M. Pan Real Estate Company. You will continue your analysis of the provided Real Estate County Data spreadsheet using your selected region to complete your analysis. You may refer back to the initial report you developed in the Module Two Assignment Template to continue the work. This document and the National Statistics and Graphs spreadsheet will support your work on the assignment.

Note: In the report you prepare for the sales team, the dependent, or response, variable (y) should be the median listing price and the independent, or predictor, variable (x) should be the median square feet.

Using the Module Three Assignment Template, specifically address the following:

  • Regression Equation: Provide the regression equation for the line of best fit using the scatterplot from the Module Two assignment.
  • Determine r: Determine r and what it means. (What is the relationship between the variables?)
    • Determine the strength of the correlation (weak, moderate, or strong).
    • Discuss how you determine the direction of the association between the two variables.
      • Is there a positive or negative association?
      • What do you see as the direction of the correlation?
  • Examine the Slope and Intercepts: Examine the slopeb1{“version”:”1.1″,”math”:”<math xmlns=”http://www.w3.org/1998/Math/MathML”><msub><mi>b</mi><mn>1</mn></msub></math>”} and intercept b0{“version”:”1.1″,”math”:”<math xmlns=”http://www.w3.org/1998/Math/MathML”><msub><mi>b</mi><mn>0</mn></msub></math>”}.
    • Draw conclusions from the slope and intercept in the context of this problem.
      • Does the intercept make sense based on your observation of the line of best fit?
    • Determine the value of the land only.
      Note: You can assume, when the square footage of the house is zero, that the price is the value of just the land. This happens when x=0, which is the y-intercept.
  • Determine the R-squared Coefficient: Determine the R-squared value.
    • Discuss what R-squared means in the context of this analysis.
  • Conclusions: Reflect on the Relationship: Reflect on the relationship between square feet and sales price by answering the following questions:
    • Is the square footage for homes in your selected region different than for homes overall in the United States?
    • For every 100 square feet, how much does the price go up (i.e., can you use slope to help identify price changes)?
    • Use the regression equation to estimate how much you would list your house for if it was 1,200 square feet.
    • What square footage range would the graph be best used for?

Selling Price and Area Analysis for D.M. Pan National Real Estate Company

Selling Price Analysis for D.M. Pan National Real Estate Company 2

Report: Selling Price and Area Analysis for D.M. Pan National Real Estate Company

Kayla Murphy

Selling Price and Area Analysis for D.M. Pan National Real Estate Company 1

Southern New Hampshire University

Introduction

This study was carried out by real estate industry to determine competitive advantage of smart businesses. Statistical test such as regression analysis and descriptive statistics were used where regression analysis was used to determine the relationship between the selling price of properties and their size in square feet while descriptive statistics was used to summarize the data. A random sample of 30 real estates were selected where their median listing price and median square feet were recorded to carry out the study.

Representative Data Sample

Results from table 1 showed the results of 30 random samples from Texas region

Table 1

median listing price($)median square feet
2260001500
3140001300
2150001600
3540001650
2450001700
2570001200
3050001350
3170002000
2670001850
2680001750
2710001640
2290001625
3210001550
2160001150
2900001350
2790001700
3200001850
2110001400
2690001450
2250001250
3160001540
3280001260
3350001360
3270001900
2380001870
2400002100
2370001270
3020001670
3010001980
2240002140

 

Descriptive statistics

According to Nick (2007), descriptive statistics is a statistical measure that summarizes data into measures of center and central tendency table below showed the summary of median listing price and median square feet.

Table 2

median listing price ($) median square feet 
    
Mean274900Mean1598.5
Median270000Median1612.5
Standard Deviation42842.4Standard Deviation278.336

 

Results from the above table showed that the mean of the median listing price was $274900 while that of median square feet was 1598.5. The median value of the listing price was $270000 with the median of square feet being 1612.5. Standard deviation can be described as a measure of variation of a given data value from the mean. The standard deviation of median listing price was 42842.4 while that of median square feet was 278.336.

 

 

Data Analysis

From the descriptive statistics it was clear that the average median listing price of $274900 from the study was lower compared to that of national market that was approximately $284600.The sample is a reflective of the national market since the mean value is almost the same to that of the national market. The sample was made random by selecting the various samples from different strata’s. This included dividing the target population into different groups called strata’s and randomly selecting the 30 samples from the various groups. A sample of 30 was large enough to be a good representation of the population.

Scatterplot

 

 

The Pattern

From the scatter plot, median square feet was the predictor variable while the median listing price was the response variable. This was because median square feet was independent thus it was used to predict the selling price of the real estates.

The graph showed that there existed a positive relationship between median square and median listing price thus an increase in area of real estates would lead to an increase in the selling price. There existed no outliers from the data set since the values are concentrated in the same area.

The regression equation from the graph was,

y = 8.7399x + 260929

Having an area of 1200 square feet, the selling price would be

Median listing price = 8.7399(1200) + 260929

=$271416.88

The regression equation was positive indication that increase in independent variable would lead to an increase in the dependent variable.

 

 

 

 

 

 

 

 

 

 

 

 

 

References

Nick, Todd G. (2007). “Descriptive Statistics”. Topics in Biostatistics. Methods in Molecular Biology. 404. New York: Springer. pp. 33–52.

Phillip D. (2010). “Upper and Lower Bounds for the Sample Standard Deviation”. Teaching Statistics. 2 (3): 84–86

Scatter plot of median listing price v/s median square feet

median listing price($)

1500 1300 1600 1650 1700 1200 1350 2000 1850 1750 1640 1625 1550 1150 1350 1700 1850 1400 1450 1250 1540 1260 1360 1900 1870 2100 1270 1670 1980 2140 226000 314000 215000 354000 245000 257000 305000 317000 267000 268000 271000 229000 321000 216000 290000 279000 320000 211000 269000 225000 316000 328000 335000 327000 238000 240000 237000 302000 301000 224000

median square feet

 

 

median listing price

 

Order a Similar Paper

Homework Problems

STAT 200 Week 4 Homework Problems

6.1.2

1.) The commuter trains on the Red Line for the Regional Transit Authority (RTA) in Cleveland, OH, have a waiting time during peak rush hour periods of eight minutes (“2012 annual report,” 2012).

a.) State the random variable.

b.) Find the height of this uniform distribution.

c.) Find the probability of waiting between four and five minutes.

d.) Find the probability of waiting between three and eight minutes.

e.) Find the probability of waiting five minutes exactly.

 

6.3.2

 

Find the z-score corresponding to the given area. Remember, z is distributed as the standard normal distribution with mean of and standard deviation . 

a.) The area to the left of z is 15%.

b.) The area to the right of z is 65%.

c.) The area to the left of z is 10%.

d.) The area to the right of z is 5%.

e.)

The area between and z is 95%. (Hint draw a picture and figure out the area to the left of the .) 

f.)

The area between and z is 99%. 

6.3.4

According to the WHO MONICA Project the mean blood pressure for people in China is 128 mmHg with a standard deviation of 23 mmHg (Kuulasmaa, Hense & Tolonen, 1998). Assume that blood pressure is normally distributed.

a.) State the random variable.

b.) Find the probability that a person in China has blood pressure of 135 mmHg or more.

c.) Find the probability that a person in China has blood pressure of 141 mmHg or less.

d.) Find the probability that a person in China has blood pressure between 120 and 125 mmHg.

e.) Is it unusual for a person in China to have a blood pressure of 135 mmHg? Why or why not?

f.) What blood pressure do 90% of all people in China have less than?

6.3.8

A dishwasher has a mean life of 12 years with an estimated standard deviation of 1.25 years (“Appliance life expectancy,” 2013). Assume the life of a dishwasher is normally distributed.

a.) State the random variable.

b.) Find the probability that a dishwasher will last more than 15 years.

c.) Find the probability that a dishwasher will last less than 6 years.

d.) Find the probability that a dishwasher will last between 8 and 10 years.

e.) If you found a dishwasher that lasted less than 6 years, would you think that you have a problem with the manufacturing process? Why or why not?

f.) A manufacturer of dishwashers only wants to replace free of charge 5% of all dishwashers. How long should the manufacturer make the warranty period?

 

6.3.10

The mean yearly rainfall in Sydney, Australia, is about 137 mm and the standard deviation is about 69 mm (“Annual maximums of,” 2013). Assume rainfall is normally distributed.

a.) State the random variable.

b.) Find the probability that the yearly rainfall is less than 100 mm.

c.) Find the probability that the yearly rainfall is more than 240 mm.

d.) Find the probability that the yearly rainfall is between 140 and 250 mm.

e.) If a year has a rainfall less than 100mm, does that mean it is an unusually dry year? Why or why not?

f.) What rainfall amount are 90% of all yearly rainfalls more than?

 

6.4.4

Annual rainfalls for Sydney, Australia are given in table #6.4.6. (“Annual maximums of,” 2013). Can you assume rainfall is normally distributed?

Table #6.4.6: Annual Rainfall in Sydney, Australia

146.838390.9178.1267.595.5156.5180
90.9139.7200.2171.7187.2184.970.158
84.155.6133.1271.8135.971.999.4110.6
47.597.8122.758.4154.4173.7118.888
84.6171.5254.3185.9137.2138.996.285
45.274.7264.9113.8133.468.1156.4 

 

6.5.2

A random variable is normally distributed. It has a mean of 245 and a standard deviation of 21.

a.) If you take a sample of size 10, can you say what the shape of the distribution for the sample mean is? Why?

b.) For a sample of size 10, state the mean of the sample mean and the standard deviation of the sample mean.

c.) For a sample of size 10, find the probability that the sample mean is more than 241.

d.) If you take a sample of size 35, can you say what the shape of the distribution of the sample mean is? Why?

e.) For a sample of size 35, state the mean of the sample mean and the standard deviation of the sample mean.

f.) For a sample of size 35, find the probability that the sample mean is more than 241.

g.) Compare your answers in part c and f. Why is one smaller than the other?

 

6.5.4

According to the WHO MONICA Project the mean blood pressure for people in China is 128 mmHg with a standard deviation of 23 mmHg (Kuulasmaa, Hense & Tolonen, 1998). Blood pressure is normally distributed.

a.) State the random variable.

b.) Suppose a sample of size 15 is taken. State the shape of the distribution of the sample mean.

c.) Suppose a sample of size 15 is taken. State the mean of the sample mean.

d.) Suppose a sample of size 15 is taken. State the standard deviation of the sample mean.

e.) Suppose a sample of size 15 is taken. Find the probability that the sample mean blood pressure is more than 135 mmHg.

f.) Would it be unusual to find a sample mean of 15 people in China of more than 135 mmHg? Why or why not?

g.) If you did find a sample mean for 15 people in China to be more than 135 mmHg, what might you conclude?

 

6.5.6

The mean cholesterol levels of women age 45-59 in Ghana, Nigeria, and Seychelles is 5.1 mmol/l and the standard deviation is 1.0 mmol/l (Lawes, Hoorn, Law & Rodgers, 2004). Assume that cholesterol levels are normally distributed.

a.) State the random variable.

b.) Find the probability that a woman age 45-59 in Ghana has a cholesterol level above 6.2 mmol/l (considered a high level).

c.) Suppose doctors decide to test the woman’s cholesterol level again and average the two values. Find the probability that this woman’s mean cholesterol level for the two tests is above 6.2 mmol/l.

d.) Suppose doctors being very conservative decide to test the woman’s cholesterol level a third time and average the three values. Find the probability that this woman’s mean cholesterol level for the three tests is above 6.2 mmol/l.

e.) If the sample mean cholesterol level for this woman after three tests is above 6.2 mmol/l, what could you conclude?

 

6.5.8

A dishwasher has a mean life of 12 years with an estimated standard deviation of 1.25 years (“Appliance life expectancy,” 2013). The life of a dishwasher is normally distributed. Suppose you are a manufacturer and you take a sample of 10 dishwashers that you made.

a.) State the random variable.

b.) Find the mean of the sample mean.

c.) Find the standard deviation of the sample mean.

d.) What is the shape of the sampling distribution of the sample mean? Why?

e.) Find the probability that the sample mean of the dishwashers is less than 6 years.

f.) If you found the sample mean life of the 10 dishwashers to be less than 6 years, would you think that you have a problem with the manufacturing process? Why or why not?

 

 

µ = 0

 

m=0

 

σ = 1

 

s=1

 

−z

 

-z

Order a Similar Paper

Graduate Statistics Homework Help

Complete the following exercises located at the end of each chapter and put them into a Word document to be submitted as directed by the instructor.

Show all relevant work; use the equation editor in Microsoft Word when necessary.

1. Chapter 16, numbers 16.9, 16.10, 16.12 and 16.14

2. Chapter 17, numbers 17.6, 17.7, and 17.8

3. Chapter 18, numbers 18.8, 18.11, and 18.12

 

 

Order a Similar Paper

Interpreting Statistics Worksheet

PSY 335 Interpreting Statistics Worksheet

 

Testing of 20 randomly selected subjects yielded two sets of scores: Verbal IQ and Reading Comprehension. The table below displays the results and descriptive statistics, as well as frequency distribution charts for each variable and a scatterplot of the correlations between the two variables:

 

SubjectVerbal IQRdg. Comp.
19088
29997
3101110
4115121
59798
69895
79289
8104109
9103100
10117100
1110096
1298101
1396105
14105105
157880
1612199
178275
188589
198178
208286
MEAN9796
MEDIAN9897
MODE9889
STD.DEV.11.811.2
RANGE4346
CORREL VERB x RDG0.79

 

MP_SNHU_withQuill_Horizstack

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Please answer the following questions within the appropriate column in the table below:

I. Using the provided data and graphs, describe the frequency distribution for the IQ test:

a) What is a typical score for this sample?

b) How variable are the scores?

c) How are the scores distributed?

 

 
II. Using the provided data and graphs, describe the frequency distribution of the reading test scores:

a) What is a typical score for this sample?

b) How variable are the scores?

c) How are the scores distributed?

 

 
III. Consider the correlation data given the provided data and graph:

a) How are IQ and reading achievement related?

 

 
IV. Evaluate the data from a psychological testing perspective:

a) Are these samples good representations of the general population? How do you know?

b) What could you do to make them a more representative sample?

c) How would you interpret the correlation results?

d) What are some ways this knowledge of their relationship could be used?

 

 

 

 

Reading Comprehension Frequency Distribution

Frequency 70 80 90 100 110 120 More 0 3 4 7 6 1 0

STD SCORE

Verbal IQ Frequency Distribution

Frequency 70 80 90 100 110 120 More 0 1 4 7 4 2 1

STD SCORE

Frequency

Rdg.Compre 90 99 101 115 97 98 92 104 103 117 100 98 96 105 78 121 82 85 81 82 88 97 110 121 98 95 89 109 100 100 96 101 105 105 80 99 75 89 78 86

Verbal IQ Score

 

Reading Scores

 

Order a Similar Paper

Statistics

Class Name : MAT2058 Statistics (10wk) – 20200627 – MAT2058 Statistics VB05

Instructor Name : Bibi

Student Name : _____________________ Instructor Note :

Question 1 of 13

There is some evidence that, in the years , a simple name change resulted in a short-term increase in the price of certain business firms’ stocks (relative to the prices of similar stocks). (See D. Horsky and P. Swyngedouw, “Does it pay to change your company’s name? A stock market perspective,” Marketing Science v. , pp. .)

Suppose that, to test the profitability of name changes in the more recent market (the past five years), we analyze the stock prices of a large sample of corporations shortly after they changed names, and we find that the mean relative increase in stock price was about %, with a standard deviation of %. Suppose that this mean and standard deviation apply to the population of all companies that changed names during the past five years. Complete the following statements about the distribution of relative increases in stock price for all companies that changed names during the past five years.

(a) According to Chebyshev’s theorem, at least 84% of the relative increases in stock price lie between _____% and _____%. (Round your answer to 2 decimal places.)

(b) According to Chebyshev’s theorem, at least _____% of the relative increases in stock price lie between 0.49 % and 1.17 %. a. 56% b. 75% c. 84% d. 89%

(c) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately _____% of the relative increases in stock price lie between 0.49 % and 1.17 %. a. 68% b. 75% c. 95% d. 99.7%

(d) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately 68% of the relative increases in stock price lie between _____% and _____%.

Question 2 of 13

A nationwide test taken by high school sophomores and juniors has three sections, each scored on a scale of to . In a

recent year, the national mean score for the writing section was , with a standard deviation of . Based on this information, complete the following statements about the distribution of the scores on the writing section for the recent year.

(a) According to Chebyshev’s theorem, at least _____% of the scores lie between 26.4 and 70.0 . a. 56% b. 75% c. 84% d. 89%

(b) According to Chebyshev’s theorem, at least _____% of the scores lie between 31.85 and 64.55 . a. 56% b. 75% c. 84% d. 89%

(c) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately _____% of the scores lie between 26.4 and 70.0 . a. 68% b. 75% c. 95% d. 99.7%

(d) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately 99.7% of the scores lie between _____ and _____.

Question 3 of 13

−1981 85

6 −320 35,1987

0.83 0.17

20 80 48.2 10.9

 

 

© 2 0 2 0 M c G r a w – H i l l E d u c a t i o n . A l l R i g h t s R e s e r v e d .H o m e w o r k 2 # 3 P a g e 2 / 8

Loretta, who turns eighty this year, has just learned about blood pressure problems in the elderly and is interested in how her blood pressure compares to those of her peers. Specifically, she is interested in her systolic blood pressure, which can be problematic among the elderly. She has uncovered an article in a scientific journal that reports that the mean systolic blood pressure measurement for women over seventy-five is mmHg, with a standard deviation of mmHg.

Assume that the article reported correct information. Complete the following statements about the distribution of systolic blood pressure measurements for women over seventy-five.

(a) According to Chebyshev’s theorem, at least _____% of the measurements lie between 123.1 mmHg and 146.7 mmHg. a. 56% b. 75% c. 84% d. 89%

(b) According to Chebyshev’s theorem, at least 8/9 (about 89%) of the measurements lie between _____mmHg and _____mmHg. (Round your answer to 1 decimal place.)

(c) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately 68% of the measurements lie between _____mmHg and _____mmHg.

(d) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately _____% of the measurements lie between 123.1 mmHg and 146.7 mmHg. a. 68% b. 75% c. 95% d. 99.7%

Question 4 of 13

A real estate company is interested in the ages of home buyers. They examined the ages of thousands of home buyers and found that the mean age was years old, with a standard deviation of years. Suppose that these measures are valid for the population of all home buyers. Complete the following statements about the distribution of all ages of home buyers.

(a) According to Chebyshev’s theorem, at least 84% of the home buyers’ ages lie between _____years and _____years. (Round your answer to the nearest integer.)

(b) According to Chebyshev’s theorem, at least _____% of the home buyers’ ages lie between 27 years and 63 years. a. 56% b. 75% c. 84% d. 89%

(c) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately _____% of the home buyers’ ages lie between 27 years and 63 years. a. 68% b. 75% c. 95% d. 99.7%

(d) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately 68% of the home buyers’ ages lie between _____years and _____years.

Question 5 of 13

134.9 5.9

45 9

 

 

© 2 0 2 0 M c G r a w – H i l l E d u c a t i o n . A l l R i g h t s R e s e r v e d .H o m e w o r k 2 # 3 P a g e 3 / 8

Suppose that the genders of the three children of a certain family are soon to be revealed. Outcomes are thus triples of “girls” ( ) and “boys” ( ), which we write , , etc. For each outcome, let be the random variable counting the

number of boys in each outcome. For example, if the outcome is , then . Suppose that the random variable

is defined in terms of as follows: . The values of are thus:

Outcome

Value of

Calculate the probability distribution function of , i.e. the function . First, fill in the first row with the values of .

Then fill in the appropriate probabilities in the second row.

Value of

Question 6 of 13

Fill in the values in the table below to give a legitimate probability distribution for the discrete random variable ,

whose possible values are , , , , and .

Value x of X P( X =x )

-4 0.29

3 0.11

4

5 0.21

6

Question 7 of 13

The ages (in years) of the employees at a particular computer store are the following.

Assuming that these ages constitute an entire population, find the standard deviation of the population. Round your answer to two decimal places.

(If necessary, consult a list of formulas.)

Question 8 of 13

Find the standard deviation of this sample of numbers. Round your answer to two decimal places.

(If necessary, consult a list of formulas.)

Question 9 of 13

g b gbg bbb R bgb =R bgb 2

X R =X −R 4 X

bgg ggb bbb bbg bgb gbb gbg ggg

X −3 −3 −1 −2 −2 −2 −3 −4

X pX x X

x X PX x

P =X x X −4 3 4 5 6

6

32, 32, 24, 45, 35, 42

69, 51, 56, 70, 62, 40

 

 

© 2 0 2 0 M c G r a w – H i l l E d u c a t i o n . A l l R i g h t s R e s e r v e d .H o m e w o r k 2 # 3 P a g e 4 / 8

Many tests that purport to measure intelligence are timed. The decision of how much time to allow for such a test often is made after examination of samples of times for completion of practice versions of the test. Below is a histogram summarizing such times for completion for one type of test.

Based on this histogram, estimate the standard deviation of the sample of times.

Carry your intermediate computations to at least four decimal places, and round your answer to at least one decimal place.

(If necessary, consult a list of formulas.)

Question 10 of 13

A major cab company in Chicago has computed its mean fare from O’Hare Airport to the Drake Hotel to be , with a

standard deviation of . Based on this information, complete the following statements about the distribution of the company’s fares from O’Hare Airport to the Drake Hotel.

(a) According to Chebyshev’s theorem, at least _____% of the fares lie between 21.81 dollars and 37.45 dollars. a. 56% b. 75% c. 84% d. 89%

(b) According to Chebyshev’s theorem, at least _____% of the fares lie between 23.765 dollars and 35.495 dollars. a. 56% b. 75% c. 84% d. 89%

(c) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately _____% of the fares lie between 21.81 dollars and 37.45 dollars. a. 68% b. 75% c. 95% d. 99.7%

(d) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately 68% of the fares lie between _____dollars and _____dollars.

Question 11 of 13

32

Frequency

Time for completion (in minutes)

15

10

5

0

2

10

14

6

8 12 16 20 24

32

$29.63 $3.91

 

 

© 2 0 2 0 M c G r a w – H i l l E d u c a t i o n . A l l R i g h t s R e s e r v e d .H o m e w o r k 2 # 3 P a g e 5 / 8

Archie is fed up with waiting in line at his local post office and decides to take action. Over the course of the next few months, he records the waiting times for each of a random selection of post office visits made by him and other customers. These waiting times (in minutes) are as follows:

Construct a box-and-whisker plot for the data.

Question 12 of 13

The following are the annual salaries of chief executive officers of major companies. (The salaries are written in thousands of dollars.)

Find th and th percentiles for these salaries.

(If necessary, consult a list of formulas.)

Question 13 of 13

BIG Corporation produces just about everything but is currently interested in the lifetimes of its batteries, hoping to obtain its share of a market boosted by the popularity of portable CD and MP3 players. To investigate its new line of Ultra batteries, BIG randomly selects Ultra batteries and finds that they have a mean lifetime of hours, with a standard

deviation of hours. Suppose that this mean and standard deviation apply to the population of all Ultra batteries. Complete the following statements about the distribution of lifetimes of all Ultra batteries.

(a) According to Chebyshev’s theorem, at least 36% of the lifetimes lie between _____hours and _____hours. (Round your answer to the nearest integer.)

(b) According to Chebyshev’s theorem, at least _____% of the lifetimes lie between 651 hours and 979 hours. a. 56% b. 75% c. 84% d. 89%

(c) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately _____% of the lifetimes lie between 651 hours and 979 hours. a. 68% b. 75% c. 95% d. 99.7%

(d) Suppose that the distribution is bell-shaped. According to the empirical rule, approximately 68% of the lifetimes lie between _____hours and _____hours.

18

27, 19, 24, 23, 19, 26, 25, 10, 22, 5, 14, 22, 11, 13, 6, 30, 15, 12

17

723, 472, 315, 790, 405, 633, 676, 609, 362, 495, 338, 743, 519, 75, 428, 542, 224

25 70

1000 815 82

5 10 15 20 25

Waiting time (in minutes)

Order a Similar Paper