MATH 331 – Probability and Statistics

September 12, 2014 Exam 1 Name ____________________

1. Consider the sample data of ratings (on a scale from 0.0 to 10.0) for 24 participants on the America’s Got No Talent variety show:

5.9 3.3 1.7 4.1 0.1 5.3

4.1 4.4 6.9 5.0 1.4 3.1

1.3 2.0 3.1 1.3 9.2 5.8

3.0 2.4 4.1 2.6 2.5 6.5

a. Construct an ordered stem-and-leaf plot. Be sure to include a legend and a title.

Ordered stem-and-leaf plot for the ratings of participants on America’s Got No Talent

Stem Leaf

0 1

1 3 3 4 7

2 0 4 5 6

3 0 1 1 3

4 1 1 1 4

5 0 3 8 9

6 5 9

7

8

9 2

Key: 1|3 = 1.3

b. Identify the modal stem.

All 1, 2, 3, 4 and 5 are modal stem as they have the highest number of counts which is 4

c. Determine the range of the data. R = H – L

R = 9.2 – 0.1 = 9.1

d. Calculate and S to three decimal places. (Points: 16)

Sum of data = 89.1

Sum of square of data = 435.95

Sample mean = 89.1/24 = 3.713

2. Consider the sample data:

5.9 3.3 1.7 4.1 0.1 5.3

4.1 4.4 6.9 5.0 1.4 3.1

1.3 2.0 3.1 1.3 9.2 5.8

3.0 2.4 4.1 2.6 2.5 6.5

a. Find the five-number summary. Round to three decimal places.

Minimum = 0.1

Q1 = 2.3

Q2 = 3.2

Q3 = 5.075

Maximum = 9.2

b. Find the outlier boundaries using the IQR. If there are any, state which are outliers.

IQR = Q3 – Q1; Low boundary = Q1 – 1.5IQR

High boundary = Q3 + 1.5IQR

IQR = 5.075 – 2.3 = 2.775

Lower boundary = 2.3 – 1.5 x 2.775 = -1.863

Upper boundary = 5.075 + 1.5 x 2.775 = 9.2375

Therefore no outliers in the data

c. Determine the skewness of the data.

Skewness of the data = 0.6714

d. Sketch the boxplot. Label all parts, including the outliers (if there are any), whiskers, and the box. (Points: 20)

3. Consider the sample data:

5.9 3.3 1.7 4.1 0.1 5.3

4.1 4.4 6.9 5.0 1.4 3.1

1.3 2.0 3.1 1.3 9.2 5.8

3.0 2.4 4.1 2.6 2.5 6.5

Assume nothing is known about the shape of the distribution. Using Tchebysheff’s theorem, at least 75% of the data will lie between what two data values? Show work.

(Points : 6)

According to Tchebysheff’s theorem,

Therefore at least 75% of data lies between -0.563 and 7.989

4. An almost bell-shaped (symmetric) population of measurements has mean µ = 75 and standard deviation s = 8. Between what two values will approximately 68% of the measurements be found (empirical rule)? Show work. (Points : 6)

If the data is almost bell-shaped, 78% of data will lie between one standard deviation from the mean

Lower limit = 75 – 8 = 67

Upper limit = 75 + 8 = 83

Therefore 68% of the measurements will be found between 67 and 83

5. Use the table below. Round all answers to four decimal places. (15 points)

education x (in years) 12 17 9 14 15 7 10

income y (in thousands) 36 88 13 32 48 16 32

a. Find the linear regression equation y = ax + b.

The regression equation is given by y = -35.5639 + 6.1184x

b. Find the coefficient of determination r2 and interpret.

The coefficient of determination r2 is 0.7517. This implies that 75% of variation of y can be explained by the regression equation.

c. Use the regression equation to predict y when x = 11 and x = 18.

When x = 11, y = -35.5639 + 6.1184(11) = 31.7387

When x = 18, y = -35.5639 + 6.1184(18) = 74.5677

d. Find residuals for x = 14 and x = 10.

When x = 14, y = -35.5639 + 6.1184(14) = 50.0940

Residual = 32 – 50.0940 = -18.094

When x = 10, y = -35.5639 + 6.1184(10) = 25.6203

Residual = 32 – 25.6203 = 6.3797

e. Manually find the residual for x = 17. Round to four decimal places.

When x = 17, y = -35.5639 + 6.1184(17) = 68.4492

Residual = 88 – 68.4492 = 19.5508

6. Following is a sample of bivariate data listing years of formal education, x , and annual income in thousands of dollars, y , for seven persons:

education x (in years) 12 17 9 14 15 7 10

income y (in thousands) 36 88 13 32 48 16 32

a. Find Sx, Sy, and Sxy. Round to two decimal places.

Find Sxy using both formulas.

b. Calculate to two decimal places and interpret the sample correlation coefficient r. (12 pts.)

The r of 0.8670 implies that there is a strong positive linear correlation between y and x

7. Consider the sample data:

5.9 3.3 1.7 4.1 0.1 5.3

4.1 4.4 6.9 5.0 1.4 3.1

1.3 2.0 3.1 1.3 9.2 5.8

3.0 2.4 4.1 2.6 2.5 6.5

Using a class width of 2.0, and starting the first class with 0.0, create the frequency distribution. Show all classes and the frequency for the classes. Use left inclusion for the classes. Draw a histogram to represent this frequency distribution. Identify the modal class (16 pts.)

Class Frequency

5

8

8

2

1

The modal classes are and

8. Create a set of bivariate data that has 8 ordered pairs (points) which will have a correlation coefficient equal to -1. Explain why your example has all residuals equal to 0. (9 points)

Pairs x y

1 8 1

2 7 2

3 6 3

4 5 4

5 4 5

6 3 6

7 2 7

8 1 8

Since a correlation coefficient of -1 implies that the pairs of data are perfectly negatively correlated, there will not be any error or residual in the relationship for the two variables.