STAT 250 Spring 2019 Data Analysis Assignment 1 Your submitted document should include the following items. Points will be deducted if the following are not included. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin your document. Number your pages across your entire solutions document. Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do not include the questions in your submitted document. Generate all requested graphs and tables using StatCrunch. Upload your document onto Blackboard as a Word (docx) file or pdf file using the link provided by your instructor. It is your responsibility for uploading a readable file. Full assignment Instructions, as well as a example is attached as a word file.Access to StatCrunch is required. https://www.statcrunch.com/5.0/group.php?groupid=7…I will provide the login info..

Your submitted document should include the following items.
Points will be deducted if the

following are not included.

1.

Type your Name and STAT 250 with your correct section number
(e.g. STAT 250-xxx)

right justified and then Data Analysis Assignment #1 centered
on the top of page 1 below

your name the begin your document.

2.

Number your pages across your entire solutions document.

3.

Your document should include the ANSWERS ONLY with each
answer labeled by its

corresponding number and subpart. Keep the answers in order.
Do not include the questions

in your submitted document.

4.

Generate all requested graphs and tables using StatCrunch.

5.

Upload your document onto Blackboard as a Word (docx) file or
pdf file using the link

provided by your instructor. It is your responsibility for
uploading a readable file.

Elements of good technical writing:

Use complete and coherent sentences to answer the questions.

Graphs must be appropriately titled and should refer to the
context of the question.

Graphical displays must include labels with units if
appropriate for each axis.

Units should always be included when referring to numerical
values.

When making a comparison you must use comparative language,
such as “greater than”, “less

than”, or “about the same as.”

Ensure that all graphs and tables appear on one page and are
not split across two pages.

Type all mathematical calculations when directed to compute
an answer ‘by-hand.’

Pictures of actual handwritten work are not accepted on this
assignment.

When writing mathematical expressions into your document you
may use either an equation editor

or common shortcuts such as:

x can be written as sqrt(x), p̂ can be written as p-hat, x
can be

written as x-bar.

Problem 1: 2018 Movies

1

Moviepass is a subscription service that allows users to see
one movie per a day at select theaters.

AMC Theatres released their own movie subscription service
called A-List to compete with

Moviepass which allows users to see up to three movies per a
week. Raw data was collected from

one user who purchased an annual Moviepass subscription in
January 2018 and subscription to AList in November 2018. The
dataset found in our StatCrunch Group presents 169 movies seen

along with other variables describing each movie. The data
set is called “2018 Movies.”

a) Use StatCrunch to create a one-way table for the variable
“Genre” using both counts and

percentages. Select Stat → Tables → Frequency and select both
‘Frequency’ and ‘Percent of

total’ in the Statistic(s) box by holding down the Ctrl Key
(Command Key on Macs) when

making these selections. Copy your table into your document
and then manually round the

values in the ‘Percent of total” column to two decimal places
in the StatCrunch table that

you have copied into your document.

b) Interpret your findings from the table in part (a) by
identifying the least and most popular

genre by percent of total. Use complete sentences with
context and include the genre and

percentage in the sentences.

c) Use StatCrunch to generate a two-way table for the
variables “Genre” and “Viewer

Rating”. Go to Stat → Tables → Contingency → With Data (since
you have the raw data in

StatCrunch). Select “Genre” as your row variable and “Viewer
Rating” as your column

variable. In the display box, select only Percent of Total.
Lastly, unclick (or deselect) “ChiSquare test for independence”
since it is highlighted by default by holding the Ctrl key and

clicking on it. Copy your table into your document.

d) How many and what percentage of the 169 movies did the
viewer dislike? Answer this

question in a complete sentence.

e) What values are the same when looking at both your one-way
table and your two-way table?

Be specific if referencing rows or columns.

f) Now, create two more two-way tables keeping “Genre” as
your row variable and “Viewer

Rating” as your column variable. One table needs to include
row percentages and the other

needs to include column percentages. To do this, change what
you select in the display box

from percent of total (in part (c)) to row percent for the
first table and column percent for the

second table. Include both tables in your document.

g) Specifically interpret the meaning of the row percentage
found in the “Children’s/Animated”

and “Liked” cell. Note that there are 14 movies in that cell.

h) Now, specifically interpret the meaning of the column
percentage found in the

“Children’s/Animated” and “Liked” cell. Note that there are
14 movies in that cell.

Problem 2: 2018 Movies Revisited

2

Which genre is most popular among the 169 movies seen? Use
the “2018 Movies” data set posted

in our StatCrunch group to answer the following questions.

a) Using the variable named “Genre”, produce a relative
frequency bar chart using Graph →

Bar Plot → With Data. Please properly label axes and provide
a meaningful title and copy it

into your document.

b) Using the variable “Genre”, produce a relative frequency
Pareto chart. Begin with your bar

chart, and edit it by changing “Order by” to Count
Descending. Properly title and label your

graph and copy it into your document.

c) Using the variable “Genre”, produce a Pie Chart using
Graph → Pie Chart → With Data.

Add an appropriate title and copy this entire graph including
the legend into your document.

d) Use the three graphs to answer the question: Which genre
of movie did this individual see

the most of? Present both the count and the proportion and
write your answer in one

sentence.

e) Now produce two grouped relative frequency bar charts (to
copy to your document) by

following the directions below.

Go to Graph → Bar Plot → With Data.

For the first grouped bar chart, graph the variable “Viewer
Rating” and group by “Genre.”

To “group by” click the arrow next to Group by box (the third
box down) and select the

variable you are asked to group by. In the Type box (5th box
down from the top) choose

relative frequency within category. Title these graphs
clearly. You may keep the default

labels for the x and y-axis.

For the second grouped bar chart, graph the variable “Genre”
and group by “Viewer

Rating.” In the Type box (5th box down from the top) choose
relative frequency within

category. Title these graphs clearly. You may keep the
default labels for the x and y-axis.

f) Compare the graph variable among the categories of the
genres. Describe what you see

from each graph in one sentence each. Specifically with the
graph grouped by Viewer

Rating, revisit your answer to 1(h) in your comment.

See next page for Problem 3

Problem 3: Metro Bike Share

3

On July 7, 2016, the Los Angeles County Metropolitan
Transportation Authority launched a bicycle

sharing system called Metro Bike Share. The system uses a
fleet of about 1,400 bikes and includes

93 stations in Downtown Los Angeles, Venice, and the Port of
Los Angeles. It is the first bike

share system in the United States to be integrated as part of
the city’s existing public transit system.

The “Metro Bike Share” data set includes a random sample of
300 trips lasting between one and

60 minutes. Twelve variables are included for each
observation. The Duration variable indicates

the length of the trip in minutes.

a) Create a frequency histogram for the variable “Duration”
by using Graph → Histogram.

Properly title and label your graph and copy it into your
document.

b) Interpret the shape of this distribution in one complete
sentence.

c) Use StatCrunch to obtain the sample size, mean, and
standard deviation for the “Duration”

variable by using Stat → Summary Stats → Columns. Note: in
the Statistics box, select the

summary statistics listed above in the exact order given.
Copy the entire table into your

document and manually round each value to two decimal places.

d) Use StatCrunch to obtain the five number summary and the
IQR for the “Duration” variable

(the five number summary includes Min, Q1, Median, Q3, Max).
Go to Stat → Summary

Stats → Columns to obtain these values. Note: in the
Statistics box, select the summary

statistics listed above in the exact order given. Copy the
entire table into your document and

manually round each value to two decimal places.

e) Choose the appropriate summary statistics for center and
spread (presented in either 3c or

3d) based on your stated shape of the distribution in 3b.

f) Use your summary statistics from part 3d and determine the
fences used to mathematically

identify outliers for the “Duration” variable. To do this,
show all steps in your calculations

manually including how you obtained the upper and lower
fences. Please type your work

and calculations.

g) Construct a horizontally oriented boxplot of the
“Duration” variable by using Graph →

Boxplot. To do this, click the “Draw boxes horizontally” box.
Properly title and label and

copy this graph into your document.

h) How many outliers do you identify (please use both the
boxplot and your results from 3f)?

Write your response in a complete sentence.

Problem 4: SAT Scores

This data set presents SAT Verbal and Math scores for a
random sample of 300 individuals. In

addition, the individual’s gender and college is recorded.
The sample was collected from one of six

colleges (numbered 1 – 6). The data set is called “SAT
Scores.”

a) Construct two relative frequency histograms using the
“Math” variable (one for Males and

one for Females). To do this, go to Graph → Histogram. Select
Math to enter it in the

4

graph box and then click the arrow in the “Group by:” box and
select Gender. Properly title

and label your graphs. Finally, below the titling area, under
“For multiple graphs” change

Columns per page from 1 to 2 and click Compute! Once the
graph is computed, click the

three lines in the bottom left of the leftmost graph. Select
x-axis and change the minimum

to 250 and select the y-axis and change the maximum to 0.24.
(I have to do this to have

each graph have the same sizing for the x and y axes)*. Copy
and paste your graphs into

your document.

b) Describe the shape of each distribution in context in one
sentence each.

c) Use StatCrunch to obtain sample size (n), the mean, and
standard deviation of the “Math”

variable by Gender (using “Group by:”) Copy and paste the
table into your document.

Round your answers to whole numbers in your document.

For parts 4d-4f, determine how well the Empirical Rule does
in predicting the percentage of

observations within some number of standard deviations of the
mean.

d) Use your rounded summary statistics for females from part
4c to calculate the interval

corresponding to one, two, and three standard deviations
about the mean SAT Math

Score. Type your work showing how you obtained these
intervals. Round the endpoints

of the final intervals correctly to whole numbers and clearly
label and list these three

intervals in your document as shown below:

68% interval (lower value, upper value)

95% interval (lower value, upper value)

99.7% interval (lower value, upper value)

e) Use StatCrunch to determine the count and percentage of
observations falling in each of

these intervals by following the instructions listed below or
using another appropriate

counting method. Properly label and list these counts and
percentages in your document.

Start in the “Female Math SAT Scores” data set (found in your
StatCrunch Group). Go to

Data → Row Selection → Interactive Tools. In the slider
selectors box, click the variable

Math into the variable box. Then Click compute.

The box that appears has a slider under the words Math that
allows you to create ranges of

scores that you determined in 4d. Use the slider to obtain
the count for each interval by

looking at the “# rows selected” presented in the first line
of the box. Calculate the

percentages from the counts you obtained for each interval
and include them in your

document.

f) Do each of the three percentages found in part 4e match to
what the Empirical Rule

predicts? Compare your results in 4e with the expected
percentage stated in the empirical

rule. State your answer in one to three sentences.

g) Suppose a new female student with a Math SAT score of 700
was recorded. Calculate the z-

score of this ‘new’ score and explain in a complete sentence
what this z-score indicates.

5

1

Sample Solution to Display Formatting

Problem X: Students’ Grades

A random sample of 30 students was selected from a STAT 250
course taught during the

summer session and their first exam scores were recorded.

a) Create a histogram in StatCrunch. Be sure to title and
label it correctly.

b) Interpret the histogram’s shape

See sample solution and formatting on page 2.

Notes about submission

Following the main points will help you submit a
professionally completed assignment.

1)

2)

3)

4)

Right justify your name and provide your correct section and
the due date.

Center the specific homework assignment title.

Bold each problem complete problem number.

The graph can be around the below size for readability (click
on the graph once and only

adjust the size of the graph by using the bottom right dot)

5) Remember not to include the questions in your answer. Only
provide answers. Please

keep the assignment in problem and part order (present 1a,
then 1b, and so on).

2

Kenneth Strazzeri

STAT 250-0xx (your correct section)

Data Analysis Assignment 1

Problem X

a)

b) The shape of this distribution is left skewed because I
see the majority of the data values

falling in the upper end of the distribution and a few 50s
and 60s skewing the shape. There does

not seem to be any outliers visible on the graph.

…

