STAT 250 Spring 2019 Data Analysis Assignment 1 Your submitted document should include the following items. Points will be deducted if the following are not included. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin your document. Number your pages across your entire solutions document. Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do not include the questions in your submitted document. Generate all requested graphs and tables using StatCrunch. Upload your document onto Blackboard as a Word (docx) file or pdf file using the link provided by your instructor. It is your responsibility for uploading a readable file. Full assignment Instructions, as well as a example is attached as a word file.Access to StatCrunch is required. https://www.statcrunch.com/5.0/group.php?groupid=7…I will provide the login info..
STAT 250 Spring 2019 Data Analysis Assignment 1
Your submitted document should include the following items.
Points will be deducted if the
following are not included.
1.
Type your Name and STAT 250 with your correct section number
(e.g. STAT 250-xxx)
right justified and then Data Analysis Assignment #1 centered
on the top of page 1 below
your name the begin your document.
2.
Number your pages across your entire solutions document.
3.
Your document should include the ANSWERS ONLY with each
answer labeled by its
corresponding number and subpart. Keep the answers in order.
Do not include the questions
in your submitted document.
4.
Generate all requested graphs and tables using StatCrunch.
5.
Upload your document onto Blackboard as a Word (docx) file or
pdf file using the link
provided by your instructor. It is your responsibility for
uploading a readable file.
Elements of good technical writing:
Use complete and coherent sentences to answer the questions.
Graphs must be appropriately titled and should refer to the
context of the question.
Graphical displays must include labels with units if
appropriate for each axis.
Units should always be included when referring to numerical
values.
When making a comparison you must use comparative language,
such as “greater than”, “less
than”, or “about the same as.”
Ensure that all graphs and tables appear on one page and are
not split across two pages.
Type all mathematical calculations when directed to compute
an answer ‘by-hand.’
Pictures of actual handwritten work are not accepted on this
assignment.
When writing mathematical expressions into your document you
may use either an equation editor
or common shortcuts such as:
x can be written as sqrt(x), p̂ can be written as p-hat, x
can be
written as x-bar.
Problem 1: 2018 Movies
1
Moviepass is a subscription service that allows users to see
one movie per a day at select theaters.
AMC Theatres released their own movie subscription service
called A-List to compete with
Moviepass which allows users to see up to three movies per a
week. Raw data was collected from
one user who purchased an annual Moviepass subscription in
January 2018 and subscription to AList in November 2018. The
dataset found in our StatCrunch Group presents 169 movies seen
along with other variables describing each movie. The data
set is called “2018 Movies.”
a) Use StatCrunch to create a one-way table for the variable
“Genre” using both counts and
percentages. Select Stat → Tables → Frequency and select both
‘Frequency’ and ‘Percent of
total’ in the Statistic(s) box by holding down the Ctrl Key
(Command Key on Macs) when
making these selections. Copy your table into your document
and then manually round the
values in the ‘Percent of total” column to two decimal places
in the StatCrunch table that
you have copied into your document.
b) Interpret your findings from the table in part (a) by
identifying the least and most popular
genre by percent of total. Use complete sentences with
context and include the genre and
percentage in the sentences.
c) Use StatCrunch to generate a two-way table for the
variables “Genre” and “Viewer
Rating”. Go to Stat → Tables → Contingency → With Data (since
you have the raw data in
StatCrunch). Select “Genre” as your row variable and “Viewer
Rating” as your column
variable. In the display box, select only Percent of Total.
Lastly, unclick (or deselect) “ChiSquare test for independence”
since it is highlighted by default by holding the Ctrl key and
clicking on it. Copy your table into your document.
d) How many and what percentage of the 169 movies did the
viewer dislike? Answer this
question in a complete sentence.
e) What values are the same when looking at both your one-way
table and your two-way table?
Be specific if referencing rows or columns.
f) Now, create two more two-way tables keeping “Genre” as
your row variable and “Viewer
Rating” as your column variable. One table needs to include
row percentages and the other
needs to include column percentages. To do this, change what
you select in the display box
from percent of total (in part (c)) to row percent for the
first table and column percent for the
second table. Include both tables in your document.
g) Specifically interpret the meaning of the row percentage
found in the “Children’s/Animated”
and “Liked” cell. Note that there are 14 movies in that cell.
h) Now, specifically interpret the meaning of the column
percentage found in the
“Children’s/Animated” and “Liked” cell. Note that there are
14 movies in that cell.
Problem 2: 2018 Movies Revisited
2
Which genre is most popular among the 169 movies seen? Use
the “2018 Movies” data set posted
in our StatCrunch group to answer the following questions.
a) Using the variable named “Genre”, produce a relative
frequency bar chart using Graph →
Bar Plot → With Data. Please properly label axes and provide
a meaningful title and copy it
into your document.
b) Using the variable “Genre”, produce a relative frequency
Pareto chart. Begin with your bar
chart, and edit it by changing “Order by” to Count
Descending. Properly title and label your
graph and copy it into your document.
c) Using the variable “Genre”, produce a Pie Chart using
Graph → Pie Chart → With Data.
Add an appropriate title and copy this entire graph including
the legend into your document.
d) Use the three graphs to answer the question: Which genre
of movie did this individual see
the most of? Present both the count and the proportion and
write your answer in one
sentence.
e) Now produce two grouped relative frequency bar charts (to
copy to your document) by
following the directions below.
Go to Graph → Bar Plot → With Data.
For the first grouped bar chart, graph the variable “Viewer
Rating” and group by “Genre.”
To “group by” click the arrow next to Group by box (the third
box down) and select the
variable you are asked to group by. In the Type box (5th box
down from the top) choose
relative frequency within category. Title these graphs
clearly. You may keep the default
labels for the x and y-axis.
For the second grouped bar chart, graph the variable “Genre”
and group by “Viewer
Rating.” In the Type box (5th box down from the top) choose
relative frequency within
category. Title these graphs clearly. You may keep the
default labels for the x and y-axis.
f) Compare the graph variable among the categories of the
genres. Describe what you see
from each graph in one sentence each. Specifically with the
graph grouped by Viewer
Rating, revisit your answer to 1(h) in your comment.
See next page for Problem 3
Problem 3: Metro Bike Share
3
On July 7, 2016, the Los Angeles County Metropolitan
Transportation Authority launched a bicycle
sharing system called Metro Bike Share. The system uses a
fleet of about 1,400 bikes and includes
93 stations in Downtown Los Angeles, Venice, and the Port of
Los Angeles. It is the first bike
share system in the United States to be integrated as part of
the city’s existing public transit system.
The “Metro Bike Share” data set includes a random sample of
300 trips lasting between one and
60 minutes. Twelve variables are included for each
observation. The Duration variable indicates
the length of the trip in minutes.
a) Create a frequency histogram for the variable “Duration”
by using Graph → Histogram.
Properly title and label your graph and copy it into your
document.
b) Interpret the shape of this distribution in one complete
sentence.
c) Use StatCrunch to obtain the sample size, mean, and
standard deviation for the “Duration”
variable by using Stat → Summary Stats → Columns. Note: in
the Statistics box, select the
summary statistics listed above in the exact order given.
Copy the entire table into your
document and manually round each value to two decimal places.
d) Use StatCrunch to obtain the five number summary and the
IQR for the “Duration” variable
(the five number summary includes Min, Q1, Median, Q3, Max).
Go to Stat → Summary
Stats → Columns to obtain these values. Note: in the
Statistics box, select the summary
statistics listed above in the exact order given. Copy the
entire table into your document and
manually round each value to two decimal places.
e) Choose the appropriate summary statistics for center and
spread (presented in either 3c or
3d) based on your stated shape of the distribution in 3b.
f) Use your summary statistics from part 3d and determine the
fences used to mathematically
identify outliers for the “Duration” variable. To do this,
show all steps in your calculations
manually including how you obtained the upper and lower
fences. Please type your work
and calculations.
g) Construct a horizontally oriented boxplot of the
“Duration” variable by using Graph →
Boxplot. To do this, click the “Draw boxes horizontally” box.
Properly title and label and
copy this graph into your document.
h) How many outliers do you identify (please use both the
boxplot and your results from 3f)?
Write your response in a complete sentence.
Problem 4: SAT Scores
This data set presents SAT Verbal and Math scores for a
random sample of 300 individuals. In
addition, the individual’s gender and college is recorded.
The sample was collected from one of six
colleges (numbered 1 – 6). The data set is called “SAT
Scores.”
a) Construct two relative frequency histograms using the
“Math” variable (one for Males and
one for Females). To do this, go to Graph → Histogram. Select
Math to enter it in the
4
graph box and then click the arrow in the “Group by:” box and
select Gender. Properly title
and label your graphs. Finally, below the titling area, under
“For multiple graphs” change
Columns per page from 1 to 2 and click Compute! Once the
graph is computed, click the
three lines in the bottom left of the leftmost graph. Select
x-axis and change the minimum
to 250 and select the y-axis and change the maximum to 0.24.
(I have to do this to have
each graph have the same sizing for the x and y axes)*. Copy
and paste your graphs into
your document.
b) Describe the shape of each distribution in context in one
sentence each.
c) Use StatCrunch to obtain sample size (n), the mean, and
standard deviation of the “Math”
variable by Gender (using “Group by:”) Copy and paste the
table into your document.
Round your answers to whole numbers in your document.
For parts 4d-4f, determine how well the Empirical Rule does
in predicting the percentage of
observations within some number of standard deviations of the
mean.
d) Use your rounded summary statistics for females from part
4c to calculate the interval
corresponding to one, two, and three standard deviations
about the mean SAT Math
Score. Type your work showing how you obtained these
intervals. Round the endpoints
of the final intervals correctly to whole numbers and clearly
label and list these three
intervals in your document as shown below:
68% interval (lower value, upper value)
95% interval (lower value, upper value)
99.7% interval (lower value, upper value)
e) Use StatCrunch to determine the count and percentage of
observations falling in each of
these intervals by following the instructions listed below or
using another appropriate
counting method. Properly label and list these counts and
percentages in your document.
Start in the “Female Math SAT Scores” data set (found in your
StatCrunch Group). Go to
Data → Row Selection → Interactive Tools. In the slider
selectors box, click the variable
Math into the variable box. Then Click compute.
The box that appears has a slider under the words Math that
allows you to create ranges of
scores that you determined in 4d. Use the slider to obtain
the count for each interval by
looking at the “# rows selected” presented in the first line
of the box. Calculate the
percentages from the counts you obtained for each interval
and include them in your
document.
f) Do each of the three percentages found in part 4e match to
what the Empirical Rule
predicts? Compare your results in 4e with the expected
percentage stated in the empirical
rule. State your answer in one to three sentences.
g) Suppose a new female student with a Math SAT score of 700
was recorded. Calculate the z-
score of this ‘new’ score and explain in a complete sentence
what this z-score indicates.
5
1
Sample Solution to Display Formatting
Problem X: Students’ Grades
A random sample of 30 students was selected from a STAT 250
course taught during the
summer session and their first exam scores were recorded.
a) Create a histogram in StatCrunch. Be sure to title and
label it correctly.
b) Interpret the histogram’s shape
See sample solution and formatting on page 2.
Notes about submission
Following the main points will help you submit a
professionally completed assignment.
1)
2)
3)
4)
Right justify your name and provide your correct section and
the due date.
Center the specific homework assignment title.
Bold each problem complete problem number.
The graph can be around the below size for readability (click
on the graph once and only
adjust the size of the graph by using the bottom right dot)
5) Remember not to include the questions in your answer. Only
provide answers. Please
keep the assignment in problem and part order (present 1a,
then 1b, and so on).
2
Kenneth Strazzeri
STAT 250-0xx (your correct section)
Data Analysis Assignment 1
Problem X
a)
b) The shape of this distribution is left skewed because I
see the majority of the data values
falling in the upper end of the distribution and a few 50s
and 60s skewing the shape. There does
not seem to be any outliers visible on the graph.
…
Purchase answer to see full
attachment












Other samples, services and questions:
When you use PaperHelp, you save one valuable — TIME
You can spend it for more important things than paper writing.