5 – Clean it up

🎯 Learning Objectives

Develop the Communication and Networks Learning Strands:

  • Define the terms correlation and outliers in relation to data trends.
  • Identify the steps of the investigative cycle.
  • Solve a problem by implementing the steps of the investigative cycle on a data set.
  • Use your findings to support a recommendation.
💬 Key Vocabulary

  • Data
  • visualisation
  • insight
  • prediction
  • outliers
  • criteria
📝 Starter Activity – Analyse the graph

View this graph by clicking here.

  • What data is being displayed on the graph?
  • Does the graph show a trend?
  • Where are the anomalies in the data?
  • Why do you think those anomalies have occurred?

Put this information in the worksheet below:

📝 Correlation doesn’t always mean causation

A correlation shows that there is a relationship between two or more variables, but that doesn’t guarantee that one causes the other.

For example, there is likely to be a correlation between ice cream sales and the weather. Does that mean that ice cream sales cause hot weather?

📝 Where are the anomalies in the data?

Until 1949, most of the data follows a slow upward trend, but there are a few odd blips.

Data that sits outside a trend is known an outlier.

Outliers can cause problems when working out statistics such as the mean, but they shouldn’t be removed from the data set without investigating the reason for them.

📖 The investigative cycle

So far we have spent time investigating data sets to see patterns or to extract meaning.

The PPDAC cycle is a framework for us to follow when asking and answering real-world problems using data.

The Problem

Pose a question that you think the data will help you to answer.

Context is important when framing questions.

“What is the average number of goals scored in the first half for teams in the Premier League?”

In this example, the question includes variables that can be compared with one another.

The Plan

The plan involves working out:

Where will we get the data from?

How will it be collected (if we are collecting it ourselves)?

  • Predict an answer
  • Find a data set
  • Evaluate the quality of data
  • Plan how to collect the data

The Data

In this step, we gather the data.

Once we have the data we need to help us answer the question, we should look through the data to see if the data needs cleansing (detecting and correcting, or removing, corrupt or inaccurate data).

Analysis

This step is all about making sense of the data. To do this you need to:

  • Visualise the data
  • Spot any patterns, trends, correlations, or outliers
  • Write down your observations of what the data is showing you

Conclusions and Recommendations

What’s the answer to your question?

How does the data help prove the answer?

Is the answer reliable?

What can we do with the results?

Can we use this data to make a case for action, or has it led to further questions that need to be answered? 

📖 Gold/Platinum – Roller coasters (River Kingdom)

River Kingdom is a new theme park that is opening in the UK.

They want you to recommend design considerations that would help make a great experience for their visitors.

One of the main restrictions that they know of is that they can’t build a roller coaster over the height of 350 ft, due to limitations of the site.

The problem: roller coasters (River Kingdom)

“What makes a really cool roller coaster?” would be considered a poorly defined problem.

It doesn’t help us to understand what we are measuring.

What variables about roller coasters could we measure in order to help us answer that question?

Think/pair/share.

Measurable variables

  • Speed
  • Height
  • Drop
  • Number of twists and loops (inversions)
  • Length (distance)
  • Duration
  • Position (e.g. sitting down, suspended)

We can get data on existing roller coasters to find out what is possible.

First, we need to pose questions that help turn our vague aim into precise goals.

Remember that we have a height restriction of 350 ft.

Analysis

You can get the data from the  website CODAP, which will we also use to help us visualise the data. Click here to access the data.

Conclusion

Write a conclusion based on your findings and make a recommendation to River Kingdom.

  • What is your recommendation?
  • How does the data help support it?
  • Is this data enough to support your recommendation, or is there any further action or research that they should do?

In this lesson, you…

Look at the terms correlation and outliers in relation to data trends

Use the steps of an investigative cycle to help us find answers to a question or problem using data

Next lesson, you will…

Use the steps of the investigative cycle to solve a problem unique to us

Decide on the data we need to help us answer a question and create a method of capturing the data

🏅 Badge it

🥈 Silver Badge

🥇 Gold Badge

🥉 Platinum Badge

  • Complete the Gold/Platinum task including the explorer tasks and upload the worksheet to Bourne to Learn.