# Little Big Data

## 1 Big data

### Learn It

• The World is constantly generating new data at an astonishing rate.
• According to IBM, every day in 2012, 2.5EB (Exabytes) of new data were generated. In this project, we'll look at where this data is coming from, see how it can be user, identify your contribution and try analysing some large data sets for ourselves.
• Let's start by thinking about how much data we're talking about.
• All data is stored as 1s and 0s. A single 1 or 0 is called a 'bit'
• 8 bits are called a byte. A single letter or punctuation mark takes up 1 byte of data. The word, 'hello' is 5 bytes long.
• 1000 bytes are called a kilobyte (kB).
• This picture of a kitten is about 333kB in size.
• There are 1000kB a megabyte. This 37s long sound file is about 1MB in size, or you could store the aforementioned kitten photo three times.
• There are 1000MB in a gigabyte (GB). You could store a little over 3000 copies of the kitten picture on a 1GB memory stick, or around 250 MP3 music tracks.
• There are 1000GB in a terabyte (TB). As of June 2015, Wikipedia (English only) takes up 10TB. Larger home computers often have 1TB hard drives to store data on.
• There are 1000TB in a petabyte (PB). In 2013, Netflix reported that they had 3.14PB of movies on their streaming service. This figure is likely to now be somewhat higher!
• If you had a petabyte of MP3 music, it'd take approximately 2000 years to play it all.
• It is estimated that a human brain has a storage capacity of around 2.5PB of binary data.
• There are 1000PB in an exabyte (EB); a billion gigabytes.
• According to Wikipedia:
• The world's technological capacity to store information grew from 2.6 ("optimally compressed") exabytes in 1986 to 15.8 in 1993, over 54.5 in 2000, and to 295 (optimally compressed) exabytes in 2007. This is equivalent to less than one CD (650 MB) per person in 1986 (539 MB per person), roughly four CDs per person of 1993, 12 CDs per person in the year 2000, and almost 61 CDs per person in 2007. Piling up the imagined 404 billion CDs from 2007 would create a stack from the earth to the moon and a quarter of this distance beyond (with 1.2 mm thickness per CD).
• It is estimated that Google hold around 10EB on their servers and keep around 5EB on backup tapes, making them the largest holder of data in the World.
• Daily Internet traffic around the World is (as of 2015) estimated to average 2.5EB of data.
• A gramme of DNA can in theory hold 455EB of binary data.

### Research it

• Some organisations store large amounts of data, which will need processing. Google, for instance.
• Each of us has a digital footprint, which we leave every day. The data we generate can be used as part of big data analysis projects to provide information to the Government to help assign NHS funding, or to help supermarkets decide how much milk to order.
• Consider the NHS. Across the country, millions of people are treated every year. Each time a patient is treated, an electronic record of the problem, diagnosis and treatment it stored.
• As the Government has all that data, they can quickly see what diseases are responsible for the most deaths in Britain, and fund research on improving treatments. It is also possible to see which hospitals are curing more people than others, and take steps to tackle underperformance. Those hospitals who treat more patients can be given more funding and so on.
• Tech/Lifestyle website, Mashable has an excellent article on big data, featuring examples like:
• WNYC (A New York radio station) used big data to help commuters anticipate travel times
• Twitter engineers used geolocation data from every tweet in several major cities to build a contour map of the region. The highest points indicate areas where more tweeting takes place.
• Patient.info collates search data from their site to show a heat-map of the UK based on web searches.

### Case Study

• A good example of big data in everyday life is attendance in school.
• At a class level, a form tutor could use attendance data to identify students who are consistently late.
• At a year group level, a head of year could use registration data to see which tutor groups have the best attendance.
• At a whole-school level, a headmaster could use this data to monitor the whole schools' attendance.
• At a county-level, local authorities can use this data to identify any schools where attendance is an issue.
• At a national level, the government can use this data to ensure attendance in schools is consistent around the country.
• At a global level, countries can compare their students attence with those from other countries around the world.