We create 2.5 quintillion bytes of data every day; and over 90% of the data in the world today has been created in just the last two years. The data comes from everywhere: social media sites, user generated data including digital pictures and videos, purchase transaction records, cell phone GPS signals, and data from devices and sensors used to name a just a few sources. Such data from structured and unstructured sources is known as Big Data.
Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term “big data” often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. – Wikipedia
In addition to unstructured data from the web, business applications used by companies and enterprises generate terabytes of data during their daily operations. Making sense of such data and gleaning insights from it requires tools and people with the right functional and technical expertise, and has given rise to the business of managing Big Data.
Some of the key attributes of Big Data that also contribute to its complexity include:
- Volume – of Data at rest. Data scientists need to process Terabytes and Exabyte’s of data.
- Velocity – Data in motion that includes Streaming data, milliseconds to seconds to respond.
- Variety – Data in many forms that includes structured, unstructured, text, multimedia. It is difficult to process and manage the massive volume of structured and unstructured data using traditional database and software techniques.
- Veracity – Data in doubt. Uncertainty due to data inconsistency, incompleteness, ambiguities, latency, deception, approximations.
Researchers, analysts and data scientists use the term “data mining” for techniques used to refine the raw data into information or knowledge.
Gaining Insights from Big Data
Business operations require insights on data from customers and other organizations, government agencies and other research organizations. In addition they may be required to analyze inputs from social media, satellite, drones, and sensors that generate vast amounts of unstructured data in a variety of formats including image, voice and text. They will also have to process transactional and reference data that may exist in databases within software applications running commercially developed databases like IBM’s DB2, Microsoft’ SQL Servers, or Oracle.
The figure below (source) highlights a framework to leverage Big Data techniques including Reporting,Dashboards and Data Discovery in a corporate environment. Such data can be cataloged, indexed, and queried using well-understood tools and techniques.
Emerging data and analytic techniques are being applied to make sense of structured and unstructured data. Users are embracing results from analysis of large, real-world data sets from public sources; and review of such big-data can produce reliable recommendations much more quickly.
Because of the way the human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports. Data visualization is a quick, easy way to convey concepts in a universal manner, and many analytic tools also come with advanced visualization capabilities.
Why should corporate executives pay attention to Big Data?
As the challenges of big-data are being understood, innovative applications are highlighting potential to glean insights from the aggregated sources.
Innovative data aggregators, organizations, and scientists are applying different types of analytic techniques like investigative data discovery, descriptive data aggregation, predictive analytics focused, and other techniques for data analysis. Here are a few examples:
- Experian is using Big Data and Machine Learning to reduce Mortgage Application time to a Few Days – Credit reference agency Experian hold around 3.6 petabytes of data from people all over the world. Banks and financial institutions depend on them for insights on creditworthiness of individuals and businesses that they want to lend to. “Just a few years ago when we did analytics on a dataset it was based on a smaller, representative set of information. Today we don’t really reduce the size of the dataset, we do analytics across a terabyte, or petabyte, and that’s something we couldn’t do before.” said an executive (Forbes.com)
- Introducing a New Coffee at Starbucks – During a recent product rollout, Starbucks’s executives were concerned about ‘strong taste’ of a new coffee being introduced. On the day of the rollout, its data scientists began monitoring social media – blogs, Twitter, and niche coffee discussion forums – to review customers’ reactions in real-time. By mid-morning, Starbucks discovered that while customers were complementing the taste, a number them thought it was too expensive. Starbucks immediately lowered the price, and by the end of the day most of the negative comments had disappeared. Such real-time analysis using big-data techniques might help companies react to customer feedback much faster than traditional techniques like waiting for market surveys
- Drilling for Oil at Chevron – Oil drilling is an expensive business and each drilling in the Gulf of Mexico costs Chevron upwards of $100 million. Traditionally, the odds of finding oil have been around 1 in 5. To improve its odds, Chevron invested in a program to digitally analyze seismic data. Its geologists began to leverage advances in computing power and storage capacity to refine their computer models. Chevron was able to improve the odds of a successful well to nearly 1 in 3, resulting in tremendous cost savings.
- Formula One – Formula One cars generates terabytes of data during a typical race. The F1 cars are equipped with hundreds of IoT sensors, and they provide a stream of data which is analyzed in real-time. During a typical race, dozens of engineers at the track comb over the data in real-time , looking for any adjustment that could help the team win or lose a race.
Resources and References
Corporate digitization efforts and the need for expertise to guide the transformation translates to opportunity for consultants and software product development firms. Startups have begun exploring new and innovative techniques for big-data management, data aggregation and visualization. A few articles and research reports on the topic:
- Review of Big Data techniques and tools
- What is Big Data? – IBM
- Big data: The next frontier for innovation, competition, and productivity – McKinsey report 2011
- Visualization is the future: 6 startups re-imagining how we consume data – Interesting article from 2013. Many of these startups have since been acquired.
- Enabling Agronomy Data and Analytical Modeling: A Journey
Edited and compiled by: Mohan K | Reproduction with permission only | Contact myDigitalStartup.net