Trends in Big Data Analytics, Aggregation and Visualization – 2018

We create 2.5 quintillion bytes of data every day; and over 90% of the data in the world today has been created in just the last two years. The data comes from everywhere: social media sites, user generated data including digital pictures and videos, purchase transaction records, cell phone GPS signals, and data from devices and sensors used to name a just a few sources. Such data from structured and unstructured sources is known as Big Data.

Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term “big data” often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.Wikipedia

In addition to unstructured data from the web, business applications used by enterprises generate terabytes of data during their daily operations. Making sense of all the data and gleaning insights from it requires tools and people with the right functional and technical expertise. This has given rise to demand for data scientists and the business of managing Big Data.

The key attributes of Big Data that also contribute to its complexity include:

  • Volume – of Data at rest. Data scientists need to process Terabytes and Exabyte’s of data.
  • Velocity Data in motion that includes Streaming data, and transactions that take milliseconds to seconds to respond.
  • Variety – Data in many forms that includes structured, unstructured, text, multimedia. It is difficult to process and manage the massive volume of structured and unstructured data using traditional database and software techniques.
  • Veracity – Data in doubt. Uncertainty due to data inconsistency, incompleteness, ambiguities, latency, deception, approximations and unknown sources

Researchers, analysts and data scientists use the term “data mining” for techniques used to refine the raw data into information or knowledge.


Gaining Insights from Big Data

Efficient business operations require insights on data aggregated from customers and other organizations, government agencies and external research organizations. In addition business leaders may seek insights on external trends. This requires continual analysis of inputs from social media, satellite, drones, and sensors that generate vast amounts of unstructured data in a variety of formats. Such analysis may have to be correlated against transactional and reference data from databases like IBM’s DB2, Microsoft’ SQL Servers, or Oracle.

During a new product launch, executives may closely monitor the success of a new product launch by reviewing in-store sales data against insights from social media. (refer to the Starbucks example in the next section)

The figure below (source) highlights a framework to leverage Big Data techniques including Reporting,Dashboards and Data Discovery in a corporate environment. Such data can be cataloged, indexed, and queried using well-understood tools and techniques.


Figure: Framework for Visualization in a corporate environment


Emerging data and analytic techniques are being used to make sense of structured and unstructured data.  Users are embracing results from analysis of large, real-world data sets from public sources; and review of such big-data can produce reliable recommendations much more quickly.

Because of the way the human brain processes information, using charts or graphs to visualize large amounts of complex data is easier than poring over spreadsheets or reports. Data visualization is a quick, easy way to convey concepts in a universal manner, and many analytic tools also come with advanced visualization capabilities.


Why should corporate executives pay attention to Big Data?

As the challenges of big-data analysis and mining are being understood, innovative applications are highlighting the art of the possible.

Innovative data aggregators, organizations, and scientists are applying different types of analytic techniques like investigative data discovery, descriptive data aggregation, predictive analytics for data analysis. A few examples across businesses :

  • Experian is using Big Data and Machine Learning to reduce Mortgage Application time to a Few Days –  Credit reference agency Experian hold around 3.6 petabytes of data from people all over the world. Banks and financial institutions depend on them for insights on creditworthiness of individuals and businesses that they want to lend to. “Just a few years ago when we did analytics on a dataset it was based on a smaller, representative set of information. Today we don’t really reduce the size of the dataset, we do analytics across a terabyte, or petabyte, and that’s something we couldn’t do before.”  said an executive (
  • Introducing a New Coffee at Starbucks During a recent product rollout, Starbucks’s executives were concerned about ‘strong taste’ of a new coffee being introduced. On the day of the rollout, its data scientists began monitoring social media – blogs, Twitter, and niche coffee discussion forums – to review customers’ reactions in real-time. By mid-morning, Starbucks discovered that while customers were complementing the taste, a number of them thought it was too expensive. Starbucks immediately lowered the price, and by the end of the day most of the negative comments had disappeared. Such real-time analysis using big-data techniques might help companies react to customer feedback much faster than traditional techniques like waiting for market surveys
  • Drilling for Oil at ChevronOil drilling is an expensive business and a single drilling in the Gulf of Mexico costs Chevron upwards of $100 million. Traditionally, the odds of finding oil have been around 1 in 5. To improve its odds, Chevron invested in a program to digitally analyze seismic data. Its geologists began to leverage advances in computing power and storage to refine their computer models. Chevron was able to improve the odds of drilling a successful well to nearly 1 in 3, resulting in tremendous cost savings.
  • Formula One – Formula One cars generates terabytes of data during a typical race. The F1 cars are equipped with hundreds of IoT sensors, and they provide a stream of data which is analyzed in real-time. During a typical race, dozens of engineers at the track comb over the data in real-time , looking for any adjustment that could help the team win or lose a race.


Resources and References

Corporate digitization efforts and the need for expertise to guide the transformation translates to opportunity for consultants and software product development firms. Startups have begun exploring new and innovative techniques for big-data management, data aggregation and visualization. A few articles and research reports on the topic:

Edited and compiled by: Mohan K | Reproduction with permission only | Contact