The 2017 article by ‘The Economist’ reads — “The world’s most valuable resource is no longer oil, but data”. Over the past few years, big tech companies such as Google, Facebook, Amazon, Microsoft, and even startups have been using data to develop data-driven products and recommendations. Why do you think data has risen to such significance? How does one analyze this big data to make conclusions?
We are living in a world where ‘Data’ impacts almost every aspect of our lives starting from personal suggestions like food to public opinion on politics. Data is being generated every second in enormous quantities and its ability to impact lives has skyrocketed its value to the current standards. There are many technologies developed to make good use of this big data and hence it is important for us to understand and keep up with them.
Data is stored in the form of numbers and characters and it is difficult for us to understand it in its raw form in long tables or excel sheets. Humans understand and process information better when it is represented visually. This helps us to find patterns and anomalies in the data and thereby giving us a better insight. Below is an example where data is represented in the form of a table with each row representing the percentage of different languages used by users sourced from their Github profiles. Now can we predict what is the most used programming language or how many developers use C++, from the given observations?
In order to answer such a question, it is necessary to find patterns in the data. This can be achieved with Data Visualization. It is a technique of representing data in the form of visual objects to convey the key observations. But what is the best way to visualize the data? Should I use a bar graph, pie chart, scatter plot, or something else? What should be the limits of my graph? Now, I will share some tips which will help you find these answers.
Understanding the purpose of visualization will help us to choose the right visual representation. Let’s take an example of the earnings of 2 companies in Q1 and Q2 for 2019. The Stacked bar graph on the left presents the data in the form that Company X is performing very well with respect to Company Y. However, the line chart on the right shows that Company Y showed growth over Q2 whereas Company X showed a decline. Hence the intent of visualization helps to drive the story for the analytics.
What is the data format?
It is crucial to identify the format of the data and then select charts accordingly. For example- Line charts are useful for showing comparison over time. It helps to clearly visualize the growth and fall of the metrics. On the other hand, bar graphs are not the best option for time-varying data. With a number of time steps, it is very difficult to understand the bar graphs. Bar graphs are generally used to show the ranking based comparison. Plotting the data in a sorted manner helps to realize the comparisons more effectively.
Do your visuals communicate your data?
It is very important that your visuals represent the information conveyed through your data and do not exist only for the visual appeal. For example below are the 2 plots communicating the same information i.e. accuracy of different models on Skeleton Action Recognition dataset NTU-120. In the case on the left, the y-axis varies between 0-100 and shows that there is not much growth over the different models. However, the plot on the right indicates that state-of-art performs significantly better over the previous methods. Similarly showing the minor grid lines in the later help us to easily observe the numerical value of accuracy of different models over the former one.
Aesthetics vs Functionality?
Visualization tools have evolved over the years, and adding aesthetics to the graphs is just a single click away. But it is crucial to recognize if such changes bring any functional advantage or are simply distracting. Let’s take a look at another example: The plots below represent the sales of a retail shop over the week. Although the former one looks aesthetically more pleasing, the different colors do not add any value. In the second plot, the data is separated into 2 logical groups i.e. Weekdays and Weekends. Thus, clustering the data based on common property and using color-coding helps to demonstrate the analysis in an effective manner.
It is important to visualize data as it helps in conveying information effectively. Also, while designing a visualization, consider the following questions: Why Visualization? Do your visuals communicate your data? Aesthetics vs Functionality?