Blog Archives

A Closer Look at Exploratory Data Analysis: What and Why

What it is

An Exploratory Data Analysis, or EDA, is an exhaustive look at existing data from current and historical surveys conducted by a company.

In addition, the appropriate variables from your company’s customer database—such as information about rate plans, usage, account management, and others—are typically included in the analysis.

The intent of the EDA is to determine whether a predictive model is a viable analytical tool for a particular business problem, and if so, which type of modeling is most appropriate.

The deliverable is a low-risk, low-cost comprehensive report of findings of the univariate data and recommendations about how the company should use additional modeling.

At the very least, the EDA may reveal aspects of your company’s performance that others may not have seen.

Why do it

An EDA is a thorough examination meant to uncover the underlying structure of a data set and is important for a company because it exposes trends, patterns, and relationships that are not readily apparent.

You can’t draw reliable conclusions from a massive quantity of data by just gleaning over it—instead, you have to look at it carefully and methodically through an analytical lens.

Getting a “feel” for this critical information can help you detect mistakes, debunk assumptions, and understand the relationships between different key variables. Such insights may eventually lead to the selection of an appropriate predictive model.

What else you can do

If additional predictive modeling is deemed appropriate, a number of approaches may then be utilized.

Approach 1: A logistic model (LOGIT)

You may elect to segregate a company’s business customers into 2 separate and distinct classes: those who place a high value on or express high satisfaction with company services, and those who don’t.  This type of analysis is sometimes referred to as a response model.

LOGIT could offer some insight into the factors that drive this customer rating, especially when some of those factors are opinion oriented (from existing surveys, a survey designed expressly for this purpose, or both). This model would also utilize appropriate customer data from the company’s various strategic surveys, demographic variables, etc. One of the outcomes of LOGIT modeling is a probability, or “score,” that can be appended to each person in the larger database from whence the analysis came.

Approach 2: Recursive partitioning

Recursive partitioning is a technique that uses the same database and survey information but in a different way.

This modeling approach does a good job of taking categorical variables into account as well as ordinal and continuous value variables. Categorical variables tend to be characteristics, such as type of rate plan, type of business, or location, for example. Continuous variables are numbers, like number of employees or annual revenue.  Ratings scales from surveys fall into this latter type of data.

Recursive partitioning provides a “tree” output, and customers branch toward one classification or another based on how they respond to questions or how their behaviors and characteristics are measured.

No matter which course of action your company decides to take, the first step is always begins with an EDA. It’s an important component of the marketing research process that allows data to be organized, reviewed, and interpreted for the benefit of your business.