Exploratory data Analytics EDA
Exploratory data analytics is needed before building any predictions over the data, we need to understand the nature of the data and see any patterns found in it. So, with this in mind:
Tell Us: What kind of questions would you be able to answer through exploratory analysis?
Give any specific exploratory analysis you have performed. How does this step help in designing the predictive analytics model? Finally:
Exploratory analysis is not necessary about answering questions but generating questions for further investigation. In exploratory analysis we often visualize the data and perform simple data transformations such as sum and count. One question that can be answered in exploratory analysis is “what is the correlation between the variables?”. Using a correlation heat map this can be quickly done.
Exploratory data analytics in my job helps me add a new bond fund to our portfolios. There are over 50 funds available that could be added. i gathered data on all of the funds. I generated a correlation heatmap of the dataset, reduced the dataset down by removing some highly correlated variables then used a k-means cluster analysis to see if there were any definitive groupings of funds that were similar. From there i used one fund per grouping in our portfolio building process to see if that grouping would generate the desired outcome.
Exploratory analysis is crucial in business analytics because it enables practitioners to summarize and report on the main characteristics in a datasets. The major questions for EDA focuses on highlighting the distribution of the data, presence or availability of outliers, correlation between predictors, and the percentage of missing values in the data set. Actually, I compare EDA to detective work and understanding the data before making any statistical inferences.
My current job involves using SQL to handle much of the required deliverables. However, this program has exposed me to various EDA processes in R. An example is the recently completed Laptopsales project on R. The EDA involved understanding the mean, median, and existence of outlier prices in the data. Additionally, making tables of factors and understanding the percentage contribution of each variable is some sort of EDA to me. We do plots to ascertain the time series trends, histograms to understand distributions, and correlation matrices to understand multi-collinearity. I am not clear if unsupervised machine learning approaches like clustering can be considered as EDA.
Exploratory data analytics is a way to explore large data sets to become familiar, look for patterns and understand what the data (information) represents. The goal is to maximize the information in the dataset, extract meaningful information, find outliers that may skew the data, and identify underlying connections or relationships between the information. Once this is determined, the analyst needs to check the data for errors and see how the information trends over time.
When I started working with data in the emergency department, there was no significant dataset we could use to query information. From here, we needed to explore the raw data, find the fields that needed to be sourced and write the CCL code to export to SQL. In my case, I had to do reverse data analysis by asking the questions that needed to be answered and finding the fields that supplied the information.
Inaccurate information or information with large outliers caused by errors can have a negative impact on predictive analytics. It is important to explore the data to make sure the information is accurate. By having accurate historical data, we were able to create a basic predictive model with the ED arrivals. This allows leaders to determine the need to have additional staff report to work or start flexing who are no longer needed. Below is an example of one of our dashboards using predictive modeling based on current arrivals and predicting the trend over the next four hours.
Through exploratory data analytics, I would be able to answer the kinds of questions relating to the different products, profitability, and costs. Some specific questions I would ask through exploratory analytics would be: what is the most profitable product? what is the total revenue for the last year? for the last month? what was our total profit for the last year and month?
I have performed some exploratory analysis on the current dataset and answered some of the questions above. Once the data is pulled into a data analytics program such as Power BI or Tableau, the process becomes quite easy. By pulling in a few fields, I was able to break out profit by product for example. The exploratory step is crucial to designing a predictive analytics model as it helps you refine your questions you’re attempting to answer and helps you understand the current state of the business including certain trends. Then you can do predictive analytics and answer questions such as if current trends hold, where will we be in 6 months? a year? and so on and so forth.
Leave a ReplyWant to join the discussion?
Feel free to contribute!