Exploratory Data Analysis Defined
Exploratory data analysis (EDA) is the critical process of performing initial investigations on data, to discover patterns, spot anomalies, test hypothesis and to check assumptions through the use of summary statistics and graphical representations.
The objectives of EDA include:
- Suggesting hypotheses about the causes of observed anomalies
- Assessing assumptions on which statistical inference will be based
- Supporting the selection of appropriate statistical tools and techniques
Exploratory data analysis can also help with:
- Generating questions about a users data.
- Searching for answers by visualizing, transforming, and modelling a users data.
- Refining questions and/or generating new questions.
- Detecting mistakes or anomalies
- Allowing for preliminary selection of appropriate models
- Determining relationships among the explanatory variables
- Assessing the direction and size of relationships between explanatory and outcome variables
Some techniques in EDA that can be used include:
- Clustering and dimension reduction techniques
- Univariate visualization of each field in the raw dataset
- Bivariate visualizations and summary statistics
- Multivariate visualizations and mapping
- K-Means Clustering
- Predictive models, e.g. linear regression.
In Data Defined, we help make the complex world of data more accessible by explaining some of the most complex aspects of the field.
Click Here for more Data Defined.