1.2.3 Understand Data
The data understanding phase involves data collection and exploration which includes loading the data and analyzing the data for your business problem.
Assess the various data sources and formats. Load data into appropriate data management tools, such as Oracle Database. Explore relationships in data so it can be properly integrated. Query and visualize the data to address specific data mining questions such as distribution of attributes, relationship between pairs or small number of attributes, and perform simple statistical analysis. As you take a closer look at the data, you can determine how well it can be used to addresses the business problem. You can then decide to remove some of the data or add additional data. This is also the time to identify data quality problems such as:
- Is the data complete?
- Are there missing values in the data?
- What types of errors exist in the data and how can they be corrected?
To summarize, in this phase, you will:
- Access and collect data
- Explore data
- Assess data quality
Parent topic: Machine Learning Process