Prepare data for DSM processing
The most prevalent challenges for data scientists are:
- Data accessibility
- Data reliability
- Data quality
Data Accessibility
In order to get data accessibility we recommend that you transfer the necessary data from your data ware house into your DSM project data store as show below.
A lot of public data is available on the web. To help you out we are providing some free tools here to give you access.
Data Reliability
Data science projects requires reliable data to get good result. The following are required:
- Single source of truth
- Data source traceability
- Stakeholder collaboration to determine that the data is trusted and viable
- Frame of reference converted
Data Quality
Poor quality data will result in wrong outcomes. Data must be measured per the following quality dimensions:
- Entirety
- Completeness
- Consistency
- Completeness
- Validity
- Uniqueness
Quality issues must be cleansed: Deleted or corrections predicted.
The more high-quality data they get, the more accurate and better their outcomes.