Input Data

Prepare data for DSM processing

The most prevalent challenges for data scientists are:

  • Data accessibility
  • Data reliability
  • Data quality

Data Accessibility

In order to get data accessibility we recommend that you transfer the necessary data from your data ware house into your DSM project data store as show below.

A lot of public data is available on the web. To help you out we are providing some free tools here to give you access.

Data Reliability

Data science projects requires reliable data to get good result. The following are required:

  • Single source of truth
  • Data source traceability
  • Stakeholder collaboration to determine that the data is trusted and viable
  • Frame of reference converted

Data Quality

Poor quality data will result in wrong outcomes. Data must be measured per the following quality dimensions:

  • Entirety
  • Completeness
  • Consistency
  • Completeness
  • Validity
  • Uniqueness

Quality issues must be cleansed: Deleted or corrections predicted.

The more high-quality data they get, the more accurate and better their outcomes.