Describe how cleaning tools are used to prepare data for data mining projects.

Often, people think that ETL (extract, transformation, and load) is all there is to ensure data quality. There is a lot more to data quality than ETL; however, a data analyst should be familiar with ETL basics: processes, techniques, and tools. Data mining models may not perform well with inaccurate data or dirty data. The time to train and test a model may cause a project to fail when data is sparse; sparse data may lead to more time during exploration and finding better data to use for training. Understanding the basics of data management to include data quality may help a data analyst take less time to succeed with their data mining project.

1. Explain how extract, transform, and load (ETL) can affect data quality, data management goals, and affect data mining projects positively and negatively.
2. Describe how cleaning tools are used to prepare data for data mining projects.
3. Explain how to use SAS to create tasks and data flows.
4. Explain how to ensure data integrity can enhance a data model with SAS.