Transform data more efficiently – AI school

AWS – SageMaker For Data Scientists

About Lesson

SageMaker Data Wrangler offers a selection of 300+ prebuilt, PySpark-based data transformations so you can transform your data and scale your data preparation workflow without writing a single line of code.

Preconfigured transformations cover common use cases such as flattening JSON files, deleting duplicate rows, imputing missing data with mean or medium, one hot encoding, and time-series–specific transformers to accelerate the preparation of time-series data for ML. For your image data, SageMaker Data Wrangler offers common image augmentations (ie Blur, Enhance, Resize) and cleaning operations (ie drop corrupted images and duplicates). You can also author custom transformations in PySpark, SQL, and Pandas. SageMaker Data Wrangler offers image (imagaug, openCV) libraries for creating custom transforms for CV use cases and offers a rich library of code snippets to make it easier to author custom transformations.

Join the conversation