Models are only as good as the data that they’re trained on. But digging into your data to find deficiencies can be a time consuming and frustrating process. In this session, Voxel51 Co-Founder Brian Moore and ML Engineer Jacob Marks will demonstrate how a systematic, structured approach to improving data quality can streamline your ML workflows and help you achieve state of the art performance. We’ll cover best practices for co-developing data and models, including techniques for active learning, data cleaning, and identification of edge cases. Using the open source FiftyOne library, we’ll also show how to organize and visualize training data, build and execute data curation workflows, evaluate models, and integrate with other tools like annotation and experiment tracking in your ML stack.
You’ll walk away from this demonstration with a set of actionable workflows that you can apply to your own ML projects that will help you improve the quality of your training data and your model’s performance.