Talk
in
Workshop: Decentralization and Trustworthy Machine Learning in Web3: Methodologies, Platforms, and Applications
Invited Talk: Richard Socher, Swetha Mandava, Zairah Mustahsan - Becoming Data-Centric at You.com - a privacy focussed search engine
Zairah Mustahsan · Mani Swetha Mandava
Being data-driven improves decision-making outcomes and enables automation, but building data-driven tooling and culture is a complex and challenging task, especially for startups with limited resources. We will discuss this difficult task of creating an analytics platform from scratch at you.com to protect user privacy while driving decision-making across the organization.
The amount of data created daily is exponentially rising, and harnessing that data effectively and ethically is crucial for success in today’s world. We’ll talk about automatic data collection with privacy constraints and the infrastructure setup for data ingestion (Kafka), persistence (Delta Lake, CosmosDB), processing (Spark), access, and analytics platforms (Scuba, Databricks). We’ll walk through the lessons learned while using this mostly unstructured and unlabelled data for A/B tests and to train our search and ranking models, the importance of defining custom metrics specific to your product, and the necessary changes at the organizational level to drive adoption and confidence in data-centric approaches.