Talk
Reality-Centric AI
Responsible AI
Mihaela van der Schaar
Professor of ML, AI and Medicine at the University of Cambridge
Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence, and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London. In addition to leading the van der Schaar Lab, Mihaela is the founder and director of the Cambridge Centre for AI in Medicine.Mihaela received numerous awards, including the Oon Prize for Preventative Medicine from the University of Cambridge (2018). Mihaela is personally credited as the inventor of 35 USA patents. She has made over 45 contributions to international standards for which she received 3 ISO Awards.
Talk
Friends don't let friends deploy black-box models
Data Engineering & MLOps
Explainability
Responsible AI
Rich Caruana
Senior Principal Research at Microsoft Research
Rich Caruana is a senior principal researcher at Microsoft Research. Before joining Microsoft, Rich was on the CS faculty at Cornell , at UCLA’s Med School, and at CMU’s Center for Learning and Discovery. Rich’s Ph.D. is from CMU. His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 for Meta Clustering, best paper awards in 2005, 2007, and 2014, and co-chaired KDD in 2007. His current research focus is on learning for medical decision making, interpretable AI, and large language models.
Talk
Building recommendation systems for e-commerce marketplaces
Data Engineering & MLOps
Data Product Management
Recommendation and Personalization
Bongani Shongwe
Senior Data Engineer at Adevinta
Bongani is a data engineer working in personalization and recommendations for Adevinta's central team for the online classified markets across Europe and North America. He has a background as a full-stack software engineer but swapped to Data Engineering as it allowed him to follow his passion for large-scale distributed systems and related topics. Bongani obtained his MSc in Computer Science from the University of the Witwatersrand in 2014, focusing on peer-to-peer infrastructures for online gaming.
Talk
Creating a large dataset for pretraining LLMs
Deep Learning
Generative AI
NLP
Guilherme Penedo
ML Research Engineer at HuggingFace
With a background in Aerospace Engineering, Guilherme entered the ML world as a Research Engineer at a French startup called LightOn. While at LightOn, he was part of the Falcon team and was in charge of creating the pretraining dataset for the Falcon LLM: the RefinedWeb dataset. After working on the Falcon project, he joined HuggingFace, where he maintains the open-source data processing library "datatrove" and works on improving pretraining datasets as a member of the HuggingFace Science Team.
Talk
Vector Databases (pg_vector) and Real-Time Analytics
Advanced Analytics
Generative AI
NLP
Hubert Dulay
Developer Advocate at StarTree
Hubert Dulay is an O'Reilly author of "Streaming Data Mesh" and "Streaming Databases" (early access). He is a veteran engineer with over 20 years of experience in big and fast data. Hubert has compiled his experiences with data from his time while consulting for many financial institutions, healthcare organizations, and telecommunications companies, providing simple solutions that solved many data problems. With the massive growth in ML/AI, Hubert helps data practitioners understand how existing data pipelines can be altered for ML/AL.
Talk
Beyond text: Exploring the world with Large Multimodal Models
Advanced Analytics
NLP
Computer Vision
Jules Talloen
Senior Machine Learning Engineer at ML6
As a full-stack ML engineer, Jules is interested in the entire development process, from state-of-the-art research to data pipelines and web development. To stay at the forefront of innovation, he continuously monitors the latest research and enables applications built on top of the latest advancements. Using his technical expertise and drive for perfectionism, Jules translates business needs into high-quality, scalable, and performant data processing solutions.
Talk
First Principles Thinking in real world Machine Learning
Data Career Skills
Data Product Management
Data Team Leadership
Pedro Azevedo
General Partner at Union Square Ventures
Pedro Azevedo is a mechanical engineering graduate from the University of Aveiro who researched Autonomous Driving Assistance Systems (ADAS) in the Laboratory of Automation and Robotics at the Department of Mechanical Engineering. After completing his master's degree, Pedro transitioned to the industry, where he worked on real-time ML applied to the manufacturing industry and now works on large-scale machine learning systems, developing end-to-end ML pipelines through use cases in world-class fashion brands such as Adidas.
Talk
Policy as code: Automating compliance with Data Mesh
Data Centric
Data Engineering & MLOps
Data Product Management
Responsible AI
Shawn Kyzer
Associate Director of Data Engineering at AstraZeneca
Shawn is an innovative technologist and data strategist with over 15 years of experience. He is passionate about harnessing data strategy, engineering, and analytics to help businesses uncover new opportunities. His holistic view of technology and emphasis on developing strong engineering talent, focusing on delivering outcomes while minimizing outputs, sets him apart. Shawn's deep technical knowledge includes distributed computing, cloud architecture, data science, machine learning, and engineering analytics platforms. He worked as a consultant for prestigious clients, from secret-level government organizations to Fortune 500 companies.
Talk
High Mileage of Low Rank Adapters
Data Engineering & MLOps
Deep Learning
NLP
Shikhar Chauhan
Data Scientist at KBC Bank NV
Shikhar Chauhan has worked with Deep Learning for images and text throughout his entire career, from solving fake news with AI to automating electronic waste recycling to automating document processing for a bank and optimizing internal processes using GenAI. Shikhar has also been a contributor to Transformers, Torchvision, and spaCy. He loves to share knowledge and keep up to date with the industry and the latest in the field. Shikhar is fascinated by NLP, Images, and Multi-Modal AI.
Talk
DGCF: Deep Graph-Powered Recommendations
Deep Learning
Recommendation and Personalization
Sofia Bourhim
Research Scientist at ENSIAS
Sofia Bourhim is a Research Scientist. She is working on Graph Machine Learning and its interdisciplinary applications to AI for Good problems. She recently obtained her Ph.D. in the field of AI. She previously interned at Microsft research lab (MARI) as a Research intern and is a recipient of the Microsoft PhD Fellowship. Sofia's contributions extend to receiving recognition for the best research papers at various conferences and notable events like NeurIPS'23 and Gitex'23.
Talk
Unlocking LLMs: From forgiving errors to ethical considerations in domain adaptation
Domain-specific applications of AI
NLP
Responsible AI
Vivek Kumar
General Partner at Union Square Ventures
Vivek Kumar is a Senior Researcher/Scientist and the Chair of Open Source Intelligence at the University of Federal Armed Forces under the Federal Ministry of Defence, Germany. He coordinates STELAR, an EU's Horizon 2020 program targeting "Artificial Intelligence, Data, and Robotics" in agri-food. Dr. Kumar won the Marie Sklodowska-Curie fellowship in 2019 for the EU's Horizon 2020 ITN network program PhilHumans & worked with Philips Research, Netherlands, and the University of Cagliari to receive a doctorate in 2023. His current research focuses on NLP, Knowledge Graphs, LLMs, AI Fairness & Bias.
Talk
The ML monitoring flow for models deployed to production
Data Engineering & MLOps
Explainability
Responsible AI
Wojtek Kuberski
Co-Founder and CTO at NannyML
Wojtek Kuberski is an AI professional and entrepreneur with a master's in AI from KU Leuven. He founded Prophecy Labs, a consultancy specializing in machine learning, before getting his current role as a co-founder and CTO of NannyML. NannyML is an open-source library for ML monitoring. At NannyML, he leads the research and product teams, contributing to novel algorithms in the model monitoring space. Wojtek was featured in the Breakout Session at Web Summit for "some of the world's most exciting early-stage startups".

The emerging festival for all data practitioners.

We could claim to be the best data conference in Europe, but we can do better because we're a festival. Join us this September for the only data event you must attend this year.

Watch 2023 Highlights
DISCOVER MORE
About the Event

What is
Data Makers Fest?

Data Makers Fest is a festival dedicated to all data makers. A vibrant, must-attend gathering for anyone passionate about making things happen with data. Here's what we're expecting for this second edition.

>1000
Attendeest
3
Stages
50
Talks
4
Hands-on Tutorials
2
Conference days
>20
Exhibitors
>1000
Attendees
3
Stages
50
Talks
4
Hands-on Tutorials
2
Conference days
>20
Exhibitors
>1000
Attendees
3
Stages
50
Talks
4
Hands-on Tutorials
2
Conference days
>20
Exhibitors
>1000
Attendees
3
Stages
50
Talks
Join Us

Reasons to join
Data Makers Fest

Access lots of high-quality content
Not the commercial stuff we’ve all seen too much by now, but the things that really matter.
It’s for the “real” data makers
You know, the ones who actually work with data and not only pretend to.
Meet worldwide data experts
An opportunity to informally network with experts, colleagues, and companies.
It’s a festival!
Not another boring conference. Expect beer, music, fun, and games between the talks.
TOPICS

Explore hot topics in data and AI

Embark on a journey into the future of AI and data with dynamic sessions covering cutting-edge topics such as Responsible AI implementation, advancements in Natural Language Processing, and the transformative power of Generative AI.

Advanced Analytics
Explainability
Data Career Skills
Forecasting
Data-Centric
Generative AI
Data Engineering & MLOps
Information Retrieval
Data Product Management
NLP
Data Team Leadership
Recommendation and Personalization
Data Visualization
Responsible AI
Deep Learning
Domain-specific applications of AI
speakers

Meet our speakers

Take a sneak peek into the lineup for Data Makers Fest 2024.
We will be announcing more names very soon.

Mihaela van der Schaar
Professor of ML, AI and Medicine at the University of Cambridge
Rich Caruana
Senior Principal Research at Microsoft Research
Bongani Shongwe
Senior Data Engineer at Adevinta
Guilherme Penedo
Machine Learning Research Engineer at HuggingFace
Talk
Reality-Centric AI
Responsible AI
Mihaela van der Schaar
Professor of ML, AI and Medicine at the University of Cambridge
Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence, and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London. In addition to leading the van der Schaar Lab, Mihaela is the founder and director of the Cambridge Centre for AI in Medicine.Mihaela received numerous awards, including the Oon Prize for Preventative Medicine from the University of Cambridge (2018). Mihaela is personally credited as the inventor of 35 USA patents. She has made over 45 contributions to international standards for which she received 3 ISO Awards.
Talk
Friends don't let friends deploy black-box models
Data Engineering & MLOps
Explainability
Responsible AI
Rich Caruana
Senior Principal Research at Microsoft Research
Rich Caruana is a senior principal researcher at Microsoft Research. Before joining Microsoft, Rich was on the CS faculty at Cornell , at UCLA’s Med School, and at CMU’s Center for Learning and Discovery. Rich’s Ph.D. is from CMU. His thesis on Multi-Task Learning helped create interest in a new subfield of machine learning called Transfer Learning. Rich received an NSF CAREER Award in 2004 for Meta Clustering, best paper awards in 2005, 2007, and 2014, and co-chaired KDD in 2007. His current research focus is on learning for medical decision making, interpretable AI, and large language models.
Talk
Building recommendation systems for e-commerce marketplaces
Data Engineering & MLOps
Data Product Management
Recommendation and Personalization
Bongani Shongwe
Senior Data Engineer at Adevinta
Bongani is a data engineer working in personalization and recommendations for Adevinta's central team for the online classified markets across Europe and North America. He has a background as a full-stack software engineer but swapped to Data Engineering as it allowed him to follow his passion for large-scale distributed systems and related topics. Bongani obtained his MSc in Computer Science from the University of the Witwatersrand in 2014, focusing on peer-to-peer infrastructures for online gaming.
Talk
Creating a large dataset for pretraining LLMs
Deep Learning
Generative AI
NLP
Guilherme Penedo
ML Research Engineer at HuggingFace
With a background in Aerospace Engineering, Guilherme entered the ML world as a Research Engineer at a French startup called LightOn. While at LightOn, he was part of the Falcon team and was in charge of creating the pretraining dataset for the Falcon LLM: the RefinedWeb dataset. After working on the Falcon project, he joined HuggingFace, where he maintains the open-source data processing library "datatrove" and works on improving pretraining datasets as a member of the HuggingFace Science Team.
Talk
Vector Databases (pg_vector) and Real-Time Analytics
Advanced Analytics
Generative AI
NLP
Hubert Dulay
Developer Advocate at StarTree
Hubert Dulay is an O'Reilly author of "Streaming Data Mesh" and "Streaming Databases" (early access). He is a veteran engineer with over 20 years of experience in big and fast data. Hubert has compiled his experiences with data from his time while consulting for many financial institutions, healthcare organizations, and telecommunications companies, providing simple solutions that solved many data problems. With the massive growth in ML/AI, Hubert helps data practitioners understand how existing data pipelines can be altered for ML/AL.
Talk
Beyond text: Exploring the world with Large Multimodal Models
Advanced Analytics
NLP
Computer Vision
Jules Talloen
Senior Machine Learning Engineer at ML6
As a full-stack ML engineer, Jules is interested in the entire development process, from state-of-the-art research to data pipelines and web development. To stay at the forefront of innovation, he continuously monitors the latest research and enables applications built on top of the latest advancements. Using his technical expertise and drive for perfectionism, Jules translates business needs into high-quality, scalable, and performant data processing solutions.
Talk
First Principles Thinking in real world Machine Learning
Data Career Skills
Data Product Management
Data Team Leadership
Pedro Azevedo
General Partner at Union Square Ventures
Pedro Azevedo is a mechanical engineering graduate from the University of Aveiro who researched Autonomous Driving Assistance Systems (ADAS) in the Laboratory of Automation and Robotics at the Department of Mechanical Engineering. After completing his master's degree, Pedro transitioned to the industry, where he worked on real-time ML applied to the manufacturing industry and now works on large-scale machine learning systems, developing end-to-end ML pipelines through use cases in world-class fashion brands such as Adidas.
Talk
Policy as code: Automating compliance with Data Mesh
Data Centric
Data Engineering & MLOps
Data Product Management
Responsible AI
Shawn Kyzer
Associate Director of Data Engineering at AstraZeneca
Shawn is an innovative technologist and data strategist with over 15 years of experience. He is passionate about harnessing data strategy, engineering, and analytics to help businesses uncover new opportunities. His holistic view of technology and emphasis on developing strong engineering talent, focusing on delivering outcomes while minimizing outputs, sets him apart. Shawn's deep technical knowledge includes distributed computing, cloud architecture, data science, machine learning, and engineering analytics platforms. He worked as a consultant for prestigious clients, from secret-level government organizations to Fortune 500 companies.
Talk
High Mileage of Low Rank Adapters
Data Engineering & MLOps
Deep Learning
NLP
Shikhar Chauhan
Data Scientist at KBC Bank NV
Shikhar Chauhan has worked with Deep Learning for images and text throughout his entire career, from solving fake news with AI to automating electronic waste recycling to automating document processing for a bank and optimizing internal processes using GenAI. Shikhar has also been a contributor to Transformers, Torchvision, and spaCy. He loves to share knowledge and keep up to date with the industry and the latest in the field. Shikhar is fascinated by NLP, Images, and Multi-Modal AI.
Talk
DGCF: Deep Graph-Powered Recommendations
Deep Learning
Recommendation and Personalization
Sofia Bourhim
Research Scientist at ENSIAS
Sofia Bourhim is a Research Scientist. She is working on Graph Machine Learning and its interdisciplinary applications to AI for Good problems. She recently obtained her Ph.D. in the field of AI. She previously interned at Microsft research lab (MARI) as a Research intern and is a recipient of the Microsoft PhD Fellowship. Sofia's contributions extend to receiving recognition for the best research papers at various conferences and notable events like NeurIPS'23 and Gitex'23.
Talk
Unlocking LLMs: From forgiving errors to ethical considerations in domain adaptation
Domain-specific applications of AI
NLP
Responsible AI
Vivek Kumar
General Partner at Union Square Ventures
Vivek Kumar is a Senior Researcher/Scientist and the Chair of Open Source Intelligence at the University of Federal Armed Forces under the Federal Ministry of Defence, Germany. He coordinates STELAR, an EU's Horizon 2020 program targeting "Artificial Intelligence, Data, and Robotics" in agri-food. Dr. Kumar won the Marie Sklodowska-Curie fellowship in 2019 for the EU's Horizon 2020 ITN network program PhilHumans & worked with Philips Research, Netherlands, and the University of Cagliari to receive a doctorate in 2023. His current research focuses on NLP, Knowledge Graphs, LLMs, AI Fairness & Bias.
Talk
The ML monitoring flow for models deployed to production
Data Engineering & MLOps
Explainability
Responsible AI
Wojtek Kuberski
Co-Founder and CTO at NannyML
Wojtek Kuberski is an AI professional and entrepreneur with a master's in AI from KU Leuven. He founded Prophecy Labs, a consultancy specializing in machine learning, before getting his current role as a co-founder and CTO of NannyML. NannyML is an open-source library for ML monitoring. At NannyML, he leads the research and product teams, contributing to novel algorithms in the model monitoring space. Wojtek was featured in the Breakout Session at Web Summit for "some of the world's most exciting early-stage startups".
Hubert Dulay
Developer Advocate at StarTree
Jules Talloen
Senior Machine Learning Engineer at ML6
Pedro Azevedo
Machine Learning Analyst at Adidas
Shawn Kyzer
Associate Director of Data Engineering at AstraZeneca
Shikhar Chauhan
Data Scientist at KBC Bank NV
Sofia Bourhim
Research Scientist at ENSIAS
Vivek Kumar
Sr Researcher/Scientist at Universität der Bundeswehr München
Wojtek Kuberski
Co-Founder and CTO at NannyML
Get news about Data Makers Fest 2024 agenda
Thank you! We'll send you the updates.
Oops! Something went wrong while submitting the form.
PARTNERS

Meet our partners

From market-leading companies to innovative startups, our partners  fuel our journey with unparalleled support and collaboration.
As you see, there's still room for you.

Maker
Builders
Supporters
Communication Partners
Become a Partner

Partner with us!

Partner with Data Makers Fest side by side with market-leading companies. Check our previous partners here and our current ones above.

Find out the corporate opportunities we have for this edition and tailor your plan to your needs. We want you to join the Fest!