
EPSO AD7 Data Management and Data Knowledge Training Set
Introduction
Preparing for the EPSO AD7 Data Management and Data Knowledge competition requires mastering a broad spectrum of topics at a senior level. This training set is designed to mirror that breadth and depth, incorporating a mix of question types, from factual knowledge checks to case-based reasoning and practical scenario questions. The aim is to test not just memorized facts, but also the ability to apply concepts in realistic situations. The question set covers technical foundations, regulatory frameworks, and hands-on problem-solving to reflect the real duties of an AD7 data specialist.
All questions are grounded in the 11 “Typical Duties” listed in Annex II of the official competition notice. These duties span architecture, governance, policy, engineering, platforms, analytics, master data, metadata, quality, security, and AI readiness, essentially the full data management lifecycle. By aligning questions with these areas, the training ensures coverage of the same competencies that candidates’ professional experience is expected to encompass. Each question explicitly maps to one (or more) of the duties, ensuring role-faithfulness: the content remains relevant to tasks one would actually perform as a data expert in the EU context.
The difficulty level is calibrated for AD7 profiles, meaning questions probe an in-depth understanding and senior-level judgment. AD7 candidates are typically required to have 5–7 years of relevant experience on top of a university degree, so the exercises reflect that advanced proficiency. For example, rather than asking basic definitions, many questions present scenarios or data problems that require integrating multiple concepts or making policy-driven decisions. This senior-level calibration is intentional – the competition’s Selection Board ultimately controls test difficulty, and this set is meant to challenge you accordingly while remaining fair.
We have also balanced professional knowledge with applied know-how. On one hand, the questions will assess familiarity with EU data frameworks, standards, and compliance requirements (e.g. understanding what the GDPR covers, or knowing the principles of the European Data Strategy). On the other hand, you will encounter practical questions about tools and best practices – for instance, interpreting a data quality report, choosing an appropriate data architecture for a case, or troubleshooting a data pipeline. This reflects the dual expectation for AD7 experts to be conversant with high-level concepts, policies, and frameworks, and also to demonstrate hands-on practical skills in data management.
The underpinning philosophy of this training set is to ensure relevance, reproducibility, and authenticity. Relevance means many questions are situated in public-sector or EU institutional contexts, such as implementing interoperability across Member State systems or aligning a data policy with EU legislation, reinforcing knowledge of the European dimension. Reproducibility implies that each question is based on standard best practices. Finally, role-faithfulness (or authenticity) means the scenarios and tasks mirror what an EU AD7 data manager might really face, making your practice as practical as possible.
In summary, this introduction has outlined how the training set is structured and why. Below, for each of the 11 Annex II topics, you will find a curated list of free online learning materials. These resources are selected to help you deepen your knowledge and skills in each area, from foundational theories to applied techniques. They include open courses, tutorials, and hands-on guides. You can readily use them in your preparation. We have avoided just listing dry EU regulations or technical specs; instead, we highlight interactive or explanatory materials (like courses, videos, or practitioner guides) that make the content engaging and easier to absorb. Use these resources to fill any gaps in your understanding, practice relevant tools, and connect concepts to real-world applications. Good luck with your preparation!
Field-related MCQ learning resources
1. Data Architecture and Interoperability
- Interoperable Europe Academy (Free Online Courses): An EU initiative offering self-paced courses, webinars, and workshops to build advanced digital skills in interoperability and data architecture. The academy’s curriculum is aimed at public sector officials and IT professionals, covering reference architectures, semantic interoperability, and cross-border e-services{5}. These courses provide EU-specific context on designing interoperable data systems and frameworks.
- GODAN Open Data Management MOOC (Unit on Interoperability): A free online course (originally for agriculture data) that includes modules on data sharing frameworks and semantic interoperability. Lesson 4.4 of this MOOC explains the basic principles of semantic interoperability, how to use vocabularies/ontologies for consistent data exchange, and practical guidance on choosing and embedding the right metadata schemas{6}{7}. It’s a useful primer on ensuring different systems “speak” the same data language.
- Web of Data (Coursera, EIT Digital): A popular MOOC introducing Linked Data standards and the Semantic Web, which are key to interoperability. It teaches how to publish and consume structured data on the web, covering RDF, SPARQL, ontologies, etc., enabling design of applications that leverage the volume and variety of web data{8}. This course strengthens your understanding of common data models, ontologies, and data exchange standards in an architecture context.
- Enterprise Architecture & Interoperability Tutorials: For a more technical view on designing interoperable systems, consider free resources on enterprise architecture frameworks. For example, TOGAF tutorials on YouTube (e.g. by Edureka or The Open Group) can help you grasp reference architecture concepts and how to standardize system design across an organization. While not EU-specific, they provide insight into structuring complex data systems for interoperability.
2. Data Governance and Compliance
- Fundamentals of Data Governance (Edureka/Coursera): A comprehensive introductory course that equips you to “create and apply a foundational governance framework that ensures data integrity, security, regulatory compliance, and efficient data management” in an organization{9}. It covers core principles, roles (data owners, stewards), governance models, and tools, with real-world case studies (including GDPR and other regulations){10}. This is great for learning how to set up governance structures and policies.
- Resources.data.gov: Data Governance Playbooks - An open US government portal aggregating playbooks, templates, and guides for implementing data governance programs{11}. These resources (though US-centric) are practical and cover how to establish governance bodies, define data stewardship roles, manage data quality policies, and ensure compliance. They provide checklists and best practices that are broadly applicable.
- Understanding the GDPR (MOOC, University of Groningen): A free online course focused on the EU General Data Protection Regulation. It explores individuals’ data rights, obligations of controllers and processors, and enforcement of privacy compliance{12}. If you need a solid grounding in privacy law and ethical data handling, this course is invaluable. It ensures you grasp compliance requirements under GDPR, i.e. knowledge directly relevant to regulatory compliance duties.
- LightsOnData Governance Resources: LightsOnData offers a collection of free templates, articles, and best-practice guides on data governance and management{13}. Topics include creating data charters, data ownership matrices, and governance maturity assessments. These practical tools can help you understand how organizations implement governance and measure its effectiveness (useful for duties like maturity assessments and audits of data governance).
3. Data Policies and Strategy
- Introduction to Data Strategy and Management (edX): This course lays the foundation for treating data as a strategic asset, showing how to merge business strategy with data strategy{15}. It covers developing data policies, aligning them with organizational goals, and adapting to evolving needs. By completing it, you’ll understand how to craft data strategies and governance policies that support broader objectives, echoing tasks like policy development, stakeholder consultation, and aligning with the European Data Strategy.
- World Bank, Open Data for Policymakers: A self-paced e-learning course designed for public sector policymakers on open data principles and best practices{18}. It gives a thorough overview of data policy frameworks, legal considerations, and the value of open data. Given that AD7 roles may involve drafting policies for data sharing and engaging stakeholders, this resource provides insight into policy levers and how open data strategies are formulated in government.
- Data for Effective Policy Making (IDBx on edX): This course helps you “take control of data” in public policy contexts{19}. It teaches how to use data in planning, management, and evaluation of policies. While not solely about writing data strategy, it strengthens your ability to incorporate data evidence into policy decisions, a skill relevant to developing informed data policies and strategic roadmaps (as mentioned in typical duty 3f).
- OECD or EU Data Strategy Papers (for reference): Although we avoid raw official documents, it can be useful to skim high-level strategy documents like the European Data Strategy (2020) or OECD’s data governance reports. These outline key objectives (e.g. data spaces, AI innovation, data sovereignty) that your strategy should support. Understanding these will help ensure any policy/strategy you devise is aligned with overarching EU objectives{17}. (These documents are publicly available on EU/OECD websites.)
4. Data Engineering and Integration
- Data Engineering Zoomcamp (DataTalks.Club): A free, 9-week project-based course teaching practical data engineering{21}. You’ll learn to build data ingestion pipelines, integrate data from various sources, and set up workflows for ETL. The curriculum spans workflow orchestration, data pipelines, data warehouses, batch processing, and streaming. These are all applied in a real-world project. This is directly relevant to duties like designing ETL/ELT processes, real-time streaming, and data orchestration. By following along, you gain hands-on experience with tools like Apache Airflow (orchestration), Kafka (streaming), and APIs for integration.
- Confluent Developer Tutorials (Apache Kafka): Confluent’s free online resources provide an excellent beginner’s guide to Apache Kafka and real-time data integration. Their tutorial collection helps you understand key Kafka concepts and how to get started with streaming data pipelines{24}. Since real-time data streaming is explicitly in the duties, these tutorials will teach you how to set up producers/consumers, topic design, and stream processing, reinforcing the concepts of immediate data processing across distributed systems.
- IBM DataOps Methodology (Cognitive Class): A free course introducing DataOps best practices{25}. DataOps is all about automating and integrating data flows for reliable delivery of data. This course covers how to build a repeatable, business-oriented data pipeline framework – touching on version control for data, CI/CD for data integration, and monitoring data processes. It’s highly relevant for understanding how to automate and control data integration processes{26} and ensure continuous data delivery in complex environments.
- Microsoft Learn, API Integration Tutorials: Microsoft’s documentation includes step-by-step tutorials on building and consuming APIs (for example, using Azure API Management or building RESTful services). These free guides can strengthen your knowledge of Application Programming Interfaces and how to facilitate seamless data exchange between systems. Even if you use different technology stacks, the principles of API integration (endpoints, authentication, data format exchange) are broadly applicable.
5. Data Warehousing and Data Platforms
- IBM Big Data 101 (Cognitive Class): An introductory course that gives the “big picture” of big data architecture and technologies{28}. It explains the characteristics of Big Data (the 4 V’s) and covers foundational platforms like Apache Hadoop and Spark. Through this course, you’ll learn why technologies like Hadoop HDFS or Spark are used for large-scale data storage/processing{29}, knowledge useful for understanding big data components of modern data platforms (duty 5e mentions big data and data mining).
- Microsoft Learn, Data Warehouse Tutorial: A step-by-step walkthrough of building an end-to-end data warehousing solution on the cloud{30}. This free tutorial takes you from data acquisition to data transformation and loading into a warehouse, then to creating reports. Following it helps solidify how data lakes, warehouses, and pipelines work together in practice. It’s excellent practice for duties like designing data warehouse/lake architectures and implementing scalable storage solutions.
- Intellipaat’s Data Warehouse Tutorial: A comprehensive text and video tutorial for beginners covering everything from data warehouse concepts and schemas to ETL and OLAP{32}. It breaks down star vs. snowflake schema, the role of a staging area, and how business intelligence layers query a warehouse. This resource is useful for grasping the conceptual underpinnings behind duty 5a–d (reporting frameworks, warehouse design, cloud storage architecture). It will reinforce why we design warehouses in certain ways for efficient querying and analysis.
- Coursera: Modernizing Data Lakes and Data Warehouses (Google Cloud): This course focuses on the modern data platform paradigm, combining data lakes and warehouses on cloud infrastructure. It goes through building a data lake on Google Cloud Storage and then provisioning a warehouse (BigQuery), and how to ensure they work in a hybrid model. This aligns with understanding cloud and hybrid data storage architectures and how to handle semi-structured or unstructured data at scale.
6. Business Intelligence and Reporting
- Microsoft “Dashboard in a Day” Workshop: A free Power BI workshop (available as an online tutorial on Microsoft Learn) that teaches you to create reports and dashboards step-by-step. You learn how to connect data, build visuals, publish a report, and combine visuals into a dashboard, including mobile layouts{34}. This hands-on training reflects duty 6a (dashboard development) and 6d (report automation/distribution), giving you practical skills in designing interactive BI dashboards and scheduling them for stakeholders.
- Power BI Dashboard Design Best Practices: Microsoft’s guidance on designing effective dashboards provides tips on layout, choosing the right visuals, and making key information stand out{35}. It’s a concise read to improve your data visualization and storytelling abilities. Since BI is not only about generating charts but conveying insights, applying these best practices will help you create dashboards that decision-makers can easily interpret (duty 6a and 6b on self-service analytics).
- Google Data Studio (Looker Studio) Tutorial: Looker Studio is Google’s free BI reporting platform. The Coupler.io blog offers a beginner-friendly tutorial on creating a Data Studio dashboard from scratch, explaining data sources, connectors, metrics, and chart types{36}. By following this, you get exposure to a BI tool widely used in the public domain for quick reporting. It’s useful for understanding how to enable self-service reporting (duty 6b) using cloud-based tools, and how to implement report sharing (related to duty 6d).
- Predictive Analytics Courses: For duty 6c (predictive analytics), consider free courses like “Introduction to Predictive Analytics using Python” (many are available on Coursera or edX to audit). These usually cover regression, forecasting, and using tools like scikit-learn. One example is IBM’s Data Analysis with Python which has a module on predictive modeling. Gaining familiarity with applying statistical models will help you answer questions on forecasting trends and behaviors in a BI context.
7. Master and Reference Data Management
- Master Data Management (MDM) Fundamentals (Udemy Free Course): A free Udemy tutorial that highlights key MDM concepts, methodologies, and processes{38}. It covers definitions of master vs. reference data, common MDM challenges, and approaches to building a single source of truth. This gives a solid conceptual foundation for duty 7a (master data management) and 7b (reference data standardization). You’ll learn why MDM is needed and how organizations tackle duplicate or inconsistent core data.
- Informatica’s MDM 101 Blog Series: Informatica (a leading data management company) has a “Master Data Management 101” article series aiming to simplify basic MDM concepts for everyone{39}. These blog posts often discuss real-world scenarios of consolidating customer data, managing hierarchies, and governance around master data. They’re great for seeing practical examples of MDM in action, complementing the theory with how enterprises implement MDM solutions (people, process, technology).
- Reference Data Management & Data Quality (Udemy/Class Central): An overview course (free) that ties together reference data management and data quality management{40}. It explains how maintaining reference codes and classifications (duty 7b) intersects with ensuring data quality. Since reference data (like country codes, product categories) must be consistent to avoid garbage-in, this resource helps you understand processes to govern reference data sets and their lifecycle.
- Microsoft Purview & CluedIn MDM Workshop: For a hands-on angle, Microsoft’s guided project on building an end-to-end governance and MDM stack (with Azure Purview and CluedIn) is valuable. It walks through deploying a solution that standardizes, deduplicates, and enriches data across an organization{41}{42}. Working through it can deepen your applied skills in setting up automated master data pipelines and using data cataloging tools – experience directly relevant to duties 7c (business glossaries) and 7d (data harmonization across systems).
8. Data Taxonomy and Metadata Management
- UNC Chapel Hill, Metadata Management Tutorial: A basic introduction to what metadata is, the types of metadata (descriptive, structural, administrative), and best practices for assigning metadata{43}. Although aimed at research data management, it’s very useful for grasping metadata cataloguing (duty 8a) and why documenting data is critical. It covers how to create useful metadata records so that data assets can be found and understood by others.
- World Bank, Documenting Data with Metadata Standards: A free online course (15 hours) focused on creating high-quality, standards-based metadata to improve data discoverability and reusability{44}. It emphasizes using international metadata standards and even introduces a tool (Metadata Editor) for hands-on practice. For someone in an AD7 role, understanding standards like DCAT-AP, ISO 19115, or DDI (depending on data domain) is important – this course gives you that insight, aligned with duty 8a (maintaining metadata repositories) and 8b (data lineage and provenance tracking).
- Coursera Blog, “What Is Metadata Management?”: A brief explainer article that outlines how metadata management helps organizations catalog and maintain information about their data{45}. It touches on metadata’s role in data governance and how metadata catalogs work. This is a quick read to reinforce key points: e.g., metadata management involves tools for data dictionaries, business glossaries, and data lineage tracking – directly reflecting duty 8b and 8c.
- W3C and Semantic Web Resources: Since duty 8c mentions ontologies and linked open data, leveraging some free W3C tutorials can help. “Intro to Semantic Web and Linked Data” (a W3C-curated online course or tutorials on w3schools) can teach you the basics of RDF, OWL, and using ontologies. Additionally, the Linked Open Vocabularies (LOV) website is a great place to explore existing ontologies/thesauri. Familiarity with these will prepare you for questions about using semantic technologies and multilingual thesauri in metadata management.
9. Data Quality Management
- Great Expectations for Data Quality (Cognitive Class): Great Expectations is an open-source Python library for data quality checks, and IBM offers a free course on it{46}. Through this, you learn how to profile data, define validation rules, and catch anomalies in datasets. It directly addresses duty 9a (data profiling) and 9c/9d (quality monitoring and tooling) by showing practical techniques to implement those processes. Even if you don’t code in Python daily, seeing how a tool automates data quality checks gives insight into what to look for (completeness, consistency, validity, etc.).
- Data Quality Management 101 (Dataversity): An article by Dataversity that succinctly explains why DQ management is crucial and outlines the practices involved. It notes that low-quality data wastes time and energy (through reprocessing) and can hide operational problems or compliance issues{47}. It then defines data quality management as the set of practices to maintain accurate information through all data handling steps{48}. This read will solidify your conceptual understanding of data quality dimensions and the end-to-end process (from data entry to analysis) of keeping data clean.
- DataCamp, Data Quality Fundamentals: DataCamp has a beginner-friendly module (free to try) that covers key concepts, dimensions, and techniques for monitoring and improving data quality{49}. It’s a quick way to learn about data quality dimensions like accuracy, completeness, timeliness, uniqueness, etc., and common techniques like data cleansing and deduplication. Knowing these dimensions and techniques is essential for answering scenario questions on identifying or fixing data quality issues (duties 9b and 9c).
- Informatica Data Quality & Profiling Videos, Informatica’s free “Train-to-Certify” videos or webinars on data quality (if accessible) are useful to see enterprise-grade approaches. They often demonstrate profiling a dataset, setting up data quality rules, and generating data quality scorecards. Watching a demo can give you a visual understanding of what tools do under the hood for tasks like data parsing, standardization, and monitoring. This aligns with duty 9d’s mention of methods and tools for processing and interpreting data quality assessments.
10. Data Security and Privacy
- IBM Data Privacy Fundamentals: A free course covering data privacy concepts illustrated by five prominent data breach cases{50}. It highlights common vulnerabilities and how organizations can protect personal data. By exploring real breach scenarios, you’ll better understand privacy risk assessment and mitigation – supporting duty 10c (privacy impact assessments) by learning what can go wrong if data protection isn’t robust. The course also reinforces principles of confidentiality, integrity, and availability in data security.
- Alison, GDPR Data Protection Officer Course: Alison offers a free course aimed at data protection officers which provides an in-depth look at GDPR and data security measures. It covers the basics of privacy law, but also practical skills like conducting data protection impact assessments (DPIAs), managing data subject requests, and implementing access controls. This maps to multiple sub-duties: ensuring compliant access control (10a) and encryption practices (10b), as well as general regulatory compliance.
- HubSpot Academy, GDPR and Data Privacy Course: A short, free course that explores the implications of GDPR on business practices and customer data handling{51}. It’s useful for understanding privacy from a business perspective – for example, how to implement consent mechanisms, what “privacy by design” means in practice, and how to train staff on data protection. This kind of knowledge supplements the technical side of security with the governance side of privacy (related to duty 10c).
- OWASP Top 10 and Security Basics: For the technical security aspect, it’s worth reviewing resources like the OWASP Top 10 (common web app security risks) or basic cryptography tutorials. OWASP’s guides are free and will deepen your understanding of access control failures, sensitive data exposure, etc., linking to duty 10a and 10b. Additionally, a tutorial on encryption basics (for instance, a free module on Coursera about cryptography or a YouTube series on TLS/encryption) will give insight into how encryption safeguards data in storage and transit.
11. Advanced Analytics and Data Preparation for AI
- Kaggle, Feature Engineering Course: Kaggle’s free micro-course on Feature Engineering teaches the principles of creating better features for machine learning models{53}. It covers how to identify which raw attributes have the most potential, and how to transform or encode features to improve model performance (e.g. one-hot encoding, scaling, creating interaction terms). This directly addresses duty 11a (feature engineering). Through short lessons and hands-on exercises, you’ll practice turning raw data into model-ready datasets.
- Class Central, Data Labeling Techniques: Class Central lists free courses focusing on data labeling and annotation, crucial for supervised AI. These cover annotation pipelines, quality control in labeling, and programmatic labeling with tools like Snorkel{55}. By learning systematic labeling techniques, you’ll be better prepared for duty 11b (data labelling and annotation for AI). One recommended resource is Snorkel’s own tutorials on weak supervision, which show how to label training data at scale with heuristic functions, very relevant to modern AI data preparation.
- DataCamp, Guide to Data Augmentation: A tutorial (with TensorFlow/Keras examples) that explains various data augmentation techniques and their applications{57}. Although often focused on images, it also touches on text and numeric data augmentation. Understanding augmentation (duty 11c) is key to improving model robustness; this guide will teach you methods like flipping/rotating images, synonym replacement in text, noise addition, etc., and why they help reduce overfitting. Even if you won’t implement them from scratch in an exam, knowing the concepts allows you to answer theoretical questions or propose solutions in case studies.
- Google ML Crash Course, Training Data Preparation: Google’s free Machine Learning Crash Course has a section on preparing and splitting data for ML. It covers practices like normalization, dealing with outliers, splitting data into train/validation/test sets properly, and ensuring that the test data is representative (duty 11d, managing anonymized and representative test datasets). Going through this will reinforce how to curate datasets that yield reliable model evaluations, crucial when dealing with AI model readiness and unbiased performance assessment.
- Each of these learning resources is intended to strengthen your proficiency in the corresponding topic. By engaging with them, you can deepen the knowledge required for the AD7 competition and practice the kinds of analyses and decisions that the EU institutions expect of a data management professional. Good luck in your preparation!
Bibliography
{5} Interoperable Europe (IOPEU) Academy - Eipa
{6} {7} Lesson 4.4: Semantic Interoperability | Open Data MOOC
{8} Web of Data | Coursera
{9} {10} Fundamentals of Data Governance | Coursera
{11} Data management & governance | resources.data.gov
{12} Understanding the GDPR | Massive Open Online Courses (MOOC)
{13} Top 10 Data Governance Courses and Training - LightsOnData
{15} Introduction to Data Strategy, Management, and Governance - edX
{18} Open Data for Policymakers (Self-Paced) - World Bank
{19} IDBx: Data for Effective Policy Making - edX
{21} A Guide to Free Online Courses at DataTalks.Club – DataTalks.Club
{24} Intro to Apache Kafka®: Tutorials, Explainer Videos & More
{25} DataOps Methodology course - Cognitive Class
{28} {29} Big Data 101 course | Cognitive Class
{30} Data Warehouse Tutorial: Introduction - Microsoft Fabric
{32} Data Warehouse Tutorial for Beginners - Intellipaat
{34} Dashboard in a Day - Online workshop - Training - Microsoft Learn
{35} Tips for Designing a Great Power BI Dashboard - Microsoft Learn
{36} Looker Studio (Google Data Studio) Tutorial: Dashboard for Beginners
{38} Free Data and Information Management Tutorial - Master ... - Udemy
{39} The Basics of Master Data Management (MDM), Part 1 - Informatica
{40} Understanding Data Quality and Reference Data Management
{41} {42} Build an end to end data governance and master data management stack with Microsoft Purview and CluedIn. - Training | Microsoft Learn
{43} LibGuides: Metadata for Data Management: A Tutorial: Intro
{44} Documenting Development Data Using Metadata Standards
{45} What Is Metadata Management? - Coursera
{46} Great Expectations, a data validation library for Python
{47} {48} Data Quality Management 101 - Dataversity
{49} Introduction to Data Quality Course - DataCamp
{50} 70+ Cognitive Class Courses {2025} | Learn Online for Free
{51} The General Data Protection Regulation (GDPR) Course
{53} Free Course: Feature Engineering from Kaggle | Class Central
{55} Data Labeling Courses and Certifications - Class Central
{57} A Complete Guide to Data Augmentation | DataCamp