FAQs
Section dedicated to fequently asked questions.
Frequent Terms
Data Products
A data product is essentially a dataset that has been designed, packaged, and published in a way that makes it easily consumable by others in the organization. Key aspects of a data product are mentioned below :
- Domain-Driven Data products are aligned with specific business domains or subjects (e.g., finance, marketing, operations). This ensures that data is organized around how the business operates and understands its data, rather than technical constructs like databases or systems.
- Data-Driven A data product is a dataset or a collection of datasets that provide value to consumers. It could be raw data, derived data (like aggregations or calculations), or even machine learning models and their outputs.
- Packaged Data products are packaged with all the necessary metadata for consumption. This includes data dictionary information, data quality metrics, usage guidelines, data lineage details, etc.
- Governed Each data product has clear ownership and governance processes. This ensures that data is accurate, complete, timely, secure, and compliant with relevant regulations.
- Discoverable Data products are catalogued in a way that makes them easily discoverable by data consumers. This could be through a data catalog or other discovery tools.
- Usable Data products are designed to be used by their intended audience. They may have different formats (e.g., CSV, JSON, Parquet) based on the preferences and capabilities of the consumers.
Data Contracts
A Data Contract is an agreement between the Data Product and a consumer (the team or individual who wants to use the data). It's like software contracts where services provide certain guarantees about their APIs, ensuring consumers can rely on them. Key aspects of a data contract are mentioned below :
- Data Product Definition
A data contract starts with a clear definition of the data product itself. This includes:
- Data product name and ID
- Business domain or subject area
- Purpose and intended use cases
- List of datasets or data entities included
- Data Schema and Structure
The contract specifies the schema and structure of the data, including:
- Column names and data types
- Primary keys and foreign keys (if applicable)
- Associated Model
- Data format (e.g., CSV, Parquet, JSON)
- Data Freshness The contract agrees upon how frequently the data will be updated or refreshed.
- Data Governance
It outlines governance processes such as:
- Access controls and permissions
- Data quality checks and SLAs (Service Level Agreements)
- Data lineage tracking and auditing
- Data Availability The contract specifies when and how the data will be available for consumption.
- Data Usage Guidelines
It includes any specific instructions or recommendations for using the data, such as:
- Joining with other datasets
- Specific aggregations or calculations required
- Known limitations or biases in the data
- Data Deprecation/Retirement Policy The contract should also include a process for when and how the data product might be deprecated or retired.
- Points of Contact Lastly, it specifies who to contact if there are any issues with the data or questions about its usage.
Domain Team
Each data contract is associated with a data product in the upstream, which is linked either to another data product – or a source system at the very top. Every source system is linked to a domain team who is responsible for the source system. A domain team associated with data products will have the following responsibiltiy/composition :
- Domain Expertise Serve as subject matter experts for their respective business domains, understanding the data needs, use cases, and KPIs.
- Data Ownership Take ownership of domain-specific data assets, ensuring they are accurate, complete, and up-to-date.
- Data Governance Establish and maintain data governance policies within their domains, such as data quality rules, access controls, and metadata management.
- Data Productization Collaborate with data engineers to design, develop, and maintain data products that meet the domain's needs.
- Stakeholder Communication Act as a point of contact between business stakeholders, data engineers, and data consumers within their domains.
- Data Lineage and Metadata Management Maintain up-to-date lineage information and metadata for better traceability and discoverability of data assets.
- Continuous Improvement Monitor data usage patterns, gather feedback from users, and drive improvements to data products and governance processes.
- Domain Owner/Lead A representative from the business domain who has decision-making authority and ensures alignment with business objectives.
- Data Stewards Domain experts responsible for day-to-day data management tasks, such as maintaining data quality, enforcing access controls, and managing metadata.
- Data Engineers/Data Product Owners Technical team members responsible for designing, implementing, and supporting domain-specific data products and pipelines.
- Business Analysts/Stakeholders Representatives from the business who provide insights into data needs, use cases, and KPIs.
Concept linking
Linking Domain Team, Source Systems, Data Products and Data Contracts as below :