Get started with MetricFlow
This getting started page presents a sample workflow to help you create your first metrics in dbt Cloud or the command-line interface (CLI). It uses the Jaffle shop example project as the project data source and is available for you to use.
If you prefer, you can create semantic models and metrics for your own dbt project. This page will guide you on how to:
- Create a semantic model using MetricFlow
- Define metrics using MetricFlow
- Test and query metrics locally using MetricFlow
- Run a production job in dbt Cloud
- Set up dbt Semantic Layer in dbt Cloud
- Connect to and query the API with dbt Cloud
MetricFlow allows users to define metrics in their dbt project whether in dbt Cloud or in dbt Core. dbt Core users can use the MetricFlow CLI to define metrics in their local dbt Core project.
However, to experience the power of the universal dbt Semantic Layer and query those metrics in downstream tools, you'll need a dbt Cloud Team or Enterprise account.
Prerequisites
- Have an understanding of key concepts in MetricFlow, which powers the revamped dbt Semantic Layer.
- Have both your production and development environments running dbt version 1.6 or higher. Refer to upgrade in dbt Cloud for more info.
- Use Snowflake, BigQuery, Databricks, Redshift, or Postgres (CLI only. dbt Cloud support coming soon).
- Create a successful run in the environment where you configure the Semantic Layer.
- Note: Semantic Layer currently supports the Deployment environment for querying. (development querying experience coming soon)
- Set up the Semantic Layer API in the integrated tool to import metric definitions.
- Note: To access the API and query metrics in downstream tools, you must have a dbt Cloud Team or Enterprise account. dbt Core or Developer accounts can define metrics using MetricFlow CLI or the dbt Cloud IDE.
- Note: To access the API and query metrics in downstream tools, you must have a dbt Cloud Team or Enterprise account. dbt Core or Developer accounts can define metrics using MetricFlow CLI or the dbt Cloud IDE.
- Understand MetricFlow's key concepts, which powers the revamped dbt Semantic Layer.
New to dbt or metrics? Try our Jaffle shop example project to help you get started!
Create a semantic model
The following steps will walk you through setting up semantic models, which you can do with the dbt Cloud IDE or the CLI. Semantic models consist of entities, dimensions, and measures.
We highly recommend you read the overview of what a semantic model is before getting started. If you're working in the Jaffle shop example, delete the orders.yml
config or delete the .yml extension so it's ignored during parsing. We'll be rebuilding it step by step in this example.
If you're following the guide in your own project, pick a model that you want to build a semantic manifest from and fill in the config values accordingly.
- Create a new yml config file for the orders model, such as
orders.yml
.
It's best practice to create semantic models in the /models/semantic_models
directory in your project. Semantic models are nested under the semantic_models
key. First, fill in the name and appropriate metadata, map it to a model in your dbt project, and specify model defaults. For now, default_agg_time_dimension
is the only supported default.
semantic_models:
#The name of the semantic model.
- name: orders
defaults:
agg_time_dimension: ordered_at
description: |
Order fact table. This table is at the order grain with one row per order.
#The name of the dbt model and schema
model: ref('orders')
- Define your entities. These are the keys in your table that MetricFlow will use to join other semantic models. These are usually columns like
customer_id
,order_id
, and so on.
#Entities. These usually correspond to keys in the table.
entities:
- name: order_id
type: primary
- name: location
type: foreign
expr: location_id
- name: customer
type: foreign
expr: customer_id
- Define your dimensions and measures. Dimensions are properties of the records in your table that are non-aggregatable. They provide categorical or time-based context to enrich metrics. Measures are the building block for creating metrics. They are numerical columns that MetricFlow aggregates to create metrics.
#Measures. These are the aggregations on the columns in the table.
measures:
- name: order_total
description: The total revenue for each order.
agg: sum
- name: order_count
expr: 1
agg: sum
- name: tax_paid
description: The total tax paid on each order.
agg: sum
- name: customers_with_orders
description: Distinct count of customers placing orders
agg: count_distinct
expr: customer_id
- name: locations_with_orders
description: Distinct count of locations with order
expr: location_id
agg: count_distinct
- name: order_cost
description: The cost for each order item. Cost is calculated as a sum of the supply cost for each order item.
agg: sum
#Dimensions. Either categorical or time. These add additional context to metrics. The typical querying pattern is Metric by Dimension.
dimensions:
- name: ordered_at
type: time
type_params:
time_granularity: day
- name: order_total_dim
type: categorical
expr: order_total
- name: is_food_order
type: categorical
- name: is_drink_order
type: categorical
Putting it all together, a complete semantic model configurations based on the order model would look like the following example:
semantic_models:
#The name of the semantic model.
- name: orders
defaults:
agg_time_dimension: ordered_at
description: |
Order fact table. This table is at the order grain with one row per order.
#The name of the dbt model and schema
model: ref('orders')
#Entities. These usually corespond to keys in the table.
entities:
- name: order_id
type: primary
- name: location
type: foreign
expr: location_id
- name: customer
type: foreign
expr: customer_id
#Measures. These are the aggregations on the columns in the table.
measures:
- name: order_total
description: The total revenue for each order.
agg: sum
- name: order_count
expr: 1
agg: sum
- name: tax_paid
description: The total tax paid on each order.
agg: sum
- name: customers_with_orders
description: Distinct count of customers placing orders
agg: count_distinct
expr: customer_id
- name: locations_with_orders
description: Distinct count of locations with order
expr: location_id
agg: count_distinct
- name: order_cost
description: The cost for each order item. Cost is calculated as a sum of the supply cost for each order item.
agg: sum
#Dimensions. Either categorical or time. These add additional context to metrics. The typical querying pattern is Metric by Dimension.
dimensions:
- name: ordered_at
type: time
type_params:
time_granularity: day
- name: order_total_dim
type: categorical
expr: order_total
- name: is_food_order
type: categorical
- name: is_drink_order
type: categorical
If you're familiar with writing SQL, you can think of dimensions as the columns you would group by and measures as the columns you would aggregate.
select
metric_time_day, -- time
country, -- categorical dimension
sum(revenue_usd) -- measure
from
snowflake.fact_transactions -- sql table
group by metric_time_day, country -- dimensions
Define metrics
Now that you've created your first semantic model, it's time to define your first metric! You can define metrics with the dbt Cloud IDE or CLI.
MetricFlow supports different metric types like simple, ratio, cumulative, and derived. It's recommended that you read the metrics overview docs before getting started.
You can define metrics in the same YAML files as your semantic models or create a new file. If you want to create your metrics in a new file, create another directory called
/models/metrics
. The file structure for metrics can become more complex from here if you need to further organize your metrics, for example, by data source or business line.The example metric we'll create is a simple metric that refers directly to the the
order_total
measure, which will be implemented as asum()
function in SQL. Again, if you're working in the Jaffle shop sandbox, we recommend deleting the originalorders.yml
file, or removing the .yml extension so it's ignored during parsing. We'll be rebuilding theorder_total
metric from scratch. If you're working in your own project, create a simple metric like the one below using one of the measures you created in the previous step.
metrics:
- name: order_total
description: Sum of total order amount. Includes tax + revenue.
type: simple
label: Order Total
type_params:
measure: order_total
- Save your code, and in the next section, you'll validate your configs before committing them to your repository.
To continue building out your metrics based on your organization's needs, refer to the Build your metrics for detailed info on how to define different metric types and semantic models.
Configure the MetricFlow time spine model
MetricFlow requires a time spine for certain metric types and join resolution patterns, like cumulative metrics. You will have to create this model in your dbt project. This article explains how to add the metricflow_time_spine
model to your project.
Test and query metrics
Support for testing or querying metrics in the dbt Cloud IDE is not available in the current beta but is coming soon.
You can use the Preview or Compile buttons in the IDE to run semantic validations and make sure your metrics are defined. You can dynamically query metrics with integrated tools on a dbt Cloud Team or Enterprise plan using the Semantic Layer API.
Currently, you can define and test metrics using the MetricFlow CLI. dbt Cloud IDE support is coming soon. Alternatively, you can test using SQL client tools like DataGrip, DBeaver, or RazorSQL.
This section will explain how you can test and query metrics using the MetricFlow CLI (dbt Cloud IDE support coming soon).
Before you begin, you'll need to install the MetricFlow CLI package and make sure you run at least one model.
Install MetricFlow
Install the MetricFlow CLI as an extension of a dbt adapter from PyPI. The MetricFlow CLI is compatible with Python versions 3.8, 3.9, 3.10 and 3.11
Use pip install metricflow
and your dbt adapter:
- Create or activate your virtual environment.
python -m venv venv
orsource your-venv/bin/activate
- Run
pip install "dbt-metricflow[your_adapter_name]"
- You must specify
[your_adapter_name]
. - For example, run
pip install "dbt-metricflow[snowflake]"
if you use a Snowflake adapter.
- You must specify
Query and commit your metrics using the CLI
MetricFlow needs a semantic_manifest.json
in order to build a semantic graph. To generate a semantic_manifest.json artifact run dbt parse
. This will create the file in your /target
directory. If you're working from the Jaffle shop example, run dbt seed && dbt run
before preceding to ensure the data exists in your warehouse.
- Make sure you have the MetricFlow CLI installed and up to date.
- Run
mf --help
to confirm you have MetricFlow installed and view the available commands. - Run
mf query --metrics <metric_name> --group-by <dimension_name>
to query the metrics and dimensions. For example,mf query --metrics order_total --group-by metric_time
- Verify that the metric values are what you expect. To further understand how the metric is being generated, you can view the generated SQL if you type
--explain
in the CLI. - Run
mf validate-configs
to run validation on your semantic models and metrics. - Commit and merge the code changes that contain the metric definitions.
To streamline your metric querying process, you can connect to the dbt Semantic Layer API to access your metrics programmatically. For SQL syntax, refer to Querying the API for metric metadata to query metrics using the API.
Run a production job
Before you begin, you must have a dbt Cloud Team or Enterprise multi-tenant deployment, hosted in North America (cloud.getdbt.com login URL).
Once you’ve defined metrics in your dbt project, you can perform a job run in your dbt Cloud deployment environment to materialize your metrics. Only the deployment environment is supported for the dbt Semantic Layer at this moment.
- Go to Deploy in the menu bar
- Select Jobs to re-run the job with the most recent code in the deployment environment.
- Your metric should appear as a red node in the dbt Cloud IDE and dbt directed acyclic graphs (DAG).
Set up dbt Semantic Layer
You can set up the dbt Semantic Layer in dbt Cloud at the environment and project level. Before you begin:
- You must have a dbt Cloud Team or Enterprise multi-tenant deployment, hosted in North America.
- You must be part of the Owner group, and have the correct license and permissions to configure the Semantic Layer:
- Enterprise plan — Developer license with Account Admin permissions. Or Owner with a Developer license, assigned Project Creator, Database Admin, or Admin permissions.
- Team plan — Owner with a Developer license.
- You must have a successful run in your new environment.
If you're using the legacy Semantic Layer, we highly recommend you upgrade your dbt version to dbt v1.6 or higher to use the new dbt Semantic Layer. Refer to the dedicated migration guide for more info.
In dbt Cloud, create a new deployment environment or use an existing environment on dbt 1.6 or higher.
- Note — Deployment environment is currently supported (development experience coming soon)
Navigate to Account Settings and select the specific project you want to enable the Semantic Layer for.
In the Project Details page, navigate to the Semantic Layer section, and select Configure Semantic Layer.
- In the Set Up Semantic Layer Configuration page, enter the credentials you want the Semantic Layer to use specific to your data platform. We recommend credentials have the least privileges required because your Semantic Layer users will be querying it in downstream applications. At a minimum, the Semantic Layer needs to have read access to the schema(s) that contains the dbt models that you used to build your semantic models.
Select the deployment environment you want for the Semantic Layer and click Save.
After saving it, you'll be provided with the connection information that allows you to connect to downstream tools. If your tool supports JDBC, save the JDBC URL or individual components (like environment id and host).
Save and copy your environment ID, service token, and host, which you'll need to use downstream tools. For more info on how to integrate with partner integrations, refer to Available integrations.
Return to the Project Details page, then select Generate Service Token. You will need Semantic Layer Only and Metadata Only service token permissions.
Great job, you've configured the Semantic Layer 🎉!
Connect and query API
You can query your metrics in a JDBC-enabled tool or use existing first-class integrations with the dbt Semantic Layer.
You must have a dbt Cloud Team or Enterprise multi-tenant deployment, hosted in North America. (Additional region support coming soon)
- To learn how to use the JDBC API and what tools you can query it with, refer to the dbt Semantic Layer API.
- To authenticate, you need to generate a service token with Semantic Layer Only and Metadata Only permissions.
- Refer to the SQL query syntax to query metrics using the API.
To learn more about the sophisticated integrations that connect to the dbt Semantic Layer, refer to Available integrations for more info.
FAQs
If you're encountering some issues when defining your metrics or setting up the dbt Semantic Layer, check out a list of answers to some of the questions or problems you may be experiencing.
How do I migrate from the legacy Semantic Layer to the new one?
How are you storing my data?
Is the dbt Semantic Layer open source?
dbt Cloud Developer or dbt Core users can define metrics in their project, including a local dbt Core project, using the dbt Cloud IDE or the MetricFlow CLI. However, to experience the universal dbt Semantic Layer and access those metrics using the API or downstream tools, users will must be on a dbt Cloud Team or Enterprise plan.