Get started with MetricFlow

This getting started page presents a sample workflow to help you create your first metrics in dbt Cloud or the command-line interface (CLI). It uses the Jaffle shop example project as the project data source and is available for you to use.

If you prefer, you can create semantic models and metrics for your own dbt project. This page will guide you on how to:

Create a semantic model using MetricFlow
Define metrics using MetricFlow
Test and query metrics locally using MetricFlow
Run a production job in dbt Cloud
Set up dbt Semantic Layer in dbt Cloud
Connect to and query the API with dbt Cloud

MetricFlow allows users to define metrics in their dbt project whether in dbt Cloud or in dbt Core. dbt Core users can use the MetricFlow CLI to define metrics in their local dbt Core project.

However, to experience the power of the universal dbt Semantic Layer and query those metrics in downstream tools, you'll need a dbt Cloud Team or Enterprise account.

Prerequisites

Have an understanding of key concepts in MetricFlow, which powers the revamped dbt Semantic Layer.
Have both your production and development environments running dbt version 1.6 or higher. Refer to upgrade in dbt Cloud for more info.
Use Snowflake, BigQuery, Databricks, Redshift, or Postgres (CLI only. dbt Cloud support coming soon).
Create a successful run in the environment where you configure the Semantic Layer.
- Note: Semantic Layer currently supports the Deployment environment for querying. (development querying experience coming soon)
Set up the Semantic Layer API in the integrated tool to import metric definitions.
- Note: To access the API and query metrics in downstream tools, you must have a dbt Cloud Team or Enterprise account. dbt Core or Developer accounts can define metrics using MetricFlow CLI or the dbt Cloud IDE.
Understand MetricFlow's key concepts, which powers the revamped dbt Semantic Layer.

tip

New to dbt or metrics? Try our Jaffle shop example project to help you get started!

Create a semantic model

The following steps will walk you through setting up semantic models, which you can do with the dbt Cloud IDE or the CLI. Semantic models consist of entities, dimensions, and measures.

We highly recommend you read the overview of what a semantic model is before getting started. If you're working in the Jaffle shop example, delete the orders.yml config or delete the .yml extension so it's ignored during parsing. We'll be rebuilding it step by step in this example.

If you're following the guide in your own project, pick a model that you want to build a semantic manifest from and fill in the config values accordingly.

Create a new yml config file for the orders model, such as orders.yml.

It's best practice to create semantic models in the /models/semantic_models directory in your project. Semantic models are nested under the semantic_models key. First, fill in the name and appropriate metadata, map it to a model in your dbt project, and specify model defaults. For now, default_agg_time_dimension is the only supported default.

semantic_models:
  #The name of the semantic model.
  - name: orders
    defaults:
      agg_time_dimension: ordered_at
    description: |
      Order fact table. This table is at the order grain with one row per order. 
    #The name of the dbt model and schema
    model: ref('orders')

Define your entities. These are the keys in your table that MetricFlow will use to join other semantic models. These are usually columns like customer_id, order_id, and so on.

  #Entities. These usually correspond to keys in the table.
    entities:
      - name: order_id
        type: primary
      - name: location
        type: foreign
        expr: location_id
      - name: customer
        type: foreign
        expr: customer_id

Define your dimensions and measures. Dimensions are properties of the records in your table that are non-aggregatable. They provide categorical or time-based context to enrich metrics. Measures are the building block for creating metrics. They are numerical columns that MetricFlow aggregates to create metrics.

    #Measures. These are the aggregations on the columns in the table.
    measures: 
      - name: order_total
        description: The total revenue for each order.
        agg: sum
      - name: order_count
        expr: 1
        agg: sum
      - name: tax_paid
        description: The total tax paid on each order. 
        agg: sum
      - name: customers_with_orders
        description: Distinct count of customers placing orders
        agg: count_distinct
        expr: customer_id
      - name: locations_with_orders
        description: Distinct count of locations with order
        expr: location_id
        agg: count_distinct
      - name: order_cost
        description: The cost for each order item. Cost is calculated as a sum of the supply cost for each order item. 
        agg: sum
  #Dimensions. Either categorical or time. These add additional context to metrics. The typical querying pattern is Metric by Dimension.  
    dimensions:
      - name: ordered_at
        type: time
        type_params:
          time_granularity: day 
      - name: order_total_dim
        type: categorical
        expr: order_total
      - name: is_food_order
        type: categorical
      - name: is_drink_order
        type: categorical  

Putting it all together, a complete semantic model configurations based on the order model would look like the following example:

semantic_models:
  #The name of the semantic model.
  - name: orders
    defaults:
      agg_time_dimension: ordered_at
    description: |
      Order fact table. This table is at the order grain with one row per order. 
    #The name of the dbt model and schema
    model: ref('orders')
    #Entities. These usually corespond to keys in the table.
    entities:
      - name: order_id
        type: primary
      - name: location
        type: foreign
        expr: location_id
      - name: customer
        type: foreign
        expr: customer_id
    #Measures. These are the aggregations on the columns in the table.
    measures: 
      - name: order_total
        description: The total revenue for each order.
        agg: sum
      - name: order_count
        expr: 1
        agg: sum
      - name: tax_paid
        description: The total tax paid on each order. 
        agg: sum
      - name: customers_with_orders
        description: Distinct count of customers placing orders
        agg: count_distinct
        expr: customer_id
      - name: locations_with_orders
        description: Distinct count of locations with order
        expr: location_id
        agg: count_distinct
      - name: order_cost
        description: The cost for each order item. Cost is calculated as a sum of the supply cost for each order item. 
        agg: sum
    #Dimensions. Either categorical or time. These add additional context to metrics. The typical querying pattern is Metric by Dimension.  
    dimensions:
      - name: ordered_at
        type: time
        type_params:
          time_granularity: day 
      - name: order_total_dim
        type: categorical
        expr: order_total
      - name: is_food_order
        type: categorical
      - name: is_drink_order
        type: categorical  

tip

If you're familiar with writing SQL, you can think of dimensions as the columns you would group by and measures as the columns you would aggregate.

select
  metric_time_day,  -- time
  country,  -- categorical dimension
  sum(revenue_usd) -- measure
from
  snowflake.fact_transactions  -- sql table
group by metric_time_day, country  -- dimensions

Define metrics

Now that you've created your first semantic model, it's time to define your first metric! You can define metrics with the dbt Cloud IDE or CLI.

MetricFlow supports different metric types like simple, ratio, cumulative, and derived. It's recommended that you read the metrics overview docs before getting started.

You can define metrics in the same YAML files as your semantic models or create a new file. If you want to create your metrics in a new file, create another directory called /models/metrics. The file structure for metrics can become more complex from here if you need to further organize your metrics, for example, by data source or business line.
The example metric we'll create is a simple metric that refers directly to the the order_total measure, which will be implemented as a sum() function in SQL. Again, if you're working in the Jaffle shop sandbox, we recommend deleting the original orders.yml file, or removing the .yml extension so it's ignored during parsing. We'll be rebuilding the order_total metric from scratch. If you're working in your own project, create a simple metric like the one below using one of the measures you created in the previous step.

metrics:
  - name: order_total
    description: Sum of total order amount. Includes tax + revenue.
    type: simple
    label: Order Total
    type_params:
      measure: order_total

Save your code, and in the next section, you'll validate your configs before committing them to your repository.

To continue building out your metrics based on your organization's needs, refer to the Build your metrics for detailed info on how to define different metric types and semantic models.

Configure the MetricFlow time spine model

MetricFlow requires a time spine for certain metric types and join resolution patterns, like cumulative metrics. You will have to create this model in your dbt project. This article explains how to add the metricflow_time_spine model to your project.

Test and query metrics

Testing and querying metrics in the dbt Cloud IDE not yet supported

Support for testing or querying metrics in the dbt Cloud IDE is not available in the current beta but is coming soon.

You can use the Preview or Compile buttons in the IDE to run semantic validations and make sure your metrics are defined. You can dynamically query metrics with integrated tools on a dbt Cloud Team or Enterprise plan using the Semantic Layer API.

Currently, you can define and test metrics using the MetricFlow CLI. dbt Cloud IDE support is coming soon. Alternatively, you can test using SQL client tools like DataGrip, DBeaver, or RazorSQL.

This section will explain how you can test and query metrics using the MetricFlow CLI (dbt Cloud IDE support coming soon).

Before you begin, you'll need to install the MetricFlow CLI package and make sure you run at least one model.

Install MetricFlow

Install the MetricFlow CLI as an extension of a dbt adapter from PyPI. The MetricFlow CLI is compatible with Python versions 3.8, 3.9, 3.10 and 3.11

Use pip install metricflow and your dbt adapter:

Create or activate your virtual environment. python -m venv venv or source your-venv/bin/activate
Run pip install "dbt-metricflow[your_adapter_name]"
- You must specify [your_adapter_name].
- For example, run pip install "dbt-metricflow[snowflake]" if you use a Snowflake adapter.

Query and commit your metrics using the CLI

MetricFlow needs a semantic_manifest.json in order to build a semantic graph. To generate a semantic_manifest.json artifact run dbt parse. This will create the file in your /target directory. If you're working from the Jaffle shop example, run dbt seed && dbt run before preceding to ensure the data exists in your warehouse.

Make sure you have the MetricFlow CLI installed and up to date.
Run mf --help to confirm you have MetricFlow installed and view the available commands.
Run mf query --metrics <metric_name> --group-by <dimension_name> to query the metrics and dimensions. For example, mf query --metrics order_total --group-by metric_time
Verify that the metric values are what you expect. To further understand how the metric is being generated, you can view the generated SQL if you type --explain in the CLI.
Run mf validate-configs to run validation on your semantic models and metrics.
Commit and merge the code changes that contain the metric definitions.

To streamline your metric querying process, you can connect to the dbt Semantic Layer API to access your metrics programmatically. For SQL syntax, refer to Querying the API for metric metadata to query metrics using the API.

Run a production job

Before you begin, you must have a dbt Cloud Team or Enterprise multi-tenant deployment, hosted in North America (cloud.getdbt.com login URL).

Once you’ve defined metrics in your dbt project, you can perform a job run in your dbt Cloud deployment environment to materialize your metrics. Only the deployment environment is supported for the dbt Semantic Layer at this moment.

Go to Deploy in the menu bar
Select Jobs to re-run the job with the most recent code in the deployment environment.
Your metric should appear as a red node in the dbt Cloud IDE and dbt directed acyclic graphs (DAG).

DAG with metrics appearing as a red node

Set up dbt Semantic Layer

You can set up the dbt Semantic Layer in dbt Cloud at the environment and project level. Before you begin:

You must have a dbt Cloud Team or Enterprise multi-tenant deployment, hosted in North America.
You must be part of the Owner group, and have the correct license and permissions to configure the Semantic Layer:
- Enterprise plan — Developer license with Account Admin permissions. Or Owner with a Developer license, assigned Project Creator, Database Admin, or Admin permissions.
- Team plan — Owner with a Developer license.
You must have a successful run in your new environment.

tip

If you're using the legacy Semantic Layer, we highly recommend you upgrade your dbt version to dbt v1.6 or higher to use the new dbt Semantic Layer. Refer to the dedicated migration guide for more info.

In dbt Cloud, create a new deployment environment or use an existing environment on dbt 1.6 or higher.
- Note — Deployment environment is currently supported (development experience coming soon)
Navigate to Account Settings and select the specific project you want to enable the Semantic Layer for.
In the Project Details page, navigate to the Semantic Layer section, and select Configure Semantic Layer.

Semantic Layer section in the Project Details page

In the Set Up Semantic Layer Configuration page, enter the credentials you want the Semantic Layer to use specific to your data platform. We recommend credentials have the least privileges required because your Semantic Layer users will be querying it in downstream applications. At a minimum, the Semantic Layer needs to have read access to the schema(s) that contains the dbt models that you used to build your semantic models.

Enter the credentials you want the Semantic Layer to use specific to your data platform, and select the deployment environment.

Select the deployment environment you want for the Semantic Layer and click Save.
After saving it, you'll be provided with the connection information that allows you to connect to downstream tools. If your tool supports JDBC, save the JDBC URL or individual components (like environment id and host).

After you configure the Semantic Layer, you'll be provided with the connection information to connect to you downstream tools.

Save and copy your environment ID, service token, and host, which you'll need to use downstream tools. For more info on how to integrate with partner integrations, refer to Available integrations.
Return to the Project Details page, then select Generate Service Token. You will need Semantic Layer Only and Metadata Only service token permissions.

Great job, you've configured the Semantic Layer 🎉!

Connect and query API

You can query your metrics in a JDBC-enabled tool or use existing first-class integrations with the dbt Semantic Layer.

You must have a dbt Cloud Team or Enterprise multi-tenant deployment, hosted in North America. (Additional region support coming soon)

To learn how to use the JDBC API and what tools you can query it with, refer to the dbt Semantic Layer API.
- To authenticate, you need to generate a service token with Semantic Layer Only and Metadata Only permissions.
- Refer to the SQL query syntax to query metrics using the API.
To learn more about the sophisticated integrations that connect to the dbt Semantic Layer, refer to Available integrations for more info.

FAQs

If you're encountering some issues when defining your metrics or setting up the dbt Semantic Layer, check out a list of answers to some of the questions or problems you may be experiencing.

How do I migrate from the legacy Semantic Layer to the new one?

If you're using the legacy Semantic Layer, we highly recommend you upgrade your dbt version to dbt v1.6 or higher to use the new dbt Semantic Layer. Refer to the dedicated migration guide for more info.

How are you storing my data?

User data passes through the Semantic Layer on its way back from the warehouse. dbt Labs ensures security by authenticating through the customer's data warehouse. Currently, we don't cache data for the long term, but it might temporarily stay in the system for up to 10 minutes, usually less. In the future, we'll introduce a caching feature that allows us to cache data on our infrastructure for up to 24 hours.

Is the dbt Semantic Layer open source?

The dbt Semantic Layer is proprietary, however, some components of the dbt Semantic Layer are open source, like dbt-core and MetricFlow.

dbt Cloud Developer or dbt Core users can define metrics in their project, including a local dbt Core project, using the dbt Cloud IDE or the MetricFlow CLI. However, to experience the universal dbt Semantic Layer and access those metrics using the API or downstream tools, users will must be on a dbt Cloud Team or Enterprise plan.

Prerequisites​

Create a semantic model​

Define metrics​

Configure the MetricFlow time spine model​

Test and query metrics​

Install MetricFlow​

Query and commit your metrics using the CLI​

Run a production job​

Set up dbt Semantic Layer​

Connect and query API​

FAQs​

Next steps​