Hi, I'm James

Opinions on naming dbt models

Strong opinions, loosely held.

The first 97% of quality is achieved by naming things well. Approximately.

Having built and operated a data warehouse from scratch three times, I’ve gained 1 or 2 strong opinions about naming the gazillions of databases, schemas, tables, columns, and functions involved.

The rules

General

  • snake_case only for identifiers. No exceptions.
  • Most to least significant names. E.g., account_marketing_channel_first instead of first_marketing_channel_for_account.
  • Use double-underscores to create namespaces (e.g., stg__google_analytics__events, stg__google_analytics__properties).
  • Even if imperfect, follow the names that are already present in the codebase.
  • Never change names in the consumption layer, no matter how bad they are.
  • Consumption layer names are always nouns and plural (e.g., fct__leads).
  • No abbreviations (excepting the prefixes defined below).

Catalogs / databases

  • Use suffixes _prd, _dev, and _branch_* to differentiate environments:
    • Production: thing_prd
    • Development / pre-prod: thing_dev
    • Personal / feature branches: thing_branch_pr_42 or thing_branch_james
  • The warehouse is called warehouse_prd. No need to be clever.
  • Ingestion databases are named after the systems they replicate (e.g., salesforce_prd).

Schemas

  • The options are bronze, silver, and gold.
  • Maybe you also have models and functions.

Tables & views

  • Never depend on fully-qualified names to disambiguate data models (e.g., warehouse_prd.silver.account collides with salesforce_prd.bronze.account).
  • Use prefixes to differentiate the roles of different tables / data models:
    • raw__: ingested raw data
    • stg__: cleaned staged versions of raw data
    • int__: intermediate transformations
    • fct__, dim__: dimensional data models
    • wide__: wide-table data models
    • feat__: feature tables
    • rpt__: reports
  • A fit-for-purpose data mart can avoid the need for prefixes (e.g., a mart of finance reports, named the way finance likes it).
  • Use suffixes to differentiate table grains (e.g., _daily, _weekly for time-series pre-calculated aggregations).
  • Build up names in a way that ensures tables related to a specific topic are always grouped together (e.g., raw__salesforce__account and raw__salesforce__opportunity).
  • Don’t use names to differentiate tables from views (e.g., v_). information_schema handles that for you.

Columns

  • Consumption layer tables have a unique ID column that shares a name with the table (e.g., fct__leads.lead_id not fct__leads.id).
  • Booleans are prefixed with is_ or has_, always in the present tense (e.g., is_invalid not was_flagged_invalid).
  • Numerics are always suffixed with a unit such as _count, _usd, _percent, _days, etc.
  • Datetimes are always suffixed with _at.
  • Dates are always suffixed with _date and are always in a specific time-zone (e.g., America/New_York).
  • If dates / times are in a different time zone, include the locality source in the name (e.g., created_date_user_localized).

Macros & functions

  • Verbs. With the addition of _to_ and _from_ to be honorary verbs for casting and conversion.
  • Be clear about what the thing does, a long name is probably better (e.g., generate_date_spine, date_to_day_of_week).

An example

Here’s a sample of a real warehouse that has successfully survived maintenance since 2020:

marts_prd
├── finance
│   ├── annual_recurring_revenue_weekly
│   ├── annual_recurring_revenue_monthly
│   └── ...
├── contact_center
│   ├── aging_support_cases_daily
│   ├── support_case_inventory
│   └── ...
└── ...

warehouse_prd
├── gold
│   ├── dim__employee
│   │   ┌─────────────┬──────────────┬─────┐
│   │   │ employee_id │ tenure_years │ ... │
│   │   ├─────────────┼──────────────┼─────┤
│   │   │ 8c4jfqa0    │ 4.6          │     │
│   │
│   ├── fct__support_tickets
│   │   ┌───────────────────┬───────────────────┬──────────┬─────┐
│   │   │ support_ticket_id │ owner_employee_id │ age_days │ ... │
│   │   ├───────────────────┼───────────────────┼──────────┼─────┤
│   │   │ cnhwgd72          │ 8c4jfqa0          │ 0.3      │     │
│   │
│   └── ...
├── silver
│   ├── int__genesys_conversation_zendesk_ticket_map
│   ├── stg__genesys__conversation
│   ├── stg__zendesk__ticket
│   └── ...
└── bronze
    ├── raw__genesys__conversation
    ├── raw__genesys__rule
    ├── raw__zendesk__ticket
    ├── raw__zendesk__user
    └── ...