Opinions on naming dbt models
Strong opinions, loosely held.
The first 97% of quality is achieved by naming things well. Approximately.
Having built and operated a data warehouse from scratch three times, I’ve gained 1 or 2 strong opinions about naming the gazillions of databases, schemas, tables, columns, and functions involved.
The rules
General
snake_caseonly for identifiers. No exceptions.- Most to least significant names. E.g.,
account_marketing_channel_firstinstead offirst_marketing_channel_for_account. - Use double-underscores to create namespaces (e.g.,
stg__google_analytics__events,stg__google_analytics__properties). - Even if imperfect, follow the names that are already present in the codebase.
- Never change names in the consumption layer, no matter how bad they are.
- Consumption layer names are always nouns and plural (e.g.,
fct__leads). - No abbreviations (excepting the prefixes defined below).
Catalogs / databases
- Use suffixes
_prd,_dev, and_branch_*to differentiate environments:- Production:
thing_prd - Development / pre-prod:
thing_dev - Personal / feature branches:
thing_branch_pr_42orthing_branch_james
- Production:
- The warehouse is called
warehouse_prd. No need to be clever. - Ingestion databases are named after the systems they replicate (e.g.,
salesforce_prd).
Schemas
- The options are
bronze,silver, andgold. - Maybe you also have
modelsandfunctions.
Tables & views
- Never depend on fully-qualified names to disambiguate data models (e.g.,
warehouse_prd.silver.accountcollides withsalesforce_prd.bronze.account). - Use prefixes to differentiate the roles of different tables / data models:
raw__: ingested raw datastg__: cleaned staged versions of raw dataint__: intermediate transformationsfct__,dim__: dimensional data modelswide__: wide-table data modelsfeat__: feature tablesrpt__: reports
- A fit-for-purpose data mart can avoid the need for prefixes (e.g., a mart of finance reports, named the way finance likes it).
- Use suffixes to differentiate table grains (e.g.,
_daily,_weeklyfor time-series pre-calculated aggregations). - Build up names in a way that ensures tables related to a specific topic are always grouped together (e.g.,
raw__salesforce__accountandraw__salesforce__opportunity). - Don’t use names to differentiate tables from views (e.g.,
v_).information_schemahandles that for you.
Columns
- Consumption layer tables have a unique ID column that shares a name with the table (e.g.,
fct__leads.lead_idnotfct__leads.id). - Booleans are prefixed with
is_orhas_, always in the present tense (e.g.,is_invalidnotwas_flagged_invalid). - Numerics are always suffixed with a unit such as
_count,_usd,_percent,_days, etc. - Datetimes are always suffixed with
_at. - Dates are always suffixed with
_dateand are always in a specific time-zone (e.g.,America/New_York). - If dates / times are in a different time zone, include the locality source in the name (e.g.,
created_date_user_localized).
Macros & functions
- Verbs. With the addition of
_to_and_from_to be honorary verbs for casting and conversion. - Be clear about what the thing does, a long name is probably better (e.g.,
generate_date_spine,date_to_day_of_week).
An example
Here’s a sample of a real warehouse that has successfully survived maintenance since 2020:
marts_prd
├── finance
│ ├── annual_recurring_revenue_weekly
│ ├── annual_recurring_revenue_monthly
│ └── ...
│
├── contact_center
│ ├── aging_support_cases_daily
│ ├── support_case_inventory
│ └── ...
│
└── ...
warehouse_prd
├── gold
│ ├── dim__employee
│ │ ┌─────────────┬──────────────┬─────┐
│ │ │ employee_id │ tenure_years │ ... │
│ │ ├─────────────┼──────────────┼─────┤
│ │ │ 8c4jfqa0 │ 4.6 │ │
│ │
│ ├── fct__support_tickets
│ │ ┌───────────────────┬───────────────────┬──────────┬─────┐
│ │ │ support_ticket_id │ owner_employee_id │ age_days │ ... │
│ │ ├───────────────────┼───────────────────┼──────────┼─────┤
│ │ │ cnhwgd72 │ 8c4jfqa0 │ 0.3 │ │
│ │
│ └── ...
│
├── silver
│ ├── int__genesys_conversation_zendesk_ticket_map
│ ├── stg__genesys__conversation
│ ├── stg__zendesk__ticket
│ └── ...
│
└── bronze
├── raw__genesys__conversation
├── raw__genesys__rule
├── raw__zendesk__ticket
├── raw__zendesk__user
└── ...