Introduction
Today we will explore what polars selectors are, and what we can do with them.
What are Polars Selectors?
Selectors in polars are tools that you can use to select columns based on their properties, data types or patterns. For example if you want to select all string/numeric columns in your data frames:
import polars as pl
import polars.selectors as cs
df = pl.DataFrame({
"id": [1, 2],
"test_score": [88, 92],
"final_score": [95, 89],
"category": ["A", "B"],
"updated_at": [None, None]
})
# select all string columns
df.select(cs.string())
# select all numeric columns
df.select(cs.numeric())There are mainly three types of selectors in polars, 1. type based - you select columns based on data type; 2. pattern based - select columns based on pattern matching; 3. set logic selectors - combing multiple selectors together.
We will look at them by category
Type Based Selectors
cs.numeric()
cs.string()
cs.temporal()
Selector targets columns with time-based data type.
import polars as pl
import polars.selectors as cs
from datetime import datetime
df = pl.DataFrame({
"id": [1, 2],
"transaction_date": [datetime(2023, 5, 12), datetime(2023, 6, 15)],
"upload_at": [datetime(2023, 5, 13, 10, 0), datetime(2023, 6, 16, 11, 0)],
"amount": [100.0, 200.0]
})
# Use cs.temporal() to apply a date operation to all time columns
result = df.with_columns(
cs.temporal().dt.month_start()
)cs.by_name()
- Selecting columns by name
import polars as pl
import polars.selectors as cs
df = pl.DataFrame({
"id": [1, 2],
"test_score": [88, 92],
"final_score": [95, 89],
"category": ["A", "B"],
"updated_at": [None, None]
})
# Select specific columns by exact name
df.select(cs.by_name("id", "category"))cs.by_dtype()
- Selecting columns by data type
Pattern-based Selectors
cs.contains()
cs.contains("score")cs.matches()
- Allow to use regex patterns
import polars as pl
import polars.selectors as cs
df = pl.DataFrame({
"abc_123": [1],
"abc_456": [2],
"xyz_123": [3],
"id_primary": [4],
"id_secondary": [5]
})
# Select columns that start with 3 letters, an underscore, and then numbers
# Pattern: ^[a-z]{3}_\d+$
result = df.select(
cs.matches(r"^[a-z]{3}_\d+$")
)cs.starts_with()
cs.starts_with("sale_")cs.ends_with()
cs.ends_with("sum")Combining Selectors
Set Operations
- Union (
|)
target_cols = cs.starts_with("sale_") | cs.numeric()
df.select(target_cols)- Intersection (
&)
target_cols = cs.starts_with("sale_") & cs.numeric()
df.select(target_cols)- Difference (
-): used to exclude some columns
# Logic: All Numerics MINUS the columns we want to protect
features = cs.numeric() - cs.by_name("user_id", "target_label")
df.with_columns(
features.standardize()
)- Complement (
~)
df.select(~cs.string())Practice Exercise
Now it’s time to practice! Try solving this exercise using selectors:
Scenario: You have a sales dataset with the following structure:
import polars as pl
import polars.selectors as cs
from datetime import datetime
sales_df = pl.DataFrame({
"order_id": [1001, 1002, 1003, 1004, 1005],
"customer_id": [501, 502, 503, 504, 505],
"product_price": [29.99, 149.99, 79.99, 199.99, 49.99],
"shipping_cost": [5.99, 12.99, 8.99, 15.99, 6.99],
"tax_amount": [2.40, 12.00, 6.40, 16.00, 4.00],
"sale_region": ["North", "South", "East", "West", "North"],
"sale_channel": ["Online", "Store", "Online", "Store", "Online"],
"order_date": [
datetime(2026, 1, 10),
datetime(2026, 1, 11),
datetime(2026, 1, 12),
datetime(2026, 1, 13),
datetime(2026, 1, 14)
],
"delivery_date": [
datetime(2026, 1, 15),
datetime(2026, 1, 16),
datetime(2026, 1, 17),
datetime(2026, 1, 18),
datetime(2026, 1, 19)
]
})Tasks:
- Select all columns that contain the word “sale” in their name
- Select all numeric columns EXCEPT the ID columns (
order_idandcustomer_id) - Calculate the sum of all columns that end with “_cost” or “_amount”
- Extract just the month from all temporal columns
- Select all string columns that start with “sale_” and convert them to lowercase
Bonus Challenge: Create a single expression that selects all numeric columns (except IDs), rounds them to 2 decimal places, and adds a “_rounded” suffix to each column name.
Click to see solutions
# Task 1: Select columns containing "sale"
sales_df.select(cs.contains("sale"))
# Task 2: Numeric columns excluding IDs
sales_df.select(cs.numeric() - cs.by_name("order_id", "customer_id"))
# Task 3: Sum of cost and amount columns
sales_df.select(
(cs.ends_with("_cost") | cs.ends_with("_amount")).sum()
)
# Task 4: Extract month from temporal columns
sales_df.with_columns(
cs.temporal().dt.month()
)
# Task 5: Lowercase string columns starting with "sale_"
sales_df.with_columns(
(cs.starts_with("sale_") & cs.string()).str.to_lowercase()
)
# Bonus: Round numeric columns (except IDs) with suffix
sales_df.with_columns(
(cs.numeric() - cs.by_name("order_id", "customer_id"))
.round(2)
.name.suffix("_rounded")
)