Background image

API Advice

Type Safety in Python: Pydantic vs. Data Classes vs. Annotations vs. TypedDicts

Tristan Cartledge

Tristan Cartledge

August 29, 2024

Featured blog post image
Success Icon

Tip

A massive thank you to Sydney Runkle (opens in a new tab) from the Pydantic (opens in a new tab) team for her invaluable feedback and suggestions on this post!!

Python's dynamic typing is one of its greatest strengths. It is the language developers use to get things done without getting bogged down by type definitions and boilerplate code. When prototyping, you don't have time to think about unions, generics, or polymorphism - close your eyes, trust the interpreter to guess your variable's type, and then start working on the next feature.

That is, until your prototype takes off and your logs are littered with TypeError: 'NoneType' object is not iterable or TypeError: unsupported operand type(s) for /: 'str' and 'int'. You might blame the users for adding units in the amount field, or the frontend devs for posting null instead of []. So you fix the bug with another if statement, a try block, or the tenth validation function you've written this week. No time for reflection, just keep shipping, right? The ball of twine must grow.

We all know there is a better way. Python has had type annotations for years, and data classes and typed dictionaries allow us to document the shapes of the objects we expect.

Pydantic is the most comprehensive solution available to enforce type safety and data validation in Python, which is why we chose it for our SDKs at Speakeasy.

In this post we'll run through how we got to this conclusion. We'll detail the history of type safety in Python and explain the differences between: type annotations, data classes, TypedDicts, and finally, Pydantic.

If It Walks Like a Duck and It Quacks Like a Duck, Then It Must Be a Duck

Python is a duck-typed language (opens in a new tab). In a duck-typed language, an object's type is determined by its behavior at runtime, based on the parts of the object that are actually used. Duck-typing makes it easier to write generic code that works with different types of objects.

If your code expects a Duck object to make it quack, Python doesn't care if the object is a Mallard or a RubberDuck. From Python's perspective, anything with a quack method is a Duck:


class Duck:
def quack(self):
print("Quack!")
class Mallard:
def quack(self):
print("Quack!")
def make_duck_quack(duck):
duck.quack()
make_duck_quack(Duck()) # prints "Quack!"
make_duck_quack(Mallard()) # prints "Quack!"

This code runs without errors, even though make_duck_quack expects a Duck object in our mental model, and we pass it a Mallard object. The Mallard object has a quack method, so it behaves like a Duck object.

One of the reasons for Python's popularity is its flexibility. You can write generic and reusable code without worrying about the specific object types.

But this flexibility comes at a cost. If you pass the wrong type of object to a function you'll only find out at runtime, leading to bugs that are difficult to track down.

This was the motivation behind developing type annotations.

Type Annotations

Type annotations were introduced in Python 3.5 to add optional type hints to your code (PEP 484 (opens in a new tab)). Type hints can help you catch bugs while you are still writing your code by telling you when you pass the wrong type of object to a function.

Info Icon

TIP

To make the most of these type hints, many developers use type checkers. Type checkers are tools that analyze your Python code without running it, looking for potential type-related errors. One popular type checker is Pylance (opens in a new tab), a Visual Studio Code Extension that checks your Python code for type mismatches and shows you hints in your IDE.

If you're not using VS Code, Pyright (opens in a new tab) has similar functionality and can be run from the command line (opens in a new tab) or as an extension (opens in a new tab) to many text editors.

Here's how you can add type hints to the make_duck_quack function:


class Duck:
def quack(self):
print("Quack!")
class RubberDuck:
def quack(self):
print("Quack!")
def make_duck_quack(duck: Duck):
duck.quack()
make_duck_quack(Duck()) # prints "Quack!"
make_duck_quack(RubberDuck())
# Pylance will show the hint: Argument 1 to "make_duck_quack" has incompatible type "RubberDuck"; expected "Duck".

Now, when you pass a RubberDuck object to the make_duck_quack function, your IDE hints that there's a type mismatch. Using annotations won't prevent you from running the code if there is a type mismatch, but it can help you catch bugs during development.

This covers type annotations for functions, but what about classes? We can use data classes to define a class with specific types for its fields.

Data Classes

Data classes were introduced in Python 3.7 (PEP 557 (opens in a new tab)) as a convenient way to create classes that are primarily used to store data. Data classes automatically generate special methods like __init__(), __repr__(), and __eq__(), reducing boilerplate code. This feature aligns perfectly with our goal of making type-safe code easier to write.

By using data classes, we can define a class with specific types for its fields while writing less code than we would with a traditional class definition. Here's an example:


from dataclasses import dataclass
@dataclass
class Duck:
name: str
age: int
def quack(self):
print(f"{self.name} says: Quack!")
donald = Duck("Donald", 5)
print(donald) # Duck(name='Donald', age=5)
donald.quack() # Donald says: Quack!
daffy = Duck("Daffy", "3")
# Pylance will show the hint: Argument of type "Literal['3']" cannot be assigned to parameter "age" of type "int" in function "__init__".

We define a Duck data class with two fields: name and age. When we create a new Duck object and pass in values, the data class automatically generates an __init__() method that initializes the object with these values.

In the data class definition, the type hints specify that the name field should be a string and that age should be an integer. If we create a Duck object with the wrong data types, the IDE hints that there's a type mismatch in the __init__ method.

We get a level of type safety that wasn't there before, but at runtime, the data class still accepts any value for the fields, even if they don't match the type hints. Data classes make it convenient to define classes that store data, but they don't enforce type safety.

What if we're building an SDK and want to help users pass the right types of objects to functions? Using TypedDict types can help with that.

TypedDict Types

Introduced in Python 3.8 (PEP 589 (opens in a new tab)), TypedDict lets you define specific key and value types for dictionaries, making it particularly useful when working with JSON-like data structures:


from typing import TypedDict
class DuckStats(TypedDict):
name: str
age: int
feather_count: int
def describe_duck(stats: DuckStats) -> str:
return f"{stats['name']} is {stats['age']} years old and has {stats['feather_count']} feathers."
print(
describe_duck(
{
"name": "Donald",
"age": 5,
"feather_count": 3000,
}
)
)
# Output: Donald is 5 years old and has 3000 feathers.
print(
describe_duck(
{
"name": "Daffy",
"age": "3", # Pylance will show the hint: Argument of type "Literal['3']" cannot be assigned to parameter "age" of type "int" in function "describe_duck"
"feather_count": 5000,
}
)
)

In this example, we define a DuckStats TypedDict with three keys: name, age, and feather_count. The type hints in the TypedDict definition specify that the name key should have a string value, while the age and feather_count keys should have integer values.

When we pass a dictionary to the describe_duck function, the IDE will show us a hint if there is a type mismatch in the dictionary values. This can help us catch bugs early and ensure that the data we are working with has the correct types.

While we now have type hints for dictionaries, data passed to our functions from the outside world are still unvalidated. Users can pass in the wrong types of values and we won't find out until runtime. This brings us to Pydantic.

Pydantic

Pydantic is a data validation library for Python that enforces type hints at runtime. It helps developers with the following:

  1. Data Validation: Pydantic ensures that data conforms to the defined types and constraints.
  2. Data Parsing: Pydantic can convert input data into the appropriate Python types.
  3. Serialization: Pydantic makes it easy to convert Python objects into JSON-compatible formats.
  4. Deserialization: It can transform JSON-like data into Python objects.

These Pydantic functionalities are particularly useful when working with APIs that send and receive JSON data, or when processing user inputs.

Here's how you can use Pydantic to define a data model for a duck:


from pydantic import BaseModel, Field, ValidationError
class Duck(BaseModel):
name: str
age: int = Field(gt=0)
feather_count: int | None = Field(default=None, ge=0)
# Correct initialization
try:
duck = Duck(name="Donald", age=5, feather_count=3000)
print(duck) # Duck(name='Donald', age=5, feather_count=3000)
except ValidationError as e:
print(f"Validation Error:\n{e}")
# Faulty initialization
try:
invalid_duck = Duck(name="Daffy", age=0, feather_count=-1)
print(invalid_duck)
except ValidationError as e:
print(f"Validation Error:\n{e}")

In this example, we define a Duck data model with three fields: name, age, and feather_count. The name field is required and should have a string value, while the age and feather_count fields are optional and should have integer values.

We use the Field class from Pydantic to define additional constraints for the fields. For example, we specify that the age field should be greater than or equal to zero, and the feather_count field should be greater than or equal to zero, or None.

In Python 3.10 and later, we can use the | operator for union types (PEP 604 (opens in a new tab)), allowing us to write int | None instead of Union[int, None].

When we try to create an invalid Duck instance, Pydantic raises a ValidationError. The error message is detailed and helpful:


Validation Error:
2 validation errors for Duck
age
Input should be greater than 0 [type=greater_than, input_value=0, input_type=int]
For further information visit https://errors.pydantic.dev/2.8/v/greater_than
feather_count
Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-1, input_type=int]
For further information visit https://errors.pydantic.dev/2.8/v/greater_than_equal

This error message clearly indicates which fields failed validation and why. It specifies that:

  1. The 'age' should be greater than 0, but we provided 0.
  2. The 'feather_count' should be greater than or equal to 0, but we provided -1.

Detailed error messages make it much easier to identify and fix data validation issues, especially when working with complex data structures or processing user inputs.

Simplifying Function Validation with Pydantic

While we've seen how Pydantic can validate data in models, it can also be used to validate function arguments directly. This can simplify our code while making it safer to run. Let's revisit our describe_duck function using Pydantic's validate_call decorator:


from pydantic import BaseModel, Field, validate_call
class DuckDescription(BaseModel):
name: str
age: int = Field(gt=0)
feather_count: int = Field(gt=0)
@validate_call
def describe_duck(duck: DuckDescription) -> str:
return f"{duck.name} is {duck.age} years old and has {duck.feather_count} feathers."
# Valid input
print(describe_duck(DuckDescription(name="Donald", age=5, feather_count=3000)))
# Output: Donald is 5 years old and has 3000 feathers.
# Invalid input
try:
print(describe_duck(DuckDescription(name="Daffy", age=0, feather_count=-1)))
except ValueError as e:
print(f"Validation Error: {e}")
# Validation Error: 2 validation errors for DuckDescription
# age
# Input should be greater than 0 [type=greater_than, input_value=0, input_type=int]
# For further information visit https://errors.pydantic.dev/2.8/v/greater_than
# feather_count
# Input should be greater than 0 [type=greater_than, input_value=-1, input_type=int]
# For further information visit https://errors.pydantic.dev/2.8/v/greater_than

In this example, we made the following changes:

  1. We defined a DuckDescription Pydantic model to represent the expected structure and types of our duck data.
  2. We used the @validate_call decorator on our describe_duck function. This decorator automatically validates the function's arguments based on the type annotations.
  3. The function now expects a DuckDescription object instead of separate parameters. This ensures that all the data is validated as a unit before the function is called.
  4. We simplified the function body since we can now be confident that the data is valid and of the correct type.

By using Pydantic's @validate_call decorator, we made our function safer and easier to read.

Comparing Python Typing Methods

The table below summarizes the key differences between the Python typing methods we discussed. Keep in mind that some points may have exceptions or nuances depending on your specific use case. The table is meant to provide a general overview only.

FeatureType AnnotationsData ClassesTypedDictPydantic
Static type checking
Runtime type checking
Automatic data validation
JSON serialization
Nested object support
Custom validation rules
IDE autocomplete support
Performance overheadNoneMinimalNoneMinimal
Compatibility with dicts
Standard Library

Why Speakeasy Chose Pydantic

At Speakeasy, we chose Pydantic as the primary tool for data validation and serialization in the Python SDKs we create.

After our initial Python release, support for Pydantic was one of the most requested features from our users. Pydantic provides a great balance between flexibility and type safety. And because Pydantic uses Rust under the hood, it has a negligible performance overhead compared to other third-party data validation libraries.

SDKs are an ideal use case for Pydantic, providing automatic data validation and serialization for the data structures that API users interact with.

By working with the Pydantic team, we've contributed to the development of features that make Pydantic even better suited for SDK development.

The Value of Runtime Type Safety

To illustrate the value of runtime type safety, consider a scenario where we are building an API that receives JSON data from a client to represent an order from a shop. Let's use a TypedDict to define the shape of the order data:


from typing import TypedDict
class Order(TypedDict):
customer_name: str
quantity: int
unit_price: float
def calculate_order_total(order: Order) -> float:
return order["quantity"] * order["unit_price"]
print(
calculate_order_total(
{
"customer_name": "Alex",
"quantity": 10,
"unit_price": 5,
}
)
) # Output: 50

In this example, we define an Order TypedDict with three keys: customer_name, quantity, and unit_price. We then create an order_data dictionary with values for these keys and pass it to the calculate_order_total function.

The calculate_order_total function multiplies the quantity and unit_price values from the order dictionary to calculate the total order amount. It works fine when the order_data dictionary has the correct types of values, but what if the client sends us invalid data?


print(
calculate_order_total(
{
"customer_name": "Sam",
"quantity": 10,
"unit_price": "5",
}
)
) # Output: 5555555555

In this case, the client sends us a string value for the unit_price key instead of a float. Since Python is a duck-typed language, the code will still run without errors, but the result will be incorrect. This is a common source of bugs in Python code, especially when working with JSON data from external sources.

Now, let's see how we can use Pydantic to define a data model for the order data and enforce type safety at runtime:


from pydantic import BaseModel, computed_field
class Order(BaseModel):
customer_name: str
quantity: int
unit_price: float
@computed_field
def calculate_total(self) -> float:
return self.quantity * self.unit_price
order = Order(
customer_name="Sam",
quantity=10,
unit_price="5",
)
print(order.calculate_total) # Output: 50.0

In this case, Pydantic converts the string "5" to a float value of 5.0 for the unit_price field. The automatic type coercion prevents errors and ensures the data is in the correct format.

Pydantic enforces type safety at runtime, but don't we lose the simplicity of passing dictionaries around?

But we don't have to give up on dictionaries.

Using Typed Dictionaries With Pydantic Models

In some cases, you may want to accept both TypedDict and Pydantic models as input to your functions. You can achieve this by using a union type in your function signature:


from typing import TypedDict
from pydantic import BaseModel
class OrderTypedDict(TypedDict):
customer_name: str
quantity: int
unit_price: float
class Order(BaseModel):
customer_name: str
quantity: int
unit_price: float
def calculate_order_total(order: Order | OrderTypedDict) -> float:
if not isinstance(order, BaseModel):
order = Order(**order)
return order.quantity * order.unit_price
print(
calculate_order_total(
{
"customer_name": "Sam",
"quantity": 10,
"unit_price": "5",
}
)
) # Output: 50.0

In this example, we define an OrderTypedDict TypedDict and an Order Pydantic model for the order data. We then define a calculate_order_total function to accept a union type of Order and OrderTypedDict.

If the input is a TypedDict, it'll be converted to a Pydantic model before performing the calculation. Now our function can accept both TypedDict and Pydantic models as input, providing us flexibility while still enforcing type safety at runtime.

Speakeasy SDKs employ this pattern so users can pass in either dictionaries or Pydantic models to the SDK functions, reducing the friction of using the SDK while maintaining type safety.

Conclusion

To learn more about how we use Pydantic in our SDKs, see our post about Python Generation with Async & Pydantic Support.

CTA background illustrations

Speakeasy Changelog

Subscribe to stay up-to-date on Speakeasy news and feature releases.