5 Useful DIY Python Functions for JSON Parsing and Processing

Image by author

# Introduction

work with JSON This is often challenging in Python. basic json.loads() Only takes you so far.

API responses, configuration files, and data exports often contain JSON that is garbled or poorly structured. You need to flatten nested objects without safely extracting values key error Exceptions, merging multiple JSON files, or converting between JSON and other formats. These tasks are constantly encountered in web scraping, API integration, and data processing. This article walks you through five practical functions to handle common JSON parsing and processing tasks.

You can find the code for these functions here GitHub.

# 1. Safely removing nested values

JSON objects often reside several levels deep. Accessing deeply nested values becomes increasingly challenging with bracket notation. If a key is missing, you will find one key error.

Here’s a function that lets you access nested values using dot notation, with fallback to missing keys:

def get_nested_value(data, path, default=None):
    """
    Safely extract nested values from JSON using dot notation.

    Args:
        data: Dictionary or JSON object
        path: Dot-separated string like "user.profile.email"
        default: Value to return if path doesn't exist

    Returns:
        The value at the path, or default if not found
    """
    keys = path.split('.')
    current = data

    for key in keys:
        if isinstance(current, dict):
            current = current.get(key)
            if current is None:
                return default
        elif isinstance(current, list):
            try:
                index = int(key)
                current = current(index)
            except (ValueError, IndexError):
                return default
        else:
            return default

    return current

Let’s test this with a complex nested structure:

# Sample JSON data
user_data = {
    "user": {
        "id": 123,
        "profile": {
            "name": "Allie",
            "email": "allie@example.com",
            "settings": {
                "theme": "dark",
                "notifications": True
            }
        },
        "posts": (
            {"id": 1, "title": "First Post"},
            {"id": 2, "title": "Second Post"}
        )
    }
}

# Extract values
email = get_nested_value(user_data, "user.profile.email")
theme = get_nested_value(user_data, "user.profile.settings.theme")
first_post = get_nested_value(user_data, "user.posts.0.title")
missing = get_nested_value(user_data, "user.profile.age", default=25)

print(f"Email: {email}")
print(f"Theme: {theme}")
print(f"First post: {first_post}")
print(f"Age (default): {missing}")

Output:

Email: allie@example.com
Theme: dark
First post: First Post
Age (default): 25

The function path splits the string at points and walks through the data structure one key at a time. At each level, it checks whether the current value is a dictionary or a list. It is used for dictionaries .get(key)who comes back None For missing keys instead of generating an error. For lists, it tries to convert the key to an integer index.

default The parameter provides a fallback when part of the path does not exist. This prevents your code from crashing when working with incomplete or inconsistent JSON data from the API.

This pattern is particularly useful when processing API responses where some fields are optional or present only under certain conditions.

# 2. Flattening nested JSON into single-level dictionaries

Machine learning models, CSV exports, and database inserts often require flat data structures. But API responses and configuration files use nested JSON. Converting nested objects to flat key-value pairs is a common task.

Here’s a function that concatenates nested JSON with customizable separators:

def flatten_json(data, parent_key='', separator="_"):
    """
    Flatten nested JSON into a single-level dictionary.

    Args:
        data: Nested dictionary or JSON object
        parent_key: Prefix for keys (used in recursion)
        separator: String to join nested keys

    Returns:
        Flattened dictionary with concatenated keys
    """
    items = ()

    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{parent_key}{separator}{key}" if parent_key else key

            if isinstance(value, dict):
                # Recursively flatten nested dicts
                items.extend(flatten_json(value, new_key, separator).items())
            elif isinstance(value, list):
                # Flatten lists with indexed keys
                for i, item in enumerate(value):
                    list_key = f"{new_key}{separator}{i}"
                    if isinstance(item, (dict, list)):
                        items.extend(flatten_json(item, list_key, separator).items())
                    else:
                        items.append((list_key, item))
            else:
                items.append((new_key, value))
    else:
        items.append((parent_key, data))

    return dict(items)

Let’s now flatten a complex nested structure:

# Complex nested JSON
product_data = {
    "product": {
        "id": 456,
        "name": "Laptop",
        "specs": {
            "cpu": "Intel i7",
            "ram": "16GB",
            "storage": {
                "type": "SSD",
                "capacity": "512GB"
            }
        },
        "reviews": (
            {"rating": 5, "comment": "Excellent"},
            {"rating": 4, "comment": "Good value"}
        )
    }
}

flattened = flatten_json(product_data)

for key, value in flattened.items():
    print(f"{key}: {value}")

Output:

product_id: 456
product_name: Laptop
product_specs_cpu: Intel i7
product_specs_ram: 16GB
product_specs_storage_type: SSD
product_specs_storage_capacity: 512GB
product_reviews_0_rating: 5
product_reviews_0_comment: Excellent
product_reviews_1_rating: 4
product_reviews_1_comment: Good value

The function uses recursion to handle arbitrary nesting depth. When it encounters a dictionary, it processes each key-value pair, combining the original key with the separator to produce the flattened key.

For lists, it uses the index as part of the key. This lets you preserve the order and structure of array elements in the flattened output. Sample reviews_0_rating Tells you that this is the rating of the first review.

separator The parameter lets you customize the output format. Use dots for dot notation, underscores for snake_case, or slashes for path-like keys, depending on your needs.

This function is especially useful when you need to convert JSON API responses into DataFrames or CSV rows, where each column needs a unique name.

# 3. Deep Merging Multiple JSON Objects

Configuration management often requires merging multiple JSON files containing default settings, environment-specific configurations, user preferences, and more. a simple dict.update() Only handles the top level. You need deep merge which recursively combines nested structures.

Here’s a function that deep merges JSON objects:

def deep_merge_json(base, override):
    """
    Deep merge two JSON objects, with override taking precedence.

    Args:
        base: Base dictionary
        override: Dictionary with values to override/add

    Returns:
        New dictionary with merged values
    """
    result = base.copy()

    for key, value in override.items():
        if key in result and isinstance(result(key), dict) and isinstance(value, dict):
            # Recursively merge nested dictionaries
            result(key) = deep_merge_json(result(key), value)
        else:
            # Override or add the value
            result(key) = value

    return result

Let’s try to merge sample configuration information:

import json

# Default configuration
default_config = {
    "database": {
        "host": "localhost",
        "port": 5432,
        "timeout": 30,
        "pool": {
            "min": 2,
            "max": 10
        }
    },
    "cache": {
        "enabled": True,
        "ttl": 300
    },
    "logging": {
        "level": "INFO"
    }
}

# Production overrides
prod_config = {
    "database": {
        "host": "prod-db.example.com",
        "pool": {
            "min": 5,
            "max": 50
        }
    },
    "cache": {
        "ttl": 600
    },
    "monitoring": {
        "enabled": True
    }
}

merged = deep_merge_json(default_config, prod_config)

print(json.dumps(merged, indent=2))

Output:

{
  "database": {
    "host": "prod-db.example.com",
    "port": 5432,
    "timeout": 30,
    "pool": {
      "min": 5,
      "max": 50
    }
  },
  "cache": {
    "enabled": true,
    "ttl": 600
  },
  "logging": {
    "level": "INFO"
  },
  "monitoring": {
    "enabled": true
  }
}

The function recursively merges nested dictionaries. When both the base and the override have dictionaries on the same key, it merges those dictionaries instead of completely replacing them. This preserves values that are not explicitly overridden.

notice how database.port And database.timeout Whereas, stick to the default configuration database.host Gets overridden. Pool settings merge at nested level min And max Both get updated.

The function also adds new keys that are not present in the base configuration, such as monitoring Section in Production Overrides.

You can chain multiple merges into a layer configuration:

final_config = deep_merge_json(
    deep_merge_json(default_config, prod_config),
    user_preferences
)

This pattern is common in application configurations where you have default, environment-specific settings and runtime overrides.

# 4. Filtering JSON by Schema or Whitelist

APIs often return more data than you need. Large JSON responses make your code harder to read. Sometimes you only want specific fields, or you need to remove sensitive data before logging.

Here’s a function that filters JSON to only contain specified fields:

def filter_json(data, schema):
    """
    Filter JSON to keep only fields specified in schema.

    Args:
        data: Dictionary or JSON object to filter
        schema: Dictionary defining which fields to keep
                Use True to keep a field, nested dict for nested filtering

    Returns:
        Filtered dictionary containing only specified fields
    """
    if not isinstance(data, dict) or not isinstance(schema, dict):
        return data

    result = {}

    for key, value in schema.items():
        if key not in data:
            continue

        if value is True:
            # Keep this field as-is
            result(key) = data(key)
        elif isinstance(value, dict):
            # Recursively filter nested object
            if isinstance(data(key), dict):
                filtered_nested = filter_json(data(key), value)
                if filtered_nested:
                    result(key) = filtered_nested
            elif isinstance(data(key), list):
                # Filter each item in the list
                filtered_list = ()
                for item in data(key):
                    if isinstance(item, dict):
                        filtered_item = filter_json(item, value)
                        if filtered_item:
                            filtered_list.append(filtered_item)
                    else:
                        filtered_list.append(item)
                if filtered_list:
                    result(key) = filtered_list

    return result

Let’s filter a sample API response:

import json
# Sample API response
api_response = {
    "user": {
        "id": 789,
        "username": "Cayla",
        "email": "cayla@example.com",
        "password_hash": "secret123",
        "profile": {
            "name": "Cayla Smith",
            "bio": "Software developer",
            "avatar_url": "https://example.com/avatar.jpg",
            "private_notes": "Internal notes"
        },
        "posts": (
            {
                "id": 1,
                "title": "Hello World",
                "content": "My first post",
                "views": 100,
                "internal_score": 0.85
            },
            {
                "id": 2,
                "title": "Python Tips",
                "content": "Some tips",
                "views": 250,
                "internal_score": 0.92
            }
        )
    },
    "metadata": {
        "request_id": "abc123",
        "server": "web-01"
    }
}

# Schema defining what to keep
public_schema = {
    "user": {
        "id": True,
        "username": True,
        "profile": {
            "name": True,
            "avatar_url": True
        },
        "posts": {
            "id": True,
            "title": True,
            "views": True
        }
    }
}

filtered = filter_json(api_response, public_schema)

print(json.dumps(filtered, indent=2))

Output:

{
  "user": {
    "id": 789,
    "username": "Cayla",
    "profile": {
      "name": "Cayla Smith",
      "avatar_url": "https://example.com/avatar.jpg"
    },
    "posts": (
      {
        "id": 1,
        "title": "Hello World",
        "views": 100
      },
      {
        "id": 2,
        "title": "Python Tips",
        "views": 250
      }
    )
  }
}

The schema acts as a whitelist. setting fields for True Include it in the output. Using nested dictionaries allows you to filter nested objects. The function recursively applies the schema to nested structures.

For arrays, the schema is applied to each item. In the example, the posts array is filtered so each post contains only id, titleAnd viewsWhereas content And internal_score are excluded.

Pay attention to how sensitive areas are password_hash And private_notes Do not appear in the output. This makes the function useful for logging or cleaning data before sending it to a frontend application.

You can create different schemas for different use cases, such as a minimal schema for list views, a detailed schema for single-item views, and an admin schema that covers everything.

# 5. Converting JSON to Dot Notation

Some systems use flat key-value stores, but you may want to work with nested JSON in your code. Converting between flat dot-notation keys and nested structures helps to achieve this.

Here are a pair of functions for bidirectional conversion.

// Converting JSON to Dot Notation

def json_to_dot_notation(data, parent_key=''):
    """
    Convert nested JSON to flat dot-notation dictionary.

    Args:
        data: Nested dictionary
        parent_key: Prefix for keys (used in recursion)

    Returns:
        Flat dictionary with dot-notation keys
    """
    items = {}

    if isinstance(data, dict):
        for key, value in data.items():
            new_key = f"{parent_key}.{key}" if parent_key else key

            if isinstance(value, dict):
                items.update(json_to_dot_notation(value, new_key))
            else:
                items(new_key) = value
    else:
        items(parent_key) = data

    return items

// Converting dot notation to JSON

def dot_notation_to_json(flat_data):
    """
    Convert flat dot-notation dictionary to nested JSON.

    Args:
        flat_data: Dictionary with dot-notation keys

    Returns:
        Nested dictionary
    """
    result = {}

    for key, value in flat_data.items():
        parts = key.split('.')
        current = result

        for i, part in enumerate(parts(:-1)):
            if part not in current:
                current(part) = {}
            current = current(part)

        current(parts(-1)) = value

    return result

Let’s test the round-trip conversion:

import json
# Original nested JSON
config = {
    "app": {
        "name": "MyApp",
        "version": "1.0.0"
    },
    "database": {
        "host": "localhost",
        "credentials": {
            "username": "admin",
            "password": "secret"
        }
    },
    "features": {
        "analytics": True,
        "notifications": False
    }
}

# Convert to dot notation (for environment variables)
flat = json_to_dot_notation(config)
print("Flat format:")
for key, value in flat.items():
    print(f"  {key} = {value}")

print("n" + "="*50 + "n")

# Convert back to nested JSON
nested = dot_notation_to_json(flat)

print("Nested format:")
print(json.dumps(nested, indent=2))

Output:

Flat format:
  app.name = MyApp
  app.version = 1.0.0
  database.host = localhost
  database.credentials.username = admin
  database.credentials.password = secret
  features.analytics = True
  features.notifications = False

==================================================

Nested format:
{
  "app": {
    "name": "MyApp",
    "version": "1.0.0"
  },
  "database": {
    "host": "localhost",
    "credentials": {
      "username": "admin",
      "password": "secret"
    }
  },
  "features": {
    "analytics": true,
    "notifications": false
  }
}

json_to_dot_notation The function flattens the structure by iteratively walking through nested dictionaries and connecting the keys with pointers. Unlike the earlier flatten function, this does not handle arrays; This configuration is optimized for data that is entirely key-value.

dot_notation_to_json The function reverses the process. It splits each key into points and creates a nested structure by creating intermediate dictionaries as needed. The loop handles all parts except the last part, creating nesting levels. Then it assigns the value to the last key.

This approach keeps your configuration readable and maintainable while working within the constraints of a flat key-value system.

# wrapping up

JSON processing goes beyond basic json.loads(). In most projects, you’ll need tools to navigate nested structures, resize, merge configurations, filter fields, and convert between formats.

The techniques in this article also transfer to other data processing tasks. You can modify these patterns for XML, YAML, or custom data formats.

Start with the Secure Access function to prevent KeyError exceptions in your code. Add others as your specific needs are met. Happy coding!

Bala Priya C is a developer and technical writer from India. She likes to work in the fields of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, Data Science, and Natural Language Processing. She loves reading, writing, coding, and coffee! Currently, she is working on learning and sharing her knowledge with the developer community by writing tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.

5 Useful DIY Python Functions for JSON Parsing and Processing

# Introduction

# 1. Safely removing nested values

# 2. Flattening nested JSON into single-level dictionaries

# 3. Deep Merging Multiple JSON Objects

# 4. Filtering JSON by Schema or Whitelist

# 5. Converting JSON to Dot Notation

// Converting JSON to Dot Notation

// Converting dot notation to JSON

# wrapping up

I Found the Best Way to Run an Internet Speed ​​Test (And Use the Results for Better Wi-Fi)

Zurich goes public with £7.7bn bid for UK insurer Beazley

Related Articles

Leave a Comment Cancel Reply

I Found the Best Way to Run an Internet Speed Test (And Use the Results for Better Wi-Fi)