Author(s): Dua Asif
Originally published on Towards AI.
# config.py
DATABASE_URL = "postgresql://localhost/mydb"
API_KEY = "sk_live_abc123"
DEBUG = True
# app.py
import config
def connect_database():
return psycopg2.connect(config.DATABASE_URL)
def call_api(endpoint):
return requests.get(f"https://api.example.com/{endpoint}",
headers={'X-API-Key': config.API_KEY})
Clean. Easy. Each module can access configuration import configNo passing parameters everywhere, convenient,
Then the bug reports started.
“The app works fine on my machine but not in production.” “Tests pass locally but fail in CI.” “Feature works when run alone but breaks when run after other tests.”
I couldn’t reproduce any of them. The code worked perfectly for me. But users were experiencing random, intermittent failures.
Then I discovered the problem…global variables.
# test_api.py
import config
def test_with_mock_api():
# Override config for testing
config.API_KEY = "test_key_123"
result = call_api('users')
assert result.status_code == 200
# test_database.py
import config
def test_with_test_database():
# Override config for testing
config.DATABASE_URL = "postgresql://localhost/testdb"
result = query_users()
assert len(result) > 0
The tests were modifying the global situation. When run in sequence they interfere with each other. The order of test execution determines which tests pass.
# Run test_api first, then test_database
pytest test_api.py test_database.py # Both pass
# Run test_database first, then test_api
pytest test_database.py test_api.py # test_api fails!
# Why?
# test_database changed DATABASE_URL globally
# test_api used the changed value
# Results were unpredictable
That month… a victim of race conditions and strange test failures… I learned the most painful lesson about global state.
Global variables are not convenient. They are invisible dependencies that make it impossible to reason about the code. And they create bugs that appear only under specific circumstances.
Let me show you how global variables destroyed my code.
race condition that occurred only in production
I had a counter as a global variable.
# counter.py
request_count = 0
def track_request():
global request_count
request_count += 1
return request_count
# app.py
from counter import track_request
@app.route('/api/endpoint')
def handle_request():
count = track_request()
logging.info(f"Request #{count}")
return process_request()
Did excellent work in development. Each request gets a unique sequential number.
In production with multiple threads…the request number was wrong.
# Thread 1 and Thread 2 run simultaneously
# Thread 1: Read request_count (0)
# Thread 2: Read request_count (0)
# Thread 1: Add 1, get 1
# Thread 2: Add 1, get 1
# Both threads see count = 1
# Later
# Thread 3: Read request_count (1)
# Thread 4: Read request_count (1)
# Thread 3: Add 1, get 2
# Thread 4: Add 1, get 2
# Counts are duplicated!
# Logs show
# Request #1
# Request #1 ← Duplicate!
# Request #2
# Request #2 ← Duplicate!
Global variables are not thread-safe. Multiple threads accessing the same global create race conditions.
import threading
request_count = 0
count_lock = threading.Lock()
def track_request():
global request_count
with count_lock: # Only one thread at a time
request_count += 1
return request_count
But this introduced another problem…lock contention. Every request had to wait for the lock. Under load, performance degrades.
# Better: thread-local storage
import threading
thread_local = threading.local()
def track_request():
if not hasattr(thread_local, 'request_count'):
thread_local.request_count = 0
thread_local.request_count += 1
return thread_local.request_count
# Each thread has its own counter
# No race conditions
# No lock contention
Or don’t use globals at all…
class RequestTracker:
def __init__(self):
self.count = 0
self.lock = threading.Lock()def track(self):
with self.lock:
self.count += 1
return self.count
# Create one instance per application
tracker = RequestTracker()
@app.route('/api/endpoint')
def handle_request():
count = tracker.track()
logging.info(f"Request #{count}")
return process_request()
The cache that caused the memory leak
I created a global cache for performance.
# cache.py
_cache = {}
def cache_set(key, value):
_cache(key) = value
def cache_get(key):
return _cache.get(key)
# user_service.py
from cache import cache_set, cache_get
def get_user(user_id):
cached = cache_get(f"user:{user_id}")
if cached:
return cacheduser = database.query("SELECT * FROM users WHERE id = %s", user_id)
cache_set(f"user:{user_id}", user)
return user
good Jop. Cached users, fast lookup.
In production… memory usage continued to increase. 8GB, 16GB, 32GB were used in this process. Finally crashed.
# After 1 hour
_cache = {
'user:1': {...},
'user:2': {...},
'user:3': {...},
# ... 10,000 users cached
}
# After 1 day
_cache = {
'user:1': {...},
'user:2': {...},
# ... 240,000 users cached
# 5GB of memory
}
# After 1 week
# 1,680,000 cached users
# 32GB of memory
# Server crashes
The cache never cleared itself. Each unique user_id was cached forever. Global state stored infinitely.
from functools import lru_cache
@lru_cache(maxsize=1000) # Limit cache size
def get_user(user_id):
return database.query("SELECT * FROM users WHERE id = %s", user_id)
# Or use time-based expiration
from datetime import datetime, timedelta
class ExpiringCache:
def __init__(self, ttl_seconds=300):
self._cache = {}
self._expiry = {}
self.ttl = timedelta(seconds=ttl_seconds)def set(self, key, value):
self._cache(key) = value
self._expiry(key) = datetime.now() + self.ttl
def get(self, key):
if key not in self._cache:
return None
if datetime.now() > self._expiry(key):
del self._cache(key)
del self._expiry(key)
return None
return self._cache(key)
# One cache instance
cache = ExpiringCache(ttl_seconds=300)
def get_user(user_id):
cached = cache.get(f"user:{user_id}")
if cached:
return cached
user = database.query("SELECT * FROM users WHERE id = %s", user_id)
cache.set(f"user:{user_id}", user)
return user
The test that changed the global situation
I wrote a test that modified the global.
# settings.py
DEBUG = False
MAX_RETRIES = 3
TIMEOUT = 30
# test_api.py
import settings
def test_api_with_debug():
# Enable debug mode for this test
settings.DEBUG = Trueresult = call_api_endpoint()
assert 'debug_info' in result
# test_retry.py
import settings
def test_retry_logic():
# This test expects DEBUG = False
result = call_api_endpoint()
assert 'debug_info' not in result
# If test_api runs first
# settings.DEBUG is now True
# test_retry fails!
The tests revised the global situation. Modifications continued during trials. Test order matters.
# Test isolation broken
pytest test_api.py test_retry.py # test_retry fails
pytest test_retry.py test_api.py # Both pass
pytest test_retry.py # Passes when run alone
pytest test_api.py test_retry.py # Fails when run after test_api
I needed to save and restore the kingdom.
import settings
import pytest
@pytest.fixture
def restore_settings():
# Save original values
original_debug = settings.DEBUG
original_retries = settings.MAX_RETRIES
original_timeout = settings.TIMEOUTyield # Test runs here
# Restore original values
settings.DEBUG = original_debug
settings.MAX_RETRIES = original_retries
settings.TIMEOUT = original_timeout
def test_api_with_debug(restore_settings):
settings.DEBUG = True
result = call_api_endpoint()
assert 'debug_info' in result
def test_retry_logic(restore_settings):
result = call_api_endpoint()
assert 'debug_info' not in result
Or better… don’t modify globals in tests.
# Instead of modifying globals
def test_api_with_debug():
settings.DEBUG = True
result = call_api_endpoint()
# Pass configuration explicitly
def test_api_with_debug():
config = {'DEBUG': True}
result = call_api_endpoint(config)
# Or use dependency injection
def call_api_endpoint(debug=False):
if debug:
return {'data': '...', 'debug_info': '...'}
return {'data': '...'}
def test_api_with_debug():
result = call_api_endpoint(debug=True)
assert 'debug_info' in result
the singleton that wasn’t
I created the singleton pattern with global.
# database.py
_db_instance = None
def get_database():
global _db_instance
if _db_instance is None:
_db_instance = DatabaseConnection()
return _db_instance
# Multiple modules use it
from database import get_database
def save_user(user):
db = get_database()
db.execute("INSERT INTO users ...")
def get_orders():
db = get_database()
return db.query("SELECT * FROM orders")
I thought I had a database connection. In fact, I had one per process.
# With multiprocessing
from multiprocessing import Process
def worker():
db = get_database()
# Each process creates its own connection
# _db_instance is not shared between processes
# In process 1
get_database() # Creates connection A
# In process 2
get_database() # Creates connection B (different instance!)
# Two separate connections
# Not a singleton across processes
And with thread safety issues…..
# Thread 1 and Thread 2 run simultaneously
def get_database():
global _db_instance
if _db_instance is None: # Thread 1 checks: None
# Thread 2 checks: None
_db_instance = DatabaseConnection() # Thread 1 creates connection
_db_instance = DatabaseConnection() # Thread 2 creates connection (overwrites)
return _db_instance
# Two connections created
# One immediately lost
# Memory leak
Thread-safe singleton…
import threading
_db_instance = None
_db_lock = threading.Lock()
def get_database():
global _db_instance
if _db_instance is None:
with _db_lock:
# Double-check inside lock
if _db_instance is None:
_db_instance = DatabaseConnection()
return _db_instance
But better… don’t use singleton or global.
class Application:
def __init__(self):
self.db = DatabaseConnection()def save_user(self, user):
self.db.execute("INSERT INTO users ...")
def get_orders(self):
return self.db.query("SELECT * FROM orders")
# Create one instance
app = Application()
# Pass it where needed
@app.route('/users')
def users_endpoint():
return app.get_orders()
Side effect of import that broke everything
I had initialization code at module level.
# config.py
import os
DATABASE_URL = os.environ('DATABASE_URL') # Read from environment
API_KEY = os.environ('API_KEY')
# Initialize connection at import time
db_connection = connect_to_database(DATABASE_URL)
each import of config Executed this code. Even though I just wanted to look at the module.
# test_config.py
import config # This crashes!
# Why?
# Environment variables aren't set
# KeyError: 'DATABASE_URL'
# Even if I'm just testing something else
import other_module
# other_module imports config
# config tries to read environment variables
# Crash!
The side effects of importing made it impossible to test code or even import without a full environment setup.
# Can't do this in tests
import config
config.DATABASE_URL = "postgresql://localhost/testdb"# Because it already tried to connect during import
# Connection was made with os.environ('DATABASE_URL')
# Setting it after import does nothing
I had to set up the entire environment just to import the modules.
# test_config.py
import os
# Must set environment before importing
os.environ('DATABASE_URL') = 'postgresql://localhost/testdb'
os.environ('API_KEY') = 'test_key'
import config # Now it works
# But this affects all other tests
# Global environment pollution
Solution…lazy initialization.
# config.py
import os
_db_connection = None
def get_database_url():
return os.environ.get('DATABASE_URL', 'postgresql://localhost/defaultdb')
def get_api_key():
return os.environ.get('API_KEY', 'default_key')
def get_db_connection():
global _db_connection
if _db_connection is None:
_db_connection = connect_to_database(get_database_url())
return _db_connection
# Now importing doesn't crash
# Connection only created when actually needed
Or use a class…
class Config:
def __init__(self):
self._db_connection = None @property
def database_url(self):
return os.environ.get('DATABASE_URL', 'postgresql://localhost/defaultdb')
@property
def db_connection(self):
if self._db_connection is None:
self._db_connection = connect_to_database(self.database_url)
return self._db_connection
# Create config when needed
config = Config()
# Tests can override
config_test = Config()
config_test._database_url_override = 'postgresql://localhost/testdb'
Hidden dependencies that made testing impossible
My function used global without declaring it.
# config.py
API_ENDPOINT = "https://api.production.com"
# api.py
def call_api(path):
# Uses global without importing it explicitly
import config
url = f"{config.API_ENDPOINT}/{path}"
return requests.get(url)
# Looks fine
result = call_api('users')
But testing was impossible.
# test_api.py
import config
config.API_ENDPOINT = "https://api.test.com"
from api import call_api
# This STILL uses production API!
# Why?
# Because call_api imports config inside the function
# The import happens when the function runs
# After we've already overridden config.API_ENDPOINT
# But the import creates a new reference
Actually, it’s confusing. Let me make it clear…
# api.py
def call_api(path):
import config # Gets the config module
url = f"{config.API_ENDPOINT}/{path}"
# This WILL use the modified value
# Real problem is this
API_ENDPOINT = "https://api.production.com"
def call_api(path):
# Implicitly uses module-level global
url = f"{API_ENDPOINT}/{path}"
return requests.get(url)
# test_api.py
import api
api.API_ENDPOINT = "https://api.test.com"
result = api.call_api('users')
# Still uses production URL if function was called before modificati
Hidden dependencies are invisible in the function signature.
# This function looks like it only depends on 'path'
def call_api(path):
url = f"{API_ENDPOINT}/{path}"
return requests.get(url)
# But it secretly depends on API_ENDPOINT global
# Impossible to see without reading the implementation
Clear dependencies…
# Dependency is visible in signature
def call_api(path, endpoint="https://api.production.com"):
url = f"{endpoint}/{path}"
return requests.get(url)
# Testing is easy
def test_api():
result = call_api('users', endpoint='https://api.test.com')
assert result.status_code == 200
# Production uses default
result = call_api('users')
# Test overrides
result = call_api('users', endpoint='https://mock.api')
Or use dependency injection…..
class APIClient:
def __init__(self, endpoint="https://api.production.com"):
self.endpoint = endpointdef call_api(self, path):
url = f"{self.endpoint}/{path}"
return requests.get(url)
# Production
client = APIClient()
result = client.call_api('users')
# Testing
test_client = APIClient(endpoint='https://api.test.com')
result = test_client.call_api('users')
The monkey patch that had a lasting effect
I did a global monkey-patch for testing.
# production_module.py
import datetime
def get_current_time():
return datetime.datetime.now()
# test_module.py
import datetime
import production_module
def test_time_dependent_function():
# Mock current time
original_now = datetime.datetime.now
datetime.datetime.now = lambda: datetime.datetime(2024, 1, 1, 12, 0, 0)result = production_module.get_current_time()
assert result.day == 1
# Forgot to restore!
# datetime.datetime.now = original_now
# All subsequent tests use mocked time!
def test_something_else():
now = datetime.datetime.now()
print(now) # 2024-01-01 12:00:00
# Forever stuck in the past!
Monkey patching modifies global state. If it is not restored it affects everything.
import datetime
import pytest
@pytest.fixture
def freeze_time():
original_now = datetime.datetime.now
datetime.datetime.now = lambda: datetime.datetime(2024, 1, 1, 12, 0, 0)yield
datetime.datetime.now = original_now # Always restore
def test_time_dependent_function(freeze_time):
result = get_current_time()
assert result.day == 1
Or use a library designed for this…
from freezegun import freeze_time
@freeze_time("2024-01-01 12:00:00")
def test_time_dependent_function():
result = get_current_time()
assert result.day == 1
# Automatically restored after test
What I learned about the global situation
After weeks of debugging impossible-to-reproduce bugs… I learned that global variables aren’t about convenience. They are about hidden dependencies and invisible coupling.
Problems with globals:
- Not thread-safe (race condition)
- Not process-safe (each process has its own)
- Make testing harder (shared state between tests)
- Make code difficult to understand (hidden dependencies)
- Cause of memory leak (stored forever)
- Create order dependencies (initial order matters)
When globals are actually okay:
- real constant (never modified)
- Module-level configuration (if immutable)
- Cache with proper size limit and eviction
- logging configuration
- Read-only lookups (if thread-safe)
The checklist I use now…
Is this actually a global?
☐ Can it be a parameter instead?
☐ Can it be a class attribute instead?
☐ Can it be dependency-injected instead?
If it must be global:
☐ Is it truly constant? (Never modified)
☐ Does it need thread safety?
☐ Does it need process safety?
☐ Does it need size limits?
☐ How will tests handle it?
☐ Are initialization side effects safe?
The Golden Rule…if you are using global Keyword, you’re probably doing something wrong.
Which global variables created bugs that you couldn’t reproduce? Share it below… We’ve all learned that globals make everything harder.
Published via Towards AI
