
Image by author
# Introduction
Parsing date and time is one of those tasks that seems simple until you actually try to do it. Python’s datetime module Handles standard formats well, but real-world data is a mess. User input, scraped web data, and legacy systems often throw curveballs.
This article walks you through five practical functions to handle common date and time parsing tasks. By the end, you’ll understand how to create flexible parsers that can handle the messy date formats you see in projects.
# 1. Parsing Relative Time Strings
Social media apps, chat applications, and activity feeds display timestamps such as “5 minutes ago” or “2 days ago.” When you scrape or process this data, you need to convert these relative strings back to real datetime objects.
Here’s a function that handles common relative time expressions:
from datetime import datetime, timedelta
import re
def parse_relative_time(time_string, reference_time=None):
"""
Convert relative time strings to datetime objects.
Examples: "2 hours ago", "3 days ago", "1 week ago"
"""
if reference_time is None:
reference_time = datetime.now()
# Normalize the string
time_string = time_string.lower().strip()
# Pattern: number + time unit + "ago"
pattern = r'(d+)s*(second|minute|hour|day|week|month|year)s?s*ago'
match = re.match(pattern, time_string)
if not match:
raise ValueError(f"Cannot parse: {time_string}")
amount = int(match.group(1))
unit = match.group(2)
# Map units to timedelta kwargs
unit_mapping = {
'second': 'seconds',
'minute': 'minutes',
'hour': 'hours',
'day': 'days',
'week': 'weeks',
}
if unit in unit_mapping:
delta_kwargs = {unit_mapping(unit): amount}
return reference_time - timedelta(**delta_kwargs)
elif unit == 'month':
# Approximate: 30 days per month
return reference_time - timedelta(days=amount * 30)
elif unit == 'year':
# Approximate: 365 days per year
return reference_time - timedelta(days=amount * 365)
uses the function a Regular Expression (Regex) To extract number and time unit from string. Sample (d+) captures one or more points, and (second|minute|hour|day|week|month|year) Matches the time unit. s? The plural makes the ‘s’ optional, so “hour” and “hours” both work.
for those units timedelta directly supports (from seconds to weeks), we create a timedelta And subtract it from the reference time. For months and years, we estimate 30 and 365 days, respectively. It’s not perfect, but it’s good enough for most use cases.
reference_time The parameter lets you specify a different “now” for testing or when processing historical data.
Let’s test it:
result1 = parse_relative_time("2 hours ago")
result2 = parse_relative_time("3 days ago")
result3 = parse_relative_time("1 week ago")
print(f"2 hours ago: {result1}")
print(f"3 days ago: {result2}")
print(f"1 week ago: {result3}")
Output:
2 hours ago: 2026-01-06 12:09:34.584107
3 days ago: 2026-01-03 14:09:34.584504
1 week ago: 2025-12-30 14:09:34.584558
# 2. Extracting dates from natural language text
Sometimes you need to find dates buried in the text: “The meeting is scheduled for January 15, 2026” or “Please respond by March 3”. Instead of manually parsing the entire sentence, you just want to extract the date.
Here is a function that finds and extracts dates from natural language:
import re
from datetime import datetime
def extract_date_from_text(text, current_year=None):
"""
Extract dates from natural language text.
Handles formats like:
- "January 15th, 2024"
- "March 3rd"
- "Dec 25th, 2023"
"""
if current_year is None:
current_year = datetime.now().year
# Month names (full and abbreviated)
months = {
'january': 1, 'jan': 1,
'february': 2, 'feb': 2,
'march': 3, 'mar': 3,
'april': 4, 'apr': 4,
'may': 5,
'june': 6, 'jun': 6,
'july': 7, 'jul': 7,
'august': 8, 'aug': 8,
'september': 9, 'sep': 9, 'sept': 9,
'october': 10, 'oct': 10,
'november': 11, 'nov': 11,
'december': 12, 'dec': 12
}
# Pattern: Month Day(st/nd/rd/th), Year (year optional)
pattern = r'(january|jan|february|feb|march|mar|april|apr|may|june|jun|july|jul|august|aug|september|sep|sept|october|oct|november|nov|december|dec)s+(d{1,2})(?:st|nd|rd|th)?(?:,?s+(d{4}))?'
matches = re.findall(pattern, text.lower())
if not matches:
return None
# Take the first match
month_str, day_str, year_str = matches(0)
month = months(month_str)
day = int(day_str)
year = int(year_str) if year_str else current_year
return datetime(year, month, day)
The function creates a dictionary that maps month names (both full and abbreviated) to their numerical values. The regex pattern matches the month name followed by the day number with an optional ordinal suffix (st, nd, rd, th) and an optional year.
(?:...) The syntax creates a non-capturing group. This means that we match the pattern but do not save it separately. This is useful for optional parts like serial suffix and year.
When no year is provided, the function defaults to the current year. This is logical because if someone mentions “March 3” in January, they are generally referring to the following March, not the previous year.
Let’s test it with different text formats:
text1 = "The meeting is scheduled for January 15th, 2026 at 3pm"
text2 = "Please respond by March 3rd"
text3 = "Deadline: Dec 25th, 2026"
date1 = extract_date_from_text(text1)
date2 = extract_date_from_text(text2)
date3 = extract_date_from_text(text3)
print(f"From '{text1}': {date1}")
print(f"From '{text2}': {date2}")
print(f"From '{text3}': {date3}")
Output:
From 'The meeting is scheduled for January 15th, 2026 at 3pm': 2026-01-15 00:00:00
From 'Please respond by March 3rd': 2026-03-03 00:00:00
From 'Deadline: Dec 25th, 2026': 2026-12-25 00:00:00
# 3. Parsing Flexible Date Formats with Smart Detection
Real-world data comes in many formats. It is difficult to write separate parsers for each format. Instead, let’s create a function that automatically tries multiple formats.
Here’s a smart date parser that handles common formats:
from datetime import datetime
def parse_flexible_date(date_string):
"""
Parse dates in multiple common formats.
Tries various formats and returns the first match.
"""
date_string = date_string.strip()
# List of common date formats
formats = (
'%Y-%m-%d',
'%Y/%m/%d',
'%d-%m-%Y',
'%d/%m/%Y',
'%m/%d/%Y',
'%d.%m.%Y',
'%Y%m%d',
'%B %d, %Y',
'%b %d, %Y',
'%d %B %Y',
'%d %b %Y',
)
# Try each format
for fmt in formats:
try:
return datetime.strptime(date_string, fmt)
except ValueError:
continue
# If nothing worked, raise an error
raise ValueError(f"Unable to parse date: {date_string}")
This function uses the brute-force approach. It tries each format until one works. strptime The function increases a ValueError If the date string doesn’t match the format, we catch that exception and move on to the next format.
The order of the formats matters. We have kept the International Organization for Standardization (ISO) format (%Y-%m-%d) First, because it is most common in technical contexts. such as unclear format %d/%m/%Y And %m/%d/%Y To appear later. If you know you have frequent data usage, reorder the list to prioritize it.
Let’s test it with different date formats:
# Test different formats
dates = (
"2026-01-15",
"15/01/2026",
"01/15/2026",
"15.01.2026",
"20260115",
"January 15, 2026",
"15 Jan 2026"
)
for date_str in dates:
parsed = parse_flexible_date(date_str)
print(f"{date_str:20} -> {parsed}")
Output:
2026-01-15 -> 2026-01-15 00:00:00
15/01/2026 -> 2026-01-15 00:00:00
01/15/2026 -> 2026-01-15 00:00:00
15.01.2026 -> 2026-01-15 00:00:00
20260115 -> 2026-01-15 00:00:00
January 15, 2026 -> 2026-01-15 00:00:00
15 Jan 2026 -> 2026-01-15 00:00:00
This approach isn’t the most efficient, but it’s simple and handles most of the date formats you’ll encounter.
# 4. Parsing time period
Video players, workout trackers, and time-tracking apps display durations like “1 hour 30 minutes” or “2:45:30.” When parsing user input or scraped data, you need to convert these timedelta Items for calculation.
Here’s a function that parses common period formats:
from datetime import timedelta
import re
def parse_duration(duration_string):
"""
Parse duration strings into timedelta objects.
Handles formats like:
- "1h 30m 45s"
- "2:45:30" (H:M:S)
- "90 minutes"
- "1.5 hours"
"""
duration_string = duration_string.strip().lower()
# Try colon format first (H:M:S or M:S)
if ':' in duration_string:
parts = duration_string.split(':')
if len(parts) == 2:
# M:S format
minutes, seconds = map(int, parts)
return timedelta(minutes=minutes, seconds=seconds)
elif len(parts) == 3:
# H:M:S format
hours, minutes, seconds = map(int, parts)
return timedelta(hours=hours, minutes=minutes, seconds=seconds)
# Try unit-based format (1h 30m 45s)
total_seconds = 0
# Find hours
hours_match = re.search(r'(d+(?:.d+)?)s*h(?:ours?)?', duration_string)
if hours_match:
total_seconds += float(hours_match.group(1)) * 3600
# Find minutes
minutes_match = re.search(r'(d+(?:.d+)?)s*m(?:in(?:ute)?s?)?', duration_string)
if minutes_match:
total_seconds += float(minutes_match.group(1)) * 60
# Find seconds
seconds_match = re.search(r'(d+(?:.d+)?)s*s(?:ec(?:ond)?s?)?', duration_string)
if seconds_match:
total_seconds += float(seconds_match.group(1))
if total_seconds > 0:
return timedelta(seconds=total_seconds)
raise ValueError(f"Unable to parse duration: {duration_string}")
The function handles two main formats: colon-separated times and entity-based strings. For the colon format, we split at the colon and interpret the parts as hours, minutes, and seconds (or just minutes and seconds for a two-part period).
For the entity-based format, we use three different regex patterns to find hours, minutes, and seconds. Sample (d+(?:.d+)?) Matches integers or decimals, such as “1.5”. Sample s*h(?:ours?)? Matches “h”, “hour”, or “hour” with optional spaces.
Each matched value is converted to seconds and added to the total. This approach lets the function handle fractional periods like “45 seconds” or “2 hours 15 minutes” without requiring the presence of all units.
Let’s now test the function with different period formats:
durations = (
"1h 30m 45s",
"2:45:30",
"90 minutes",
"1.5 hours",
"45s",
"2h 15m"
)
for duration in durations:
parsed = parse_duration(duration)
print(f"{duration:15} -> {parsed}")
Output:
1h 30m 45s -> 1:30:45
2:45:30 -> 2:45:30
90 minutes -> 1:30:00
1.5 hours -> 1:30:00
45s -> 0:00:45
2h 15m -> 2:15:00
# 5. Parsing ISO week dates
Some systems use ISO week dates instead of regular calendar dates. ISO week date such as “2026-W03-2” means “3rd week of 2026, day 2 (Tuesday)”. This format is common in business contexts where planning occurs weekly.
Here’s a function to parse ISO week dates:
from datetime import datetime, timedelta
def parse_iso_week_date(iso_week_string):
"""
Parse ISO week date format: YYYY-Www-D
Example: "2024-W03-2" = Week 3 of 2024, Tuesday
ISO week numbering:
- Week 1 is the week with the first Thursday of the year
- Days are numbered 1 (Monday) through 7 (Sunday)
"""
# Parse the format: YYYY-Www-D
parts = iso_week_string.split('-')
if len(parts) != 3 or not parts(1).startswith('W'):
raise ValueError(f"Invalid ISO week format: {iso_week_string}")
year = int(parts(0))
week = int(parts(1)(1:)) # Remove 'W' prefix
day = int(parts(2))
if not (1 <= week <= 53):
raise ValueError(f"Week must be between 1 and 53: {week}")
if not (1 <= day <= 7):
raise ValueError(f"Day must be between 1 and 7: {day}")
# Find January 4th (always in week 1)
jan_4 = datetime(year, 1, 4)
# Find Monday of week 1
week_1_monday = jan_4 - timedelta(days=jan_4.weekday())
# Calculate the target date
target_date = week_1_monday + timedelta(weeks=week - 1, days=day - 1)
return target_date
ISO week dates follow specific rules. Week 1 is defined as the week containing the first Thursday of the year. This means the first week could start in December of the previous year.
The function uses a reliable approach: find January 4 (which is always in week 1), then find Monday of that week. From there, we add the appropriate number of weeks and days to reach the target date.
calculation jan_4.weekday() Returns 0 for Monday to 6 for Sunday. Subtracting this from January 4 gives us Monday of week 1. then we add (week - 1) week and (day - 1) days to receive the last date.
Let’s test it:
# Test ISO week dates
iso_dates = (
"2024-W01-1", # Week 1, Monday
"2024-W03-2", # Week 3, Tuesday
"2024-W10-5", # Week 10, Friday
)
for iso_date in iso_dates:
parsed = parse_iso_week_date(iso_date)
print(f"{iso_date} -> {parsed.strftime('%Y-%m-%d (%A)')}")
Output:
2024-W01-1 -> 2024-01-01 (Monday)
2024-W03-2 -> 2024-01-16 (Tuesday)
2024-W10-5 -> 2024-03-08 (Friday)
This format is less common than regular dates, but when it is encountered, it saves significant time by having the parser ready.
# wrapping up
Each function in this article uses regex patterns and datetime arithmetic to handle variations in formatting. These techniques transfer to other parsing challenges, as you can adapt these patterns for custom date formats in your projects.
Creating your own parser helps you understand how date parsing operates. When you run into a non-standard date format that the standard libraries can’t handle, you’ll be ready to write a custom solution.
These functions are especially useful for small scripts, prototypes, and learning projects where adding heavy external dependencies may be overkill. Happy coding!
Bala Priya C is a developer and technical writer from India. She likes to work in the fields of mathematics, programming, data science, and content creation. His areas of interest and expertise include DevOps, Data Science, and Natural Language Processing. She loves reading, writing, coding, and coffee! Currently, she is working on learning and sharing her knowledge with the developer community by writing tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.