Bulk URL Checker with uv: Validate Website Accessibility in Python
Learn how to build a powerful URL checker script using uv that validates multiple websites concurrently, detects broken links, and generates detailed reports.
Table of Contents
- What Makes This URL Checker Special?
- The Complete URL Checker Script
- How the Script Works
- Running the Script
- Advanced Usage Examples
- Understanding the Output
- Common Use Cases
- Troubleshooting Common Issues
- Script Customization Options
- Best Practices
- Integration with Other Tools
- What’s Next?
- Conclusion
Managing websites and ensuring all links are working properly is a crucial task for web developers, SEO specialists, and content managers. Broken links can hurt your SEO rankings, frustrate users, and damage your website’s credibility. With uv
, creating a powerful URL checker script is straightforward and efficient.
In this comprehensive guide, we’ll build a feature-rich URL validation tool that can check hundreds of URLs concurrently, categorize different types of errors, generate detailed reports, and save problematic URLs for further investigation. Whether you’re auditing a website, validating external links, or monitoring API endpoints, this script has you covered.
New to uv?
If you’re new to uv or want to learn how to set up full Python projects, start with our comprehensive guide Getting Started with uv: Setting Up Your Python Project in 2025 before diving into this advanced script.
What Makes This URL Checker Special?
Unlike basic URL validation tools, our script offers enterprise-level features:
- Concurrent Processing: Check multiple URLs simultaneously using ThreadPoolExecutor
- Smart URL Handling: Automatically adds HTTPS protocol to URLs without schemes
- Comprehensive Error Detection: Identifies timeouts, connection errors, and HTTP status codes
- Detailed Reporting: Provides response times, status codes, and error categorization
- File Input/Output: Read URLs from files and save problematic URLs for review
- Progress Tracking: Real-time progress indicators during bulk checking
- Flexible Configuration: Customizable timeout settings and worker thread counts
- Cross-Platform: Works seamlessly on macOS, Windows, and Linux
The Complete URL Checker Script
Let’s start with our comprehensive URL checker script. Save this as url_checker.py
:
#!/usr/bin/env -S uv run
# /// script
# dependencies = [
# "requests",
# ]
# ///
import requests
from urllib.parse import urlparse
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import sys
def check_url(url, timeout=10):
"""
Check if a URL is accessible and return status information.
Args:
url (str): The URL to check
timeout (int): Timeout in seconds (default: 10)
Returns:
dict: Contains url, status, error_type, and response_time
"""
# Add http:// if no scheme is provided
if not url.startswith(('http://', 'https://')):
url = 'https://' + url
start_time = time.time()
try:
response = requests.get(url, timeout=timeout, allow_redirects=True)
response_time = time.time() - start_time
return {
'url': url,
'status': 'OK',
'status_code': response.status_code,
'error_type': None,
'response_time': round(response_time, 2)
}
except requests.exceptions.Timeout:
return {
'url': url,
'status': 'TIMEOUT',
'status_code': None,
'error_type': 'Connection timeout',
'response_time': timeout
}
except requests.exceptions.ConnectionError as e:
return {
'url': url,
'status': 'CONNECTION_ERROR',
'status_code': None,
'error_type': f'Connection error: {str(e)[:100]}...',
'response_time': time.time() - start_time
}
except requests.exceptions.RequestException as e:
return {
'url': url,
'status': 'ERROR',
'status_code': None,
'error_type': f'Request error: {str(e)[:100]}...',
'response_time': time.time() - start_time
}
def read_urls_from_file(filename):
"""Read URLs from a text file, one per line."""
urls = []
try:
with open(filename, 'r', encoding='utf-8') as file:
for line in file:
url = line.strip()
if url and not url.startswith('#'): # Skip empty lines and comments
urls.append(url)
return urls
except FileNotFoundError:
print(f"Error: File '{filename}' not found.")
return []
except Exception as e:
print(f"Error reading file: {e}")
return []
def check_urls_batch(urls, timeout=10, max_workers=10):
"""
Check multiple URLs concurrently.
Args:
urls (list): List of URLs to check
timeout (int): Timeout per request in seconds
max_workers (int): Maximum number of concurrent threads
Returns:
list: List of results for each URL
"""
results = []
with ThreadPoolExecutor(max_workers=max_workers) as executor:
# Submit all tasks
future_to_url = {executor.submit(check_url, url, timeout): url for url in urls}
# Process completed tasks
for i, future in enumerate(as_completed(future_to_url), 1):
result = future.result()
results.append(result)
# Progress indicator
print(f"Checked {i}/{len(urls)} URLs: {result['url']} - {result['status']}")
return results
def main():
# Configuration
filename = input("Enter the filename containing URLs (or press Enter for 'urls.txt'): ").strip()
if not filename:
filename = 'urls.txt'
timeout = input("Enter timeout in seconds (or press Enter for 10): ").strip()
timeout = int(timeout) if timeout.isdigit() else 10
print(f"\nReading URLs from '{filename}'...")
urls = read_urls_from_file(filename)
if not urls:
print("No URLs found to check.")
return
print(f"Found {len(urls)} URLs to check.")
print(f"Using timeout: {timeout} seconds")
print("-" * 50)
# Check all URLs
results = check_urls_batch(urls, timeout=timeout)
# Separate problematic URLs
problematic_urls = [r for r in results if r['status'] != 'OK']
working_urls = [r for r in results if r['status'] == 'OK']
print("\n" + "=" * 50)
print("SUMMARY")
print("=" * 50)
print(f"Total URLs checked: {len(results)}")
print(f"Working URLs: {len(working_urls)}")
print(f"Problematic URLs: {len(problematic_urls)}")
if problematic_urls:
print("\n" + "=" * 50)
print("PROBLEMATIC URLs")
print("=" * 50)
# Group by error type
timeout_urls = [r for r in problematic_urls if r['status'] == 'TIMEOUT']
connection_error_urls = [r for r in problematic_urls if r['status'] == 'CONNECTION_ERROR']
other_error_urls = [r for r in problematic_urls if r['status'] == 'ERROR']
if timeout_urls:
print(f"\nTIMEOUT ERRORS ({len(timeout_urls)}):")
for result in timeout_urls:
print(f" - {result['url']}")
if connection_error_urls:
print(f"\nCONNECTION ERRORS ({len(connection_error_urls)}):")
for result in connection_error_urls:
print(f" - {result['url']}")
print(f" Error: {result['error_type']}")
if other_error_urls:
print(f"\nOTHER ERRORS ({len(other_error_urls)}):")
for result in other_error_urls:
print(f" - {result['url']}")
print(f" Error: {result['error_type']}")
# Save problematic URLs to file
with open('problematic_urls.txt', 'w', encoding='utf-8') as f:
f.write("# Problematic URLs found during check\n")
f.write(f"# Checked on: {time.strftime('%Y-%m-%d %H:%M:%S')}\n\n")
if timeout_urls:
f.write("# TIMEOUT ERRORS\n")
for result in timeout_urls:
f.write(f"{result['url']}\n")
f.write("\n")
if connection_error_urls:
f.write("# CONNECTION ERRORS\n")
for result in connection_error_urls:
f.write(f"{result['url']}\n")
f.write("\n")
if other_error_urls:
f.write("# OTHER ERRORS\n")
for result in other_error_urls:
f.write(f"{result['url']}\n")
print(f"\nProblematic URLs saved to 'problematic_urls.txt'")
if working_urls:
print(f"\nWORKING URLs ({len(working_urls)}):")
for result in working_urls:
print(f" ✓ {result['url']} (Status: {result['status_code']}, Time: {result['response_time']}s)")
if __name__ == "__main__":
print("URL Connection Checker")
print("=" * 30)
main()
How the Script Works
Our URL checker script is built around several key components that work together to provide comprehensive URL validation:
Core Functions Breakdown
Function | Purpose | Key Features |
---|---|---|
check_url() | Validates individual URLs | Handles timeouts, connection errors, measures response time |
read_urls_from_file() | Loads URLs from text files | Skips comments and empty lines, handles file errors |
check_urls_batch() | Processes multiple URLs concurrently | Uses ThreadPoolExecutor, provides progress tracking |
main() | Orchestrates the entire process | Interactive configuration, result categorization |
Error Detection Categories
The script categorizes different types of URL problems:
- OK: URL is accessible and returns a valid HTTP response
- TIMEOUT: URL takes longer than the specified timeout to respond
- CONNECTION_ERROR: Network-level issues (DNS resolution, connection refused)
- ERROR: Other HTTP-related errors (invalid URLs, server errors)
Running the Script
The beauty of using uv
is that you can run this script immediately without any setup. Save the script as url_checker.py
and follow these steps:
Prerequisites
No additional software installation is required! The script only uses the requests
library, which uv
will automatically install when you first run the script.
Basic Usage
1. Create a URL List File
First, create a text file with URLs to check. Save this as urls.txt
:
# Website URLs to check
https://www.google.com
https://www.github.com
https://www.stackoverflow.com
https://nonexistent-website-12345.com
https://httpstat.us/500
https://httpstat.us/404
# Add more URLs here
bitdoze.com
example.com
2. Run the Script
# Basic execution
uv run url_checker.py
The script will prompt you for:
- Filename: Press Enter to use
urls.txt
or specify a different file - Timeout: Press Enter for 10 seconds or specify a custom timeout
3. Example Output
URL Connection Checker
==============================
Enter the filename containing URLs (or press Enter for 'urls.txt'):
Enter timeout in seconds (or press Enter for 10):
Reading URLs from 'urls.txt'...
Found 8 URLs to check.
Using timeout: 10 seconds
--------------------------------------------------
Checked 1/8 URLs: https://www.google.com - OK
Checked 2/8 URLs: https://www.github.com - OK
Checked 3/8 URLs: https://www.stackoverflow.com - OK
Checked 4/8 URLs: https://nonexistent-website-12345.com - CONNECTION_ERROR
Checked 5/8 URLs: https://httpstat.us/500 - OK
Checked 6/8 URLs: https://httpstat.us/404 - OK
Checked 7/8 URLs: https://bitdoze.com - OK
Checked 8/8 URLs: https://example.com - OK
==================================================
SUMMARY
==================================================
Total URLs checked: 8
Working URLs: 7
Problematic URLs: 1
==================================================
PROBLEMATIC URLs
==================================================
CONNECTION ERRORS (1):
- https://nonexistent-website-12345.com
Error: Connection error: HTTPSConnectionPool(host='nonexistent-website-12345.com', port=443)...
Problematic URLs saved to 'problematic_urls.txt'
WORKING URLs (7):
✓ https://www.google.com (Status: 200, Time: 0.15s)
✓ https://www.github.com (Status: 200, Time: 0.23s)
✓ https://www.stackoverflow.com (Status: 200, Time: 0.18s)
✓ https://httpstat.us/500 (Status: 500, Time: 1.02s)
✓ https://httpstat.us/404 (Status: 404, Time: 0.98s)
✓ https://bitdoze.com (Status: 200, Time: 0.45s)
✓ https://example.com (Status: 200, Time: 0.32s)
Advanced Usage Examples
1. Custom Configuration
# Run with custom file and timeout
uv run url_checker.py
# When prompted:
# Filename: my_links.txt
# Timeout: 5
2. Checking Different Types of URLs
Create specialized URL lists for different purposes:
API Endpoints (api_endpoints.txt
):
https://api.github.com
https://jsonplaceholder.typicode.com/posts/1
https://httpbin.org/get
https://api.openweathermap.org/data/2.5/weather
Social Media Links (social_links.txt
):
https://twitter.com/username
https://linkedin.com/in/profile
https://facebook.com/page
https://instagram.com/account
Internal Website Links (internal_links.txt
):
https://yourwebsite.com/about
https://yourwebsite.com/contact
https://yourwebsite.com/blog
https://yourwebsite.com/products
3. Performance Optimization
For large URL lists, you can modify the script to use more concurrent workers:
# In the check_urls_batch function call, increase max_workers
results = check_urls_batch(urls, timeout=timeout, max_workers=20)
Performance Guidelines:
URL Count | Recommended Workers | Expected Time |
---|---|---|
1-50 | 5-10 | 10-30 seconds |
51-200 | 10-15 | 30-60 seconds |
201-500 | 15-25 | 1-3 minutes |
500+ | 20-30 | 3+ minutes |
Understanding the Output
Status Codes and Their Meanings
Status Code | Meaning | Action Required |
---|---|---|
200 | OK - Page loads successfully | None |
301/302 | Redirect - Page moved | Update URL if permanent |
404 | Not Found - Page doesn’t exist | Remove or fix URL |
500 | Server Error - Website issue | Contact website owner |
Timeout | No response within time limit | Check URL or increase timeout |
Connection Error | Network/DNS issues | Verify URL spelling |
Generated Files
The script creates a problematic_urls.txt
file containing:
# Problematic URLs found during check
# Checked on: 2025-01-15 14:30:25
# TIMEOUT ERRORS
https://slow-website.com
# CONNECTION ERRORS
https://nonexistent-site.com
https://typo-in-url.co
# OTHER ERRORS
https://broken-ssl-site.com
Common Use Cases
1. Website Audit
Use the script to audit your website’s external links:
# Extract all external links from your website first
# Then check them with the script
uv run url_checker.py
2. SEO Link Validation
Validate backlinks and external references:
# backlinks.txt
https://partner-site1.com/link-to-us
https://directory.com/our-listing
https://blog.com/article-mentioning-us
3. API Endpoint Monitoring
Monitor API endpoints for availability:
# api_health.txt
https://api.yourservice.com/health
https://api.yourservice.com/status
https://api.yourservice.com/version
4. Competitor Analysis
Check competitor websites for availability:
# competitors.txt
https://competitor1.com
https://competitor2.com
https://competitor3.com
Troubleshooting Common Issues
Issue 1: “File not found” Error
Problem: Script can’t find the URL file
Solution:
# Make sure the file exists in the same directory
ls -la urls.txt
# Or use absolute path
/full/path/to/urls.txt
Issue 2: Too Many Timeouts
Problem: Many URLs showing timeout errors
Solutions:
- Increase timeout value (try 20-30 seconds)
- Check your internet connection
- Reduce concurrent workers to avoid overwhelming your network
Issue 3: SSL Certificate Errors
Problem: SSL-related connection errors
Solution: The script uses requests
with default SSL verification. For testing purposes, you could modify the script to handle SSL issues, but this is not recommended for production use.
Script Customization Options
1. Add User-Agent Header
Some websites block requests without proper user agents:
# In the check_url function, modify the requests.get call:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
response = requests.get(url, timeout=timeout, allow_redirects=True, headers=headers)
2. Add Response Size Tracking
# Add to the return dictionary in check_url:
'content_length': len(response.content) if response else 0
3. Export Results to CSV
import csv
# Add this function to save results as CSV
def save_to_csv(results, filename='url_check_results.csv'):
with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = ['url', 'status', 'status_code', 'response_time', 'error_type']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for result in results:
writer.writerow(result)
Best Practices
1. Respectful Checking
- Use reasonable timeouts: Don’t set timeouts too low (minimum 5 seconds)
- Limit concurrent requests: Don’t overwhelm servers with too many simultaneous requests
- Add delays for large batches: Consider adding small delays between batches
2. File Organization
- Use descriptive filenames:
social_media_links.txt
,api_endpoints.txt
- Add comments: Use
#
to add context to your URL lists - Regular updates: Keep your URL lists current
3. Monitoring and Automation
- Schedule regular checks: Use cron jobs or task schedulers
- Set up alerts: Monitor the
problematic_urls.txt
file - Track trends: Keep historical data of URL health
Integration with Other Tools
1. Combine with Web Scraping
# Extract URLs from a webpage first
import requests
from bs4 import BeautifulSoup
def extract_links_from_page(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
links = [a.get('href') for a in soup.find_all('a', href=True)]
return links
2. Integration with CI/CD
# GitHub Actions example
name: URL Health Check
on:
schedule:
- cron: "0 9 * * 1" # Every Monday at 9 AM
jobs:
url-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- name: Check URLs
run: uv run url_checker.py
What’s Next?
Now that you’ve mastered URL checking with uv, you might want to explore more automation scripts:
- Web Scraping: Build scrapers to extract URLs automatically
- API Monitoring: Create scripts to monitor API health and performance
- SEO Tools: Develop tools for SEO analysis and link building
- Website Monitoring: Set up comprehensive website health monitoring
For more advanced Python automation with uv, check out our other guides:
- Getting Started with uv: Setting Up Your Python Project in 2025
- Text-to-Speech with uv: Create Audio from Text in Python
Conclusion
The URL checker script demonstrates the power and simplicity of using uv
for Python automation tasks. With just a few lines of code and zero configuration, you can validate hundreds of URLs, detect broken links, and generate comprehensive reports.
Key benefits of this approach:
- Zero Setup: No virtual environments or dependency management needed
- High Performance: Concurrent processing for fast results
- Comprehensive Reporting: Detailed error categorization and timing information
- Flexible Input: Support for file-based URL lists with comments
- Actionable Output: Problematic URLs saved for easy follow-up
Whether you’re maintaining a website, conducting SEO audits, or monitoring API endpoints, this script provides a solid foundation that you can customize and extend for your specific needs. The combination of uv
’s simplicity and Python’s powerful libraries makes automation tasks like this both accessible and powerful.
Ready to start checking your URLs? Save the script, create your URL list, and experience the efficiency of automated link validation!
Related Posts

How To Install, Upgrade Python and Run VENV on MAC
Learn how install, upgrade Python on MAC and use VENV for your projects with this easy steps.

Deploying a Python uv Project with Git and Railpack in Dokploy
See how you can host your project easily with Dokploy, Railpack, and uv
