Home / Projects / RPA Pension Payment Automation

RPA Pension Payment Automation System

Insurance Sector

RPA Python Selenium Web Scraping SharePoint PDF Processing Automation

Designed and implemented a large-scale Robotic Process Automation (RPA) system for automated pension payment processing in the insurance sector. Built with Python and Selenium, the system processes 500,000+ payment documents monthly, performing automated web scraping with intelligent batch splitting, PDF extraction, summarization, and SharePoint integration. This automation saves 50,000 hours annually and significantly reduces regulatory complaints from Chile's Super Intendency.

500K+

PDFs Processed Monthly

50K

Hours Saved Yearly

100%

Automated Workflow

Zero

IP Blocking Events

Business Impact

50,000 hours saved annually - Massive reduction in manual processing time
Reduced Super Intendency complaints - Improved compliance and customer satisfaction
Zero downtime - Robust anti-blocking measures ensure uninterrupted operation
Enterprise-grade integration - Seamless SharePoint integration for document management
Scalable architecture - Handles peak loads without performance degradation

System Architecture

flowchart TD A[Monthly Trigger
Scheduled Job] --> B[Download Lists
Nominias Paid Last Month] B --> C[Intelligent Batch Splitting
Anti-Blocking Algorithm] C --> D[Web Scraping with Selenium
Batch 1-N] D --> E[Download PDFs
Payment Documents] E --> F[Queue Management
Rate Limiting] F --> G[PDF Processing Pipeline] G --> H[Text Extraction] H --> I[Data Extraction
Structured Information] I --> J[Summary Generation
Automated Reports] J --> K[SharePoint Integration
Document Upload] K --> L[Completion Notification
Status Logging] F --> M{Batch Complete?} M -->|No| D M -->|Yes| G

Workflow Process

Monthly Trigger: Automated job initiation downloads previous month's payment lists
Intelligent Batching: System divides large datasets into optimal batches using anti-blocking algorithm
Web Scraping & Downloads: Selenium retrieves payment documents with sophisticated rate limiting to prevent blocking
PDF Processing: Extracts text and structures payment information from 500,000+ documents monthly
Summarization: Generates automated reports with totals, trends, and regulatory compliance data
SharePoint Integration: Uploads organized documents with metadata tagging for enterprise searchability

Key Technical Challenges

Large-Scale Web Scraping Without Blocking

Processing 500,000+ documents risks IP blocking. Solution: Intelligent batch splitting with dynamic sizing, exponential backoff, session rotation, and adaptive speed monitoring prevents detection while maintaining high throughput.

PDF Processing at Scale

Half a million PDFs monthly with varying quality. Solution: Multi-format parser with OCR fallback, parallel processing, automatic retry logic, and quality validation checkpoints handle corrupted, scanned, and text-based documents.

Enterprise SharePoint Integration

Uploading 500,000+ documents requires robust organization. Solution: Dynamic folder creation by payment period/Convenio/nomina with metadata tagging ensures searchable, well-organized document management.

Reliability Achievements

Zero blocking events: System has operated without IP blocking or account suspension since deployment
99.9% uptime: Reliable monthly processing with self-healing from transient issues
Graceful degradation: Continues processing even when services experience temporary problems

Data Extraction & Summarization

The system extracts actionable information and generates summaries that provide business intelligence and support regulatory compliance with Chile's Super Intendency.

Structured Data Extraction: Key fields (amounts, dates, payment types, customer IDs) automatically extracted from PDFs
Validation & Quality Control: Business logic validation ensures extracted data meets quality standards
Automated Summaries: Monthly reports with totals, averages, trends, and regulatory compliance data
Dashboard Integration: Real-time monitoring exports data to analytics dashboards for decision support

Key Learnings

Building this large-scale RPA system provided valuable insights into enterprise automation:

Scale Changes Everything: Solutions working for 1,000 documents fail at 500,000. Architecture must be designed for scale from day one with parallel processing, efficient memory management, and robust error handling.
Anti-Blocking is Critical: At enterprise scale, sophisticated detection mechanisms require more than simple rate limiting. Behavioral simulation, adaptive algorithms, and continuous monitoring are essential to maintain high throughput.
PDF Variability Demands Robustness: Handling text-based PDFs, scanned images, corrupted documents, and mixed layouts requires multi-format parsing with OCR fallback. Quality validation checkpoints are non-negotiable.
Error Recovery Strategy: At this volume, errors are inevitable. The system must distinguish between transient (retryable), fatal (manual intervention), and expected (ignore) errors with comprehensive logging and alerting.
Automation Isn't "Set It and Forget It": The 50,000-hour annual savings justified development investment, but ongoing maintenance costs must be factored into total ROI. Monitoring, updates, and occasional refactoring are essential.

Technology Stack

Component	Technology	Purpose
Programming Language	Python 3.x	Core automation logic
Web Automation	Selenium WebDriver	Browser automation and web scraping
PDF Processing	PyPDF2, pdfplumber	Text extraction from PDFs
OCR	Tesseract	Image-based PDF text extraction
SharePoint Integration	Office365-REST-Python-Client	Enterprise document management
Authentication	OAuth 2.0 / MSAL	Secure access to SharePoint APIs
Job Scheduling	.bat	Monthly automated execution