Enterprise ETL Modernization & IoT
Business Impact
Reduced sync latency by 90% and stabilized ingestion across 64 biometric devices.
The Context
Before modernization, every department (HR, Finance, Academic) maintained its own individual PHP scripts to sync student and attendance data.
These scripts:
- Ran separately with no shared logic
- Updated records row-by-row
- Used inefficient patterns like
SELECT ... LIMIT 1 - Frequently timed out
- Produced inconsistent results across systems
Biometric attendance from 64 machines also lacked monitoring. When a device disconnected, teams only discovered it during payroll crunch time—creating last-minute panic and manual corrections.
The system was slow, fragile, and difficult to scale.
The Solution
I rebuilt the entire flow as a centralized Python ETL pipeline, replacing all department-specific scripts and standardizing how data moves across the organization.
1. Centralized Data Pipeline
All PHP scripts were consolidated into a single ETL service:
- Unified extraction rules
- Consistent transformation logic
- Centralized MySQL schema for student + attendance data
- Proper logging and observability
This eliminated conflicting updates and created a single source of truth.
2. IoT Ingestion for 64 Biometric Devices
Using pyzk, I redesigned the biometric ingestion engine:
- Concurrent polling instead of sequential
- Retry + backoff for unstable machines
- Per-device logging for daily visibility
- Automatic connection pooling
Result: Sync time reduced from hours to minutes.
3. Fixing the "N+1 Update" Pattern
The old system updated one row at a time.
I replaced this with a batch-oriented ETL using Python + Pandas:
- Extract: Pull raw logs from devices
- Transform: Clean, deduplicate, and apply payroll rules in memory
- Load: Bulk insert using
executemanywith transaction safety
This removed the N+1 bottleneck and improved stability.
4. Twice-Daily Device Health Notifications
To eliminate last-minute payroll panic, I implemented automated health reports:
- Sent every morning and afternoon
- Show which devices are online, slow, or disconnected
- Notifies Support, Network, and Payroll teams
- Issues get fixed before payroll day instead of during crisis mode
This solved one of the organization’s biggest pain points.
The Result
The organization shifted from manual firefighting to a reliable, automated, and monitored system.
- Consistent data across all departments
- Predictable ETL runs via Systemd timers
- Visibility into machine health
- Dramatically faster sync (90%+ faster)
- No more “rush hour panic” during payroll cycles
While the ETL is significantly faster now, the new architecture provides a strong foundation for future optimization and scaling.