2025Case Studies

Enterprise ETL Modernization & IoT

Business Impact

Reduced sync latency by 90% and stabilized ingestion across 64 biometric devices.

PythonPandasPyZK (IoT)MySQLSystemd

The Context

Before modernization, every department (HR, Finance, Academic) maintained its own individual PHP scripts to sync student and attendance data.

These scripts:

  • Ran separately with no shared logic
  • Updated records row-by-row
  • Used inefficient patterns like SELECT ... LIMIT 1
  • Frequently timed out
  • Produced inconsistent results across systems

Biometric attendance from 64 machines also lacked monitoring. When a device disconnected, teams only discovered it during payroll crunch time—creating last-minute panic and manual corrections.

The system was slow, fragile, and difficult to scale.


The Solution

I rebuilt the entire flow as a centralized Python ETL pipeline, replacing all department-specific scripts and standardizing how data moves across the organization.

1. Centralized Data Pipeline

All PHP scripts were consolidated into a single ETL service:

  • Unified extraction rules
  • Consistent transformation logic
  • Centralized MySQL schema for student + attendance data
  • Proper logging and observability

This eliminated conflicting updates and created a single source of truth.

2. IoT Ingestion for 64 Biometric Devices

Using pyzk, I redesigned the biometric ingestion engine:

  • Concurrent polling instead of sequential
  • Retry + backoff for unstable machines
  • Per-device logging for daily visibility
  • Automatic connection pooling

Result: Sync time reduced from hours to minutes.

3. Fixing the "N+1 Update" Pattern

The old system updated one row at a time.
I replaced this with a batch-oriented ETL using Python + Pandas:

  • Extract: Pull raw logs from devices
  • Transform: Clean, deduplicate, and apply payroll rules in memory
  • Load: Bulk insert using executemany with transaction safety

This removed the N+1 bottleneck and improved stability.

4. Twice-Daily Device Health Notifications

To eliminate last-minute payroll panic, I implemented automated health reports:

  • Sent every morning and afternoon
  • Show which devices are online, slow, or disconnected
  • Notifies Support, Network, and Payroll teams
  • Issues get fixed before payroll day instead of during crisis mode

This solved one of the organization’s biggest pain points.


The Result

The organization shifted from manual firefighting to a reliable, automated, and monitored system.

  • Consistent data across all departments
  • Predictable ETL runs via Systemd timers
  • Visibility into machine health
  • Dramatically faster sync (90%+ faster)
  • No more “rush hour panic” during payroll cycles

While the ETL is significantly faster now, the new architecture provides a strong foundation for future optimization and scaling.