Senior Enterprise Platform Reliability Engineer
Infrastructure, ERP, Database & Security Operations
Position Summary
We are seeking an experienced Senior Enterprise Platform Reliability Engineer to lead the reliability, scalability, security, and operational continuity of enterprise infrastructure and ERP environments.
This position combines responsibilities across:
Enterprise Infrastructure Architecture
Site Reliability Engineering (SRE)
Database Reliability Engineering (DBRE)
ERP Platform Engineering
Security Engineering & Compliance
Operational Leadership
You will be responsible for maintaining and evolving mission-critical production systems that support multi-region ERP operations, PostgreSQL database environments, Linux infrastructure, virtualization platforms, security monitoring, and enterprise observability.
The ideal candidate is capable of operating independently within complex production environments, troubleshooting high-impact incidents, improving operational maturity, and driving long-term infrastructure reliability.
This is not a traditional DevOps role.
Key Responsibilities
Enterprise Infrastructure Architecture
Design, maintain, and improve enterprise-grade infrastructure environments, including:
- Multi-region Linux infrastructure environments
- High-availability PostgreSQL clusters using Patroni, etcd, and Keepalived
- Inter-datacenter networking and secure VPN architecture using WireGuard
- Proxmox virtualization infrastructure and workload management
- OPNsense firewall, routing, reverse proxy, and edge security architecture
- Enterprise storage, backup, and disaster recovery systems following 3-2-1 backup strategies
- Infrastructure redundancy and failover planning
- Production workload scaling and operational continuity planning
ERP Platform Engineering
Manage and support enterprise Odoo ERP environments, including:
- Odoo v9, v16, and v18 production environments
- Multi-region ERP deployments and infrastructure coordination
- Custom module integration support
- Worker tuning, memory analysis, and platform scaling
- High-availability ERP failover environments
- Production recovery and restoration workflows
- Neutralized production restores for development environments
- Release troubleshooting and production issue resolution
- ERP operational performance optimization
Database Reliability Engineering (DBRE)
Responsible for maintaining the stability, performance, recoverability, and availability of PostgreSQL environments supporting mission-critical business systems.
Responsibilities include:
- PostgreSQL performance tuning and workload optimization
- SQL execution plan analysis and query troubleshooting
- Locking, contention, and replication analysis
- Autovacuum and database maintenance strategy management
- Cache-hit ratio and buffer performance analysis
- High-availability PostgreSQL architecture and failover management
- Backup validation and recovery testing
- Disaster recovery validation and restoration procedures
- Monitoring long-running queries and operational bottlenecks
- Database observability using Prometheus, Grafana, exporters, and log aggregation systems
- Database operational support during releases, migrations, and upgrades
Site Reliability Engineering (SRE)
Ensure the reliability, availability, stability, and operational continuity of enterprise production systems.
Responsibilities include:
- Maintaining high system uptime across multi-region environments
- Designing and managing enterprise observability platforms
- Centralized monitoring, metrics collection, logging, and alerting
- Proactive alerting strategy development
- Production incident troubleshooting and operational response
- Root-cause analysis and operational recovery coordination
- High-availability infrastructure design and failover validation
- Backup validation and disaster recovery readiness testing
- Infrastructure and platform health monitoring
- Operational documentation and reliability process development
- Cross-functional collaboration between infrastructure, development, database, and operational teams
Security Engineering & Compliance
Design and maintain enterprise security architecture and operational controls across infrastructure, databases, networking, and ERP systems.
Responsibilities include:
- SIEM architecture and centralized security monitoring
- Wazuh and Security Onion deployment and management
- IDS/IPS implementation and network security monitoring
- Firewall segmentation and network security policy design
- Secure VPN architecture and encrypted inter-site connectivity
- Security observability and enterprise logging
- Vulnerability identification and operational risk analysis
- Security incident investigation and forensic support
- Business Continuity Planning (BCP) and Disaster Recovery (DR) strategy development
- ISO 27001-aligned operational security practices
- PIPEDA-aware operational controls and data protection processes
- Security documentation, audit readiness, and compliance support
Operational Leadership
Provide operational leadership and technical governance across infrastructure and production operations.
Responsibilities include:
- Developing and maintaining Standard Operating Procedures (SOPs)
- Defining operational standards and governance processes
- Coordinating production incident escalation and response
- Supporting deployment governance and change management practices
- Evaluating operational risks associated with infrastructure and software changes
- Supporting development teams with deployment and infrastructure troubleshooting
- Creating operational workflows and recovery procedures
- Improving infrastructure maturity, standardization, and reliability practices
- Supporting management with infrastructure planning and operational readiness initiatives
- Driving long-term infrastructure sustainability and operational resilience
Required Qualifications
- 7+ years of experience managing enterprise Linux infrastructure
- Advanced PostgreSQL administration and performance tuning experience
- Strong understanding of high-availability architecture and failover systems
- Experience managing enterprise virtualization platforms such as Proxmox
- Experience with observability platforms including Grafana, Prometheus, Loki, and exporters
- Strong networking knowledge including VPNs, routing, firewalls, and reverse proxies
- Experience supporting production ERP environments
- Strong incident response and troubleshooting abilities
- Experience designing backup and disaster recovery strategies
- Strong understanding of operational security and infrastructure hardening
- Ability to independently manage production-critical systems
- Strong documentation and operational process development skills
Preferred Qualifications
- Experience supporting Odoo ERP environments
- Experience with Patroni, etcd, and PostgreSQL HA clustering
- Experience with Wazuh, Security Onion, or SIEM platforms
- Familiarity with ISO 27001 operational practices
- Experience managing multi-region infrastructure deployments
- Experience working within hybrid cloud and on-premise environments
- Experience leading operational improvement initiatives
What Success Looks Like
Successful candidates will:
- Operate comfortably within complex enterprise production environments
- Take ownership of infrastructure reliability and operational continuity
- Improve platform stability and observability over time
- Reduce operational risk through automation, documentation, and standardization
- Troubleshoot high-impact production issues efficiently and methodically
- Balance performance, reliability, scalability, and security considerations
- Communicate clearly with technical and non-technical stakeholders
Environment & Technology Stack
Infrastructure
- Ubuntu Linux
- Proxmox
- OPNsense
- WireGuard
- Enterprise storage and backup systems
Databases
- PostgreSQL
- Patroni
- etcd
- HA clustering and replication environments
ERP / Application Platforms
- Odoo ERP
- Custom module environments
- Multi-region application deployments
Monitoring & Observability
- Grafana
- Prometheus
- Loki
- Exporters
- SIEM platforms
Security
- Wazuh
- Security Onion
- IDS/IPS systems
- Reverse proxy architecture
Compensation & Benefits
Compensation will be competitive and aligned with experience, technical capability, and operational leadership ability.
Additional benefits may include:
- Extended health and dental coverage
- Paid vacation
- Professional development support
- Flexible work arrangements
- Access to enterprise-grade infrastructure environments
- Long-term career growth opportunities
Pay: $80,000.00-$90,000.00 per year
Benefits:
- Company events
- Dental care
- Employee assistance program
- Extended health care
- Flexible schedule
- Life insurance
- On-site parking
Work Location: In person