01Senior DevOps / Platform Engineer·11+ yrs production

Loay
Ali

I build, harden, and operate production cloud platforms the way they're supposed to work, quiet, observable, recoverable, and cheap enough to defend in a FinOps review.

Based
Munich, Germany · CET
Working with
Linux · AWS · OCI · Kubernetes · Terraform
Portrait of Loay Ali, Senior DevOps / Platform Engineer.
Open to work
11+
Years operating
production infrastructure
2M+
Domains under management
at a country-code TLD registry
500K+
Users on a VPN platform
I built solo from zero
99.9%
Sustained uptime SLA
across multi-year tenures
02 — about

A working biography, briefly.

// Eleven years of being the engineer responsible when the pager goes off, the certificate expires, the cluster autoscaler misbehaves, or the CFO asks why the AWS bill spiked.

Senior infrastructure engineer with a craftsman's bias toward simple, observable, recoverable systems.

I'm Loay Ali, a Senior DevOps / Platform Engineer based in Munich. I've spent the last eleven years operating production Linux at every scale that matters: from a regulated ccTLD registry to a global Anycast DNS network across 60+ POPs, from enterprise email on AWS EKS to a VPN platform I built from zero to 500K+ users as the sole engineer.

My background is unusual in one specific way. I sit comfortably between the classic stack (BIND, BGP, iptables, FreeBSD, VMware vSphere, hardened bare metal) and the modern stack (Kubernetes on EKS/OKE, Terraform, Helm, GitOps, ELK, Prometheus). That bridge matters when you're migrating an old platform to a new cloud without dropping a query.

I work like a craftsman, not a checklist. I write infrastructure-as-code by default, run quarterly FinOps reviews that have delivered 20–25% cost reductions, document everything so the next person can sleep, and treat security as a feature, not an afterthought. I've held the pager 24/7 as the only engineer for an ISP serving 35% of a country's customers — that experience teaches you to design things that don't break.

Currently open to senior DevOps, Platform, SRE, and Cloud Architect roles — remote, hybrid, or on-site. Munich-based with full EU work authorization.

Strongest at
Operating things that must not fail
Regulated environments, sole-engineer ownership, 24/7 on-call without drama.
Building toward
Platform Engineering & MLOps
Self-service platforms, golden paths, GenAI workloads on Kubernetes.
Languages spoken
English · Arabic · German
Fluent · Native · A1 (actively learning)
Operates from
Munich, Germany
CET timezone · full EU work authorization · open to relocation
03 — how I work

Six principles, learned the hard way.

// Each one of these cost me at least one outage to learn. They're cheaper to inherit than to discover.

01 / DESIGN

Boring infrastructure ships on Fridays.

If a system is exciting, something's wrong. Predictable beats clever. Recoverable beats fast. Documented beats novel. The best platform is the one nobody talks about because it just works.

02 / SECURITY

Hardening is day-one, not month-six.

Least-privilege IAM, centralized secrets (Vault, KMS, SSM), TLS/PKI automation, hardened images, and audit-readiness all live in the original Terraform module — not in a panicked retro after the first compliance review.

03 / OBSERVABILITY

If you can't see it, you can't operate it.

Prometheus, Grafana, ELK, kube-prometheus-stack, structured logging, SLO-driven alerting. Dashboards before deployment. Alerts that mean something. Runbooks linked from every alert.

04 / COST

A platform defensible in a FinOps review.

Right-sized nodes. Reserved instances where stable. Spot where tolerable. Quarterly reviews with measurable savings (20-25% delivered). Cost is an architectural concern, not someone else's problem.

05 / OPERATIONS

ITIL without the bureaucracy.

Structured incident response. Post-mortems that produce action items, not blame. Change windows planned with stakeholders. BCP/DR drills that are actually rehearsed. The boring parts of ops, treated with respect.

06 / HANDOFF

If only I can run it, I haven't finished.

Architecture docs, deployment runbooks, scaling guides, DR procedures, on-call rotations. The job ends when another engineer can operate the platform independently. Not before.

04 — selected work

A handful of platforms that didn't fail.

// Build · Operate · Hand off. Each entry below was owned end-to-end, from architecture through production go-live and team enablement.

CASE / 01
Recent
Dec 2025 — Apr 2026 Sole infrastructure engineer
Reconhece · Brazil (Remote)

[01]Reconhece, a complete platform built from zero on OCI.

Designed, built, and launched a complete cloud platform from scratch for a SaaS startup running NestJS microservices and a NextJS frontend. Owned every decision from VCN topology through go-live, then handed it off to the in-house engineering team to operate independently.

100%
Greenfield IaC
04
CI/CD pipelines
0
Manual deploys
25%+
Cost savings at launch
  • Cloud foundation — modular Terraform: VCN, public/private subnets, NAT/IGW/Service gateways, NSGs, Bastion, managed PostgreSQL, Redis Cache, OCI Vault (KMS/AES-256).
  • Production Kubernetes (OKE) with Cluster Autoscaler + HPA, RBAC, resource quotas, topology spread, pod anti-affinity.
  • CI/CD on Azure Pipelines — four multi-stage flows: build, registry push, rolling deploy, health checks, auto-rollback, blue-green strategy.
  • TLS automation with Nginx Ingress + Cert-Manager + Let's Encrypt across all endpoints.
  • Observability — kube-prometheus-stack + ELK across all microservices; SLA alerting rules.
  • Handoff — runbooks, scaling guides, DR procedures so the in-house team owns it now.
Terraform OCI / OKE Kubernetes Helm Azure Pipelines Nginx Ingress Cert-Manager Prometheus Grafana ELK PostgreSQL Redis RabbitMQ Vault / KMS
CASE / 02
Enterprise email
2024 — 2025 Infrastructure lead
Dynamic IT · Dubai

[02]Axigen on AWS EKS, enterprise email at production scale.

Email-on-Kubernetes is rare. Doing it under enterprise hardening discipline is rarer. Built a production EKS cluster for the Axigen enterprise email platform from scratch with Terraform, with three dedicated node groups, NLB load balancers, and full email-authentication and observability baked in.

03
Dedicated node groups
SMTP + WebMail
HPA-scaled frontends
SSM
Secrets management
IRSA
Identity hardening
  • VPC + node groups — system / frontend / backend with taints, tolerations, and topology-spread.
  • Storage + identity — EBS CSI persistent storage, IRSA for least-privilege IAM, Cert-Manager wildcard TLS.
  • Email authentication — DKIM, SPF, DMARC via Route53, validated end-to-end.
  • Observability — Prometheus / Grafana with mail-flow dashboards, SLO-driven alerting.
  • Hardening — SSM-backed secrets, audit-ready posture, regulated-environment runbooks.
AWS EKS Terraform VPC IRSA EBS CSI Cert-Manager NLB Route53 DKIM / SPF / DMARC SSM Prometheus Grafana
CASE / 03
Global DNS
2022 — 2025 Operator + migration lead
Dynamic IT · Dubai

[03]CDNS, Anycast DNS across 60+ POPs migrated without dropping a query.

Operated a global Anycast DNS network across 60+ locations on five continents using BGP routing and AXFR distribution. Then executed a full operating-system migration (Slackware 11 to modern Linux) across every node in production with zero service disruption. The end users never knew.

60+
Anycast POPs
05
Continents covered
2006 → 2024
OS migration span
0
Lost queries
  • BGP + Anycast routing for resilient, low-latency DNS resolution at the edge.
  • AXFR-based zone distribution with strict consistency monitoring.
  • Migration playbook — withdraw BGP, drain queries, reinstall OS, restore config from version control, re-announce, monitor. Per POP. Repeated 60+ times.
  • Bridged classic + modern — BIND configs translated from Slackware 11 era to the modern toolchain without breaking the abstraction.
BGP Anycast DNS BIND AXFR FreeBSD Slackware → modern Linux iptables Nagios
CASE / 04
Regulated
2022 — 2025 Registry operator
Dynamic IT · Dubai

[04].TM ccTLD Registry, country-code TLD run under strict compliance.

Operated and upgraded the full registry stack for the .TM country-code TLD: EPP, WHOIS, RDAP. A hardened, regulated enterprise environment managing roughly 2 million domains under strict compliance, audit, and uptime requirements where change windows are scheduled and post-mortems are formal.

2M+
Domains under management
EPP · WHOIS · RDAP
Registry protocols
24/7
Regulated compliance posture
  • Full stack ownership — EPP server, WHOIS/RDAP, supporting databases, monitoring, backups.
  • Hardened posture — patch management, least-privilege IAM, DDoS resilience, structured incident response.
  • Audit-ready — documented runbooks, change-control discipline, compliance-grade logging.
  • Stakeholder coordination — change windows aligned with registrar partners and oversight bodies.
EPP WHOIS / RDAP BIND PostgreSQL Linux hardening Veeam DR ITIL change mgmt
CASE / 05
Multi-tenant
2022 — 2025 Architect
Dynamic IT · Dubai

[05]Multi-tenant SaaS on AWS, 100K+ daily users, zero downtime.

Architected a multi-tenant SaaS platform on AWS using Terraform, EKS, Helm, Nginx Ingress, ALB + ACM wildcard TLS, and Route53. Achieved zero-downtime blue-green deployments across multiple environments. Owned tenant isolation, scaling policy, and the operational discipline that kept it boring.

100K+
Daily users
Blue / Green
Deployment model
20-25%
FinOps savings delivered
  • VPC-native networking with ALB + ACM wildcard TLS and Route53 traffic management.
  • Containerized services on EKS via Helm, with HPA, Cluster Autoscaler, and PodDisruptionBudgets.
  • CI/CD through GitHub Actions and GitLab CI, with structured rollback paths.
  • Quarterly FinOps reviews delivering 20-25% infrastructure cost reduction through right-sizing and reserved instances.
AWS EKS Terraform Helm ALB + ACM Route53 Nginx Ingress GitHub Actions GitLab CI RDS FinOps
CASE / 06
Sole engineer
2016 — 2020 Sole infrastructure owner
I2VPN · Berlin (Remote)

[06]I2VPN, from zero to 500K+ global users, alone.

Built and operated the entire production stack for a startup VPN platform from scratch as the only infrastructure engineer. Scaled it to 500,000+ active global users with 99.9% uptime and zero-downtime architecture. Wrote my own DDoS mitigation when the off-the-shelf options didn't fit.

200K+
Active users
12
Node HA MySQL cluster
90%
DDoS attack reduction
24/7
On-call, alone
  • Multi-protocol VPN — OpenVPN, WireGuard, SOCKS5, V2Ray, Squid.
  • 12-node HA MySQL cluster — 2 controllers + 4 storage + 6 SQL nodes for load-balanced traffic.
  • Custom DDoS mitigation — iptables scripts I authored personally, cutting attack impact by 90%.
  • Privacy-first DNS infrastructure eliminating leaks for 500K+ users.
  • Published apps live on App Store + Google Play (i2vpn-secure-vpn-proxy).
OpenVPN WireGuard SOCKS5 / V2Ray / Squid MySQL HA Docker Ansible iptables Nagios BIND
CASE / 07
High stakes
2013 — 2016 System administrator
Tatweer Co · Damascus

[07]Tatweer ISP, data-center migration during war.

Led a zero-downtime live data-center migration for an ISP serving roughly 35% of private internet customers in Syria, using replication over a four-day cutover. A mission-critical, 24/7 environment where the consequences of a misstep were national, not just operational.

  • VMware vSphere/vCenter environments — dedicated servers, VPS fleets, virtualized infrastructure.
  • Veeam-based daily backups with under 1-hour RTO and documented DR procedures.
  • Replication-based cutover — 4 days of dual-site running, final flip with zero downtime.
  • Zero security breaches across the tenure despite the operating environment.
VMware vSphere vCenter Veeam Linux / Windows Server Proxy infrastructure
05 — career log

A timeline, most recent first.

// Five roles. Three continents. One discipline. The pattern: take ownership, ship infrastructure that survives, document it so the next person can operate it.

2025-12 → 2026-045 months · freelance
Reconhece / Brazil · Remote
Freelance Systems / Platform Engineer

Sole infrastructure engineer engaged to design, build, and launch a complete cloud platform from scratch for a SaaS startup (NestJS microservices, NextJS frontend). Owned full lifecycle through production go-live and team handoff.

2022-02 → 2025-113 yrs 10 mo
Dynamic IT Consultant / Dubai · UAE
Senior Systems / DevOps Engineer — Infrastructure Lead

Owned full infrastructure lifecycle for multiple enterprise platforms on AWS and on-premises: regulated .TM ccTLD registry, global Anycast DNS network across 60+ POPs, enterprise email (Axigen), and multi-tenant SaaS serving 100K+ daily users. ITIL-aligned operations, FinOps savings of 20-25%, structured incident response. Designed and deployed an AI-powered N8N agent for operational automation.

2020-07 → 2022-021 yr 8 mo
Tech Studio Technology / Dubai · UAE
Senior System Administrator & Support Engineer

Delivered enterprise system administration and end-user support across dozens of enterprise clients. Resolved 200+ incidents per month with documented RCA and 85%+ customer satisfaction. Owned ticket triage, SLA tracking, escalation, and Ansible/Chef automation. Mentored junior engineers as the Linux escalation point.

2016-06 → 2020-124 yrs 7 mo · remote
I2VPN / Berlin · Remote
System Administrator (sole engineer)

Built the entire production VPN platform from scratch with full autonomy. Scaled from zero to 500,000+ global users with 99.9% uptime. Engineered a 12-node HA MySQL cluster, wrote custom DDoS-mitigation tooling (reduced attack impact by 90%), designed leak-free DNS infrastructure, and handled 24/7 on-call as the only infrastructure engineer.

2013-11 → 2016-012 yrs 3 mo
Tatweer Co / Damascus · Syria
System Administrator

Managed enterprise infrastructure for a major ISP serving roughly 35% of private internet customers in Syria. VMware vSphere/vCenter virtualization, Veeam-based backups with sub-1-hour RTO. Led a zero-downtime data-center migration over a 4-day cutover. Zero security breaches over the tenure.

06 — stack

Tools, not religions.

// Everything listed below has been in production with my hands on it. Items in amber are in heavy daily use.

Cloud Platforms [03]
AWS · 8+ yrs OCI · production Azure Pipelines EC2VPCEKSIAM / IRSALambdaS3ECRALB / NLBRoute53ACMRDSSSMEBS CSICloudWatchCodePipelineOKEVCNVault / KMS
Containers & Orchestration [10]
Kubernetes Docker Helm HPACluster AutoscalerRBACNetworkPolicyCert-ManagerExternal-SecretsNginx IngressRancher
IaC, CI/CD & Automation [10]
Terraform Ansible ChefBashPythonGitLab CIGitHub ActionsJenkinsAzure PipelinesAWS CodePipeline
Networking & Security [14]
BGP Anycast DNS BINDRoute53Cloudflare CDNNginxiptablesOpenVPNWireGuardTLS / PKISPF / DKIM / DMARCHashiCorp VaultOCI KMSAWS SSM
Linux & Systems [08]
RHEL / CentOS / Rocky Ubuntu / Debian FreeBSDSlackwaresystemdSELinuxKernel tuningVMware vSphere/vCenter
Data & Messaging [07]
PostgreSQL Redis / ElastiCache MySQL HAMongoDBTimescaleDBRabbitMQVeeam
Observability [08]
Prometheus Grafana kube-prometheus-stackELK / ElasticsearchNagiosPRTGCloudWatchRsyslog
Operations & Discipline
ITILIncident responsePost-mortemsRCARunbooksBCP / DRCapacity planningFinOpsSLO / SLA design
07 — writing

Field notes from production.

// Occasional writing on running infrastructure that doesn't fall over. Currently published on LinkedIn while the long-form home is under construction.

08 — beyond the stack

Education, credentials, language.

// Operational depth is built on top of a fundamentals education in communications and information technology engineering.

Education
Master's in Web Science
Syrian Virtual University (SVU)
2015
B.Sc. Communication & Information Technology Engineering
Tishreen University
2008 — 2013
Certifications
Data Loss Prevention (DLP)
INFOWATCH · Cert ID: EISSTM6 012938 2019
2019
Languages
English
Fluent · B2+
Arabic
Native
German
A1 · actively learning
09 — contact

Open to the next platform.

// I'm based in Munich, hold full EU work authorization, and I'm currently open to senior DevOps, Platform, SRE, and Cloud Architect roles. Remote, hybrid, on-site — all viable.

The most reliable
way to reach me is email.

Reply window is typically same day during CET working hours. Comfortable on-call and weekend coverage if the role calls for it.