Anton Ferra

DevOps & Platform Engineer · Boston, MA
I don't maintain a LinkedIn profile or public work links by choice. References from former managers are available on request, and I'm happy to walk through code samples or architecture in conversation. Email: antonferra@gmail.com

About

DevOps and platform engineer with over 5 years of experience operating production cloud infrastructure at scale. I came up through network engineering before moving into infrastructure automation, and tend to focus on reliability, automation, and developer experience.

Most of my career has been spent keeping the lights on and serving as glue between application, data, and platform teams. I gravitate toward systems work where the feedback loop is tight: build something, hand it to engineers, iterate based on what breaks or what they ask for next.

What I Work On

Cloud & Kubernetes

Multi-region EKS and GKE clusters running production workloads at scale. VPC and network design, IAM, load balancing, Gateway API with HTTPRoute. Cross-cloud migrations including database replication, cutover sequencing, and rollback playbooks.

Self-Service Infrastructure

Crossplane-based provisioning systems that let application teams stand up their own infrastructure via GitOps. Go-based Kubernetes mutating admission webhooks for pod configuration and sidecar injection. KCL schema libraries for standardizing how teams define infrastructure.

GitOps & Deployment

ArgoCD-driven deployment workflows with automated rollout orchestration, health checks, and rollback. Terraform and Terragrunt modules covering database provisioning, security groups, IAM policies, and network setup.

MLOps & AI Infrastructure

SageMaker inference infrastructure captured in Terraform — endpoint provisioning, IAM, right-sizing, and cost optimization for ML workloads. Partnered with data engineering on the model side; owned the infrastructure layer underneath.

Observability & Security

Prometheus pipelines feeding Grafana dashboards for infrastructure and application health. OpenTelemetry for distributed tracing across services. On-call response for production database incidents (SingleStore, Couchbase, Postgres). Security group auditing and centralized RBAC policy tooling.

Network Engineering

Full network stack from data center to desktop. Python-based configuration management automation across hundreds of devices. Linux system administration, SNMP monitoring, and capacity planning.

Technical Skills

AWS: EKS, EC2, RDS, VPC, NLB, Route53, S3, IAM, MSK, Lambda, EMR, SageMaker GCP: GKE, CloudSQL, Cloud Armor Kubernetes: EKS, GKE, Docker, Helm, Kustomize, Crossplane, Gateway API, RBAC, mutating webhooks IaC: Terraform, Terragrunt, Crossplane CI/CD: ArgoCD, GitHub Actions, Jenkins MLOps: Amazon SageMaker (Terraform-managed inference infrastructure, IAM, cost optimization) Observability: Prometheus, Grafana, OpenTelemetry, CloudWatch Databases: PostgreSQL (RDS, CloudSQL), Redis, SingleStore, Couchbase Languages: Python, Go, Bash, Ruby

Outside of Work

I'm a tinkerer at heart. Whether it's software, hardware, Raspberry Pi, Arduino, LoRa, or just messing with networking gear, I tend to have something half-built on a desk somewhere. A lot of what I know professionally started as hobby curiosity that got out of hand.