Site Reliability Engineer

Please login or register as jobseeker to apply for this job.

TYPE OF WORK

Full Time

SALARY

TBD

HOURS PER WEEK

40

DATE POSTED

Mar 6, 2025

JOB OVERVIEW

-- DIRECT HIRE
-- FULLY REMOTE (PHILIPPINES)

About Aphex
At Aphex, we're on a mission to make it easier for construction teams to get on the same page with a live, multiplayer planning platform. Instead of juggling spreadsheets, whiteboards, and outdated systems, we build tools that allow anyone on a project to plan and communicate their work.
Our customers are the largest construction contractors responsible for the tunnels, roads, bridges, and buildings we use daily. Our users are the engineers managing and planning these incredible projects.
We've already made an impact since launching our V1 in the UK in 2019 and entering the Australian market in 2022; we've become the leading solution in both markets. With a strong product-market fit in these markets, we're looking to continue to scale our product team, and we're looking for people who are energised by the idea of improving an industry that contributes to around 13% of global GDP.

The Role
As a Mid / Senior level Site Reliability Engineer at Aphex, you'll be responsible for building and maintaining our cloud infrastructure, implementing observability solutions, and ensuring our platform's reliability. You'll work at the intersection of development and operations, using your expertise to scale our systems and maintain high availability.

What You'll Do
- Design, implement, and manage our GCP cloud infrastructure
- Build and maintain Infrastructure as Code using Terraform
- Implement and enhance observability using OpenTelemetry and GCP's observability stack
- Design and maintain monitoring, logging, and alerting systems
- Manage CI/CD pipelines using GitHub Actions and Cloud Build
- Implement automated health checks with Playwright
- Drive reliability improvements through SLOs, error budgets, and continuous improvement
- Collaborate with development teams to improve system reliability and performance


Required Skills & Experience:

Cloud Infrastructure
- Strong experience with Google Cloud Platform (GCP)
- Proficiency in Infrastructure as Code using Terraform
- Understanding of cloud security best practices

Observability
- Experience with OpenTelemetry implementation
- Proficiency with GCP observability tools (Logging, Monitoring, Alerts)
- Ability to design and implement effective monitoring solutions

CI/CD
- Experience with GitHub Actions
- Knowledge of Cloud Build and deployment automation
- Understanding of deployment strategies and rollback procedures

Automated Testing
- Experience with Playwright or similar testing frameworks
- Understanding of test automation best practices
- Ability to build and maintain automated test suites

Nice to Have
- Experience with construction or engineering software
- Knowledge of additional cloud platforms
- Experience with chaos engineering
- Site reliability certifications
- Previous remote work experience


Our Values in Action:

Win Together
Reliability is a team effort. You'll work closely with development teams to build reliable systems from the ground up, sharing knowledge and fostering a reliability-first mindset across the organisation.

Make Ourselves Proud
We take pride in building robust, scalable systems that our customers can depend on. You'll help establish and maintain high-reliability standards in everything we do.

Take Ownership
You'll have the autonomy to shape our reliability strategy and the responsibility to ensure its successful implementation. We're looking for someone who proactively identifies potential issues and drives solutions.

Build for Tomorrow
Our platform is growing rapidly. You'll help build sustainable practices and infrastructure that can scale with our growth while maintaining reliability.


Growth and Development:
Here's a real example of career progression at Aphex
"Jec joined us as a Full Stack Developer four years ago. He leads our Application Pod today, having grown through various technical and leadership roles. His journey included leading critical projects, mentoring junior developers, and helping shape our technical architecture. We support similar growth paths for all our teaUpgrade to see actual infombers through structured mentorship, learning opportunities, and increasing responsibilities."

Benefits:
- A high-performing team: Be part of and contribute to a genuinely collaborative and motivated team.
- Flexibility: Remote-first working.
- Genuine development: Ongoing training, learning, and coaching to improve daily, with regular team events and knowledge-sharing sessions.
- Focus on culture: At Aphex, we are serious about making a real impact together and strive to walk the talk every day.

How to apply?
PLEASE SEND UPDATED 'RESUME' at Upgrade to see actual info

SKILL REQUIREMENT
VIEW OTHER JOB POSTS FROM:
SHARE THIS POST
facebook linkedin