AI-Focused DevOps Engineer
Full Stack
Cape Town, Western Cape
Retail jobs in stores, malls and supermarkets are among the most accessible roles for first-time job seekers across South Africa.
This listing does not state a salary. As a guide, retail roles in South Africa typically pay R5 000 to R12 000 a month (indicative).
Job description
Company: Full Stack
Level: Intermediate (4 years' experience)
Type: Full-Time
Location: Onsite — Cape Town, South Africa
About Full Stack
Full Stack is a software and AI consultancy based in Cape Town. We build products, platforms, and tooling for a broad range of clients — from financial services to retail. Our engineering team runs lean and moves fast, and we are investing heavily in AI-assisted development workflows. We use the tools we build, and we expect everyone on the team to do the same.
The Role
We are looking for an intermediate DevOps Engineer who is genuinely excited about AI tooling — not as a buzzword, but as a daily part of how work gets done.
This is not a pure infrastructure role. You will maintain cloud environments, own deployments, and keep services running, but you will also integrate and support AI-assisted developer tooling, manage LLM inference infrastructure, and help developers stay unblocked. If you have been quietly building confidence with AI coding assistants and want to work somewhere that takes them seriously, this role is for you.
What You Will Do
Cloud Infrastructure
- Provision and maintain Azure resources: App Services, Azure SQL (elastic pools), Key Vault, Entra ID, and resource groups
- Administer Azure DevOps pipelines, variable groups, service connections, and artifact feeds
- Monitor cloud resource spend and flag cost overruns before they happen
- Manage Azure Entra app registrations, service principals, and OAuth 2.0 authentication flows (Device Code, Client Credentials)
- Rotate and manage secrets via Azure Key Vault; enforce the rule that credentials never live in source code
Containerisation & Deployment
- Write and maintain Dockerfile and docker-compose.yml configurations for Python, Node.js, and web application stacks
- Own deployments end-to-end: push, verify, tail logs, confirm health — no silent failures
- Manage nginx as a reverse proxy: SSL termination, upstream routing, and virtual host configuration
- Diagnose container issues: restart loops, failed health checks, port conflicts, network isolation problems
AI Tooling & Developer Platforms
- Deploy and maintain internal AI tooling platforms used by the development team across Windows, macOS, and Linux
- Manage Ollama model deployments (cloud and local), including model availability monitoring and inference endpoint health
- Integrate and maintain third-party AI service APIs: Claude (Anthropic), Google AI / Gemini, ElevenLabs TTS, and others as the stack evolves
- Configure and maintain Claude Code settings, hook definitions, and MCP server registrations
- Troubleshoot AI toolchain failures: model timeouts, token quota exhaustion, broken API connections, authentication errors
Networking & Systems
- Maintain internal network services: DNS, basic VLAN concepts, SSH access management
- Troubleshoot connectivity across Docker networks, on-premises hosts, and cloud endpoints
- Apply firewall rules, port exposure policies, and TLS certificate management (Let's Encrypt and Azure-managed)
- Enforce SSH key-based authentication across all deployment targets
Databases
- Manage Azure SQL databases: schema migrations, connection string management, elastic pool utilisation monitoring
- Perform routine database operations: backups, index maintenance, slow query identification
- Work with ClickHouse or similar analytical databases for data pipeline support
- Understand vector database concepts (pgvector, ChromaDB, or similar) for AI-adjacent workloads
Security
- Enforce secrets hygiene: scan staged files before every commit, audit git history, remediate accidental leaks with git filter-repo
- Maintain .env patterns in .gitignore; ensure every repository has a safe .env.example
- Apply least-privilege principles to all service accounts, Azure RBAC assignments, and API key scopes
- Respond to and document security incidents with clear root-cause analysis and remediation steps
Support & Developer Enablement
- Serve as first-line support for developer tooling issues: broken deployments, failed API integrations, environment misconfigurations, Whisper/TTS pipeline failures
- Maintain cross-platform installation and configuration scripts (Windows PowerShell, bash for macOS and Linux)
- Write and maintain runbooks; if you fix something, document it so the next person (or AI agent) can fix it faster
- Run automated test suites after infrastructure changes; do not ship if tests are broken
What We Are Looking For
Must Have
Cloud & Infrastructure
- Azure (intermediate): comfortable provisioning and managing resources, pipelines, Key Vault, and Entra ID
- Docker (intermediate–confident): Compose stacks, networking, health checks, volume management
- Linux administration: systemd, cron, SSH, bash scripting, process and log management
- Windows administration: PowerShell, Task Scheduler, environment variable management
- nginx: reverse proxy setup, SSL termination, location routing
AI & Developer Tooling
- Active daily use of an AI coding assistant (Claude Code, GitHub Copilot, Cursor, or equivalent) — not just awareness
- Working understanding of LLM concepts: token limits, context windows, model selection trade-offs
- Ability to read and write Python scripts for automation, API integration, and data pipeline tasks
- Comfort integrating with third-party AI APIs (authentication, rate limits, error handling)
Networking
- TCP/IP fundamentals: subnetting, routing, NAT, DNS
- TLS/SSL: certificate types, renewal, and troubleshooting
- Firewall rules: UFW, iptables, or Azure NSG management
Databases
- SQL (intermediate): queries, joins, indexes, basic execution plan reading
- Azure SQL: elastic pools, firewall rules, Azure AD authentication
- Secure connection string management: environment variables, Key Vault references
Security
- Secret scanning and git history auditing
- OAuth 2.0 flows: understand and debug Device Code, Client Credentials, and Authorization Code
- Key rotation and credential lifecycle management
Development Adjacent
- Python (intermediate): write and modify automation scripts independently
- Bash / PowerShell (intermediate): multi-step scripting, API calls, file handling
- YAML: fluent — Compose files, Azure Pipelines, GitHub Actions
- Git (intermediate–advanced): branching, hooks, multi-remote setups, history rewriting when needed
- TypeScript / JavaScript (basic): enough to read configs and not break pipelines
Nice to Have
- Experience with Kubernetes (kubectl, Helm) — we are moving workloads there
- Terraform for infrastructure-as-code on Azure
- Previous work with LLM inference servers: Ollama, vLLM, or cloud model endpoints
- Familiarity with ClickHouse or other column-store analytical databases
- Exposure to ElevenLabs, Runway AI, or other AI media generation APIs
- Understanding of vector search and embedding pipelines (pgvector, FAISS, ChromaDB)
- Familiarity with Tauri or Rust-based desktop applications
The Kind of Person Who Thrives Here
- You reach for an AI tool before writing boilerplate from scratch
- You read logs before guessing — docker compose logs, git log, and cloud monitoring exist; you use them
- You verify before destroying — rollback plans and backups are habits, not afterthoughts
- You write a short runbook after fixing something, because you know it will happen again
- You ask before provisioning cloud resources, because you understand that costs accumulate
- You think about blast radius: who else is affected, what downstream services could break
What a Typical Week Looks Like
Monday:
Check platform health dashboards; review container logs; triage any overnight alerts
Tuesday:
Pipeline work — CI/CD improvements, dependency updates, or a new deployment script
Wednesday:
Developer support — environment issues, broken integrations, tooling questions
Thursday:
Security hygiene — secret scan, key rotation review, access audit
Friday:
Good to know
What does this retail job pay?
This listing does not state a salary. As a guide, retail roles in South Africa typically pay R5 000 to R12 000 a month (indicative).
Do I need experience for retail jobs in Cape Town?
Many retail roles in Cape Town are open to candidates with little or no experience. Read the listing for its exact requirements.
How do I apply for this job?
Tap "Apply on Indeed" to open the original listing, where you can read the full description and apply directly. JobsZA never charges you to apply, and you should never pay money to get a job.
Found on Indeed · Posted 6 days ago
More retail and similar jobs in Cape Town
Hospitality Placements
R30K - R50K/mo