Client:
ERP Systems Developer from the EU
[ Detailed information about the client cannot be disclosed under the provisions of the NDA ]
Project workflow
Challenge
The client had a complex Kubernetes solution from OpenStack (private cloud provided OpenStack, a free, open standard cloud computing platform mainly deployed as infrastructure-as-a-service (IaaS))
The system had an availability of around 80%, which was unacceptable. The infrastructure was challenging to support, and the skills and costs required to maintain its complexity, scalability, and security exceeded expectations
Solution — Preliminary Investigation
We were part of the DevOps team, which consisted of 10 DevOps Engineers facing this challenge
Discovery showed that:
- Maintaining IaaS required the DevOps team to have many low-level skills to manage cloud underlying assets: storage, nodes mesh, and maintaining its services like Container Registry and CI/CD Integration, Infrastructure Provisioning, and Management. That seriously affected the average salary across the DevOps team and the recruiting time and cost to find such an employee;
- Solving critical issues required a high amount of manual, complicated actions and the participation of 2nd party vendors in case of hardware problems. It often led to the Software Development Team’s idle time;
- Scalability and Elasticity in terms of expanding the cloud if the load reaches limits required a significant amount of time, which could lead to the unavailability or incorrect work of the application;
- Infrastructure security needed constant tracking and improvement, which required extra actions and increased the risk of possible problems during security updates
To address these problems, our team focused on moving to one of the public clouds. Management chose Azure public cloud after budgeting AWS, GCP, and Azure clouds
Solution — What was done
We developed a plan according to client needs:
- Created new IaC using Terraform to represent new infrastructure at Azure;
- Developed networks to introduce tunnels (main and alternate) between corporate and Azure networks, so two infrastructures could work in the same private network;
- Added Azure ACL based on existing corporate MS ldap service;
- Migrated k8s resources to Azure specifically to be able to use Azure-managed services;
- Set up default Azure scaling of k8s cluster;
- Migrated images to Azure Container Registry (ACR);
- Added scanning and alerting of k8s outdated features and versions to cover security aspects to keep the versions of k8s and basic images updated
Results
Increase
of availability
Decrease
costs of maintenance
Timings
The project lasted 6 months: 1 month spent on IaC, 2 months for k8s apps migration, and 3 months of ongoing improvement of previous steps as well as adding processes for tracking security updates, cluster updates, and ACL support
Results achieved:
- System availability increased from 80 to 99.95%;
- DevOps team payroll costs were reduced. Low-level DevOps engineers were not needed anymore, so the DevOps team size was optimized to general k8s DevOps engineers;
- Infrastructure problems due to the automatic scalability and on-time security updates were eliminated;
- Hardware maintenance costs were eliminated due to the complete migration to Public Cloud;
- Simplified journaling: access logs, state of clusters, transparency of the whole setup by Integrated Monitoring and Logging
Technologies used
- Azure
- Openstack Cloud Platform
- Terraform
- Kubernetes