Client:
Automotive Company (EU)
[ Detailed information about the client cannot be disclosed under the provisions of the NDA ]
Project workflow
Challenge
The client had a complex embedded software development project for electric vehicle controllers
Most of the tools used on the project were low-level and running real-time directly on hardware
Сlient faced massive delays in releasing new features, bug fixes, and updates. It was crucial to speed up the delivery to adapt to changing market trends, customer needs, and emerging technologies in the rapidly evolving automotive industry
Solution — Preliminary Investigation
We were part of the DevOps team, which consisted of 8 DevOps Engineers facing this challenge
Discovery showed that:
- Complex CI/CD took days to verify the new codebase and the process was barely documented;
- Unstable pipelines consisted of dozens of steps; you could never know if a pipeline would finish or not; they had low reproducibility as well as a success rate of less than 50%;
- Jenkins was poorly configured: complex pipelines were consuming too much memory, led to the crush of Jenkins machine, which was blocking the work of the whole team;
- Root cause analysis could hardly be performed due to the complexity of working directly with hardware;
- the QA team could not guarantee software stability due to the complexity of the low-level application operating in a real-time operating system on a custom PCB (printed circuit board); they needed to learn how to test the system so the developers didn’t have to perform testing themselves
To address these problems, our team focused on optimizing and improving the CI/CD pipeline, automating testing and deployment processes, investing in scalable infrastructure, and fostering a culture of continuous improvement and collaboration within the development teams
Solution — What was done
We developed a plan according to client needs:
- Pipeline logic was rebuilt: steps were simplified by replacing low-level code with higher OSI model communications by adding abstractions, removing stale legacy code from steps, and replacing it with predictable and well-logged instructions;
- CI steps were made as non-blocking as possible to get all problems as early as possible;
- CI steps were rewritten for more agile verification to enable identifying if a step is successful or not. Because PCB could change its state during the run pipeline;
- Jenkins codebase was simplified by organizing code to DRY methods;
- Infrastructure productivity was increased: additional servers were bought, new instances were set to avoid problems with Jenkins machines, and Jenkins nodes were configured in a parallel way to avoid Out-Of-Memory crashes;
- Autotests were divided into necessary and not important, not important steps that were made non-blocking for a pipeline;
- A custom QA framework (QAF) was created above Robot Framework to replace hardware behavior with abstractions, making hardware testing similar to software testing. Introduced containerization of QAF to improve repeatability and scalability of pipelines;
- Reliable state definition methods were created for low-level software and hardware on each step of a pipeline
Results
Increase
of success
Quality
increased
Timings
The project lasted 18 months: 6 months for the initial phase to improve CI time-to-market and success rate, 6 months to improve QA results and add scalability and reproducibility, and 6 months for the ongoing improvement of bottlenecks
Results achieved:
- The average pipeline duration was reduced by 4+ times, so the end-to-end pipeline took a couple of hours instead of a few days. Pipelines were made completely reproducible; the success rate was increased to 75–85%;
- Infrastructure problems due to the lack of scalability were eliminated;
- QA team contribution increased tremendously due to having QAF and the availability of the root cause analysis;
- The dev team gained the possibility to see the results of the build in hours, not days, which enhanced the development speed significantly
In addition to the decreased time to market, the fulfilled project has allowed the client to achieve increased quality and scalability of the software and reduce operational costs
Technologies used
The central car control unit projects based on RTOS developed from scratch incapsulates low-level tools:
- QNX OS
- C++
- Rust
- Python
- Jenkins
- Docker
- QA framework,
- shell code,
- Sonar
- Fortify
- Grafana
- AWS