How ITGLOBAL.COM launched a new high-performance cluster for migrating and scaling Serverspace cloud infrastructure

Service

#Cloud #Infrastructure

With the growing performance requirements for cloud services, ITGLOBAL.COM was faced with the need to migrate Serverspace customers from an outdated cluster to a new site. The decision was made to design and launch a new high-performance cluster that will provide higher virtual machine density, lower latency, and the ability to scale further.

This article describes ITGLOBAL.COM’s experience in selecting hardware, building a vStack HCP-based architecture, migrating virtual infrastructure, and testing the new cluster’s fault tolerance.

Project profile

Company name: Serverspace

Industry: Cloud infrastructure and virtualization

ITGLOBAL.COM’s role: Design, implementation, and operation of infrastructure

Project objectives

The previous cluster, built on the second-generation Intel Xeon Scalable processors, had been stably serving thousands of virtual machines for several years. However, the analysis of operational metrics revealed limitations for further development:

– the growth of customer requirements for IOPS, latency, and network bandwidth;

– the approach of the disk subsystem response time to the limit values under peak loads;

– limited ability to scale and increase the density of virtual machines;-

– the need to improve performance for databases and high-load web applications.

To address these challenges, it was necessary to create a new infrastructure platform with higher performance, fault tolerance, and efficient resource utilization.

Solution from ITGLOBAL.COM

Hardware platform selection

Servers ITPOD-SL201-D24R-NV-G4 were chosen as the basis for the new cluster, the characteristics of which met the requirements for performance and scalability.

Configuration of each node:

Processors: 2 × Intel Xeon 6526Y (Scalable Gen5)

Memory: 16 × 64 GB RDIMM 5600 MHz (1 TB RAM per node)

System disks: 2 × 960 GB U.2 NVMe

Data disks: 7 × 1920 GB U.2 NVMe

Network: 2 × dual-port 25Gb Ethernet SFP28 (OCP 3.0)

A key feature of the platform is the direct connection of NVMe disks to the motherboard without intermediate controllers, which minimizes delays and improves the performance of the software-defined storage layer.

vStack HCP-based architecture

The cluster is based on the vStack HCP platform, which combines computing resources, data storage, and network infrastructure into a single system. This approach allows you to work with a shared pool of resources without separating them into separate storage and network components.

– computing resources are provided by ITPOD servers;

– NVMe disks ensure stable operation of applications under high load;

– built-in vStack HCP network architecture simplifies management of routing and traffic balancing;

– the system maintains performance even with a sharp increase in load.

Virtual infrastructure migration

After the cluster deployment in IXcellerate, the virtual infrastructure of Serverspace was migrated. The virtual machines were transferred:

– between different virtualization environments;

– between vStack clusters.

The Infrastructure as Code (IaC) approach was used for data-level migration. An identical information system was deployed in the target environment without user data, after which data was copied and applications were launched already on the new site.

We also used a phased withdrawal of nodes from the original infrastructure after integrating new servers, which allowed us to migrate without stopping the services.

Testing and Fault Tolerance

The cluster architecture was initially designed to handle single-node failures. ITGLOBAL.COM conducted a series of load and crash tests.

Scenario 1: Single-Node Failure

Under active load (100 MB/s write, 300 MB/s read), one server was shut down:

– after 8 seconds, fencing triggered, and the node was removed from the cluster;

– the virtual machines automatically restarted on other nodes;

– degradation of SDS performance was recorded.

Scenario 2: Sequential failure of two nodes

After the first node failed, the second server was disabled:

– the system correctly handled the double failure;

– the availability of services and data was preserved;

– the performance decrease was less than 15% during the rebalancing;

– after ~20 minutes, the performance recovered to 100%.

Network architecture

25-gigabit network adapters are used for the cluster, as in the vStack HCP architecture the following passes through Ethernet:

– client traffic;

– storage traffic;

– virtual machine migration traffic;

– platform management traffic.

Two network adapters connected to different CPU socket provide fault tolerance and optimal load distribution taking into account the NUMA architecture.

Results

As a result of the launch of the new cluster, ITGLOBAL.COM achieved the following results:

– the performance of the disk subsystem increased by 3 times;

– the number of CPU cores per node increased by 2 times with the same power consumption;

– the cost of ownership per virtual machine decreased by 35%;

– fault tolerance was confirmed even with multiple failures;

– the cost of the solution increased by only 15% with a 3-fold increase in performance.

Overcommit support allowed us to increase the density of virtual machines and use resources more efficiently without expanding the physical infrastructure.

Development plans

The next stage of architecture development will be the addition of a classic ITPOD Storage with HDD disks to implement tiered storage:

– NVMe — for hot data;

– HDD — for cold data.

This will allow to support backup scenarios, S3-compatible storage, file shares and Dev/Test environments while maintaining single management through vStack HCP.

Project profile

Project objectives

Solution from ITGLOBAL.COM

vStack HCP-based architecture

Virtual infrastructure migration

Testing and Fault Tolerance

Network architecture

Results

Development plans

How ITGLOBAL.COM launched a new high-performance cluster for migrating and scaling Serverspace cloud infrastructure

Project profile

Project objectives

Solution from ITGLOBAL.COM

Hardware platform selection

Configuration of each node:

vStack HCP-based architecture

Virtual infrastructure migration

Testing and Fault Tolerance

Scenario 1: Single-Node Failure

Scenario 2: Sequential failure of two nodes

Network architecture

Results

Development plans

Other success stories

ITGLOBAL.COM implemented a powerful traffic management system to control and improve the internet service quality of an Iraqi ISP

ITGLOBAL.COM helped Sparkz Systems optimize its product development processes and took over the customer's infrastructure support

How ITGLOBAL.COM Managed Services allowed jSparrow to save time and money on software development

Per aspera ad astra: How ITGLOBAL.COM helped the African TSP Rofa Network to boost its competitive advantage on the market

Updating the company's corporate DevOps practice for Sizdev

ITGLOBAL.COM just recently completed a new project for creation of an IT infrastructure for the Center for Disaster Medicine.

The KÉDDO story - Delivering a High Availability Private Cloud for Cost-Effectiveness

The way vStack helped to make Serverspace the TOP-1 in terms of performance

Cartwheel: How to Scale Your Business with Managed IT

Leave a request