Q&A with AB Periasamy on Kubernetes and Object Storage : @VMblog

Article

Search:

Follow VMblog.com:

Improve end user experience in VDI, DaaS and physical endpoint environments

Q&A with AB Periasamy on Kubernetes and Object Storage

Ahead of KubeCon + CloudNativeCon EU 2023, VMblog spoke with AB Periasamy, Co-Founder and CEO, of MinIO, about Kubernetes and object storage.

VMblog: Why has object storage been so successful in the Kubernetes ecosystem?

AB Periasamy: First, Kubernetes is designed to support cloud-native applications, which require highly scalable and durable storage solutions. Object storage provides these capabilities, in addition to things like REStful APIs and microservices, making it a natural fit for Kubernetes deployments. Kubernetes and the cloud-native world have no friction with object storage - and that makes a difference to developers. Other cloud-native applications just snap into modern object storage.

Second, object storage is highly flexible and customizable, which is important in the context of Kubernetes, where applications may have different storage requirements. Object storage can be seamlessly integrated with Kubernetes storage classes, making it simple for developers to define and manage storage resources for their applications. In a world characterized by complexity - and this is a drawback of K8s - simple, scalable storage becomes very attractive.

Finally, most of the leading object stores are open-source. MinIO is one of those. The cloud operating model is defined by open source, and the Kubernetes community values the choice and transparency that comes with open-source solutions. They provide greater flexibility and customization options, as well as access to a large and active community of contributors who can help to improve and enhance the software over time.

VMblog: What are some challenges that organizations may face when deploying object storage with Kubernetes?

AB Periasamy: A key challenge with any storage system, and one that is particularly important in the Kubernetes world is ensuring persistence and availability in the face of node failures or other issues. There are multiple approaches here - but versioning, replication and erasure coding all play a role in delivering a robust architecture for data protection.

Data locality can also be a challenge. While Kubernetes workloads are portable, data has weight and it can be heavy (meaning time-consuming and costly) to move. In addition, ensuring the right topology of storage and compute will go a long way to adhering to the increasingly challenging data governance and data sovereignty landscape.

Scalability can be a challenge if clusters are architected incorrectly. Systems are not static, they are dynamic and this needs to be taken into consideration from the beginning. A core principle of cloud-native design is to use simple building blocks and scale them infinitely. This provides needed flexibility as your data and application stack grow.

Finally, application performance must be optimized. Modern object storage is fast and is throughput bound. Ensure your object storage solution can deliver the performance you need for your workloads, current and envisioned. High performance object storage delivers workload diversity - from AI/ML to databases and more traditional ones like archival. Choose wisely.

VMblog: MinIO has open-sourced DirectPV for Kubernetes workloads. What is it and why should developers care?

AB Periasamy: DirectPV (Direct Persistent Volume) is a CSI driver for Direct Attached Storage. In a simpler sense, it is a distributed persistent volume manager, and not a storage system like SAN or NAS. It is useful to discover, format, mount, schedule and monitor drives across nodes. Since Kubernetes hostPath and local PVs are statically provisioned and limited in functionality, DirectPV was created to address this limitation.

While we liked the simplicity of Kubernetes native LocalPV, it did not scale operationally. The volumes have to be preprovisioned statically and there was no easy way to manage large numbers of drives. Further, LocalPV lacked the sophistication of the CSI driver model.

Taking MinIO's minimalist design philosophy into account, we sought a dynamic LocalPV CSI driver implementation with useful volume management functions like drive discovery, formatting, quotas and monitoring.

DirectPV allows containerized workloads to directly access persistent volumes without going through a network interface. This feature provides direct access to the underlying storage hardware, which can result in improved performance and reduced latency for storage-intensive workloads.

Once the local volume is provisioned and attached to the container, DirectPV doesn't come in the way of application I/O. All reads and writes are performed directly on the volume. You can expect the performance to be as fast as the local drives. Distributed stateful services like MinIO, Cassandra, Kafka, Elastic, MongoDB, Presto, Trino, Spark and Splunk would benefit significantly from the DirectPV driver for their hot tier needs. For long term persistence, they may use MinIO or another S3-compatible object storage as the warm tier. MinIO would in turn use DirectPV to write the data to the persistent storage media.

DirectPV is licensed under AGPL v3. You can find more on GitHub.

VMblog: Performance seems to be a key theme with object storage and Kubernetes. How can organizations ensure optimal performance when combining the two?

AB Periasamy: To optimize performance in the context of the combination of Kubernetes and object storage is to recognize that what you are really looking for is to optimize performance at scale. Performance is one thing, performance over 10s or 100s of Petabytes is an entirely different thing.

One area to focus on is the optimization of network settings. The network configuration between Kubernetes nodes and the object storage system can significantly impact performance. Tuning network settings, such as MTU size and buffer settings, can help to improve performance and reduce latency - but remember, with object storage you are optimizing for throughput first and foremost.

As mentioned above, DirectPV can provide a more direct and efficient path for data to flow between the workload and the storage system. This delivers improved performance and reduced latency.

Implement data caching: Caching data in memory can help to reduce the number of read and write operations to the underlying storage system, which can improve performance. Kubernetes provides several caching options, such as in-memory caches or distributed caching solutions.

Load balancing can help to distribute traffic evenly across multiple storage nodes, which can help to improve performance and avoid bottlenecks. There is a high-performance load balancer called Sidekick that attaches a tiny load balancer as a sidecar to each of the client application processes. This allows the developer to eliminate the centralized load balancer bottleneck and DNS failover management. Sidekick automatically avoids sending traffic to the failed servers by checking their health via the readiness API and HTTP error returns.

Finally, always monitor and optimize performance. Whether through inbuilt systems or solutions like Prometheus it is critical to to regularly monitor the performance of the Kubernetes and object storage system. Small changes can have broad impacts in complex systems and are always easier to sleuth when you catch them early.

VMblog: What excites you most about the future of Kubernetes storage?

AB Periasamy: The rise of object storage has really been an exciting development to watch and it has contributed greatly to the adoption of Kubernetes. In the public cloud, object storage was always primary storage. Services like Snowflake, Snap, RedShift and BigQuery were built on top of object storage and it was a natural extension to pair it with the EKS, GKS and AKS distributions.

What has happened in the past few years, is that the cloud operating model can be found on prem, in the colo, at the edge. This has carried Kubernetes and, by extension, its storage running mate (object storage) with it. It made sense, really - you cannot containerize a storage appliance. The concept of infrastructure as software is what has enabled this modern software stack to scale like it has - both in terms of the number of organizations on it as well as the types of applications that run it. That is what I find most compelling about where we are today as a community.

ABOUT AB PERIASAMY

AB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).

AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software-defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat’s Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory’s “Thunder” code, which, at the time was the second fastest in the world.

AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.

AB is one of the leading proponents and thinkers on the subject of open source software – articulating the difference between the philosophy and business model. He is also an active contributor to a number of open source projects and also servers on the board of the Free Software Foundation India.

Published Friday, April 07, 2023 7:35 AM by David Marshall