Ahead of KubeCon + CloudNativeCon EU 2023, VMblog spoke with AB Periasamy, Co-Founder and CEO, of MinIO, about Kubernetes and object storage.
VMblog: Why has object storage
been so successful in the Kubernetes ecosystem?
AB Periasamy: First, Kubernetes is
designed to support cloud-native applications, which require highly scalable
and durable storage solutions. Object storage provides these capabilities, in
addition to things like REStful APIs and microservices, making it a natural fit
for Kubernetes deployments. Kubernetes and the cloud-native world have no
friction with object storage - and that makes a difference to developers. Other
cloud-native applications just snap into modern object storage.
Second, object storage
is highly flexible and customizable, which is important in the context of
Kubernetes, where applications may have different storage requirements. Object
storage can be seamlessly integrated with Kubernetes storage classes, making it
simple for developers to define and manage storage resources for their
applications. In a world characterized by complexity - and this is a drawback
of K8s - simple, scalable storage becomes very attractive.
Finally, most of the
leading object stores are open-source. MinIO is one of those. The cloud
operating model is defined by open source, and the Kubernetes community values
the choice and transparency that comes with open-source solutions. They provide
greater flexibility and customization options, as well as access to a large and
active community of contributors who can help to improve and enhance the
software over time.
VMblog: What are some challenges
that organizations may face when deploying object storage with Kubernetes?
AB Periasamy: A key challenge with any
storage system, and one that is particularly important in the Kubernetes world
is ensuring persistence and availability in the face of node failures or other
issues. There are multiple approaches here - but versioning, replication and
erasure coding all play a role in delivering a robust architecture for data
protection.
Data locality can also
be a challenge. While Kubernetes workloads are portable, data has weight and it
can be heavy (meaning time-consuming and costly) to move. In addition, ensuring
the right topology of storage and compute will go a long way to adhering to the
increasingly challenging data governance and data sovereignty landscape.
Scalability can be a
challenge if clusters are architected incorrectly. Systems are not static, they
are dynamic and this needs to be taken into consideration from the beginning. A
core principle of cloud-native design is to use simple building blocks and
scale them infinitely. This provides needed flexibility as your data and
application stack grow.
Finally, application
performance must be optimized. Modern object storage is fast and is throughput
bound. Ensure your object storage solution can deliver the performance you need
for your workloads, current and envisioned. High performance object storage
delivers workload diversity - from AI/ML to databases and more traditional ones
like archival. Choose wisely.
VMblog: MinIO has open-sourced
DirectPV for Kubernetes workloads. What is it and why should developers care?
AB Periasamy: DirectPV (Direct
Persistent Volume) is a CSI driver for Direct Attached Storage. In a simpler
sense, it is a distributed persistent volume manager, and not a storage system
like SAN or NAS. It is useful to discover, format, mount, schedule and monitor
drives across nodes. Since Kubernetes hostPath and local PVs are statically
provisioned and limited in functionality, DirectPV was created to address this
limitation.
While we liked the
simplicity of Kubernetes native LocalPV, it did not scale operationally. The
volumes have to be preprovisioned statically and there was no easy way to
manage large numbers of drives. Further, LocalPV lacked the sophistication of
the CSI driver model.
Taking MinIO's
minimalist design philosophy into account, we sought a dynamic LocalPV CSI
driver implementation with useful volume management functions like drive discovery,
formatting, quotas and monitoring.
DirectPV allows
containerized workloads to directly access persistent volumes without going
through a network interface. This feature provides direct access to the
underlying storage hardware, which can result in improved performance and
reduced latency for storage-intensive workloads.
Once the local volume is
provisioned and attached to the container, DirectPV doesn't come in the way of
application I/O. All reads and writes are performed directly on the volume. You
can expect the performance to be as fast as the local drives. Distributed
stateful services like MinIO, Cassandra, Kafka, Elastic, MongoDB, Presto,
Trino, Spark and Splunk would benefit significantly from the DirectPV driver
for their hot tier needs. For long term persistence, they may use MinIO or
another S3-compatible object storage as the warm tier. MinIO would in turn use
DirectPV to write the data to the persistent storage media.
DirectPV is licensed
under AGPL v3. You can find more on GitHub.
VMblog: Performance seems to be
a key theme with object storage and Kubernetes. How can organizations ensure
optimal performance when combining the two?
AB Periasamy: To optimize performance
in the context of the combination of Kubernetes and object storage is to
recognize that what you are really looking for is to optimize performance at
scale. Performance is one thing, performance over 10s or 100s of Petabytes is
an entirely different thing.
One area to focus on is the optimization of network settings. The network
configuration between Kubernetes nodes and the object storage system can
significantly impact performance. Tuning network settings, such as MTU size and
buffer settings, can help to improve performance and reduce latency - but
remember, with object storage you are optimizing for throughput first and
foremost.
As mentioned above,
DirectPV can provide a more direct and efficient path for data to flow between
the workload and the storage system. This delivers improved performance and
reduced latency.
Implement data caching:
Caching data in memory can help to reduce the number of read and write
operations to the underlying storage system, which can improve performance.
Kubernetes provides several caching options, such as in-memory caches or
distributed caching solutions.
Load balancing can help
to distribute traffic evenly across multiple storage nodes, which can help to
improve performance and avoid bottlenecks. There is a high-performance load
balancer called Sidekick that attaches a tiny load balancer as a sidecar to each of the
client application processes. This allows the developer to eliminate the
centralized load balancer bottleneck and DNS failover management. Sidekick
automatically avoids sending traffic to the failed servers by checking their
health via the readiness API and HTTP error returns.
Finally, always monitor and
optimize performance. Whether through inbuilt systems or solutions like
Prometheus it is critical to to regularly monitor the performance of the
Kubernetes and object storage system. Small changes can have broad impacts in
complex systems and are always easier to sleuth when you catch them
early.
VMblog: What excites you most
about the future of Kubernetes storage?
AB Periasamy: The rise of object
storage has really been an exciting development to watch and it has contributed
greatly to the adoption of Kubernetes. In the public cloud, object storage was
always primary storage. Services like Snowflake, Snap, RedShift and BigQuery were
built on top of object storage and it was a natural extension to pair it with
the EKS, GKS and AKS distributions.
What has happened in the
past few years, is that the cloud operating model can be found on prem, in the
colo, at the edge. This has carried Kubernetes and, by extension, its storage
running mate (object storage) with it. It made sense, really - you cannot
containerize a storage appliance. The concept of infrastructure as software is
what has enabled this modern software stack to scale like it has - both in
terms of the number of organizations on it as well as the types of applications
that run it. That is what I find most compelling about where we are today as a
community.
##
ABOUT AB PERIASAMY
AB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).
AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software-defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat’s Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory’s “Thunder” code, which, at the time was the second fastest in the world.
AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.
AB is one of the leading proponents and thinkers on the subject of open source software – articulating the difference between the philosophy and business model. He is also an active contributor to a number of open source projects and also servers on the board of the Free Software Foundation India.