Spanish English French German Italian Portuguese
Social Marketing
HomeTechnologyInfrastructurePrinciples for handling data in Kubernetes

Principles for handling data in Kubernetes

Kubernetes has become an industry standard, with up to 94% of organizations deploying their services and applications on the container orchestration platform, according to surveys. One of the key reasons enterprises implement Kubernetes is for standardization, which allows power users to see double productivity benefits.

Standardizing on Kubernetes gives organizations the ability to deploy any container, anywhere. But there was a missing piece: The technology assumed that containers were temporary, meaning that only stateless ones could be securely deployed on Kubernetes. However, the community recently changed the paradigm and brought features like StatefulSets and Storage Classes, which make using data in Kubernetes possible.

While it is possible to run stateful containers on Kubernetes, it is still a challenge.

do it progressively

Kubernetes is on its way to becoming as popular as Linux and the de facto way to run any application, anywhere, in a distributed fashion. Using Kubernetes involves learning a lot of technical concepts and vocabulary. For example, newcomers may have problems with the many logical units of Kubernetes, such as containers, pods, nodes, and clusters.

If you're not already running Kubernetes in production, it's not a good idea to jump right into data containers. It's best to start by moving stateless apps to avoid data loss when things go wrong.

If you can't find an operator that fits your needs, don't worry, because most of them are open source.

Understand the limitations and specifics

Once you're familiar with the general concepts of Kubernetes, you can dive into the details of stateful concepts. For example, because applications may have different storage needs, such as performance or capacity requirements, the correct underlying storage system must be provided.

What the industry generally calls storage "profiles" are called storage classes in Kubernetes. They provide a way to describe the different types of classes that a Kubernetes cluster can access. Storage classes can have different levels of QoS, such as I/O operations per second per GiB, backup policies, or arbitrary policies, such as link modes and allowed topologies.

Another critical component to understand is StatefulSet. It is the Kubernetes API object used to manage stateful applications and offers key features such as:

  • Unique and stable network identifiers that allow you to track volume and disconnect and reconnect whenever you want
  • Stable and persistent storage to keep your data safe
  • Neat and orderly deployment and scaling, which are necessary for many day 2 operations.

While StatefulSet has been a successful replacement for the infamous (now deprecated) PetSet, it is still imperfect and has limitations. For example, the StatefulSet controller comes no built-in support for volume resizing (PVC) — which is a big challenge if the size of your application's data set is about to grow beyond your current allocated storage capacity. exist alternative solutions but such limitations must be understood well in advance so that the engineering team knows how to handle them.

Have a plan

Once comfortable with Kubernetes state concepts, you can progressively migrate data-related tasks in a specific order. This allows you to learn from mistakes and avoid getting overwhelmed, because not all data technologies are equally easy to run on Kubernetes.

Established technologies like databases and storage should be migrated first, and emerging technologies like AI and ML should be migrated last. This is reflected in a recent report, which found that database and persistent storage are the two most executed data workloads on Kubernetes. The main reason is the lack of tools for Day 2 operations.

Availability of operators

Moving stateful containers to Kubernetes is only half the job, aka Day 1. Now it's time to handle Day 2 operations (one of the most discussed topics at the last KubeCon). This is where things get tricky. There are tons of Day 2 operations that Kubernetes can't handle natively, like patching and updating, backup and recovery, log processing, monitoring, scaling, and tuning.

All of these operations are application-specific. For example, a PostgreSQL and MySQL cluster will require two completely different approaches when choosing a new main server in an HA cluster setup. Kubernetes cannot know all the specific Day 2 operations of the application. This is where the operators come in.

Operators are scriptable extensions that perform operations that Kubernetes cannot handle natively. Operators provide dynamic and intelligent management capabilities by extending the functionality of the Kubernetes API. One of the most common uses is to perform these day 2 operations. These operators are not developed by Kubernetes maintainers but by third-party developers and organizations.

Before moving a data job to Kubernetes, make sure there is an operator for it. OperatorHub does a great job of indexing them. With 282 operators available on the site, the distro echoes the above: some jobs have support tools and some don't. For example, the database category has 38 operators (there are eight for PostgreSQL alone), while the entire ML/IA category only has seven.

The right level of operator skill

Having an operator for your technology is not enough, as they may have different capabilities and often exist at various levels of maturity. The OperatorFramework suggests a capability model that categorizes operators based on their characteristics:

  • Tier 1 – Works for basic installation, such as automated application provisioning and configuration management.
  • Tier 2: Supports upgrades, patches, and minor version upgrades without issue.
  • Tier 3 – Handles the entire application and storage lifecycle (backup, failover, etc.).
  • Tier 4 – Provides insights, metrics, alerts, log processing, and workload analysis.
  • Level 5 – Provides automatic horizontal/vertical scaling, automatic configuration tuning, anomaly detection, and schedule tuning.

When choosing an operator, you have to make sure that their capabilities match your needs. If you're not sure which tier is right, the Data on Kubernetes 2022 report found that most organizations are looking for operators that are at least at tier 3. Having a backup for your containers is a good practice.

If you can't find an operator that fits your needs, it's not a problem because most of them are open source. You can extend the capabilities of existing operators with in-house development, or better yet, contribute to the open source project.

understand the operator

The extensibility of the operators is its strength, but also its weakness. The lack of standards means that they are programmed in different ways, so you have to look at your configuration files to choose the format that works best for you.

Also, operators can use different technical routes to achieve the same goal. For example, one of the eight PostgreSQL operators, CloudNativePG, does not use StatefulSets and instead uses its own custom handler. That's pretty unexpected considering that StatefulSets is the foundation for stateful containers in Kubernetes.

Its developers decided to go with this design due to StatefulSet's inability to resize PVCs (as we discussed above). As the operator documentation explains, choosing "different layout options leads down several paths." Therefore, when choosing an operator, make sure you understand its implementation and its advantages and disadvantages, and choose the one that is most comfortable for you.
worth the effort

Running data on Kubernetes isn't always easy, but the good news is that it's worth the hard work: 54% of organizations surveyed attributed more than 10% of their revenue to running data on Kubernetes. Additionally, 33% said it has a transformational impact on productivity and another 51% saw a significant positive impact.

As organizations increasingly adopt multicloud infrastructure to optimize their cost and infrastructure performance, Kubernetes has become the preferred tool. With approximately 66% of countries having some form of data privacy and consumer rights legislation, which often requires data governance enforcement, companies must increasingly host user data in countries in which they operate.

RELATED

Leave a response

Please enter your comment!
Please enter your name here

Comment moderation is enabled. Your comment may take some time to appear.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SUBSCRIBE TO TRPLANE.COM

Publish on TRPlane.com

If you have an interesting story about transformation, IT, digital, etc. that can be found on TRPlane.com, please send it to us and we will share it with the entire Community.

MORE PUBLICATIONS

Enable notifications OK No thanks