Cloud Platform - Experience on building managed cloud data services

We recently worked on providing a self-service cloud platform that enables companies and end-users to install complex tools (e.g. MongoDB, InfluxDB, etc.) with just a few clicks. The project had been done for the IT arm of one of the biggest retailers in the world. The IT department of the retailer consists of many teams and products with shared infrastructure requirements. So they decided to develop an in-house cloud service under their own brand to expose services not only internally to customers but also externally to the industry.
The vision is to become a cloud service provider that offers a range of managed services required by businesses from virtual machines to managed Kubernetes and common databases. The service provider owns and manages the IT infrastructure (SaaS, IaaS) and also provides the portal for users to interact with the infrastructure. The first phase of the project was designed to deliver a MongoDB as a service.
What follows is an overview of the technologies, standards and architecture used during this cool, challenging and fun project.

Business Context

In large companies, new services are developed and deployed by different teams which generate a massive amount of data. To manage these databases in a traditional way, the installation and maintenance of a database is a task for IT administrators. Usually, if a database is needed, an email is fired off to the admin, or a ticket is created in workflow tool or a face-2-face happens. So no matter how the communication way is set up, at the it is about to get an install of a MongoDB. From that moment on, it is also expected that the requester also gets maintenance help as well.

This works while an organization is relatively small. However, we will face problems if 300 instances of MongoDB are needed (how do you scale communication?). Supporting such a demand would need a huge team. The solution isto introduce automation into the equation to automate as much as possible what the admin would usually be doing (installing every MongoDB instance). Moreover, on top of the automated installation, we provide automatic backups, logging and monitoring.

Having this kind of automation enables you to work of a chunk of large demands with only a few DBAs who just keep an eye on the platform and provide specific support in cases where it is needed for customers.

In our project, the customer needed a variety of services including Redis, MySQL, MongoDB, Elasticsearch and Cassandra. We were asked to develop a custom fully-functional solution for it. As the service catalog contains many technologies, we started an MVP with a managed MongoDB as a service, which was the most demanded service by the internal IT teams. The customer already used Cloud Foundry as service manager. We could just add new services to Cloud Foundry by implementing a standard called Open Broker API. Based on this and other requirements our MongoDB service could provide the following functionalities:

  • Provisioning single-instance or replica-set MongoDB via the Cloud Foundry UI.
  • Choosing between different plans with different hardware sizes.
  • Providing security features like authorization and TLS.
  • Providing automated backup and restore with some level of customization per service.
  • Providing access from the internet via the customer service gateway.
  • Integration with the customer’s current cloud infrastructure (OpenStack).

Concepts

  • Open Service Broker API: A REST API specification that allows independent software vendors, SaaS providers, and developers to easily provide backing services to workloads running on cloud-native platforms such as Cloud Foundry and Kubernetes. The specification describes a simple set of API endpoints that can be used to provision, gain access to and managing service offerings, so users can easily use all the Cloud Foundry functionalities including getting catalog, provisioning, de-provisioning, binding, unbinding, etc. By implementing this standard, Cloud Foundry can discover our services, show them on user dashboards and let users request services, choosing appropriate plans, and manage their deployments.
  • Service broker: An implementation of Open Service Broker API specification (written in Golang in our case), which exposes a REST API that supports provision, de-provision, last operation, binding, unbinding, etc. Because of implementing this standard, it is relatively easy to integrate the Service Broker with the Cloud Foundry marketplace and other marketplaces such as Kubernetes Service Catalog.
  • Service Gateway: A customer-specific component that provides discovery from external networks. For each provision request, our Service Broker will request external access to MongoDB hosts and a unique URL for each host will be returned. The broker will provide internet-accessible MongoDB connection URLs to customers this way.

Tech Stack

We use Cloud Foundry BOSH as a unified release, deployment, and lifecycle management tool. BOSH is a cloud-agnostic toolchain that works with different cloud providers including VirtualBox, OpenStack, Azure, AWS, etc. by providing an abstraction over all of them, so we can easily switch between these platforms.

To manage VMs and disks we use OpenStack, a de facto open-source standard for VM orchestration. BOSH has a CPI (Cloud Provider Interface) concept which is an abstraction on cloud infrastructure. We are using OpenStack CPI to connect BOSH to OpenStack. All resource requests to BOSH will be translated to OpenStack resource request by BOSH. Also, BOSH itself is installed as a VM inside OpenStack.

To manage multiple MongoDB clusters and enable backup and automation on them, we use MongoDB Ops Manager. OPs Manager is a tool that comes out-of-the-box with MongoDB enterprise edition to automate common scenarios like backup and restore, TLS, etc. on MongoDB clusters.

In the following section, due to the importance of BOSH and Concourse to our architecture, we will describe them in detail.

Cloud Foundry BOSH

BOSH is an orchestration tool for virtual machines. In BOSH, we can define a bundle of everything that is needed to build and run a piece of software on a machine (BOSH release), which can be deployed on multiple instances for a single service. In our case, we have a MongoDB release which is the base image for deployed mongo instances. When a user requests a 3-node replica-set from the Cloud Foundry marketplace, which is routed to our registered Service Broker, the Service Broker will request BOSH (The BOSH node that accepts requests is called BOSH director) to create 3 virtual machines from our BOSH release.

Cloud Platform Data Services Foundry BOSH Components
Cloud Platform Foundry BOSH Components

BOSH handles the job with the following components:

  • BOSH director: The main BOSH component that coordinates the agents and responds to user requests and system events. The director is the orchestrator of deployments.
  • BOSH agent: Runs on every VM deployed by BOSH. It is responsible for all the tasks that happen inside the VM.
  • CPI (Cloud Provider Interface): CPI is the abstraction layer which is in charge of interacting with the IaaS platform. Due to this abstraction, we can switch between different IaaS providers by just changing a few lines of BOSH CPI configuration. It currently supports VirtualBox, OpenStack, Azure, AWS, etc.

When working with BOSH you’ll use the following constructs:

  • Stemcell: A generic VM image that BOSH clones and configures during deployment. A stemcell is a template from which BOSH creates whatever VMs are needed for a wide variety of components and products. We use OpenStack-Cent-OS raw stemcell for our production.
  • BOSH release: A BOSH release is a bundle of everything that is needed to build and run a piece of software on a machine. So it includes all runtimes, shared libraries and scripts that are needed to get the application running on a stemcell.
  • Manifest: This is a YAML file that describes how stem cells and releases will be combined into a deployment. It describes the desired state.
  • BOSH deployments: An encapsulation of software and configuration that describes the desired state of a collection of VMs: what software should run on them, what resources they use, and how these are orchestrated. BOSH also manages persistent disks so that state (for example, database data files) can survive when VMs are re-created. A combination of a deployment manifest, stemcells, and releases is portable across different clouds with minimal changes to the deployment manifest.
  • Cloud-config: The cloud-config is a YAML file that defines IaaS specific configuration used by the director. It allows us to separate IaaS specific configuration into its own file and keep deployment manifests IaaS agnostic. BOSH director uses the configuration in Cloud Config to create deployments. For example, if a deployment manifest references a VM with the name XS, the BOSH director expects to find that definition in Cloud Config. Also, many vendor-specific configurations go into Cloud Config.

When we want to create a VM using BOSH, we have to set up two different resources: the release and the deployment. The release defines the services that run in the VM, how to install the packages and how to start the applications running on it. The release is composed by a set of YAML files and scripts. The deployment resource is a YAML file used for providing the variables for the services we want to execute, which underlying OS (Stemcell), VM and disk type to use (referenced in Cloud Config), and which release to use.

Concourse

In the production environment, we deployed hundreds of instances of Service Brokers on BOSH. Making any manual changes to all of them would be cumbersome. For example, if we want to fix a bug or upgrade to a new version of a software, we would need to create a BOSH release and update each deployment manually which is time-consuming and error-prone. So we need a tool to manage deployment and upgrades.

Concourse CI is a continuous integration tool that is widely used with BOSH and Cloud Foundry. We created a Concourse pipeline for the MongoDB Service Broker in order to make creating the BOSH release a repeatable process.

Cloud Platform Data Services - Concourse Pipeline for the MongoDB Service Broker
Concourse Pipeline for the MongoDB Service Broker

For the production Service Broker VMs, the Concourse pipeline is triggered when a new Git tag is created (with a specific prefix). The pipeline is composed of different tasks. The first one uploads the blobs to S3 Blobstore: the Golang package and the Service Broker / Dashboard application binaries. The second task creates and uploads the new BOSH release to the BOSH Director. These first two tasks update some files of the Service Broker git repository, these files are used to make the BOSH deployment process reproducible. The last step consists of upgrading the existing BOSH deployments to the new BOSH release.

Architecture

The following diagram depicts the overall architecture. The user interacts with the system using the Cloud Foundry Marketplace and Cloud Foundry manages the services in OpenStack using an implementation of the Open Service Broker API specification: the MongoDB Service Broker. It is an application we implemented in Golang and it is used for provisioning MongoDB service instances, but also provides the endpoints used in the dashboard.

The MongoDB Service Broker interacts with the MongoDB Ops Manager using the public REST APIs in order to create and manage the resources with Ops Manager, such as the projects and the deployments. In the OpenStack environment, Ops Manager is already installed, so the Service Broker doesn’t have to manage it. Instead, the Service Broker needs to manage the MongoDB VMs, on which the MongoDB Agents reside (the automation agent which is responsible for configuring, launching, and maintaining MongoDB processes.) and the MongoDB instances. As we have seen previously we can manage the VMs in OpenStack using the BOSH director.

Even if this is a simplified description of the system, we can see that the Service Broker has to interact with many different services in order to be able to expose the MongoDB service to the customer network or the public internet.

The Overall Architecture of our Proposed MongoDB as a Service
The Overall Architecture of our Proposed MongoDB as a Service

The picture below represents the sequence diagram of creating a user deployment (a.k.a. provision):

  1. The user requests a new MongoDB database service from the Cloud Foundry Marketplace.
  2. Cloud Foundry sends a provision request to the MongoDB Service Broker.
  3. The Service Broker handles this request asynchronously and at the beginning creates a new provision task that will be handled by the Task Manager.
  4. The Service Broker returns the task id stored in the PostgreSQL database in response.
  5. The Task Manager using a scheduler picks up a scheduled task and starts the provisioning process.
  6. The provision process creates a project in the MongoDB Ops Manager and an API Key used by the MongoDB Agent to communicate with the Ops Manager.
  7. The Service Broker sends a request to the BOSH Director in order to create the BOSH deployment for the MongoDB Agent and will wait for the process to complete.
  8. The Service Broker sends a request to the Service Gateway in order to create an IP and port for the MongoDB service in order to be exposed outside the PaaS Service Network.
  9. The provision process creates a deployment into the MongoDB Ops Manager related to the specific plan (single instance or replica set).
  10. If the user enabled the backup feature (in step 1), the Service Broker can send also a request to the Ops Manager in order to enable the automatic backups of the MongoDB database.
Service Provision Sequence Diagram
Service Provision Sequence Diagram
Conclusion

In this blog post, we outlined the work which was done for a cloud provider customer to build a managed MongoDB as a service. We’ve talked about the business context and requirements of the customer and also the architecture of our solution which was built on top of the OpenStack, Cloud Foundry Bosh, Concourse and MongoDB Ops Manager technologies. For a deeper understanding of technologies mentioned above, check out these resources: