Building Microservices with Polyglot Persistence Using Spring Cloud and Docker

Tuesday, August 25, 2015

This series continues from the last blog post about building microservices using Spring Cloud. This post has two parts. The first part describes how to create cloud-native data services using Spring Boot. The second part is a companion example project that uses Docker Compose to run multiple microservices locally to simulate a polyglot persistence setup.

What is polyglot persistence?

Polyglot persistence is a term that describes an architecture that uses a collection of different database solutions as a part of a platform’s core design. More plainly, each backing service is managed from an exclusive connection to a Spring Boot service that exposes domain data as HTTP resources.

The central idea behind polyglot persistence is that service architectures should be able to utilize the best languages for the job at hand. There is no clear definition of how to do this well, and it tends to evolve organically as central databases become cumbersome when required to add new features.

Spring Boot Roles

When designing microservices that manage exclusive access to multiple data providers, it can be useful to think about the roles in which your microservices will play.

We can think of a Spring Boot application as the basic building block for our microservice architecture.

Microservice Roles
Figure 1. Each Spring Boot application plays a role when integrating with other services

The diagram above describes six Spring Boot applications that are color coded to describe the role they play when integrated using Spring Cloud.

Data Services

Each Spring Boot application in a microservices architecture will play a role to varying degrees of importance. The data service role is one of the most important roles in any setup. This role handles exposing the application’s domain data to other microservices in the platform.

Polyglot Data Services

The diagram below describes an example microservice architecture with multiple Spring Boot applications that expose data from multiple database providers.

Polyglot Persistence Microservices
Figure 2. Example Polyglot Persistence Architecture

Building Microservices with Spring Cloud and Docker

Sunday, July 12, 2015

This blog series will introduce you to some of the foundational concepts of building a microservice-based platform using Spring Cloud and Docker.

What is Spring Cloud?

Spring Cloud is a collection of tools from Pivotal that provides solutions to some of the commonly encountered patterns when building distributed systems. If you’re familiar with building applications with Spring Framework, Spring Cloud builds upon some of its common building blocks.

Among the solutions provided by Spring Cloud, you will find tools for the following problems:

Spring Boot

The great part about working with Spring Cloud is that it builds on the concepts of Spring Boot.

For those of you who are new to Spring Boot, the name of the project means exactly what it says. You get all of the best things of the Spring Framework and ecosystem of projects, tuned to perfection, with minimal configuration, all ready for production.

Service Discovery and Intelligent Routing

Each service has a dedicated purpose in a microservices architecture. When building a microservices architecture on Spring Cloud, there are a few primary concerns to deal with first. The first two microservices you will want to create are the Configuration Service, and the Discovery Service.

Microservice Configuration

The graphic above illustrates a 4-microservice setup, with the connections between them indicating a dependency.

The configuration service sits at the top, in yellow, and is depended on by the other microservices. The discovery service sits at the bottom, in blue, and also is depended upon by the other microservices.

In green, we have two microservices that deal with a part of the domain of the example application I will use throughout this blog series: movies and recommendations.

Configuration Service

The configuration service is a vital component of any microservices architecture. Based on the twelve-factor app methodology, configurations for your microservice applications should be stored in the environment and not in the project.

The configuration service is essential because it handles the configurations for all of the services through a simple point-to-point service call to retrieve those configurations. The advantages of this are multi-purpose.

Let's assume that we have multiple deployment environments. If we have a staging environment and a production environment, configurations for those environments will be different. A configuration service might have a dedicated Git repository for the configurations of that environment. None of the other environments will be able to access this configuration, it is available only to the configuration service running in that environment.

Microservice Configuration 2

Using Graph Analysis to Decompose Monoliths into Microservices with Neo4j

Thursday, May 14, 2015

This blog post will take some of my learnings in developing microservices and apply a graph processing technique to simulate the decomposition of service architectures into microservices.

What is a microservice?

Microservices are an extension of SOA principles that are better suited for agile software development. A microservice architecture usually starts from decomposing monolithic applications into services that are cheaper to evolve and easier to throw away. The guiding theme behind this movement is to decentralize change management and reduce conflicts that tend to cause roadblocks in an SOA-based platform.

Using Data to Design Better Technology Platforms

Microservices aren't new. The pattern has been adopted at many software companies.

When companies on an SOA add new features to their platform, there tends to be a fair amount of conflicts between service teams. Certain services in the SOA become more relied upon by other services or applications in the platform.

What I've seen is that services tend towards growth rather than decomposing into smaller units. It's far easier to add features to existing services than to create new services that require operational support. Every new service requires a focus on deployment and configurations. The complexity can be tough to support with rigid processes and a lack of focus on automation.

Jumping head first into microservices is a major commitment. A monolith will have highly centralized components that will gain more mass as new microservices are born, adding additional complexity with each service call to replace modules or add functionality. It's important to analyze these connections to understand which services in an SOA are becoming more depended on.

Measuring Service Centrality

My time spent using graphs to analyze data has given me a great tool for understanding how to use data to drive decisions on decomposing an SOA. The first metric that I will use is network centrality. This metric measures how centralized a service is within a network of dependencies.

The whole idea here is to determine what components in a service are good candidates for a microservice. This can be determined by finding a component that will be the highest contributor to decreasing the overall centrality of a service, once removed.

The graph metric for centrality is a great starting point to analyze how services are gaining mass and how best to decompose services.

Decomposition Strategy

The decomposition strategy that I would like to demonstrate is based on RESTful web services that manage a set of resources.

Each service will expose a set of REST API methods to interact with the resources of the domain. The graph data model that will be used to calculate centrality will be represented by relationships of service to service interactions.

Graphs are a great way to model the resources of a domain and their interactions. Below I've sketched out a domain model for an eCommerce website based on an example by Chris Richardson.

Store front domain resources

This domain model has a set of resources which are represented by their label. Those resources are:

  • Customer
  • Order
  • Account
  • Address
  • Product
  • Warehouse
  • Credit Card

In a monolithic architecture all of our services will be contained in a single project, for example a WAR, with modules representing each service.

From Chris's example, we have the following deployment model:

Deployment model

From this example deployment model I've mapped the calls from each module to resources in the domain. That ends up looking like this:

Service to resource mappings

As systems scale and dependencies grow, they become harder for us to understand. However, these mappings can be tremendously valuable to understand which service is best suited to first become a microservice.

Mapping Stories to Release Artifacts

Conway's law states that organizations are constrained to produce systems that mirror their communication structures. In order to make the jump to microservices we need to scale teams horizontally and not vertically. To do this well, we need to figure out how to split applications into independently releasable containers. One principle metric to be aware of is the number of business stories that are affected per release. Each of these stories have a certain level of functionality that drives revenue for the business. This can help determine which features are more valuable in terms of revenue than others.

Let's take for example the following story.

As a user, I want to be able to browse the product catalog so that I can find products I want to buy.

If the product catalog becomes unavailable to users of the website, there will be an impact to revenue. This shows that not all user stories have the same business value.

Ideally we want to find ways to empower single teams to be accountable for single stories. This way, if there is an outage that affects a story, teams will have more autonomy to bring that functionality back online.

Dependency Graph

Below you will find an example graph data model of the service dependencies shared between containers, services, resources, and user stories that describe product features.

Service Dependency Model

In order to generate a rich dataset to analyze, I chose to use the concept of a user story as an added dimension to the dependency graph. User stories do well to group together a set of features. These features act as a good boundary criteria for determining how to make components more modular from a business value perspective.

The relationships between concepts in this dependency graph are driven by the following rules:

  • User stories depend on domain resources
  • Domain resources are owned by services
  • Services are managed by teams
  • A service belongs to a deployment container

Interactive Neo4j GraphGist Example

I've put together a step by step walkthrough of how you can use Neo4j to do graph analysis to functionally decompose a monolithic application into microservices.

In the coming months I will be focusing a lot on this topic with demos that revolve around how to build great microservice architectures using Spring boot.

Getting Started with Apache Spark and Neo4j Using Docker Compose

Tuesday, March 10, 2015

I've received a lot of interest in Neo4j Mazerunner since first announcing it a few months ago. People from around the world have reached out to me and are excited about the possibilities of using Apache Spark and Neo4j together. From authors who are writing new books about big data to PhD researchers who need it to solve the world's most challenging problems.

I'm glad to see such a wide range of needs for a simple integration like this. Spark and Neo4j are two great open source projects that are focusing on doing one thing very well. Integrating both products together makes for an awesome result.

Less is always more, simpler is always better.

Both Apache Spark and Neo4j are two tremendously useful tools. I've seen how both of these two tools give their users a way to transform problems that start out both large and complex into problems that become simpler and easier to solve. That's what the companies behind these platforms are getting at. They are two sides of the same coin.

One tool solves for scaling the size, complexity, and retrieval of data, while the other is solving for the complexity of processing the enormity of data by distributed computation at scale. Both of these products are achieving this without sacrificing ease of use.

Inspired by this, I've been working to make the integration in Neo4j Mazerunner easier to install and deploy. I believe I've taken a step forward in this and I'm excited to announce it in this blog post.

Categorical PageRank Using Neo4j and Apache Spark

Monday, January 19, 2015

PageRank is an important concept in computer science and modern technology. It is important because it is the underlying algorithm that mostly dictates what more than 3 billion users who use the internet experience as they browse the world wide web.

How does PageRank work?

The first PageRank algorithm was developed by Larry Page and Sergey Brinn at Stanford in 1996. Sergey Brinn had the idea that pages on the world wide web could be ordered and ranked by analyzing the number of links that point to each page. This idea was the foundation of the imminent rise of Google as the world's most popular search engine, with now over 3.5 billion searches made by its users every day.

PageRank gives us a measure of popularity in an ever connected world of information. With an enormous degree of complexity increasing every day in the virtual space of information sharing, PageRank gives us a way to understand what is important to us as users.

The unfortunate bit of this is that PageRank itself is mostly unapproachable to anything but seasoned engineers and esteemed academics. That's why I want to make it easier for every developer around the world to make this algorithm the foundation of their innovative desires.

Distributing PageRank Jobs

It should be no surprise to regular readers of this blog that I am all about the graph. Graphs are the best abstraction of data that we have today. The concept is brilliantly easy and intuitive. Nodes represent data points and are described by meta data. Relationships connect nodes together, also described by meta data, and they enrich the information of each node relative to one another.

Neo4j Mazerunner Project

As I have been building the open source project Neo4j Mazerunner to use Apache Spark GraphX and Neo4j for big scale graph analysis, I've come to understand the need for breaking down PageRank into categories. Something I call 'Categorical PageRank'.