Showing posts with label open source software. Show all posts
Showing posts with label open source software. Show all posts

Using Graph Analysis to Decompose Monoliths into Microservices with Neo4j

Thursday, May 14, 2015

This blog post will take some of my learnings in developing microservices and apply a graph processing technique to simulate the decomposition of service architectures into microservices.

What is a microservice?

Microservices are an extension of SOA principles that are better suited for agile software development. A microservice architecture usually starts from decomposing monolithic applications into services that are cheaper to evolve and easier to throw away. The guiding theme behind this movement is to decentralize change management and reduce conflicts that tend to cause roadblocks in an SOA-based platform.

Using Data to Design Better Technology Platforms

Microservices aren't new. The pattern has been adopted at many software companies.

When companies on an SOA add new features to their platform, there tends to be a fair amount of conflicts between service teams. Certain services in the SOA become more relied upon by other services or applications in the platform.

What I've seen is that services tend towards growth rather than decomposing into smaller units. It's far easier to add features to existing services than to create new services that require operational support. Every new service requires a focus on deployment and configurations. The complexity can be tough to support with rigid processes and a lack of focus on automation.

Jumping head first into microservices is a major commitment. A monolith will have highly centralized components that will gain more mass as new microservices are born, adding additional complexity with each service call to replace modules or add functionality. It's important to analyze these connections to understand which services in an SOA are becoming more depended on.

Measuring Service Centrality

My time spent using graphs to analyze data has given me a great tool for understanding how to use data to drive decisions on decomposing an SOA. The first metric that I will use is network centrality. This metric measures how centralized a service is within a network of dependencies.

The whole idea here is to determine what components in a service are good candidates for a microservice. This can be determined by finding a component that will be the highest contributor to decreasing the overall centrality of a service, once removed.

The graph metric for centrality is a great starting point to analyze how services are gaining mass and how best to decompose services.

Decomposition Strategy

The decomposition strategy that I would like to demonstrate is based on RESTful web services that manage a set of resources.

Each service will expose a set of REST API methods to interact with the resources of the domain. The graph data model that will be used to calculate centrality will be represented by relationships of service to service interactions.

Graphs are a great way to model the resources of a domain and their interactions. Below I've sketched out a domain model for an eCommerce website based on an example by Chris Richardson.

Store front domain resources

This domain model has a set of resources which are represented by their label. Those resources are:

  • Customer
  • Order
  • Account
  • Address
  • Product
  • Warehouse
  • Credit Card

In a monolithic architecture all of our services will be contained in a single project, for example a WAR, with modules representing each service.

From Chris's example, we have the following deployment model:

Deployment model

From this example deployment model I've mapped the calls from each module to resources in the domain. That ends up looking like this:

Service to resource mappings

As systems scale and dependencies grow, they become harder for us to understand. However, these mappings can be tremendously valuable to understand which service is best suited to first become a microservice.

Mapping Stories to Release Artifacts

Conway's law states that organizations are constrained to produce systems that mirror their communication structures. In order to make the jump to microservices we need to scale teams horizontally and not vertically. To do this well, we need to figure out how to split applications into independently releasable containers. One principle metric to be aware of is the number of business stories that are affected per release. Each of these stories have a certain level of functionality that drives revenue for the business. This can help determine which features are more valuable in terms of revenue than others.

Let's take for example the following story.

As a user, I want to be able to browse the product catalog so that I can find products I want to buy.

If the product catalog becomes unavailable to users of the website, there will be an impact to revenue. This shows that not all user stories have the same business value.

Ideally we want to find ways to empower single teams to be accountable for single stories. This way, if there is an outage that affects a story, teams will have more autonomy to bring that functionality back online.

Dependency Graph

Below you will find an example graph data model of the service dependencies shared between containers, services, resources, and user stories that describe product features.

Service Dependency Model

In order to generate a rich dataset to analyze, I chose to use the concept of a user story as an added dimension to the dependency graph. User stories do well to group together a set of features. These features act as a good boundary criteria for determining how to make components more modular from a business value perspective.

The relationships between concepts in this dependency graph are driven by the following rules:

  • User stories depend on domain resources
  • Domain resources are owned by services
  • Services are managed by teams
  • A service belongs to a deployment container

Interactive Neo4j GraphGist Example

I've put together a step by step walkthrough of how you can use Neo4j to do graph analysis to functionally decompose a monolithic application into microservices.

In the coming months I will be focusing a lot on this topic with demos that revolve around how to build great microservice architectures using Spring boot.

Understanding How Neo4j Cypher Queries are Evaluated

Wednesday, July 9, 2014

There are many ways to store and manage data within a Neo4j graph database. When Neo4j 2.0 launched late last year we had an entirely new browser experience for interacting with graphs. The graph visualization from the return results of Cypher queries were at the core of the user experience enhancements to the platform.

Building a Neo4j Reporting Service Part II

Wednesday, April 30, 2014


It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.

Sir Arthur Conan Doyle, Author of Sherlock Holmes stories
A Subgraph From Neo4j's Browser
Just as Sir Arthur Conan Doyle's character, Sherlock Holmes, manically collects facts and evidence to prove theories, we find ourselves doing much of the same today except on a much larger scale web scale. The web is an ever growing expanse of facts and evidence. It is at our disposal to observe without much of a challenge, but to store it and retrieve it in a way that answers the big questions, that's challenging.

Continuing on from Building a Graph-based Reporting Platform: Part I, I posed some questions related to understanding how to build great community experiences around Neo4j using Meetup.com for local events. I presented an idea to use Neo4j to build a platform that could help us understand the demand for presenting compelling content at events.

Compelling content is at the core of great community experiences. That content fuels the conversations between people, ideas begin to flow, and innovation is born.

My idea was to build an open-source platform that would poll public APIs, translate collected data into a graph, and store it in a graph database to be analyzed, queried, and visualized over time. The first component of this architecture is the Data Import Scheduler, which this post describes in detail.

Polling Data From Public APIs

Let's start out by answering a basic question.

What does the data import scheduler do?
The analytics data import scheduler is a Node.js process that can be hosted for free on Heroku and is responsible for collecting time-based statistics from a public API. In this case, the Meetup.com REST API exposes a set of methods that provide a momentary snapshot into the number of members that a group has at the time of the request. The data import scheduler polls this endpoint once a day to retrieve Meetup group statistics to later be used for time-based analysis from our graph database, Neo4j.

As illustrated in the diagram below, the Node.js application wakes up once a day and checks in with the Meetup.com REST API.


The scheduler process polls Meetup.com's REST API daily. An HTTP GET request is dispatched for each city we're tracking, returning a JSON formatted response for groups in those cities. The JSON data for each group is then translated into a subgraph, formatted as Neo4j's Cypher query language. The Cypher query is then sent as a transaction to Neo4j and updates a snapshot of the group's stats for that day.

Importing a Meetup Group's Subgraph

The image below is a visualization of a Meetup group's subgraph, translated from JSON data polled on an arbitrary date.

Graph Database - San Francisco on 4/28/2014

We see that the group has a set of topic nodes, which may already exist within the database. The subgraph must be merged into the larger graph without duplicating any nodes. Using Cypher's MERGE clause we can get or create nodes, which is useful for expanding our graph's connected data. Each topic will collect more groups as new subgraphs are merged for daily imports. The same is also true for both day and location nodes.

After a few days of scheduled imports, a group's subgraph begins to take shape. As day nodes are connected to the previous day's node, membership statistics are connected.

Neo4j Data Import Model
A Meetup Group Statistics Subgraph, 4/23 to 4/28

The data import scheduler application is open-source and available on GitHub. Also, full documentation is available to help you get started with customizing your own graph-based reporting platform.

All analysis on the temporal stats collected from the data import scheduler is performed within the REST API module of the reporting platform. It also safely exposes the graph database to a front-end web dashboard, consumed from client-side JavaScript. The REST API uses Swagger, which is a specification and complete framework for describing, producing, consuming, and visualizing RESTful web services.

Building a Neo4j Reporting Service Part I

Thursday, April 24, 2014

Data science is pretty hot right now. The obvious reason is that data is rapidly expanding in complexity and size. There is an opportunity to be had in building systems that can capture this data, classify it in multiple dimensions, and to scale it up to the demands of analysts looking to convert data into valuable reports.


As a developer evangelist for Neo4j, I am frequently out in the community talking about things I build using our database. We use Meetup.com to schedule and promote our community events all over the world.

If you're unfamiliar with Meetup.com, here is a description from their Wikipedia entry:

"Meetup is an online social networking portal that facilitates offline group meetings in various localities around the world. Meetup allows members to find and join groups unified by a common interest, such as politics, books, games, movies, health, pets, careers or hobbies. Users enter their postal code or their city and the topic they want to meet about, and the website helps them arrange a place and time to meet. Topic listings are also available for users who only enter a location."

At Neo4j, we're obsessed with data, especially connected data. We believe in our product because we use it to solve our own problems every day. With something like Meetup.com, we found ourselves guessing about many of the aspects of our community and how we could do a better job creating a great community experience.

Some of those questions were:
  • How many people will show up to an event from the attendee list?
  • What kind of content are people interested in hearing about?
  • What's the best location to host our meetups to boost attendance?

I wanted to use Neo4j to do reporting. I decided to put together a platform to track some of this information and build some reports to visualize the data we collected. I started by breaking down the problem into a set of stories to be implemented as a report.

Problem

  • Track meetup group growth over time
  • Apply tags to meetup groups and report combined growth of all groups over time

Questions

  • Given a start date and an end date, what is the time series that plots the membership growth of a given meetup group?
  • Given a start date, an end date, and a combination of tags, what is the time series that plots the combined membership growth of all meetup groups with those tags?
  • How do you generate the JSON data of a time series for a basic JS line chart plugin?

I decided to start with a GraphGist, which is an open source project we built to enable our community to put together a quick proof of concept using our database.

Neo4j for Graph Analytics: Meetup.com Example

I designed an example graph data model, which I then translated into Neo4j's Cypher query language to create an example dataset.



Now it was time to scale it up to a full platform. I decided to use Node.js.


There would be three Node.js driven components. One console application for importing data on a schedule and two web applications; a dashboard for displaying reports and REST API to communicate with the Neo4j graph database.


With an architecture in place, I went forward with building out each of the modules.

In my next blog post I will go through the details of building the import scheduler, which polls the Meetup.com API each day and imports the graph data model into Neo4j.

Feel free to take a look at the finished documentation which details the creation of each of the Node.js modules:

Graph-based Reporting Documentation

Also, I put a slide deck together: