An Interesting Case about Micro-Service Dependency

Published Feb 9, 2020

Micro-service dependency

Help! Database is Drowning!

Near the end of last year, there were a few incidents of significant database slowdown that affected our whole backend system, and many of our clients were impacted. This was hardly surprising when we found that several of our backend data services scanned through hundreds of millions of records in our Postgres databases every couple of hours in order to compute metrics like hourly revenue and occupancy. Some queries included hundred-line-long customized SQL filters and breakdown clauses. The reason that they needed to go through so many records was that we stored all our clients’ transaction data in the same database tables.

This triggered our heroic journey to separate each client’s data into its own database. Well, maybe not so heroic since so many people have done similar things before… Nonetheless, this is a post about a story when we took our first steps.

How They All Need Each Other

The first consideration was that many of our backend services, such as ETL, data API, and machine learning tasks, need to know which databases they can read from and write to after we separate data by client. Specifically, we had three key concepts that we needed to sort out right away:

  • Location: Such as a garage, an open lot, an on-street zone, etc.
  • ETL job: Each job pulls data from one data source (kiosks, sensors, mobile payments, etc.).
  • Account: Just another word for client, i.e., one account for one client.

Each location might need data from multiple ETL jobs, and each job could include data for multiple locations. The good news is that each location only needs data from sources belonging to one account, and each job only supplies data for a single account. OK, I just successfully confused you, and to make up for it, here are an additional thousand words:

Location-Job-Account Relationships

The structure is actually quite interesting, because the connections between locations and jobs do not cross account boundaries. This was an important motivation for us to separate data by account, instead of other parameters such as geo-location or timezone.

Since we needed to put data for each account into its home database, the first step was to setup a new micro-service that locations and jobs could query to get database connections. Easy, right?

A Tale of Two Designs

Easy indeed, except Steve and I came up with two different designs:

  • The Maokai-all-in-one design: Create an account table, an account_location table, and an account_job table, all managed by the account micro-service:

    Maokai's Design

    In my design, anyone can ask the account service: here is my location ID, give me the database connection. The account service then searches the account_location table for the given location_id, and uses its corresponding account_id to get the connection (the conn column in the account table). Similarly, one can ask: here is my job ID, give me the database connection. Perfectly straightforward! And you can leave the location and job services untouched!

  • The Steve-not-so-quick-think-it-through design: Just an account table managed by the account service. Add an account_id column to the location table that’s managed by the location micro-service, and an account_id column to the job table that’s managed by the job micro-service:

    Steve's Design

    In Steve’s design, when asking for a database connection, caller must provide an account ID. The account service has no idea about location or job.

Each design seems viable. However, after some discussion (aka Maokai trying to convince Steve his design is better), Steve pointed out the concept dependency in my design is circular. Could you see why?

In my design, the account service needs to know the concept of location and job — because it has to manage the location_account and job_account tables, while the location and job services also need to know the concept of account — because they need to ping the account service to get database connections.

In Steve’s design, the location service needs to know about account, and the job service also needs to know about account, but the account service does not need to know location or job. This way we untangle the circular concept dependency, and get a clean dependency graph.

Steve’s design is also more extensible: if we need to associate each user to one account in the future, we can add an account_id column to the user table, instead of modifying the account service.

The Verdict

This is an interesting yet common case when thinking about how to divide the boundary of micro-services, and how not to just go with any viable solution, but to think through the relationships between micro-services, and to make their dependencies right.

Sadly, in this post Steve scored one, Maokai zero… Rest assured, I am not gonna let that last for long. It’s my blog after all. So stay tuned for the next episode when Maokai makes it even!

  • backend
  • micro-service
  • system design