⚠️ Note: The information below is for reference purposes only. Please do not copy verbatim for your report, including this warning.
Data lakes can help hospitals and healthcare facilities turn data into business insights, maintain business continuity, and protect patient privacy. A data lake is a centralized, managed, and secure repository to store all your data, both in its raw and processed forms for analysis. Data lakes allow you to break down data silos and combine different types of analytics to gain insights and make better business decisions.
This blog post is part of a larger series on getting started with setting up a healthcare data lake. In my final post of the series, “Getting Started with Healthcare Data Lakes: Diving into Amazon Cognito”, I focused on the specifics of using Amazon Cognito and Attribute Based Access Control (ABAC) to authenticate and authorize users in the healthcare data lake solution. In this blog, I detail how the solution evolved at a foundational level, including the design decisions I made and the additional features used. You can access the code samples for the solution in this Git repo for reference.
The main change since the last presentation of the overall architecture is the decomposition of a single service into a set of smaller services to improve maintainability and flexibility. Integrating a large volume of diverse healthcare data often requires specialized connectors for each format; by keeping them encapsulated separately as microservices, we can add, remove, and modify each connector without affecting the others. The microservices are loosely coupled via publish/subscribe messaging centered in what I call the “pub/sub hub.”
This solution represents what I would consider another reasonable sprint iteration from my last post. The scope is still limited to the ingestion and basic parsing of HL7v2 messages formatted in Encoding Rules 7 (ER7) through a REST interface.
The solution architecture is now as follows:
Figure 1. Overall architecture; colored boxes represent distinct services.
While the term microservices has some inherent ambiguity, certain traits are common:
When determining where to draw boundaries between microservices, consider:
Communication scope | Technologies / patterns to consider |
---|---|
Within a single microservice | Amazon Simple Queue Service (Amazon SQS), AWS Step Functions |
Between microservices in a single service | AWS CloudFormation cross-stack references, Amazon Simple Notification Service (Amazon SNS) |
Between services | Amazon EventBridge, AWS Cloud Map, Amazon API Gateway |
Using a hub-and-spoke architecture (or message broker) works well with a small number of tightly related microservices.
Drawback: coordination and monitoring are needed to avoid microservices processing the wrong message.
Provides foundational data and communication layer, including:
Only allow indirect write access to the data lake through a Lambda function → ensures consistency.
Example outputs in the core microservice:
Outputs:
Bucket:
Value: !Ref Bucket
Export:
Name: !Sub ${AWS::StackName}-Bucket
ArtifactBucket:
Value: !Ref ArtifactBucket
Export:
Name: !Sub ${AWS::StackName}-ArtifactBucket
Topic:
Value: !Ref Topic
Export:
Name: !Sub ${AWS::StackName}-Topic
Catalog:
Value: !Ref Catalog
Export:
Name: !Sub ${AWS::StackName}-Catalog
CatalogArn:
Value: !GetAtt Catalog.Arn
Export:
Name: !Sub ${AWS::StackName}-CatalogArn