A 2018 survey of The Newstack suggests that around 46% of IT decision-makers are either using or evaluating options of going serverless. Organizations of all sizes, be it cloud-native startups or large enterprises, are exploring opportunities in this field. Now, if you dig a little it will reveal that companies are pursuing this strategic move to avoid tech hassles, reduce cost, and focus more on bringing their ideas to the market. In this article we will discuss how to pick the right data analytics strategy fir serverless systems.
Most of these companies are banking on AWS Serverless applications. As a user, you may have opted for Lambda functions (the most preferred one) to host the business logic via APIs and AWS Aurora Serverless to store and manage data for the web application. You can use this stored data for both reporting and analytics purposes. On the other hand, you can apply BI to develop new business strategies based on the insights or patterns observed in the data.
Analyzing A Use Case
Let’s consider a coaching platform to improve participants’ current knowledge or skill level. Such programs include various sessions on skills and sub-skills with learners and coaches. After its completion, you can do a survey to find out how much participants have improved. If you use analytics for it, you can assess their strengths and weaknesses too. You can go even further and get feedback on coaches and learning content because it is not one-dimensional. Then, with the collected data, you can plan your strategy
If you use analytics properly, getting real-time insights regarding engagement would not be a difficult task. Suppose the engagement or the feedback is negative, you can immediately launch corrective measures. The entire process has the potential to improve the program’s efficacy.
Things to do before introducing Analytics
Now, if the deployment of Analytics and a serverless DB is your priority, you need to consider a few factors for AWS Aurora Serverless:
-
- Analytics introduces a higher amount of reads and might involve a lot of computation.
- It requires the proper execution of aggregation operations. Frequent or heavy use of analytics can keep your DB busy with heavy reads, causing a bottleneck for traditional applications.
- The serverless DB might scale up based on the percentage of utilization or the maximum connections reached in serverless applications. But the scaleout operation can take up to 2.5 mins, which might affect the user. Lack of speed in the application may become apparent.
Things to do before introducing serverless DB
Points to be considered for serverless DBs:
-
- One of the major blockers of the AWS Aurora Serverless DB is that it cannot create Read replicas.
- AWS Aurora Serverless does not guarantee durability.
- The DB instance for an Aurora Serverless DB cluster is created in a single Availability Zone. automatic Multi-AZ failovertakes longer in the case of Serverless DB.
- There are constraints regarding DB connection pooling if Data API is not getting used. AWS Aurora serverless does not support RDS proxy for DB connection pooling.
Building an efficient data pipeline
Although AWS Aurora Serverless manages scalability, high availability, and maintains the DB at the AWS end, you must be aware of the constraints before building a resilient system.
If there are cases where analytics is required and you have the AWS Aurora Serverless DB, you can use Elasticsearch / Redis / DynamoDB / Redshift as the source to pick the analytics data. Build the data pipelines to update the raw Data or computed information to the secondary storage incrementally.
Another option for data pipelines is to have messaging queues. These queues will listen to events and update the secondary storage post computations accordingly.
You can also improve the speed by having a design pattern with a denormalized DB or domain aggregation or star schema. It can provide near real-time aggregated data for analytics. Data can be aggregated and stored in a normalized table periodically or processed post listing to events in this scenario. You can use this information directly for analytics.
Strategy Comparison
Analytics can introduce heavy reads or higher levels of computations. So, you have to strategize wisely. Please find the comparison between various strategies as mentioned below.
Key Consideration | Strategy 1 | Strategy 2 | Strategy 3 | Strategy 4 |
Name | Aurora Serverless having Denormalized Tables – Using higher configuration machine | Provisioned DB with Read Replicas having Denormalized Tables | Using ElasticSearch with Aurora Serverless MySQL
This will need some data pipeline/queues to keep the data in sync |
Aurora Serverless V2 having Denormalized Tables |
Speed | Fast | Faster | Fastest | Fast (Need to Benchmark) |
Cost | Pay as you use
Aurora Capacity Unit $0.06 per ACU Hour (0.06*24*30)=$44.64 at max |
db.t3.medium – $0.065/Hour = 48.36$ per Month
Storage Rate $0.10 per GB-month I/O Rate $0.20 per 1 million requests |
t3.medium.elasticsearch $0.073/hour
(0.073*24*31) = $ 54.312 per Month |
Pay as you use Aurora Capacity Unit $0.06 per ACU Hour (0.12*24*30)=$86.64 at max |
Durability | No | Yes as Read Replicas can be created | Yes if Replication is done | Yes |
Scaling | Autoscaling | Need to be configured | Need to be configured | Autoscaling |
High Availability | Provided by AWS | Need to be managed | Need to be managed | Provided by AWS |
Maintenance | Provided by AWS | Need to be managed | Need to be managed | Provided by AWS |
Analytics Usage Pattern | Low | High | High | Medium to High |
Latency | Low (Need to benchmark) | Low (Need to benchmark) | Lowest | Low (Need to benchmark) |
API Integration | Yes, Data API can be used | No | Yes | No |
Pros | Scaling and Maintenance will be taken care of by AWS | No issue regarding connection Pooling
Performant |
Fast |
Read Replication Available Granular Scaling available Scaling Up and Down will be faster than Aurora serverless V1 |
Cons | There might be slowness observed during scale-out.
Connection Pooling Issue While scaling it doubles the instance size. |
Higher Cost
Need to manage scaling and handling |
Higher Cost |
New to Market Expensive Does not support AWS RDS to solve the connection pooling issue. |
After going through the strategy comparison chart, you will be able to plan out a proper strategy. Try this process and share your experience with us. Stay safe and happy coding!
References:
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html
https://aws.amazon.com/rds/aurora/serverless/
https://aws.amazon.com/blogs/database/best-practices-for-working-with-amazon-aurora-serverless/