AWS re:Invent 2023 - Scaling on AWS for the first 10 million users (ARC206)

welcome to the presentation I I want to start by saying no one Builds an application and thinks to themselves wow I hope no one uses it I hope I have no users on my platform and I hope it's just me no people build applications with the intention to be useful you have customer signups you have internal processes that you're improving on you have external processes uh you you build it with an intention and you build it to scale so at AWS and at Amazon we take scale very seriously here and so today

we're going to be walking you through the best practices giving you the tools and the resources so your applications can be future proof and wherever you are in your scale Journey whether you are a startup founder building the next big generative AI X scaling to 10,000 users overnight or you're an Enterprise institution and you're building Mission critical applications that if your app goes down people will notice we're here to set you up for success my name is Sky Hart and I am a manager for S

olutions architecture at Amazon web services I will be joined halfway through the presentation by my esteemed colleague Chris muns for an interpretive dance on scalability I'm kidding he's gonna he's gonna beat me up after that for that but uh Chris muns is a very tenured Amazonian he's been around for over 12 years and he is the startup lead and Tech advisor for our entire organization so let's begin Let's Pretend We're not sitting in a GI Ballroom but you and I are sitting in a conference room

together and our developer team comes in and says I have an idea for a new application okay as a leader you might be asking yourselves a couple of questions three questions one where do I start Amazon has over 200 services that are available to you so how do I choose the right patterns for my my architecture how do I get started what is the return on investment if I build this what if I don't build this and asking yourself these business questions and working backwards from your needs and requi

rements the second question is okay great I have this application how do I build it for scale let's talk about everyone's favorite topic risk mitigation no I'm kidding but how do I make sure it's resilient enough that it can stand the test of time and when I scale up to 100,000 users things aren't going to crash which leads me to my last and final and most important thing at the center of it all is who are our users and how do we make sure they're happy how many of you have opened up an applicat

ion and it crashes and you never go back on yeah then you're like latency I don't want to use this anymore so how do we make sure that our systems work and they're they're always up and that our users are happy and they leave us that festar review at the end of the day throughout this presentation we're going to be talking about this concept of an app you can define an app in a lot of different ways for the purpose of this presentation we Define an app as the full stack that includes the front e

nd the back end and the data storage we have our user interface layer when you log on an application you can see the front end we might have our business logic layer in our compute in our engine behind the scenes and then of course our data storage how do we leverage the existing data to go back into our business another groundwork I want to lay before we get into it is acknowledging how much our world is Shifting one is developers experience these these modern Frameworks that they're utilizing

we need to be accommodating we need to change with the times the second one that I want to mention too is the move to serverless Technologies we're going to keep saying that throughout but it's taking all of that heavy lifting and that management of underlying infrastructure onto AWS so you can spend more time innovating and the last thing I want to mention is rapid scale what do we mean by that it means that we have more data than we've ever seen before and with that we become this expectation

and how to harness the power of this data in close to real time these Nano applications on a global scale I'm G to give you a quote that me really famous really Infamous quote no architecture is designed for high scalability on day one but we'll certainly try I like to use this analogy in in terms of like pretend I'm a Contractor Building a building right I'm not handy so don't quote me on any of that but the most important thing is going to be laying the foundation before you can put the roof o

n before you can put the windows on you make sure that your foundation is going to be able to hold the test of time in terms of weather conditions you know different systems and things like that it needs to be ready another mention and mental model that we like to use is this virtuous cycle of building so you might see this in MVP or prototyping these processes right pretend the users in the middle there you start building you get your development teams the developers come and they build this ap

plication that's great but we need to understand what we're building we need to use tools that are monitoring and observability to be able to see what's happening proactively it's my app does it have good latency what about the compute what about the utilization is it cost effective all of these questions we need to be measuring to be proactive and get in front of it then we learn from this we we learn from this feedback loop from our users from our customers and and things like that and we buil

d it again so wherever you are in you're scalling Journey remember that there there's a constant iteration and process Improvement to make your application more efficient and more cost- effective start from day one we're in the conference room right we're about to build this application we onboard users you know above One beta testers let's get them utilizing this application now a a very modern way of approaching this is we're going to see this throughout is you split your application from the

front end and the back end it's called coupling you might have heard this buzzword and we're going to kind of dive in a little bit deeper in a second the reason is is we have this architecture in terms of traditional frontend hosting so you're going to see a lot of times that it really common people might have ec2 on a single host they might attach an autoscaling group to that elb then be sending it to a CDN and and this is a very common pattern but there's some limitations and some pain points

that you might be experiencing one is what are you managing you have to manage a lot of the underlying infrastructure a lot of the patches and the bug fixes and things like that it's not a manage service right and there there's just some limitations in terms of having everything on one single host you're fover your redundancy points if everything is coupled together there's nowhere you can go when when disaster occurs so we move towards this modern front end and this way of thinking which is spl

it spting this and and the benefits here to this modern front end is I I said in the beginning this developer expectation uh it in moving towards these modern Frameworks and and buil-in scale and and performance and all these bells and whistles and I'm going to go into amplify which is a really powerful technology here too in a second so amplify hosting is one of the simplest Technologies you can use in our stack I I used it with a bunch of customers power when I was in sa and how it works is yo

u simply connect up your repository so whether you're most familiar with GitHub things like that you configure the build settings which is step two so that's users governance you know all your configurations and how you'd like it and then you deploy your app and it's really really that easy and it's serverless you don't have to manage that underlying infrastructure and the back end and how all of our AWS Services work is we take feedback from developers say what do you see really commonly what a

re some tools and things that need and with that we actually came up with these amplify hosting features so Atomic deployments feature Branch deployments all of these are built in and remember when I said earlier in the presentation that that this ever growing scale we want these applications to be globally available so I don't want to be configuring E2 instances across a bunch of different azs instead we have a front end that's globally available and it's built on this idea of of a CDN so it's

built on cloudfront for those of you are not familar with the different frontend Frameworks there's three that amplify supports client side rendering that's essentially like I the best way I can describe it is you kind of have a a container that runs in the client browser the second way of Hosting that's really common is the server side rendering so that's going to be the opposite it's actually put the load back on the servers that's popular for next and gasby and that kind of thing and then the

third framework that amify supports is a static site generator you might see this as like people host static websites on S3 and and some people still do that I talk to but a lot of customers prefer to do it on amplify just because it has the full package and of of ease of deployment there so you might say sky what about the back end you know we have the front end what about the back end and how do I select comput there are three buckets that I want to go over for computer engines and I'm going

to dive into a little bit deeper here people know Amazon ec2 was a service that was created in 2006 you're managing a lot you have the most control over your configurations and then containers became really really popular and we rolled out a few different options here know ECS eks for those kubernetes fans out there and then AWS fargate which is our serverless container option and the third bucket of course everyone's best friend AWS Lambda serverless Technologies for compute there might say tha

t's a lot what do I do where do I start how do I evaluate compute options similar to that slide I said before is a lot of times people just say I want to run ec2 I want to do this on a single host but the limitations there is there's no failover there's no redundancy and if you have this architecture right now that's okay um so it's something we see really commonly but it's kind of like putting all of your eggs in one basket and so if if something fails the whole system goes down your applicatio

n goes down users aren't happy we can go absolutely far with this but there's a different pattern that we want to consider and you might see that this is in terms of both backend and then also data teering where people host their self-managed you know their databases on ec2 and it's kind of like an anti- pattern that we see so we want to leverage as much managed Services as we possibly can for just getting our application off the ground another way to look at this is this Matrix and this is a re

ally good slide I'm going to pause here so you can take a picture but I'm going to talk you through this a little bit so I I want to start on the chart on the left you'll see the more opinionated the less opinionated I like hyperbolizing these Services right because the less opinionated is going to be Amazon ac2 and what we really mean by that is you have the most control in terms of configuration but with great power comes great responsibility so then you have to configure control debug all of

these things and then make sure that your comput are right size right so that's another Factor too is is from a cost perspective you want to make sure that you're not scaling out to Oblivion and then you're you're only utilizing what you really need on the opposite side of the spectrum is AWS Lambda and you'll see the customer manages side it's really just application code which is very easy so kind of when you're launching it you want to think of this in a spectrum and really what's purpose bui

lt for both your workload and then your experience and comfort level after you select your compute options you're going to say how do I expose my application to the Internet so there's three services I'm going to cover briefly for you API Gateway is our our service it's very purpose built I think for rest apis the second one to note is application load balancer really acts is kind of like a layer seven proxy uh it doesn't have all the bells and whistles of API Gateway but it's a really powerful

service in a lot of different ways and then this third one is AWS Apps sync for those of you familiar with graphql it's really popular for our customers hosting that I have a cheat sheet for you picking an API front end there's a lot of overlap here and if you have any questions like find Chris and I in the hall afterwards because uh but here's a good cheat sheet for you to go off of uh in terms of websockets you know how how many requests are you getting and and considerations there and then af

ter I go through this cheat sheet I'm actually going to rem recommend something else and I'll tell you why in a second I see photos still up app Runner app Runner is the best employee you will ever have and I say this because it provides the entire full stack of what you would need to to expose the these apis so you'll see that it's purpose built with fargate auto scaling elb ECR built into this service and I work with startups a lot and and the fastest way to deploy an application is utilizing

this app Runner you don't have to to configure all the different components and things like that it'll get you off the ground and running really really quickly let's pull it all together uh and I didn't go through the database yet but we have amplify hosting and app runner at this point I am managing none of the underlying infrastructure which makes me happy I get to spend more time innovating and focusing on the things that I really care about there so next question to no to no SQL or not to no

SQL say that five times fast this is a question I get all the time I'm also a data engineer by trade uh and there's a this is a controversial topic and and for this instance I'm going to actually recommend you start with SQL databases and I'm going to tell you why why why start with SQL it's a very popular ecosystem where you can get a lot of support from other individuals uh postgress and MySQL and and the things that you're familiar with also a lot of applications actually follow this relatio

nal data structure um so you might find me in the hall you said wait Sky I have massive amounts of data that's me you're building a time series application you're building a trading application you're building something with massive scale and paby in a couple weeks okay fine you might need know SQL or some other purpose-built database what I mean by purpose-build databases is we have a suite of databases and I recommend checking out one of those tracks in terms of Time series graph you kind of n

ame it you know document DB we have it uh other use cases for no SQL if you really don't have those those relational data sets or that sort of structure and schema again if you have those use cases that have pedabytes of scale really quickly massive amounts that then you want to consider starting with a no SQL database but I'm guessing this isn't most of you in the room so we're going to start with SQL databases specifically Amazon Aurora and the call out here is like I said before is postgress

and MySQL if you're hosting those on ec2 I really recommend you try out Aurora why because it automatically scales there's a lot of different performance availability durability and it does the work for you it does that heavy lifting AWS manages all of that it's very easy to spin up in a matter of minutes I'm an old database guy and I came from that world and Aurora makes it really easy we're constantly at AWS coming out with more and more iterations for their services so you'll see Amazon Auror

a serverless V2 uh the the really important points to mention here is that decoupling of the compute and storage layer and why that's important when you're scaling is for cost really because what this does for you is it scales out to Peak workloads so like let's say you have a huge black Friday scale scale it will scale out to meet those Peak demands and then it'll scal back in by itself so you're only paying for what you need and what your users need which is very compelling so I'm going to bri

ng this all back together and at this point we're only using serverless we're amplify hosting app runner in Aurora serverless V 2 uh I again I I constantly think this TCO this cost of ownership and and being able to leverage these managed services to scale at those Peak demands uh High availability if you're building a Global application and it needs to a lot of these features are already built into the surfaces as soon as you stand them up wow we did it with that we hit a hundred users so I'm p

robably having a mini party I'm really excited I'm I'm getting the energy up and then I had a thousand users and we're starting to get a lot of attention on that application right and then I hit 10,000 users but stuff starts breaking and all of a sudden my systems go down my database is having too many rights is handling too and things are breaking so I hate to give it to Chris when things are breaking but I would really love your help here great thanks Sky thanks for setting everything up for u

s here so Sky walk us through the initial architecture that we have some of the initial patterns of things that we'd be thinking about and again the key that we want to start off with here when you're building a new application is thinking about how we can encourage you to do less with your infrastructure by spending more time on your business application making use of managed services and servus here at AWS again when we start to reach this point of scale maybe we'll say hypothetically again in

the tens of thousands of users we potentially start to see some things go wrong right and these things will typically lend themselves to very common patterns again here at AWS we've been working with startups and businesses for a long time building so we see a lot of the same kind of issues trickle in around sometimes the the same points in an application scale right so one of the things that we'll actually see it becomes a paino is that the business itself has grown the product has grown the f

eatures and capabilities have grown and so you start to run into where the different parts of this application start to impact others many many many years ago uh in the early 20000s amazon.com was originally a monolithic application entire site was one monolithic application and what we found was that the demands the various components of that application were effectively causing negative impacts on others the other thing typically where this will start to make itself apparent is in the database

you'll have some queries that are maybe very intensive you know looking across very large amounts of data and then others that are maybe quicker and easier and so again this conflict and imbalance of the resources can sometimes cause challenges so what we need to do is kind of go through the stack and understand where we have room to optimize right where we can start to think about how to pull together or pull apart these various components and think about how we scale them further so we're goi

ng to again just as Sky had kind of outlined here work on the front end the back end and our data tier now before we go to much further into this B there is something that we need and basically what we need is a way to measure what's happening inside of our infrastructure and architecture now for those of you who've been building and running applications I'm sure you've heard from say a support person someone else in the team a customer hey the site is slow hey the app is slow you're like what d

oes what does slow mean what's happening what maybe you're are you seeing can you explain this further to me and so again we need some tooling we need some things to get the data that we need in order to be able to scale and to grow here at adws we have a number of products that can help you with this kind of the two biggest I'll say buckets of products they themselves have multiple components that most of you are probably familiar with are Amazon cloudwatch and AWS x-ray now Amazon cloudwatch h

as been around since the earliest days of AWS um it's very deeply integrated across almost the entirety of our product portfolio today has a number of different capabilities built into it around logging and metrics and alarms and dashboards but then also some other aspects that can help us with things on the front end so we could do uh there's a a tool called synthetic canaries allow us to test our API from cloud cloud watches infrastructure remotely there's also real user monitoring or Rum comp

onent that you could include in your applications front end be able to get performance data from that simly on x-ray we have the ability to do traces across our architecture now at this point we don't have too big of a distributed architecture right we have app runner in our database and so there's only so much that we can measure there but as we break apart and decompose our application these will become more and more important now the other thing that we've seen over the last several years and

this this predates the whole I would say craze this year in the generative AI space is more and more tools that are coming out for both Ops and Dev folks that enable you to make use of machine learning and so we've got two kind of core products in the space here today at AWS I don't know what the space will look like by the end of the week of course this is reinvent so there's always lots of things happening um but the first is Amazon devops Guru which is the name might imply it's a useful tool

for the the devops folks inside of your organization has the ability to look at your infrastructure understand the components that you have and then used on then Based on data and machine learning models that we've developed internally here at AWS offer up guidance so we can look for areas where you're seeing High latency slow database queries other sorts of things in your infrastructure that you wouldn't like and then offer up basically hey this is something you should take a look at this is t

he recommendation that we have next we have Amazon code Guru which is the name might imply is something that it's used for looking at your application code so you can point this at your code repository have it scan all of your code and then pull in and get understanding about what your code does and how it might be performing uh where there might be issues that it can help identify so again these two tools bring with them machine learning models based on the knowledge from Decades of development

and infrastructure inside of Amazon uh and again could be quite a powerful kind of multiplier to your efforts and what you can do today again this is just a a quick example here of a screenshot from devops Guru where it's highlighting slow queries it's highlighting aspects of what's Happening inside of my database and again there's just a wealth of information that you can pull forward from these services to help you really understand where the pain points are right we don't want this to be the

thing where you just turn the knob and you're like hey let's just do bigger instances let's just run more of those right there are other options that you can get to before you get there cool so now we've got some tools in our toolbx let's get back to actually how we thinking about scaling this architecture and breaking it down bit by bit now on the front end here if you run your front on an amplify hosting one of the great aspects about this product is that it scales really really far with with

out you having to do anything there's effectively no knobs or levers in amplify hosting that you can TW or tweak or tune that will enable it to scale further or better further for you and I've had a number of conversations with the team actually about this and saying tell me where it breaks where's the break point where does it tap out where customers having challenges and they said theoretically they don't see them really one of the best aspects about amplify hosting it's built on top of cloudf

ront so cloudfront just celebrated its 15th year birthday here for us at AWS um it is a CDN service there's over 550 points of presence today for cloudfront I remember when it was a very small kind of baby service many many years ago and now again it's got this massive Global capacity uh and it's able to help bring your traffic or bring your customers to your site faster and easier than ever before and so on the front end a lot of the performance actually comes from the work that you do all righ

t it comes from tuning your front end code it comes from looking at how many calls you're making to back in databases how slow those might be it also comes from things like how much CSS JavaScript images static content that you're loading and so this is where the CDN aspects of cloudfront become really powerful and that I can help cach those resources at the edge closer to where your customers are so this is one area where amplify does expose some capabilities for you which is that you have the

ability to set custom headers on objects that are hosted by amplify and so we see here an example where I'm setting the cache control header uh specifying the max maimum age and it's for the Rex pattern for everything in the URI that's in the slash image directory so you can imagine I'm going to cach my static images and I'm going to say hey those don't change that often I'm comfortable having those being cached as long as possible at the edge and in the customer's browser but beyond that there'

s really not a lot more that you can do with amplifier or with cloudfront to help change or tweak these things so let's move on to the backend the backend data storage let just say the database the data stor storage area is where I find actually that customers get the best impact on being able to scale earlier on it's typically where there's more pain points it's typically where physics becomes a barrier to how you think about how you have to scale things now one of the awesome things that we ha

ve now today with Aurora cus V2 and one of the key aspects of Aurora that's different than a traditional related datab based hosting product is that it's separated compute from Storage um I've been running my SQL in postget scale for many years uh pre- Amazon and what you end running into again is either compute or storage as a bottleneck but what Aurora does is basically detach the two so they can scale independently for you and so Aurora surus V2 basically helps automate this process for you b

y scaling out compute and scaling out storage for you automatically now the storage you really don't have to think about it's going to take care of that all behind the scenes uses essentially a shared storage uh model for the the database storage the CPU side of things though is where you get a little bit more control now uh Aurora serus V2 is based on the overall Aurora serus model which has this concept called Aurora capacity units a single ACU Aurora capacity unit uh represents 2 gbits of mem

ory for an underlying database node today you can uh configure manually yourself Aurora for either a minimum of half of an Acu all the way up to 120 acus which today is 256 uh gigabits memory so at that high end there you're talking about a fairly large host or hosts or capacity scale that uh that exists and so again this is in the scaling up of the single node that you could have uh as part of an aurora cluster now again Aurora takes care of this for you so Aurora servus V2 basically looks at a

number of metrics a number of bits of data of what's Happening inside of your database and automatically adds acus to uh to your database does it completely transparently for you should not impact trans uh transactions shouldn't impact things like buffer pools and again these are things that you would typically have to tweak any time that you resize the database in a non Aurora model of having compute and storage kind of mashed together now the other option that you have and this is something t

hat is part of Aurora overall is that you can also add other nodes to your cluster so we can continue to add other read nodes to our cluster now these read nodes can actually be a mix of Aurora serverless or regular Aurora so you can actually mix both these kind of uh hard configured resourced instances or nodes with servess nodes that can scale Up and Down based on the overall load that they see uh today Aurora supports up to 15 Read replicas so again it's 15 Read replicas that can go from half

an Acu all the way up to 128 acus now there is an interesting thing that aor doeses here well you'll see that there's different tiers that are uh that are written as part of this so you always have a writer node in a cluster and then you can have these 15 Read replicas that are part of it those read replicas do have to be given what's called a tier and what this is is part of how Aurora handles its availability slay such that if the primary nodes were to fail or die it then promotes a reader in

order of the tier the other thing that happens as part of this is that tier zero and tier one readers are always the same size as the writer so you can get really flexible with this you could have say a reader that's a much lower tier that you forcibly scale down and you limit how large you can get that could be useful for things like internal admin panels or bi tools or other data analytics tools that don't need to impact the or need have the full size of the aora node that's powering all of t

his and so this is the one area where you do have control again over these read replicas these are things that you would create and AD again based on the overall needs of your application now if you do add these various read replicas and you start to grow this cluster where you're right node is growing automatically again due to Aurora serverless and then you add these read replicas as need based on the demands of your application there are some customers that see you know 20 30 to1 read to writ

e demands where these read replicas become a really important thing what you're want to what you're going to want to do is add some sort of database proxy so now database proxies have been around in the postgress of my SQL world for a very long time uh and then just a few years ago we announced RDS proxy so RDS proxy as the name might apply is a database proxy that sit between your application and all of your database nodes so it can sit between your application and the right node your applicati

on and all the read nodes and what it does is it helps simplify things such as connection handling and connection uh you know memory consumption this is another really important thing especially when we start talking about how on our application tier we might see some aspect of autoscaling or serverless that happens being able to have RDS proxy own the connection pools for you actually reduces resource demand back on the database so by using an RDS proxy you effectively get more scale out of you

r database without having to change anything too much architecturally you point your app at the proxy proxy connects to the database and Away you go so our architecture starts to evolve a small bit here right what we've done now is we've implemented RS proxy we have our primary right node in our Aurora servus V2 cluster we can add some read nodes again to help with splitting out that read traffic and so again what we start to see here is a pattern where we can actually take this really pretty fa

r right if we had 15 Read nodes at 128 ACU you're talking about multiple terabits of uh you know memory and aligned CPU capacity for a database this is going to take you really really really far and again the storage tier just scales completely transparently for you so it's not something that you necessarily have to think about factoring in yourself now Sky of course gave us that awesome quote earlier in the presentation and there's another one here that I like to lend on which is that the best

database queries that you're ever going to make are the ones that you don't make often and so another trick that we have here but I purposely put this in in this order is to add caching in front of our database now there's been a number of different ways and models for doing this over the years um and again a bunch of different ways that you could think about this I'm personally personally a fan of bringing my database cach off of the database uh not running it in my application as well and so t

oday with Amazon elastic cach which is another product that's been out for a number of years you got two primary options or two primary engines I should say for doing caching inside of this product you've got MCD which MCD uh was first built by live Journal almost 20 years ago it's in all sorts of large infrastructures works really really well handles really heavy scale the other is redus and we see redus also used incredibly High uh scale architectures now the one gotta here with elasticache co

mpared to so far everything else that we've said is that with elastic cash you are going to have to basically take on scale of the resources yourself so you do have to think about the size of your clusters for this you do have to think about how you address objects in it and so adding a cach product is probably the first time where you start to see a more significant change in your application code than just splitting up say the reads and the wres between two different types of database notes so

again what we've done here is we've taken our Baseline architecture we've now added a cache to it this could help again buffer how much we need to think about scaling our database right you could move a considerable amount of your read trffic to a cache and then end up saving a ton of time and money on scaling out the database further um and so again what we kind of see here is this model where we think about scaling you know first up and then out right so serverless Aurora serus V2 is going to

scale up automatically it's going to make those nodes bigger adding more read replicas becomes something that you have to think about uh in the case of a of cach you can scale those nodes up bigger and then you can add more to a you know a pool of nodes that's represented by the service behind the scenes again you want to try to lend yourself as much as you can to thinking about how uh various other things can help reduce load and demand right so a proxy in front of your de your database helpin

g to remove overhead from connections is a real simple easy win right finding a way to do off database database caching for your application can be another real simple easy win so let's talk a little bit more here now about the application tier or the back end so Sky talked really briefly about what app Runner does for you in terms of how it's built on top of ECS and fargate and auto scaling and elb and all these other components that you yourself don't have to configure right if you were to wri

te that in Cloud information it would be a couple hundred lines of cloud information uh in this case you're not doing any of that so behind the scenes uh app Runner has kind of a couple core components to it the first is that it brings out for your application kind of at its front door an NLB uh or a network load balancer which is part of our overall application and elastic load balancing product suite behind that we run a L7 request router so this is a request router for HTTP traffic and again

app Runner today only allows you to run essentially web applications so Port 80 or Port 443 then behind that it manages for you ECS fargate tasks so it is effectively using again fargate behind the scenes and manages the scale of all those resources for you now again as sky was saying earlier you see nothing of any of this you connect your code repository you deploy your application your application runs right you're not exposed to any of the kind of bits under the hood Now app runner though a l

ittle bit different here again here from Aurora servus V2 does give you some options when it comes to kind of tweaking uh knobs and levers for scale so app Runner refers to the fargate tasks that it runs on as instances or app Runner instances an app Runner instance is is configured for a certain amount of memory and CPU and effectively this directly correlates with the the cost of the product when you're running it so today you have everything from a quarter of a vcpu or 025 vcpus and half a gi

gabyte of memory up to four CPUs and 12 gigabytes of memory these are hard allocations so you have to choose one of these configurations you can't tweak these two things independently if you wanted to do that you would choose for example like different ec2 instances you know comparing a compute intense versus a memory intense versus potentially a more mixed mode aspect of it now you again choose the size of the the underlying app Runner instance and then as I'll talk here in a moment about it sc

ales this up and down for you now by default every single app Runner instance has a maximum amount of concurrent requests that it can handle okay this is not TPS this is current requests there's an upper hard bound on this today which is 200 requests um and so this becomes one aspect that effectively blocks the scale at an instant size for app Runner the second factor that goes into scaling app Runner is the number of instances per service or per application that you have today that's a default

soft limit of 25 and I've talked to the team about this one and that number is an easy thing to request a limit increase it's set to 25 just kind of as a more of a safety mechanism to make sure you don't you know accidentally uh you know create something too huge here so we to put these two numbers together here assuming that I can get up to 200 uh you know concurrent requests per for instance a soft limit again of 25 initial upfront instances in my application that gives me about 5,000 concurre

ncy Max in an app Runner application Again by default the next aspect of this is that app Runner manages for you scaling these tasks or again instances as it calls it so it will fire up more of these fargate uh instances behind the scenes for you it will then go through the health check check that you can figure for it to add it to the L7 load balancer and it basically pays attention to how requests are coming in and out now one thing that it does do for you it does an intelligent scale down whe

n it sees that the load has dropped to the point where the instances are no longer as active as they need to be and so this L7 request router will actually intelligently shift traffic away from some nodes so they can be spun down automatically for you this is one tricky thing that people have typically seen with auto scaling is that in a traditional Auto scale mod with ec2 you scale down a node you may have no idea what that thing is doing so again this L7 request router helps basically remove t

he load from that app Runner instance and then make it easier uh for it to then be spun down now one of the tricks that app Runner also does is it effectively behind the scenes keeps the fargate task essentially in a frozen state so it keeps the memory active so if there was all of a sudden a sudden surge of traffic it can then fire that back up for you very quickly and so again app Runner is taking care of all of this for you and by default you could scale an app app Runner application down to

a single instance so again thinking about this aspect of our backend application what app Runner is going to get us again is that maximum you know 5,000 concurrency now people sometimes mix up concurrency in transactions per second right and so if you think about the duration or how long your transaction your action actually takes you factor that in again with that 5,000 currency and then you can kind of think about the scale here sometimes thinking about it in terms of TPS is a little trickier

so I thought here say I have a request that takes up to 2 seconds per request that means in a minute I can do 150,000 requests right so it's decent amount of scale here if it took one second then I could do 300,000 requests in that minute of time but again but again one way to think about this again for your application one thing that we don't get into here is your application Performance Tuning right so you're going to want to use those tools that we talked about before to see where's my applic

ation slow where is the code in it running slow one actually um interesting tips and tricks that we see quite often and one aspect about app Runner is app Runner has managed run times for you sometimes moving to a new version of your your runtime can actually bring a pretty beefy speed performance boost we see this very commonly with lambdaa so terms of our scale here this could probably pretty easily get us over that 100,000 plus user Mark maybe start to get us into our first millions of users

but then what do we have to think about after that well at some point everything that we've discussed does start to hit a wall right hey we've reached those 15 Read replicas for our Aurora database they're the maximum size they could possibly be but what we're probably run into before then is that the contention on our right note is maxed out now in a relational database of any type basically you are limited to a single right node even if you've got some sort of you know masterm type of configur

ation and a re replica the other thing here is that we do have a limitation on app Runner right so today there is a the end point probably where our application again due to the complexity of our business might start to have issues with components that are causing you know increases the latency and friction on other sides and really where you start to see this become a paino is actually organizationally so again going back to the example of amazon.com where we had this large monolith in the earl

y 2000s and we were starting to see all sorts of kind of cracks in the foundation as it were with it one of the biggest ones that we had was with our development teams the developers kept stepping on each other's toes in that single code base and so that monolith became a problem for us to be able to move as fast as we needed to with all of kind of the overlapping entanglements that happen inside of this and so this is typically where you start to talk about decomposing that application whether

you want to call it a service oriented architecture or a microservices based architecture or what have you but again what you have to start thinking about is how do I break apart this monolith there's a number of sessions this week that you can go to that I can talk to you about different strategies and examples of how to think about it I think of there being kind of primarily two different models for how you think about breaking apart an application one is a data domain mapping so you start to

look at the data in your databases and see how that can be grouped together in terms of commonalities and needs for the business the second is a business function mapping pretty close to each other typically but you start to say hey okay there's different needs of my business inside of my application and my data and I'm going to group that together by the business grouping or line of business as it might be now this also becomes an area where you might start to think about how to evaluate other

Technologies right if we're going to start building whole new applications as part of this maybe I want to look at serverless Lambda maybe I want to run some things on ec2 maybe I've got as sky was talking about earlier different database needs now again one of the kind of simplest easiest first tricks that you can do when it comes to scaling your database is just have more databases right so database Federation we take that data uh domain mapping we take that business domain mapping we separate

out that data into completely different clusters that we might have so again if I talked before about how you can have a single right master and Aurora up to 120 acus and 15 Read replicas imagine now that I can have three or four of each of those and again you keep pushing the bounds on scalability as you break this up now this changes a couple of things one I'll no longer be able to query all of my data in one place probably not doing that terribly often when you are doing it it's probably for

things like business intelligence needs or analytics needs there's better products to do do that right whether it be red shift or Athena or uh snowflake or something like that you think about a purposeful bi database tool or purposeful Analytics tool and so doing something like this again gives us that ability to just continue to kind of stamp out all of the scalability patterns and tips and tricks that we've used before as needed for these different areas so maybe for example here Forums on my

site are particularly heavy so I have to scale that a little harder the user's database you know critical and important I have to scale that a little bit further but maybe my product dat database has a much more manageable amount of writes and reads and so I don't have to you know quite tune that up quite the same now again typically the data area is where we focus a lot on scaling challenges with customers this does become a point in time where we want to say hey do I have very specific data n

eeds that I need to think about a purposeful built database for this right do I actually really have non-relational data that could go in a key Value Store do I have document data that should go into a document store or key value data or sorry time series data so thinking about where you want to put the data thinking about the database product that aligns to it again that's the kind of thing where if you don't have deep expertise on this to lean on your team that you're working with here at AWS

Your solution Architects can do uh data Discovery exercises with you data mapping exercises with you we can help you figure out hey based on what your use case is this is the right product to think about then typically in following of breaking up the database tier you think about breaking up the application tier right I always lean to to start with data and then move forward to the application tier it becomes a little bit of the same thing right so if we had a single app Runner application now w

e could have multiple app Runner applications the challenge that you run into here now becomes how do I think about gluing these things together right how do I think about how I'm going to expose the different Services out to my client uh API Gateway has a a neat thing that it can do which is called base path mapping it allows you to map different parts of your API to different backends so not just like oh I have different Lambda functions you could literally delegate out to different teams hey

you own the path for this and you own the path for that off of our API the other thing we might have to think about is moving to completely different technology patterns right maybe exposing apis inside of our infrastructure for internal microservices isn't what I should do maybe I want to think a little bit about moving from synchronous models with apis to asynchronous models with other ways of connecting these services this is another area where again you could uh see some really great talks o

n this this week thinking asynchronously part of the serus track a couple others that are in the seress track and app integration track about this as well but if we think of a traditional model where maybe I have two different Services we're in a synchronous World Service a calls service B service B replies back to service a and then back out to the client there's tight coupling there right there's a bunch of brittleness there every single one of these blue arrows becomes a potential area that I

have to think about how to recover from a failure on in the second model here we have an asynchronous application client called service a service call client b or service B but then service a replies before it waits for any work from service B another way to think about this here is I have an order service I have an invoice service and we can see basically on the top here in this asynchronous model the order service calls the invoice service says Hey client I've done that here's a you know an o

rder ID or a code that you could think of and then later on the client can go back and make that request to that next service and so just by separating this out we reduce some of the the tight coupling that we have inside of our architecture um I used to lead servess developer advocacy here for many years and we would spend a lot of time talking to customers about synchronous versus asynchronous and when I see some of the best companies that have built servus applications uh companies like Lego

for example where they've moved heavily to asynchronous models it really changes the way that you mentally think about your application and it really lends itself to helping you think about scale very very differently so really encourage you to think about how you explore how motions between components inside of your architecture and where you can move from a synchronous to an asynchronous model can help you really free up some of the again tight coupling now beyond having API call to API call t

here's then a whole bunch of services that we have uh that essentially allow you to pass events from service a to service B right so part of our app app integration Suite uh we have services like uh Amazon simple notification service or SNS Amazon simple Q service or sqs technically the first first the first AWS product that's the trivia question if you're very curious it was the first preview product for AWS was actually sqs back in 2003 Amazon event Bridge which uh basically can act as a messa

ge bus hub in between different services and then we have Kinesis data streams which can allow you to really ingest fast and spread out lots of information at scale so another quick cheat sheet again these four products they have bits of overlap there's ways that you can use each of them kind of the same however they're really are distinct use cases where each of them shines and again I would encourage you this week to check out some of the sessions in the serus and app app integration tracks wh

ich go into a lot more depth about how you think about persistency durability retries the cost models the consumption models there's a lot of different factors that you have to think to when you're choosing one of these products to sit between different services and So eventually our our architecture starts to evolve to something much much more complex right this this is not a a real custom customer's image uh this is not something that I directly replicated from a conversation but this is the k

ind of architecture that you could expect to see and shouldn't feel daunted by or concerned with right in this image what we're showing here is a whole bunch of managed services that you don't have to think about the underlying infrastructure for almost all of them have automatic capabilities for things like scaling multi-az recovery options and stuff that again you don't have to think about and so again where you could get from this is again really quite far so hypothetically here right we've d

ecomposed our application we're leaning on top of the scalability capabilities of these managed Services we're breaking out our services into different uh components uh our fun end is still you know continuing to kind of scale away for us without us having to do too much work and so we can reach our our you know initial goal here of exceeding 10 million users on AWS right where do we go from here everything after this point is where it starts to look maybe a little bit more unique the first thin

g of course is stamping out more of the patterns that we've talked about and again we can talk about companies like uber for example where they had thousands of microservices that all looked exactly the same and would follow the same types of scaling patterns uh we do something very similarly internally to Amazon a lot of our products under the hood look exactly the same right we're following patterns that we've established over and over and over again when there's certain break points we respon

d to them you always want to have the ability to dive really deep into your Stacks performance and so your observability tools your your monitoring tools your your code profiling tools become incredibly important and key and critical to understanding where to scale how to scale and so forth and then you do potentially reach a point where you say hey you know what actually it makes more sense for me at this scale to operate some of these things so I'll be the first to admit right Lambda is the mo

st expensive way to buy compute at AWS but you use Lambda because the TCO and you not having to manage that stuff for a really long time is really great there could become a point in time you say you know what I do want to take back that responsibility there's economies of scale that I want to make use of but again that's really far out at the scale line for where that starts to make sense cool so to Infinity uh again here in closing um uh I've been giving a variation of this talk now for almost

a decade here at AWS uh We've constantly revamping it you know the things that I talked about 10 years ago in this talk and 5 years ago in this talk would look quaint to what you could do today and so there's constantly things that you could think about in terms of you know what we call serverless right getting you away from managing manual infrastructure dealing with resources like Autos scaling yourself the other thing is is that speed today in the cloud is so different right CPU storage memo

ry networks so much faster than they were a decade ago um again you really do want to think about things like cashing that's where you're going to save a lot of time when it comes to scale so where inside of your infrastructure can you cash at the edge at the application at the database tier again reducing those queries to your database reducing the load on your database is going to be one of those easiest ways to win when it comes to thinking about scale and then again when it comes to just kin

d of quick easy wins you know things like Federation people think that that's a cheap hack it's a cheap hack that works almost every single time and will really help you overcome some of those pain points where you're like hey I'm at the bounds of what a product can do in terms of scale or concurrency or something like that um and again you want to look for best fit Technologies based on need we've talked heavily here about app Runner you all you might already know today that app Runner is not g

oing to work for you and that's fine and that's great but for a lot of people starting off really small again not having to think about those resources or how to build something is going to be more beneficial so with that really want to thank you for coming here to this session and joining us here Monday at reinvent we hope that you have a really great week here this is a really fun uh event every year it's been really exciting to be here we live by these uh survey results so please do come and

give us results really appreciate it and then most importantly uh a round of applause for sky this is her first reinvent and I think she did great

AWS re:Invent 2023 - Scaling on AWS for the first 10 million users (ARC206)

Related articles

Comments