Learn how Intuit leverages zero ETL to get near real-time insights

I I'm Smith uh I work as a principal engineer uh at in in the platform group uh primarily focusing on modernizing our identity platform and the data infra and along with me I have Rakesh who's one of the engineers leading the leg uh the large scale data migration from our Legacy identity stack uh before I get into the details I I want to kind of quick give a quick intro about intute I'm pretty sure most of you guys here know about some of our Mari product offerings such as Turbo Tax Credit Karma

Quake bugs MailChimp and you know we are really proud and humbled to be serving hundreds of millions of customers and small businesses in in us and around the world at intute our mission is to power Prosperity around the world and we want to provide tools to our customers and communities to overcome Financial business challenges so that you know we we basically want to work and you know we want to strive towards our strategy to build an AI driven expert platform so that customers can really rea

lly overcome their financial challenges and for that we are wanting to solve customer problems that can really help us attract retain and Delight our customers on an ongoing basis so today what we're going to talk about is how we are really leveraging zero ETL in our data migration journey to get realtime insights and provide transparency to all of our stakeholders and not just that but you know how it really helps us save time and effort so that we can really focus on solving key customer probl

ems so before I get into a little bit more details on that what I want to talk about is why identity at intute is so important primarily you know identity is Central to anything we do at intute right from you know letting the customers log into their products authorization managing the customers profile data for a lot of different workflows uh to Federated identities identity proofing you know I can just keep on going uh but you know at the the general belief that we have is that if identity mov

es fast intute moves fast and we are able to unlock a lot of uh features for our customers so you know like around 3 to four years back uh we kind of embarked on this journey to transform our identity from a monolithic architecture to a more modern microservices based platform and the primary goal for us to was to kind of uh you know deliver features quickly to our customers and also make sure at the same time that our architecture is scalable enough to handle the requirements for the next decad

e or even more and when I'm talking about the migration it was not just a straightforward you know just move the database from uh you know one relational database to another relational database or just write new services that are accessing the existing database it was a complete overhaul of uh you know the data model how we kind of decomposed our architecture and that is why it was very very important for us to develop a migration framework that is seamless and robust and we were very very clear

on our principles right that we cannot afford to have any downtime for our consumers and the data consistency was Paramount right we cannot really corrupt the data and then we also wanted to make sure that whatever stack we develop we are kind of operating in this double bubble so that if for any reason uh you know things don't work out on the new stack we kind of go back to our metal tested uh uh uh stack that we had so for that what we wanted to do was we wanted to make sure that we are able

to get realtime insights into our whole data migration journey and then we also wanted to make sure that we basically provide Crystal cre transparency for our stakeholders now if you really look at this right what what we're talking about is migrating hundreds and millions of customers and small companies that we have from our Legacy stack to the new stack and all our customers are kind of operating globally around the globe and we basically had to slice and dice the data to make sure that we do

not really impact our customers and then uh when when I talk about you know ensuring uh uh clear transparency for our stakeholders this is something that you know we we could we did not really figure out day one but when we were starting to do the migration uh in our pre-production environments at that point we realized that our team was spending a lot of time and energy in updating all our stakeholders all our buus on what is the migration status where are we with the migration what are the er

rors that are happening are We Done Yet and they also wanted to kind of get that confidence that you know this is this is going well for the other products and now you know for the more critical products we are good to go and Kickstart the migration they wanted to have that confidence and which is why we said that we need to figure out a way to kind of provide some asynchronous mechanism for them to look at data in real time and gain that confidence and not just for our external stakeholders but

even for us we wanted to get data in real time so that we can figure out how migration is happening what are the errors that are happening and use that as a feedback loop to kind of improve our migration Journey so this is this is kind of a a little bit uh of a high level architecture of you know how we kind of uh looked at this problem so if you really look at it uh on the right uh is the Legacy uh platform that we have and then on the right is the more uh modern uh microservices based archite

cture and if you really look at it the overhaul that I was talking about is that we had uh you know adopted completely new API strategy uh through graphql apis we completely changed our database uh going from Legacy relational databases to a more modern nosql Dynamo DB database uh and it it was like a complete shift in the way we were kind of uh doing things as far as identity was concerned so and then as I said we couldn't afford any downtime and what we did is that we basically adopted a adapt

er pattern where regardless of whether the users's data is in the Legacy stack or it is in the uh new uh identity stack we basically would route the traffic and we develop that anti-corruption layer regardless of which AP you are coming through the traffic would get routed either to the Legacy stack or to the new stack and what we did is we developed a spring patch based framework that would take the data from the Legacy stack and migrate it to the new stack and the backend database for that uh

uh spring batch uh application was Aurora servess so we basically had developed this pipeline from our uh Legacy system uh to seed the data uh into Aurora seress and I wish we had zero ETL there we wouldn't have had to build that pipeline uh but uh you know uh it was unfortunately not there and we had to kind of build a AWS glue based pipeline to seed all that data into the Aurora serverless and then the based applications would uh pick up the jobs and then execute the migration now all of that

data was sitting in aora serverless and we as I said we wanted to kind of power our dashboards uh so that all the stakeholders can really look at real time on what's happening with the migration we also wanted to do all sorts of real-time analytics ad hoc queries and if you really look at that box from Aurora Service uh to Red it's just a simple arrow right uh if that was not there we would really be looking at you know four five more boxes in between to get the data from uh Aurora server to the

red shift cluster let's talk about the outcomes so you know like I said what what this has enabled us for us is uh it has allowed all of our stakeholders to look at the data in real time and get that confidence of migration being going really well we've been able to kind of uh you know identify patterns of failures that are happening through either ad do queries or uh you know through our looking at our dashboards and then we've been able to use that as a feedback loop to improve our overall mi

gration process and at a high level the earlier approach that we were looking at uh we would have had the data into red shift at least you know it would have taken us four to five hours to get that data into red shift and our dashboards would be stale uh with zero ETL we were instantly able to get all the data uh into the red shift cluster and you know we were able to do our analytics in real time pretty much you know I would say that something that would have taken us at least a couple of month

s uh to develop with zero ETL we were able to deliver it in production in a couple of days right so this was amazing and as far as you know we we are a big red ship sh uh sha so this this was just the first use case I'm hoping that as we go along there are going to be a lot more use cases where we'll be able to leverage Z ATL to get data into red shift and you know kind of do a lot more real time analytics with this uh I'm going to hand it over to Rakesh to uh give a quick demo Rakesh all yours

that was awesome thank you Smith good afternoon everyone I'm Rakesh and I'm part of the identity team at in today I'm excited to share a demo of the migration process which utilizes Aurora serverless V2 with analytics backed by Red shift we initially started with uh sorry we initially started uh with having this uh complicated pipeline which replicated uh the data from from Aurora to Red shift but uh thanks to Z ATL it was very simple and it reduce the replication lag from hours to seconds so so

to set up this is pretty simple uh you just have to launch the source and destination server in my case it it it is uh the source is uh Aurora SS V2 and oops sorry okay okay the source is Aurora serverless V2 and the destination is uh red shift so I'll be running a couple of queries where it's going to get the count on the Aurora and the red shift and uh this is the dashboard which is built on top of the red shift so it has few filters which is basically uh H has the test user count and it also

has some of the information on the bar chart which indicates various migration statuses and it also has a table which gives you the clear count of what was migrated so let me go ahead and update the Aurora table so I'll be mocking the migration process where I'm going to select a few records which is there on the table as you can see the migration status for all these records are not migrated so I'll be mocking the migration where the migration let's say like 10 records are updated to migrated

and let me do the update and now 10 more records let's say it gets failed and it gets marked as skipped for what forever reason so once once these records are updated let me just go back and uh select it okay now the update is complete let's go back to the dashboard which is built on red shift so we anticipate the number of Records to increase by 10 for the users which is migrated and also to reduce edce the users to be migrated by 10 and increase the skip records by 10 so let me just go ahead a

nd uh refresh the dashboard wow this is amazing the counts got updated the updat which I did was on Aurora MySQL but it got reflected on the red shift dashboard as a business unit owner this provides me a lot of confidence in the data I need not go back to an engineer and ask him whether this data is accurate or not and as an engineer it is very invaluable to me I can just directly go in and dig in and find out what what caused the failure and I can go ahead and rectify them and this is also a w

in-win situation the source and the destination both are serverless and they can scale efficiently and to talk it off the integration between them is near real time it has saved a lot of engineering effort and not only that it has also saved a lot of cost in maintaining the pipeline so this concludes the demo

Learn how Intuit leverages zero ETL to get near real-time insights | AWS Events

Related articles

Comments