Main

Tinybird Screencast - Stream data with the Tinybird Events API

Learn how to use the Tinybird Events API, a high-frequency ingestion endpoint that lets you stream up to 1000 req/s and up to 20 MB/s to a managed Tinybird Data Source using a simple HTTP request. The Tinybird Events API is one of the simplest ways to stream data to Tinybird, so much so that some Tinybird customers have moved their streaming from Kafka to the Events API to reduce costs and complexity. To learn more about the Events API, check out the docs: https://www.tinybird.co/docs/guides/ingest-from-the-events-api.html https://www.tinybird.co/docs/api-reference/events-api.html

Tinybird

1 year ago

The Tinybird Events API is a high-frequency ingestion endpoint that makes it very easy to stream events data to Tinybird with just an HTTP request. It makes it very easy to instrument your product or website with just a few lines of code and then send that data to a Tinybird Data Source. In fact, it's so powerful and easy to use that some Tinybird customers have actually replaced things like Kafka with the Tinybird Events API for certain use cases. In this screencast, I'll show you how to stream
data to Tinybird using the Events API. Let's start by sending a single event to the Events API with a simple curl. Now, before we start, the Tinybird Workspace that we're using for this screencast is in the US East region. And so you can see that we're using that host for this request. And I've also saved an environment variable for my Tinybird token for that Workspace. So let's create this curl. It's a POST request to the Events endpoint on the US East host, specifying the name of the target D
ata Source, and we'll pass that Auth token and then create some simple JSON data with a timestamp, an href, and an event. Now, when we run that command, you can see that we get a response that indicates we've successfully sent one row of data. When you make a valid request to the Events API with a name for a Data Source that doesn't already exist, Tinybird will automatically create that Data Source and guess the schema based on the data you send it. So if we pull the resources down from the serv
er, we'll see that a Data Source called `simple_events` was created. And if we do `tb sql "select * from simple_events"`, we get the data that we just sent via the Events API. Okay, now for the fun part, let's start streaming. Out of the box, you can send up to 1000 requests and 20 MB per second to the Events API. Let's see how close we can get to those limits. Okay, so here's a bit of Python code, and basically what it's going to do is fill a buffer of events and then flush that buffer to the E
vents API when it reaches a specified sample size. When we stop the process, it will print out some stats about the throughput of the requests to the endpoint. Now let's specifically look at how the Events API is being called in this code. This is the important stuff right here. We're constructing some newline delimited JSON from the events buffer, and then we're issuing a very simple POST request to the Events API, passing parameters which include the name of the Data Source, an auth token that
allows us to append data to that Data Source, and then a `wait` parameter, which I'll talk about in a little bit. Now, this is Python, but you can see how this would be very easy to do anywhere you can make an HTTP request. Even a webhook would work. OK, now let's run this script, generating 100 events at a time, and then flushing the buffer with a request to the Events API. Now, as it runs, we can go back to our shell and `tb sql "select count() from simple_events"` a few times, and then we ca
n actually see how that data is being written to the Data Source as we make those requests. Now, when I stop this from running, it'll print out those throughput stats. So you'll see that we were sending about 5 requests per second and about 50 kB per second. So at these rates, we're not really even getting close to the rate limits of the Events API. We can throttle up the throughput by adding more events, but since I'm not using threading in this script, it's not really going to get close to the
request limit. Note that if you do need more throughput than the allowed 1000 requests or 20 MB per second, you can always get in touch with the Tinybird Customer Success Team and ask about reserved capacity. Now, one last thing I want to show you with the Events API is that `wait` parameter. You'll notice that we're getting a 202 response when we send data to the Events API with the `wait` parameter set to its default value of `False`. A 202 response basically means that Tinybird has successfu
lly received and processed the request to the Events API, but it hasn't actually yet confirmed that the data has been written to the Data Source. If you want to confirm that the data was actually written to the Data Source, you can set the `wait` parameter to `True`. Now, I'm using the `click` library to do this as a command line flag, but basically this is sending a `wait=True` as a part of the parameters in the request to the Events API. This is going to slow down throughput a bit, but it is u
seful if you want acknowledgement that the data was actually written to the database. If you use the `wait` parameter, you'll get a 200 response code instead of a 202 to indicate that the data was successfully processed and stored. And that's how you stream data to Tinybird using the Events API. If you want a little more detail, you can always check out the docs on the Events API. I've linked to them below, and I'll see you later.

Comments

@ticosnet

Thanks a lot. Does any javascript version on this topic?