<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Pranshu Raj - blog on backend systems, performance and sidequests</title><description>Deep dives into backend systems, databases, failure modes, performance and a whole lot of sidequests.</description><link>https://blog.pranshu-raj.in/</link><item><title>How I use AI</title><link>https://blog.pranshu-raj.in/posts/how-i-use-ai/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/how-i-use-ai/</guid><description>What AI use has improved for me, what I avoid doing and things that have worked best for me so far.</description><pubDate>Tue, 27 Jan 2026 19:14:20 GMT</pubDate><content:encoded>I&apos;ve been using AI since the popularity of LLMs exploded with the advent of ChatGPT (about Dec 2022) for various purposes - help with classwork, one off scripts and search.

Over the past year AI tools have become a part of my daily workflow, helping me iterate faster, find blindspots, and learn a lot more efficiently.

This post covers how I use AI, where I avoid it, and what makes it effective in my everyday work.

![code review - ticker bug](@/assets/images/how-i-use-ai/ticker-bug.png)

## What I use AI for
### Search 
Very useful for grokking unfamiliar code. I find [Deepwiki](https://deepwiki.com/) to be very good on open source repos, and use agents in my IDE (currently copilot) to do this, to learn about what parts of code are relevant, reduce the amount of code I actually have to look through.

Reduces the cognitive load of entering an existing project by a lot - form mental models faster.

Also pretty good for one off questions that you might have, such as this [query I had on Redis TTL](https://deepwiki.com/search/how-does-redis-implement-ttl_9b05c051-41c0-49fe-b8be-69d400b5609b?mode=deep).

### Code review
This is one area I haven&apos;t explored much, though it looks promising. The signal-to-noise ratio requires careful filtering, but when it works, it catches issues that I don&apos;t notice quickly.

I&apos;ve got good results in the few times I&apos;ve used it recently, for example this [concurrency bug it found](https://github.com/pranshu-raj-211/leaderboard/pull/11#issuecomment-3677673775) on my leaderboard project, or this [issue with tickers](https://github.com/pranshu-raj-211/leaderboard/pull/19#discussion_r2648481150) which I missed out.

This helps especially when you do not have a huge depth of knowledge in the tech you&apos;re working with, and if its mature (like Go) automated code reviews can figure out deeper issues like these.

### Reducing repetitive work
Low stakes work where you know what needs to be done and is something that&apos;s pretty common - initial Dockerfiles, CI, frontend (I don&apos;t know any frontend), spinning up POCs for understanding integrations.

In fact all the design changes I&apos;ve done to this blog that make it different from the default astro-paper blog are implemented by AI. These are quite simple modifications, but would take me a long time to learn and do myself.

### Pair programming
I use this mostly to learn while doing, where I do not ask for implementations (I like to think of it as the LLM prompting you, the user).

I&apos;ve used this to refactor tightly coupled code, add tests to a project that started off without any, working through unfamiliar frameworks or languages (Wes McKinney [talks about this](https://wesmckinney.com/blog/agent-ergonomics/)).

---

## What I don&apos;t use it for
Architecture decisions, complex or sensitive logic, questions about the product (these should be done by humans, and well documented for the AI to assist with implementation).

I&apos;m against doing this with AI, considering the limitations I&apos;ve seen agents have while doing these sorts of tasks, and having to redo everything from scratch is not pleasant. It&apos;s better to ask another person if you don&apos;t know what you&apos;re doing and got stuck with these.

I&apos;ve learned these boundaries through experience with what goes wrong when AI handles these tasks:

- [Hallucinated APIs during integrations](https://github.com/pranshu-raj-211/score_profiles), even when pointed to correct documentation, AI sometimes 
invents methods or parameters that don&apos;t exist, especially for newer or less common libraries
- Adding unnecessary abstractions or functions that will never be called
- Generating verbose code with defensive checks (multiple error checks) that obscure the actual logic

Having to review and redo everything from scratch negates any speed benefits - there&apos;s no point being fast if you&apos;re wrong.

### Tests
A lot of people (especially on twitter) say that AI is really good at writing tests, which is something I&apos;ve never experienced. Every time I&apos;ve used AI to write tests ends up badly, with it focusing on pointless details and missing the point. There&apos;s one instance where I tried to make it generate a Property Based test for a simple function, in order to learn more about it without having to read a long article. I got some unit tests that were not PBT at all.

---

## Workflow
Before implementation
- Identify the correct thing to work on
- Break it down into chunks, write about the components, how they need to be built, design preferences, everything (just do it in a coherent way, add indexes and summaries to prevent the agent from getting lost)
- Start with questions to identify gaps

Implementation
- Iterate fast - agents implement, developers review, write and run tests (faster feedback cycles)
- Update and expand on spec as you go (to ensure we stay on track)
- Ask questions where something is done in a weird way (can be a good learning experience)

Post implementation
- Run tests to see if changes violate expected behaviour
- if something breaks, it&apos;s often better to hand it to another model if the initial one isn&apos;t able to figure it out (I prefer Claude)

## Tools I use
I currently use GitHub Copilot (free through my student account) integrated with VSCode, and Claude and Gemini web interface for more complex tasks requiring careful context management.

I&apos;ve experimented with other tools - Cursor early after launch had significant hallucination issues that pushed me toward the web chat approach. I&apos;m planning to try Cline and Windsurf next, and have heard promising things about Claude Code though I haven&apos;t used it yet due to the costs.


## Context and the web chat hack
I do use coding agents integrated with an IDE (Copilot with VSCode, that&apos;s the only one I have right now). These did have a terrible experience for me until very recently, and frequently gave me grossly incorrect code and I&apos;d have to redo from scratch (which is why I prefer spec driven development rather than one off prompts whenever using integrated agents).

I found a hack that works a lot of times to bypass this limitation of one off prompts (it&apos;s all about context).

I believe that coding agents are sometimes terrible at choosing the correct context even when you tell them explicitly which ones to use (you&apos;d have noticed them reading multiple unrelated files to do a change).

Instead I copy parts of the code that I know are relevant, create a prompt that explains exactly what I want to do, any design rules that I want it to conform to etc. and paste it to Claude or another web based chat platform, which has worked well for a lot of tasks that integrated coding agents would have bloated the context for.

## The human part
I strongly believe that there&apos;s no replacing humans in software engineering, and the best way to make use of this tech is as a tool to multiply output.

However output is not impact, therefore humans are required to find out where things should be going, and will be valued for their taste and judgement going forward. Humans should be in the loop (especially for any and all code that goes to production), and by using humans + AI we&apos;ll be able to achieve a lot more (multiples, not incremental changes) compared to before.


### So, is this a good thing?
Yes.

Personally, apart from all the things I&apos;ve mentioned before, it allows for iterating on a lot more ideas than before. A lot of my blogs contain scripts that have been generated by AI, iterated upon, then refined by hand to remove the fluff and have the exact things needed. I&apos;m also using it to assist in writing the code for upcoming sidequests on [TTL](https://blog.pranshu-raj.in/posts/ttl) and [durable execution](https://blog.pranshu-raj.in/posts/durable-execution).

I&apos;d love to not think about the syntax too much when building things, and get more of my ideas out there in the world. I really enjoy the process of building things, and the syntax significantly slows things down, so any improvement to that is helpful as it helps me get things done while the idea is still fresh in my mind.

On the other hand I also like to implement things by hand to learn - scaling, performance improvement, memory are some of the things I would prefer to do without any AI assistance.

I do like to discuss difficult learning topics with gemini though, and make it critique the design of my systems (something I&apos;m not very good at) in a way that keeps it from being sycophantic about it. This is one limitation of LLMs that if overcome can help improve things a lot, getting critique for your ideas anytime, use the knowledge the LLM was trained on to spot flaws early.

## What about prod?
Production code should still be thoroughly reviewed, and any generated code should be generated on the basis of specific details, not vibes. There&apos;s also automated testing, but tests aren&apos;t enough alone, and understanding of the generated code and prior codebase is essential for using these tools where code affects live software.

## Other blogs I liked on this topic
- https://iximiuz.com/en/posts/grounded-take-on-agentic-coding/
- https://simonwillison.net/2025/Mar/11/using-llms-for-code/
- https://addyosmani.com/blog/ai-coding-workflow/
- https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/
- https://antirez.com/news/158
- https://antirez.com/news/155
- https://antirez.com/news/154
- https://antirez.com/news/153</content:encoded></item><item><title>Breaking the 28k SSE connection barrier</title><link>https://blog.pranshu-raj.in/posts/scaling-sse-1m-connections/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/scaling-sse-1m-connections/</guid><description>How I figured out the issue that was limiting my leaderboard to 28232 SSE connections, fixed it and a framework to reach 1 million.</description><pubDate>Sat, 13 Dec 2025 19:10:04 GMT</pubDate><content:encoded>**TLDR:**
Built a real time leaderboard that allows users to stream updates using SSE (Server Sent events). Load tests (based on Go scripts) hit the limit of 28,232 concurrent connections on Linux (which is a very peculiar number). This post explains what was blocking this and the path to go beyond (and a framework to scale to potentially millions of connections).


&lt;div class=&quot;series-box&quot;&gt;

&lt;p&gt;This post is part of a series on my &lt;a href=&quot;https://github.com/pranshu-raj-211/leaderboard&quot;&gt;real-time leaderboard&lt;/a&gt; project&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/optimizing-docker-builds/&quot;&gt; &lt;b&gt;Optimizing Docker Image builds&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/scaling-sse-1m-connections/&quot;&gt;Scaling SSE to 150k connections ( ← you are here)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/backpressure/&quot;&gt;&lt;b&gt;Backpressure in Distributed Systems&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/implementing-correct-fanout/&quot;&gt;&lt;b&gt;Fixing fanout, and other issues&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/exploring-sse/&quot;&gt;&lt;b&gt;Introduction to Server Sent Events(SSE)&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Replacing Redis sorted sets (coming soon)&lt;/li&gt;
  &lt;li&gt;Reproducible Grafana setup (coming soon)&lt;/li&gt;
  &lt;li&gt;TCP stack tuning (coming soon)&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;


---

### Acknowledgements

I&apos;d like to thank [Tushar Tripathi](https://www.linkedin.com/in/ACoAAAsFkkQBRjsCxKyuMzJl_BTAwOWMAEn8cFM/?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3B7Wg6l7mNSFyslVwyc7s0bQ%3D%3D) for pointing me to resources explaining how others achieved similar results (on websockets). And to my friends, [Ritu Raj](https://www.linkedin.com/in/rituraj12797/) and [Chahat Sagar](https://www.linkedin.com/in/chahat-sagar-aaab67222/) for listening to my ramblings on this topic for days on end.

---

## System overview
The leaderboard service (a Go HTTP server) is being tested for the maximum number of connections it can sustain, specifically the SSE (Server Sent Events) endpoint exposed by it. 

Each client connection is handled by a goroutine, and each goroutine receives updates through its own dedicated channel from a single broadcaster goroutine. The broadcaster continuously polls Redis for leaderboard updates and fans them out to all connected clients.

![System overview](@/assets/images/scaling/leaderboard_system.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;System overview&lt;/u&gt;&lt;/p&gt;


The SSE implementation I&apos;m using here is stateless, as clients would not need to stream past events (in case they disconnected and rejoined), they only care about the latest state (historical data is accessed through another endpoint that connects to a Postgres database storing aggregated data).

Prometheus and Grafana are used for monitoring the setup, to observe in a visual way the various resource usages among other metrics.

The client is a script that opens up tens of thousands of SSE connections to the `/stream-leaderboard` endpoint. Initially it was being run on the native OS, without a container for the client.

All of the server components run via Docker compose. The client as well as the server are on the same laptop (but the server components are inside Docker containers, so that separates it from client).

This benchmark does not measure
- Game servers submitting results (a more realistic benchmark)
- Network spikes (I know from experience that this server is terrible at that)

---

## The 28,232 connection limit

Initial testing was done through a Go script that opens up a bunch of persistent HTTP connections to the backend server, which was based off of the testing script used in [this talk](https://www.youtube.com/watch?v=LI1YTFMi8W4).

This got me to 15,400 concurrent connections on Windows, and 28,232 connections on Linux. All tests are done by setting up the server using docker compose and the script running natively.

![Grafana dashboard showing 28k active SSE connections](@/assets/images/scaling/28k_conns.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;28k concurrent SSE connections reached&lt;/u&gt;&lt;/p&gt;

This number was peculiar, and I got curious as to why this was happening, as I believe that it should have scaled to hundreds of thousands of more connections, since there were no issues with memory, CPU or network resources.

&lt;details&gt;
&lt;summary&gt;&lt;p align=&quot;center&quot;&gt;&lt;u&gt;&lt;b&gt;Client script&lt;/b&gt;&lt;/u&gt;&lt;/p&gt;&lt;/summary&gt;

```go
package main

import (
	&quot;flag&quot;
	&quot;fmt&quot;
	&quot;io&quot;
	&quot;log&quot;
	&quot;net/http&quot;
	&quot;os&quot;
	&quot;time&quot;
)

var (
	ip          = flag.String(&quot;ip&quot;, &quot;127.0.0.1&quot;, &quot;Server IP&quot;)
	connections = flag.Int(&quot;conn&quot;, 10000, &quot;Number of SSE connections&quot;)
)

func main() {
	flag.Usage = func() {
		io.WriteString(os.Stderr, `SSE client generator
Example usage: ./client -ip=127.0.0.1 -conn=10000
`)
		flag.PrintDefaults()
	}
	flag.Parse()

	url := fmt.Sprintf(&quot;http://%s:8080/stream-leaderboard&quot;, *ip)
	var conns []*http.Response

	for i := 0; i &lt; *connections; i++ {
		resp, err := http.Get(url)
		if err != nil {
			log.Printf(&quot;conn %d failed: %v&quot;, i, err)
			break
		}
		defer resp.Body.Close()
		conns = append(conns, resp)
		log.Printf(&quot;Connection %d established&quot;, i)
	}

	// keep alive
	for {
		time.Sleep(30 * time.Second)
	}
}
```

&lt;/details&gt;

### The bottleneck

A quick google search showed me that the issue was ephemeral port exhaustion, which was something I encountered for the first time.

The core idea is - whenever a client tries to connect to a server, it needs the destination ip and port. The operating system then selects a source ip and port to form the 4 tuple required for establishing a connection.

A TCP connection is uniquely identified by a 4-tuple:
`(src_ip, src_port, dst_ip, dst_port)`

The source port is chosen from a range of available ports configured for outgoing connections, known as the ephemeral port range.

On Linux this can be viewed by this command:&lt;br&gt;
`sysctl net.ipv4.ip_local_port_range`

Which on my machine, gives the output `net.ipv4.ip_local_port_range = 32768    60999`

This means that a client is assigned an outgoing port from 32768 to 60999, which is exactly 28232 ports (60999+1-32768=28232). Windows has a smaller port range - max out at 15k. This is a client side limit, not a bottleneck on my server.

However, this is not a hard limit.

### Fixing ephemeral port exhaustion
Linux allows you to expand the usable range of ephemeral ports by modifying:
`/proc/sys/net/ipv4/ip_local_port_range`

And the sockets (and therefore source ports) can be reused between multiple connections under certain conditions, which [this Cloudflare blog explores](https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/).

But I didn&apos;t apply these changes on my machine, since I did not want to risk breaking something on the **only machine that I have**.

---

## The TCP bottleneck
Even if the ephemeral port exhaustion is improved by using above techniques, there&apos;s still further issues that will prevent it from scaling further than that.

The main issue here would be how a TCP socket works.

A socket connection is a 4 tuple consisting of 32 bit IP addresses for source and destination, and 16 bit port numbers. This makes the maximum number of outgoing ports from a single IP address to a fixed destination IP and port to be limited at 65,536 connections (2^16).

![TCP socket 4 tuple - 16 bit for ports (destination and source each), 32 bit for ip (destination and source each)](@/assets/images/scaling/tcp_4_tuple.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;TCP socket 4 tuple&lt;/u&gt;&lt;/p&gt;

This was something I first read in [this blog post which describes the scaling to 2 million websocket connections in a Elixir framework](https://phoenixframework.org/blog/the-road-to-2-million-websocket-connections).

This part clicked instantly and I knew the solution to this problem would be having different IPs for the clients, which when added could exceed the number reachable by a single client.

### Scaling clients
I first thought of creating clients on different machines and using that to test out this idea I had, but since I don&apos;t have multiple devices I decided to find another solution.

So I dockerized the client script, made a bash script to create multiple client containers inside a specific Docker network (the one created for my server). Each client could be configured to create a specific number of connections.

This method led me to scale the setup to 150,000 concurrent SSE connections on a single laptop (8 GB RAM) running Fedora Linux.


![Grafana dashboard showing 150k connections (nearly - crashed right after this)](@/assets/images/scaling/150kconns.jpeg)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Reached 150k connections (almost)&lt;/u&gt;&lt;/p&gt;

Grafana crashes at this point.

---

## Docker networking
&lt;details&gt;
&lt;summary&gt;&lt;p align=&quot;center&quot;&gt;&lt;u&gt;&lt;b&gt;Click to expand section&lt;/b&gt;&lt;/u&gt;&lt;/p&gt;&lt;/summary&gt;

To exceeed the 65k connection limit when using a single source IP, I&apos;m using Docker networking, which solves this problem in a creative way.

When a container connects to a Docker network, it is assigned an IP address (IPv4) by default, and it can communicate with other containers and the external services through the host as a gateway (based on type of network) without knowing if the device it is interacting with is a real machine or just another container on the same host.

Each container gets its own IP address, virtual network interface, and DNS discovery service. Containers will only see the IP addresses, the routing table and the gateways, and the network interface abstracts all the details.


The server compose setup already uses a user defined bridge network:
```yaml
networks:
  internal:
```

This was done to fix issues with networking faced when connecting services to the server (Redis mostly).

Every container connecting to this `leaderboard_internal` network will be automatically assigned an IPv4 address. So something like this works:
```bash
docker run -d --name client1 --network=leaderboard_internal sseclient:0.2 -ip=leaderboard -conn=20000
```

This container (client1) attaches to the network `leaderboard_internal`, the ip param fixes the server to connect to (ip address of `leaderboard` is used), the conn param defines the number of concurrent SSE connections the client container is required to maintain. `sseclient` is just the image name.

Creating multiple containers is easy this way, just change the name of the container. Since this is a repetitive task, I created a simple bash script to do it for me. To avoid the server crashing due to network spikes (which is a problem I haven&apos;t solved yet), small delays are added.

```bash
#!/usr/bin/env bash

CLIENTS=5
CONNS_PER_CLIENT=20000

for i in $(seq 1 $CLIENTS); do
  docker run -d --name client$i --network=leaderboard_internal sseclient:0.2 -ip=leaderboard -conn=$CONNS_PER_CLIENT
  sleep 2
done
```

This is the way a system can escape the TCP ceiling, and go beyond 65k connections, simulating (potentially) millions of connections.

The bottleneck now becomes memory, something that can be solved through vertical scaling (more RAM), which is hard to do these days due to the soaring prices.

A workaround is horizontal scaling - using Wifi to connect multiple devices, each running multiple of these client containers, and since the containers do not know if they are connecting to external services or other containers, the process is largely the same. Minor config changes may be required for the IP address of the server, which is trivial.

&lt;/details&gt;

---

## What&apos;s stopping me from going further
OOM at 150k.

Since this testing setup had both client and server containers on a single machine, making the client containers reside on different machine would greatly increase the number of connections that could be made.

Further easy gains could be got from replacing Redis with an in memory sorted set data structure, removing the Grafana container (which would lead to losing out a visual representation). Some more difficult to implement updates would be memory optimizations, profiling to see how goroutines could be optimized for this.

&gt;Note: The testing setup does not currently consider the other part of the system, the game servers submitting game results which are broadcasted to the clients, which would definitely increase CPU and network usage, and a bit of memory too.

## References
- [Phoenix framework - road to 2 million websocket connections](https://phoenixframework.org/blog/the-road-to-2-million-websocket-connections)
- [Cloudflare - How to stop running out of ephemeral ports and start to love long-lived connections](https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/)
- [Docker networking overview](https://docs.docker.com/engine/network/)
- [Docker - Bridge network driver](https://docs.docker.com/engine/network/drivers/bridge/)
- [Port publishing](https://docs.docker.com/engine/network/port-publishing/)
- [SO_REUSEPORT](https://lwn.net/Articles/542629/)

---

There&apos;s a few experiments I&apos;m looking forward to conducting with this system, primarily horizontal scaling (to 1M connections), replacing Redis with a native sorted set implementation, profiling, realistic benchmarking (with game servers submitting results), preventing crashes due to request spikes (an annoying issue that I have not figured out till date). Trying the ephemeral port expansion is also interesting, maybe some other time though.

If you&apos;ve got any ideas, or interesting plans for this, reach out to me through [email](mailto:pranshuraj65536@gmail.com), [twitter](https://x.com/seigino99707047), or just submit an idea to the [repo](https://github.com/pranshu-raj-211/leaderboard).</content:encoded></item><item><title>Building a scalable real time leaderboard</title><link>https://blog.pranshu-raj.in/posts/leaderboard/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/leaderboard/</guid><description>Describing my journey of building the leaderboard, what I learnt, what issues I faced, and some design decisions (which will be elaborated on later).</description><pubDate>Fri, 14 Nov 2025 17:12:32 GMT</pubDate><content:encoded>WIP, please come back later


**TL;DR**

I built a real-time tournament leaderboard system using Go, Redis Sorted Sets, and Server-Sent Events. It streams live updates to clients and can handle hundreds of concurrent connections and game submissions per second. This project helped me learn Go, experiment with observability using Prometheus + Grafana, and optimize a system for high concurrency. [Repo](https://github.com/pranshu-raj-211/leaderboard)

---

Some time ago I started working on a real time leaderboard and got to learn a lot from building it. This post discusses a lot of the details and how I improved it, learnt things while building.


## Motivation
&lt;details&gt;
&lt;summary&gt;Click to expand&lt;/summary&gt;

This project started from a conversation I had with my friend [Ritu Raj](https://github.com/rituraj12797) on the topic of game servers in MMORPGs. I tried to dig deeper into this, could not find any resources so decided to try looking into actual MMORPG code. Since there weren&apos;t any good open source examples, I settled for the next closest thing - Lichess.

Lichess is an online chess platform that handles ~100,000 games any time of the day. It&apos;s completely free to play (non commercial - org), all the code is open source and the community is pretty nice too. I&apos;ve used it for years (for puzzles mostly) and after some digging into the code and talking to the people who made it, I realized that it&apos;s a huge system and a bit too complex to replicate at my current skill level.

So I started with recreating a component of this website, the real time leaderboard that&apos;s used in the tournaments.
&lt;/details&gt;

## The problem
Real time leaderboards are read heavy applications, but the write throughput and latency need to be optimized too. There&apos;s high frequency updates (especially in MMORPG games), concurrent reads and writes and a need for low latency (although this requirement is nowhere close to some other stuff I&apos;ve built).

I&apos;ve been an avid user of Python, and I&apos;d like to think I&apos;m somewhat decent at it (though I do doubt that sometimes). I did not believe that this would be the right language for these requirements, so I looked for a tool that&apos;d provide me the necessary performance without having a steep learning curve, which led me to Go.


## Architecture
### Core components
- Application layer - a simple http server
- Data layer - Redis for sorted set operations (eventually replaced by Go implementation)
- Real time communication - the most interesting technical aspect, based on server sent events (SSE)

### APIs
APIs intended for game server use:
- POST /submit-game: Endpoint for game servers to publish results of a game, with a predefined schema in order to update scores.

APIs intended for end user use:
- GET /stream-leaderboard
- GET /leaderboard
- GET /player/:id/stats

APIs intended for internal use
- GET /metrics

These provide all the necessary interactions with outside components, but the main thing to look out for is the streaming API which enables real time updates. Most of the discussion will be about that, since it&apos;s expected to be the most frequently used access pattern by a huge margin in these kinds of systems, and therefore optimizing the critical path is the most important.


### Read path


### Write path


### Why this tech stack?
Since one of the primary motivations for this project was to learn concurrency and observability, I decided to go with a tech stack I&apos;m not familiar with.

- Go - It&apos;s fast, great support for concurrency, has strong tooling, and is well-suited for systems programming (plus good benchmarking tools).
- Sorted Sets - Natural data structure for this use case. Adding, updating and deleting entries all cost O(log(N)) time, also supports range queries (needed to get top k players quickly).
- Redis - Has a great sorted set implementation already. Is also incredibly fast, although I&apos;d love to see how it compares to a sorted set implementation in native Go.
- Gin - Lightweight web framework that keeps things simple while giving useful abstractions like middleware and routing.
- Server-Sent-Events - Needed a way to stream updates unidirectionally. Polling and websockets were my other options but these are said to be more resource intensive for this use case.
- Prometheus + Grafana - To monitor system behavior and visualize metrics like request rates, Redis latency, memory usage, etc.
- Zap (Uber) - For structured logging, to make logs easier to search and filter later.
- Docker (and compose) - Containerization, spinning up the whole system with simple commands, help with reproducible dashboards. Later testing setup used docker networking for multiple client containers (which allows different ip for each container).

## Concurrency


## Observability
Initially started as a way to monitor performance.

Zap (Uber) used for logs, Prometheus for metrics, Grafana as dashboard for Prometheus metrics.

Some examples of the metrics used:
- Counter ():
- Gauge ():
- Histogram:

One thing that bugged me while testing this app was the need to keep creating the grafana dashboard every time that I started it up.

Usual methods for dashboard persistence use Helm charts and something called Jsonnet, which felt far too complex for the thing I&apos;m building.

Searching for options led me to a simple was of provisioning dashboards that are reproducible with just a JSON file and a change to the docker compose setup, which I&apos;m gonna document soon at [pranshu-raj.in/reproducible-grafana-json](pranshu-raj.in/reproducible-grafana-json)


## Performance


## Benchmarking setup
Initially I used a simple Go script which opened up multiple SSE clients, which got upto about 800 connections. This felt incredibly odd to me, since I wasn&apos;t seeing any clear reasons for failure (all system indicators were normal). I tried improving the memory usage which at this point was about 7-10 goroutines per connection (don&apos;t ask - I don&apos;t know why this happened), but this didn&apos;t improve things even after lots of tweaks to improve memory and CPU usage.

Eventually I decided to try a different testing method, checked out various common load test tools (k6, vegeta, wrk) but they did not have anything for SSE specified. I decided to go back to one of the earliest inspirations for doing this - Eran Yanay&apos;s 1 million websocket connection repository. I then adapted his test script for SSE, and that got me to `15,400` connections on windows. This was still not what I wanted, but a huge win nevertheless.

Still, this didn&apos;t have the memory or CPU util issues that might have stopped more connections, so I decided to dig around. It turns out that windows has a limit for outbound ports (something I later learned also exists on Linux), so I switched over to Linux and got to `28,232` connections.

28,232 is a very peculiar number, 

## Issues faced (and how they were/will be fixed)
- Docker builds taking too much time (and space) - [fix documentation](https://blog.pranshu-raj.in/posts/optimizing-docker-builds)
- Broadcast not working as intended (messages sent to only one client) - [implement fixed fanout](https://blog.pranshu-raj.in/posts/implementing-correct-fanout)
- 28,232 connection limit - [some Docker networking magic]()
- Load test script not working correctly - built a new one based off of [Eran Yanay&apos;s Gophercon talk]()
- 

## Experimental features
- Historical querying through event streaming to Postgres (a TSDB would be fine too)
- Initial value update (push a lb update when client connects)
- Sorted set implementation using skip list

## Planned features
- 

## Extensions
- Create and connect with matchmaking and playing services (probably use bots for playing)
- Horizontal scaling of servers
- Test how much this can scales (I believe 1M is easy enough, given enough laptops to have clients on)

##</content:encoded></item><item><title>Backpressure in Distributed Systems</title><link>https://blog.pranshu-raj.in/posts/backpressure/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/backpressure/</guid><description>Understanding what it is, how to deal with it, where it&apos;s used and how I handled it in the real time leaderboard.</description><pubDate>Tue, 07 Oct 2025 10:52:32 GMT</pubDate><content:encoded>Backpressure is one of those things that can make or break a distributed system, and is handled in an amazing way by a lot of tech around us.

I recently got the chance to interact with it while building my [real time leaderboard](https://github.com/pranshu-raj-211/leaderboard), where I had to account for this to enable clients have the best possible experience.

&lt;div class=&quot;series-box&quot;&gt;

&lt;p&gt;This post is part of a series on my &lt;a href=&quot;https://github.com/pranshu-raj-211/leaderboard&quot;&gt;real-time leaderboard&lt;/a&gt; project&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/optimizing-docker-builds/&quot;&gt; &lt;b&gt;Optimizing Docker Image builds&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/scaling-sse-1m-connections/&quot;&gt;&lt;b&gt;Scaling SSE to 150k connections&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/backpressure/&quot;&gt;Backpressure in Distributed Systems ( ← you are here)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/implementing-correct-fanout/&quot;&gt;&lt;b&gt;Fixing fanout, and other issues&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/exploring-sse/&quot;&gt;&lt;b&gt;Introduction to Server Sent Events(SSE)&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Replacing Redis sorted sets (coming soon)&lt;/li&gt;
  &lt;li&gt;Reproducible Grafana setup (coming soon)&lt;/li&gt;
  &lt;li&gt;TCP stack tuning (coming soon)&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;



## So what is it, really?
There&apos;s two competing definitions.

1. A technique used to regulate the transmission of messages (or events, packets).
2. The capability of producers of messages to overload consumers.
([Reference to Baeldung&apos;s blog](https://www.baeldung.com/spring-webflux-backpressure))


Though both are correct, I prefer the second definition and will use that throughout this post. 

![Fast producers overwhelm slow consumers](@/assets/images/backpressure/rate_mismatch.png)

Backpressure happens when your system can’t keep up with the amount of work being thrown at it.

## Why is this an issue?
Here is what breaks if backpressure isn&apos;t handled correctly.

- OOM errors (client killed due to huge memory usage on buffers)
- Dropped messages (buffer capacity reached - drops incoming automatically)
- Low throughput (resources are wasted trying to keep up)
- Network waste
- Latency increase
- Producers getting blocked (in case of Go channels)

---

## When does backpressure occur?
Let&apos;s first define the system in which backpressure will be encountered, then do a code prototype and discuss strategies for resolving it.

There&apos;s three components to this system

1. Producer creates and initiates send of the message to the consumer.
2. Messaging system which receives messages from the producer and forwards them to the consumer. (This part may not be present separately, can also be the network buffers of the system.)
3. Consumer receives messages from the messaging system and processes them.

Things work fine if the rate at which messages are created by the producer is less than or equal to the rate at which messages are processed by the consumer. If the creation rate exceeds the rate of consumption, we have a problem.

I like to think of this in terms of playing games like Tetris, at first the blocks arrive slowly and you&apos;re able to process (move and rotate) them easily. As time goes on, the rate at which these blocks arrive speeds up and overwhelms you and at some point it&apos;s game over.

---

## How to fix it?
There&apos;s four ways this can be handled, depending on the system constraints.

### 1. Slow down producer
Consumer sends a signal to the producer to slow down. This can be applied where the rate of messages can be controlled, and consumer should be given control of it.

In Go this can be implemented through the use of a channel to signal when the message rate should drop.

![Slowing down producers by sending signal](@/assets/images/backpressure/slow_down_producer.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Slow down producers&lt;/u&gt;&lt;/p&gt;


TCP uses the same trick: the receiver advertises a shrinking window until the sender voluntarily slows to match the receiver’s pace.

Tradeoffs
- Complexity overhead for feedback integration. In Go this is quite simple, add an extra channel to send a slow down signal, keep sending messages till the rate of messages received is good enough to work with.
- It might not always be possible to slow down producers, as message production rate might be out of our control.

&lt;details&gt;
&lt;summary&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/summary&gt;

```go
package main

import (
	&quot;fmt&quot;
	&quot;time&quot;
)

func main() {
	ch := make(chan int, 5)
	signal := make(chan int) // feedback channel

	// producer
	go func() {
		delay := 50 * time.Millisecond
		for i := 1; i &lt;= 40; i++ {
			select {
			case ch &lt;- i:
				fmt.Printf(&quot;producer: sent %d (delay=%v)\n&quot;, i, delay)
			default:
				fmt.Println(&quot;producer: channel full, backing off&quot;)
				delay *= 2 // multiplicative decrease
				i--        // retry same message
			}

			// adjust rate of messages
			select {
			case free := &lt;-signal:
				if free &gt; 2 &amp;&amp; delay &gt; 20*time.Millisecond {
					delay -= 10 * time.Millisecond // additive increase
				}
			default:
				// no feedback
			}

			time.Sleep(delay)
		}
		close(ch)
	}()

	// consumer
	go func() {
		for v := range ch {
			fmt.Printf(&quot;consumer: got %d\n&quot;, v)
			time.Sleep(200 * time.Millisecond)
			free := cap(ch) - len(ch)

			select {
			case signal &lt;- free: // send feedback
			default:
			}
		}
		close(signal)
	}()

	time.Sleep(15 * time.Second)
	fmt.Println(&quot;done&quot;)
}
```
&lt;/details&gt;  
&lt;br&gt;



### 2. Drop existing messages
If the messages existing in the queue are not as important as the ones that are being sent by the producer, the existing messages can be dropped. The exact strategy of dropping (drop oldest, drop all, priority based etc.) depends.

![Dropping existing messages in the buffer](@/assets/images/backpressure/drop_existing.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Drop existing messages&lt;/u&gt;&lt;/p&gt;


This is the approach I&apos;ve used in my [real time leaderboard](https://github.com/pranshu-raj-211/leaderboard), as the final state matters and not the intermediate states. If the producer is throttled instead, the leaderboard sent to consumers will be of older intervals for all the clients. Instead skipping on a few intermediates (which haven&apos;t been received by client) and directly sending the final state to slow clients is a better solution.

Tradeoffs
- Loss of data that is already queued to be processed. This works if it&apos;s a case where final data matters more than incoming (or priority based), but in systems where messages are critical, this cannot be applied.

&lt;details&gt;
&lt;summary&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/summary&gt;

```go
package main

import (
	&quot;fmt&quot;
	&quot;time&quot;
)

func main() {
	var msg int
	ch := make(chan int, 3)
	drain := false // set true to drain whole channel when full, false to remove oldest message instead

	// producer
	go func() {
		for i := 1; i &lt;= 40; i++ {
			select {
			case ch &lt;- i:
				fmt.Printf(&quot;producer: sent %d\n&quot;, i)
			default:
				// drop existing - drain channel
				fmt.Println(&quot;producer: channel full&quot;)
				if drain {
					for len(ch) &gt; 0 {
						&lt;-ch
					}
					fmt.Println(&quot;drained channel&quot;)
					ch &lt;- i
				} else {
					msg = &lt;-ch // drop oldest
					fmt.Printf(&quot;dropped oldest %d\n&quot;, msg)
					ch &lt;- i
				}
			}
			time.Sleep(50 * time.Millisecond)
		}
		close(ch)
	}()

	// consumer
	for v := range ch {
		fmt.Printf(&quot;consumer: got %d\n&quot;, v)
		time.Sleep(200 * time.Millisecond)
	}
	fmt.Println(&quot;done&quot;)
}
```
&lt;/details&gt;
&lt;br&gt;

### 3. Drop incoming messages
Probably the simplest method, to not accept any more messages from the producer until the space has freed up, without explicitly telling it to slow down. In producers this can be combined with retries and checks - if retries exceed a certain limit throttling can be done without any communication.

![Dropping incoming messages](@/assets/images/backpressure/drop_incoming.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Drop incoming messages&lt;/u&gt;&lt;/p&gt;

Tradeoffs
- Similar to the previous fix, we might not always have the luxury of dropping incoming messages - data may be critical enough that we&apos;re unable to drop any.
- Retries can be added to the producer - keep sending messages if ack not received instead of a fire and forget approach (at least once delivery, this is more of a nuance than anything).

&lt;details&gt;
&lt;summary&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/summary&gt;

```go
package main

import (
	&quot;fmt&quot;
	&quot;time&quot;
)


func main() {
	ch := make(chan int, 3)

	go func() {
		for i := 1; i &lt;= 10; i++ {
			select {
			case ch &lt;- i:
				fmt.Printf(&quot;producer: sent %d\n&quot;, i)
			default:
				// channel full, drop incoming
				fmt.Printf(&quot;producer: dropped %d (channel full)\n&quot;, i)
			}
			time.Sleep(50 * time.Millisecond)
		}
		close(ch)
	}()

	for v := range ch {
		fmt.Printf(&quot;consumer: processing %d\n&quot;, v)
		time.Sleep(200 * time.Millisecond)
	}
	fmt.Println(&quot;done&quot;)
}
```
&lt;/details&gt;
&lt;br&gt;

### 4. Increase consumers
An example of this is an async task queue for processing documents (or scalable notification system, similar function).

![System when worker pool is used](@/assets/images/backpressure/scale_workers.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Scale out workers (autoscaling essentially)&lt;/u&gt;&lt;/p&gt;

There will be a pool of workers which can be scaled up or down based on the amount of messages being received, an intermediate consumer may be used to just assign the messages (tasks) to the workers.

Tradeoffs
- This works when messages can be processed in parallel, but breaks down if some serial processing between consecutive messages is required (otherwise you need other ways to enforce ordering, things get complicated).

&lt;details&gt;
&lt;summary&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/summary&gt;

```go
package main

import (
	&quot;fmt&quot;
	&quot;math/rand&quot;
	&quot;sync&quot;
	&quot;time&quot;
)

var wg sync.WaitGroup

func worker(id int, tasks &lt;-chan int, quit &lt;-chan struct{}) {
	defer wg.Done()
	for {
		select {
		case t, ok := &lt;-tasks:
			if !ok {
				fmt.Printf(&quot;worker %d: exiting (channel closed)\n&quot;, id)
				return
			}
			fmt.Printf(&quot;worker %d: processing %d\n&quot;, id, t)
			time.Sleep(400 * time.Millisecond)
		case &lt;-quit:
			// consumer sends a quit signal (analogy - master process of Nginx)
			fmt.Printf(&quot;worker %d: exiting (quit signal)\n&quot;, id)
			return
		}
	}
}

func main() {
	tasks := make(chan int, 10)
	stopProducer := make(chan struct{})

	workerQuitChans := make(map[int]chan struct{})
	workerCount := 0
	var mu sync.Mutex

	startWorker := func() {
		mu.Lock()
		defer mu.Unlock()
		workerCount++
		id := workerCount
		quit := make(chan struct{})
		workerQuitChans[id] = quit
		wg.Add(1)
		go worker(id, tasks, quit)
		fmt.Printf(&quot;controller: started worker %d (total=%d)\n&quot;, id, workerCount)
	}

	stopWorker := func() {
		mu.Lock()
		defer mu.Unlock()
		if workerCount == 0 {
			return
		}
		// stop a worker - scale down
		quit := workerQuitChans[workerCount]
		close(quit)
		delete(workerQuitChans, workerCount)
		workerCount--
		fmt.Printf(&quot;controller: stopped a worker (total=%d)\n&quot;, workerCount)
	}

	// init onw worker
	startWorker()

	// producer
	go func() {
		taskID := 0
		for {
			select {
			case &lt;-stopProducer:
				close(tasks)
				return
			default:
				taskID++
				tasks &lt;- taskID
				fmt.Printf(&quot;producer: queued %d\n&quot;, taskID)
				time.Sleep(time.Duration(rand.Intn(200)+50) * time.Millisecond)
			}
		}
	}()

	// consumer (controls number of workers according to load)
	go func() {
		for {
			time.Sleep(1 * time.Second)
			queueLen := len(tasks)
			queueCap := cap(tasks)

			if queueLen &gt; queueCap/2 {
				startWorker() // scale up
			} else if queueLen == 0 &amp;&amp; workerCount &gt; 1 {
				stopWorker() // scale down
			}
		}
	}()

	time.Sleep(10 * time.Second)
	close(stopProducer)

	mu.Lock()
	for _, quit := range workerQuitChans {
		close(quit)
	}
	mu.Unlock()

	wg.Wait()
	fmt.Println(&quot;done&quot;)
}
```
&lt;/details&gt;
&lt;br&gt;

&gt;Note: Not all systems can increase consumers dynamically. For example, Nginx uses a predefined worker pool size, so it handles backpressure differently.

---

## How I dealt with backpressure in the real-time leaderboard
In my real time leaderboard, I used channels and goroutines in a manner similar to the [Actor Pattern](https://alamrafiul.com/posts/go-actor-model/). Each client connected to the server would have a goroutine associated with it, and a separate buffered channel would be created on which only that client would receive messages. The broadcaster goroutine would iterate through these channels and send messages to each buffered channel.

The key constraints for this system were:
- Final state mattered, not intermediate states, clients just needed the most recent leaderboard.  
- Multiple clients with variable speeds, some could consume updates in real time while others lagged.  

Even one client being blocked would mean that others do not get updates.  

Because of this, I chose the drop existing messages strategy, skipping intermediate updates for slower clients and only delivering the latest state. 


&lt;details&gt;
&lt;summary&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/summary&gt;

```go
for _, client := range lb.clients {
	select {
	case client.channel &lt;- update:
	case &lt;-client.ctx.Done():
		clientsToRemove = append(clientsToRemove, client)
	default:
		metrics.FilledSSEChannels.Inc()
		// drain channel before pushing new update
	drainLoop:
		for {
			select {
			case &lt;-client.channel:
			default:
				client.channel &lt;- update
				break drainLoop
			}
		}
	}
}
```
&lt;/details&gt;

---

## Warpstream reference
Warpstream is a diskless, Apache Kafka streaming platform which has a great blog on this topic, which I referred to for enhancing my understanding on it.

It has a more comprehensive view on backpressure, somewhat more aligned with keeping a steady, uniform stream of data incoming to the system rather than just dealing with a high influx rate.

[Dealing with rejection (in distributed systems)](https://www.warpstream.com/blog/dealing-with-rejection-in-distributed-systems)

---

## How TCP deals with backpressure
TCP uses flow control and congestion control, both of which use backpressure in some capacity.

In flow control, the receiver capacity is understood by the producer (sender) in order to slow down when the rate of messages is too high.

Flow control is implemented using the `sliding window protocol` in TCP. This has the receiver sending it&apos;s available window size to the sender (piggybacked on the ack for packets received), which regulates it&apos;s rate of sending based on the protocol given the window size it receives. More on that [here](https://www.baeldung.com/cs/tcp-flow-control-vs-congestion-control#introduction-to-sliding-window-protocol).


TCP messages don&apos;t go at a constant rate, they have slow starts, followed by a high rate of messages (increasing) till a threshold is reached (congestion avoidance) then a third phase called congestion detection. This is used in [congestion control](https://www.baeldung.com/cs/tcp-flow-control-vs-congestion-control#congestion-control), which helps to limit flow of packets at each node of the network, as opposed to at the end receiver. 

---

## What other systems implement this
Backpressure is a recurring theme in distributed systems. It shows up quite prominently in
- Kafka
- gRPC streaming
- Sidekiq

Checkout the code for this post [here](https://github.com/pranshu-raj-211/backpressure/).

---

## Further reading
- [Baeldung](https://www.baeldung.com/spring-webflux-backpressure#backpressure-in-reactive-streams)
- [Dealing with rejection (in distributed systems)](https://www.warpstream.com/blog/dealing-with-rejection-in-distributed-systems)</content:encoded></item><item><title>28K+ connections, zero messages</title><link>https://blog.pranshu-raj.in/posts/implementing-correct-fanout/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/implementing-correct-fanout/</guid><description>How I learnt about Go&apos;s concurrency patterns the hard way.</description><pubDate>Thu, 28 Aug 2025 11:00:08 GMT</pubDate><content:encoded>## TL;DR: 
I scaled a real-time leaderboard in Go to 28K concurrent SSE connections, but discovered that my broadcast pattern was fundamentally broken.

Fixing it taught me lessons about concurrency, backpressure, deduplication, fan-out, and observability that apply to any real-time system design.

&lt;div class=&quot;series-box&quot;&gt;

&lt;p&gt;This post is part of a series on my &lt;a href=&quot;https://github.com/pranshu-raj-211/leaderboard&quot;&gt;real-time leaderboard&lt;/a&gt; project&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/optimizing-docker-builds/&quot;&gt; &lt;b&gt;Optimizing Docker Image builds&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/scaling-sse-1m-connections/&quot;&gt;&lt;b&gt;Scaling SSE to 150k connections&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/backpressure/&quot;&gt;&lt;b&gt;Backpressure in Distributed Systems&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/implementing-correct-fanout/&quot;&gt;Fixing fanout, and other issues ( ← you are here)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/exploring-sse/&quot;&gt;&lt;b&gt;Introduction to Server Sent Events(SSE)&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Replacing Redis sorted sets (coming soon)&lt;/li&gt;
  &lt;li&gt;Reproducible Grafana setup (coming soon)&lt;/li&gt;
  &lt;li&gt;TCP stack tuning (coming soon)&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;


---

## Optimizations at a glance
- Removed per-connection Redis polling in favor of centralized poller
- Deduplication with hashing
- Broadcast using the Fan-out pattern
- Backpressure handling (clear buffer, latest update only)
- Observability with Prometheus + Grafana

---

Last month, I built a real-time leaderboard system in Go that could handle 28,232 concurrent SSE connections - pretty impressive for a Python developer&apos;s first Go project. 

Then I ran a test with actual score updates and watched all of my messages disappear into the void. 

Here&apos;s how I learned about Go&apos;s concurrency model the hard way, and why understanding channels, goroutines, mutexes deeply can make the difference between a system that works in testing and one that works in production.

&gt;Some of the concepts here might be a little difficult to understand, I&apos;m working on a second blog to explain what those are, how I&apos;m using them and how all of those connect, their tradeoffs.


---

## The goal
I wanted to build a real-time, read-mostly(write privilege to certain nodes) leaderboard that could push the same top-K view to a lot of clients over SSE.

**Targets (going in):**
- Handle 10k+ concurrent SSE connections on one machine.
- Every connected client gets every broadcast.
- Built-in observability as a sanity check (visibility).

**Outcome (spoiler)**: I hit 28,232 concurrent connections, then discovered my “broadcast” only delivered to one consumer. The post walks through how I found that flaw and fixed it (correct fan-out, dedup, decoupling, backpressure).

---

## The system
![overall architecture](@/assets/images/leaderboard/architecture.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Architecture&lt;/u&gt;&lt;/p&gt;

The system is laid out into the write and read path.

### Write path (Game server -&gt; Redis)
Game servers submit game results to an endpoint, which updates the Redis sorted set (source of truth).

### Read path (Redis -&gt; clients)
A single goroutine polls Redis, detects changes, which is broadcasted (latest top-K players) to per-client buffered channels that feed SSE connections.

Other read access patterns exist, but not the topic of discussion in this post.


### What this design focuses on
- The same update goes to all connected clients(correctness)
- Latest-only semantics are fine (skip intermediates under load - latest state matters, not the intermediates)
- Slow consumers don’t stall the producer (backpressure)

### What this design is not concerned with
- Intermediate updates when clients reconnect. Heavy emphasis on latest state view

Given the use case, access patterns and scaling concerns, I decided to opt for a SSE based (over websocket, polling) message sending for real time leaderboard updates.

I started implementing the main SSE handling code, which I did progressively in stages, improving and learning along the way.

![Legend](@/assets/images/leaderboard/legend.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Legend for future diagrams&lt;/u&gt;&lt;/p&gt;


---

## Per connection Redis polling
In this version, the SSE handler function itself contains the Redis read call, a JSON marshaling sequence.

### Implemented
1. Simple handler code
2. Redis reads (polling every 2 seconds)
3. Basic active connection metrics

![Initial Redis polling architecture](@/assets/images/leaderboard/init_coupled.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Initial implementation - note that each connection polls Redis&lt;/u&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;&lt;p align=&quot;center&quot;&gt;&lt;u&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/u&gt;&lt;/p&gt;&lt;/summary&gt;

```go
package backend

import (
	&quot;encoding/json&quot;
	&quot;fmt&quot;
	&quot;leaderboard/src/metrics&quot;
	&quot;leaderboard/src/redisclient&quot;
	&quot;time&quot;

	&quot;github.com/gin-gonic/gin&quot;
)

func StreamLeaderboard(c *gin.Context) {
	c.Writer.Header().Set(&quot;Content-Type&quot;, &quot;text/event-stream&quot;)
	c.Writer.Header().Set(&quot;Cache-Control&quot;, &quot;no-cache&quot;)
	c.Writer.Header().Set(&quot;Connection&quot;, &quot;keep-alive&quot;)
	c.Writer.Flush()

	metrics.ActiveSSEConnections.Inc()

	ticker := time.NewTicker(2 * time.Second)
	defer ticker.Stop()

	for {
		select {
		case &lt;-ticker.C:
			results, err := redisclient.GetTopNPlayers(c, &quot;leaderboard&quot;, 10)
			if err != nil {
				continue
			}

			data, _ := json.Marshal(results)
			fmt.Fprintf(c.Writer, &quot;data: %s\n\n&quot;, data)
			c.Writer.Flush()

		case &lt;-c.Request.Context().Done():
			return
		}
	}
}
```
&lt;/details&gt;  
&lt;br&gt;

### Why it&apos;s flawed
1. Redis reads coupled with the handler function - each SSE connection will have it&apos;s own read call every 2 seconds. That&apos;s immensely wasteful (considering the same data needs to go out). Higher queries per second to Redis soon becomes an issue that could turn into a scaling bottleneck.
2. No deduplication - every single read gets converted to a data object to be sent over the network.
3. Just returning when the connection is closed - connection closing needs to be done properly.
4. Unbuffered channels - may block.
5. JSON encoding on each tick.
6. Active connections never decremented.
7. Improper error handling.
8. At scale, Redis connection pool will be exhausted.

---

## Deduplication to reduce waste
Contains a basic deduplication measure - compare two json objects, object by object. This reduces the data sent, helpful to reduce network usage.

There&apos;s also the addition of other metrics to help understand how things are working, what failures are occuring in the app.

Config variables are used more frequently in this update, reducing the number of hardcoded values needed. There&apos;s still a few places where they are used, which may need to be modified later.

### What improved
- Deduplication by comparing JSON objects
- Replace harcoded values by configs
- Improved Prometheus metrics

&lt;details&gt;
&lt;summary&gt;&lt;p align=&quot;center&quot;&gt;&lt;u&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/u&gt;&lt;/p&gt;&lt;/summary&gt;

```diff
 import (
 	&quot;encoding/json&quot;
 	&quot;fmt&quot;
+	&quot;leaderboard/src/config&quot;
 	&quot;leaderboard/src/metrics&quot;
 	&quot;leaderboard/src/redisclient&quot;
 	&quot;time&quot;
 )
 
 func StreamLeaderboard(c *gin.Context) {
+	metrics.ConcurrentClients.Inc()
+	defer metrics.ConcurrentClients.Dec()
+
 	c.Writer.Header().Set(&quot;Content-Type&quot;, &quot;text/event-stream&quot;)
 	c.Writer.Header().Set(&quot;Cache-Control&quot;, &quot;no-cache&quot;)
 	c.Writer.Header().Set(&quot;Connection&quot;, &quot;keep-alive&quot;)
 	c.Writer.Flush()
 
 	metrics.ActiveSSEConnections.Inc()
+	defer metrics.ActiveSSEConnections.Dec()
 
-	ticker := time.NewTicker(2 * time.Second)
+	ticker := time.NewTicker(5 * time.Second)
 	defer ticker.Stop()
 
+	var lastData []byte
+
 	for {
 		select {
 		case &lt;-ticker.C:
-			results, err := redisclient.GetTopNPlayers(c, &quot;leaderboard&quot;, 10)
+			results, err := redisclient.GetTopNPlayers(c, &quot;leaderboard&quot;, int64(config.AppConfig.Leaderboard.TopPlayersLimit))
+			if err != nil {			metrics.RedisOperationErrors.WithLabelValues(&quot;get_top_players&quot;).Inc()
+			}
+			jsonStart :=time.Now()
+			data, err := json.Marshal(results)
+			metrics.JSONMarshalDuration.Observe(time.Since(jsonStart).Seconds())
+    
 			if err != nil {
-				continue
+				metrics.JSONErrors.WithLabelValues(&quot;marshal&quot;).Inc()
+				return
+			}
+			if !jsonEqual(data, lastData) {
+				fmt.Fprintf(c.Writer, &quot;data: %s\n\n&quot;, data)
+				c.Writer.Flush()
+				metrics.SSEMessagesSent.Inc()
+				lastData = data
 			}
-
-			data, _ := json.Marshal(results)
-			fmt.Fprintf(c.Writer, &quot;data: %s\n\n&quot;, data)
-			c.Writer.Flush()
 
 		case &lt;-c.Request.Context().Done():
+			metrics.DroppedSSEConnections.Inc()
 			return
 		}
 	}
+}
+
+func jsonEqual(a, b []byte) bool {
+	return string(a) == string(b)
 }
```
&lt;/details&gt;  
&lt;br&gt;

### Why it&apos;s flawed
1. JSON object comparison is expensive (in case of large blobs, which may be encountered here).
2. Marshaling still happens on each read, and each read is still coupled, which means high QPS on Redis.

---

## Decoupling Redis reads
Instead of having each client hammering Redis, I implemented a broadcast pattern using goroutines and channels (single producer, multiple consumers).

### What improved
1. Separate goroutine for Redis polling
2. Deduplication using hashing (SHA256)
3. JSON marshaling done only when data is changed

![Fixed broadcast overview](@/assets/images/leaderboard/broadcast_overview.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;How it should have worked&lt;/u&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;&lt;p align=&quot;center&quot;&gt;&lt;u&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/u&gt;&lt;/p&gt;&lt;/summary&gt;

```diff
 package backend
 
 import (
+	&quot;context&quot;
+	&quot;crypto/sha256&quot;
 	&quot;encoding/json&quot;
 	&quot;fmt&quot;
 	&quot;leaderboard/src/config&quot;
 	&quot;leaderboard/src/metrics&quot;
 	&quot;leaderboard/src/redisclient&quot;
+	&quot;sync&quot;
 	&quot;time&quot;
 
 	&quot;github.com/gin-gonic/gin&quot;
 )
 
+type LeaderboardUpdate struct {
+	Data []byte
+	Hash [32]byte
+}
+
+type LeaderboardBroadcaster struct {
+	// channel for all SSE conns to listen to get lb updates
+	broadcastChan chan LeaderboardUpdate
+
+	ctx    context.Context
+	cancel context.CancelFunc
+	wg     sync.WaitGroup
+}
+
+func CreateLeaderboardBroadcaster() *LeaderboardBroadcaster {
+	ctx, cancel := context.WithCancel(context.Background())
+
+	lb := &amp;LeaderboardBroadcaster{
+		// make the channel buffered - clients may be slow, messages can pile up
+		broadcastChan: make(chan LeaderboardUpdate, config.AppConfig.Server.BroadcastBufferSize),
+		ctx:           ctx,
+		cancel:        cancel,
+	}
+
+	lb.wg.Add(1)
+	go lb.detectLeaderboardChanges()
+	return lb
+}
+
+func (lb *LeaderboardBroadcaster) StopBroadcast() {
+	lb.cancel()
+	lb.wg.Wait()
+	close(lb.broadcastChan)
+}
+
+func (lb *LeaderboardBroadcaster) GetBroadcastChannel() &lt;-chan LeaderboardUpdate {
+	return lb.broadcastChan
+}
+
+// package level var
+var broadcaster *LeaderboardBroadcaster
+
+func SetBroadcaster(b *LeaderboardBroadcaster) {
+	broadcaster = b
+}
+
+func (lb *LeaderboardBroadcaster) detectLeaderboardChanges() {
+	defer lb.wg.Done()
+
+	// TODO: check this time conversion
+	ticker := time.NewTicker(time.Duration(config.AppConfig.Server.PollingIntervalSeconds) * time.Second)
+	defer ticker.Stop()
+
+	var lastHash [32]byte
+
+	for {
+		select {
+		case &lt;-ticker.C:
+			results, err := redisclient.GetTopNPlayers(lb.ctx, &quot;leaderboard&quot;, int64(config.AppConfig.Leaderboard.TopPlayersLimit))
+			if err != nil {
+				metrics.RedisOperationErrors.WithLabelValues(&quot;get_top_players&quot;).Inc()
+				config.Error(&quot;Failed to fetch leaderboard from Redis.&quot;, map[string]any{&quot;Error&quot;: err, &quot;source&quot;: &quot;/stream-leaderboard&quot;})
+				continue
+			}
+
+			resultString := fmt.Sprintf(&quot;%+v&quot;, results)
+			currentHash := sha256.Sum256([]byte(resultString))
+
+			if currentHash != lastHash {
+				lastHash = currentHash
+
+				jsonStart := time.Now()
+				jsonData, err := json.Marshal(results)
+				if err != nil {
+					config.Error(&quot;JSON marshaling error&quot;,
+						map[string]any{&quot;Error&quot;: err, &quot;source&quot;: &quot;/stream-leaderboard&quot;, &quot;results&quot;: results})
+					metrics.JSONErrors.WithLabelValues(&quot;marshal&quot;).Inc()
+					continue
+				}
+				metrics.JSONMarshalDuration.Observe(time.Since(jsonStart).Seconds())
+
+				update := LeaderboardUpdate{
+					Data: jsonData,
+					Hash: currentHash,
+				}
+
+				// non blocking send
+				select {
+				case lb.broadcastChan &lt;- update:
+				default:
+				}
+			}
+		case &lt;-lb.ctx.Done():
+			return
+		}
+	}
+}
+
 func StreamLeaderboard(c *gin.Context) {
 	metrics.ConcurrentClients.Inc()
 	defer metrics.ConcurrentClients.Dec()
 	c.Writer.Flush()
 
 	metrics.ActiveSSEConnections.Inc()
+	config.Info(&quot;New SSE conn&quot;, map[string]any{&quot;Num active clients&quot;: metrics.ActiveSSEConnections})
 	defer metrics.ActiveSSEConnections.Dec()
 
-	ticker := time.NewTicker(5 * time.Second)
-	defer ticker.Stop()
-
-	var lastData []byte
+	broadcastChan := broadcaster.GetBroadcastChannel()
 
 	for {
 		select {
-		case &lt;-ticker.C:
-			results, err := redisclient.GetTopNPlayers(c, &quot;leaderboard&quot;, int64(config.AppConfig.Leaderboard.TopPlayersLimit))
-			if err != nil {
-				metrics.RedisOperationErrors.WithLabelValues(&quot;get_top_players&quot;).Inc()
-			}
-			jsonStart := time.Now()
-			data, err := json.Marshal(results)
-			metrics.JSONMarshalDuration.Observe(time.Since(jsonStart).Seconds())
-
-			if err != nil {
-				metrics.JSONErrors.WithLabelValues(&quot;marshal&quot;).Inc()
+		case update, ok := &lt;-broadcastChan:
+			if !ok {
+				// channel closed
 				return
 			}
-			if !jsonEqual(data, lastData) {
-				fmt.Fprintf(c.Writer, &quot;data: %s\n\n&quot;, data)
-				c.Writer.Flush()
-				metrics.SSEMessagesSent.Inc()
-				lastData = data
-			}
+			fmt.Fprintf(c.Writer, &quot;data: %s\n\n&quot;, update.Data)
+			c.Writer.Flush()
+			metrics.SSEMessagesSent.Inc()
 
 		case &lt;-c.Request.Context().Done():
 			metrics.DroppedSSEConnections.Inc()
+			config.Info(&quot;Closed SSE conn&quot;, map[string]any{&quot;open&quot;: metrics.ActiveSSEConnections})
 			return
 		}
 	}
 }
-
-func jsonEqual(a, b []byte) bool {
-	return string(a) == string(b)
-}
```
&lt;/details&gt;  
&lt;br&gt;

Benchmarking this with a [Go script that opens persistent SSE connections (based off of Eran Yanay&apos;s Gophercon talk)](https://github.com/eranyanay/1m-go-websockets) got me 28,232 concurrent connections per second.

![Grafana dashboard with 28232 connections](@/assets/images/leaderboard/decoupled_redis_dashboard.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Scaled to 28,232 SSE connections after decoupling&lt;/u&gt;&lt;/p&gt;

I was pretty happy with these results, especially since the connections maxed out due to hitting the upper limit of outbound ports on Linux, not due to memory or CPU bottlenecks.

_Then I realized I had a major bug which I forgot to test for._

![Flawed broadcast explanation](@/assets/images/leaderboard/incorrect_fanout.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;The error in my broadcasting implementation&lt;/u&gt;&lt;/p&gt;

Dashed arrows here show that all connections but one do not get a message.

### Why it&apos;s flawed
1. Fan out implementation is incorrect - One message can be consumed by only a single consumer (goroutine) in a channel. This effectively means - any event added to the channel is consumed by a single goroutine, others never get the message.
2. Error handling gaps.
3. Initial message is not sent (messages sent only when the ticker event occurs and there is an update). We should have the client receive some leaderboard message on connect, even if it&apos;s somewhat stale.

---

## Fixing Fan Out
The previous version had a single channel for all broadcast messages to be sent, which would effectively be received by a single goroutine (client). This defeats the purpose of my app, for which I need every single client to receive the same message.

### What improved
- One buffered channel per client, stored in a map
- Mutex applied whenever map is modified, preventing race conditions
- Leaderboard change leads to broadcasting message to all channels

![Fixed fanout explanation](@/assets/images/leaderboard/fixed_broadcast.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Correct fan out (broadcast) implementation&lt;/u&gt;&lt;/p&gt;

Alternative implementations might use Pubsub with popular providers, but it&apos;s probably overkill for this use case.

![Grafana dashboard with fanout fix](@/assets/images/leaderboard/fanout_dash_conns.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;28232 connections with fixed fan out code&lt;/u&gt;&lt;/p&gt;

![Grafana dashboard for game submissions and SSE messages received](@/assets/images/leaderboard/fanout_game_submissions.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Game submissions are being received by the SSE clients - fan out works&lt;/u&gt;&lt;/p&gt;

&lt;details&gt;
&lt;summary&gt;&lt;p align=&quot;center&quot;&gt;&lt;u&gt;&lt;b&gt;Click to view code&lt;/b&gt;&lt;/u&gt;&lt;/p&gt;&lt;/summary&gt;

```diff
-type LeaderboardBroadcaster struct {
-	// channel for all SSE conns to listen to get lb updates
-	broadcastChan chan LeaderboardUpdate
+type Client struct {
+	ID      int64
+	channel chan LeaderboardUpdate
+	ctx     context.Context
+	cancel  context.CancelFunc
+}
 
-	ctx    context.Context
-	cancel context.CancelFunc
-	wg     sync.WaitGroup
+type LeaderboardBroadcaster struct {
+	clients       map[int64]*Client
+	clientsMutex  sync.RWMutex
+	ctx           context.Context
+	cancel        context.CancelFunc
+	wg            sync.WaitGroup
+	clientCounter int64
 }
 
 func CreateLeaderboardBroadcaster() *LeaderboardBroadcaster {
 	lb := &amp;LeaderboardBroadcaster{
 		// make the channel buffered - clients may be slow, messages can pile up
-		broadcastChan: make(chan LeaderboardUpdate, config.AppConfig.Server.BroadcastBufferSize),
-		ctx:           ctx,
-		cancel:        cancel,
+		clients: make(map[int64]*Client),
+		ctx:     ctx,
+		cancel:  cancel,
 	}
 	lb.wg.Add(1)
 	return lb
 }
 
-func (lb *LeaderboardBroadcaster) GetBroadcastChannel() &lt;-chan LeaderboardUpdate {
-	return lb.broadcastChan
+// Create new channel for client, add to map
+func (lb *LeaderboardBroadcaster) AddClient() (*Client, &lt;-chan LeaderboardUpdate) {
+	lb.clientsMutex.Lock()
+	lb.clientCounter++
+	ctx, cancel := context.WithCancel(lb.ctx)
+
+	client := &amp;Client{
+		ID:      lb.clientCounter,
+		ctx:     ctx,
+		cancel:  cancel,
+		channel: make(chan LeaderboardUpdate, config.AppConfig.Server.BroadcastBufferSize),
+	}
+	lb.clients[lb.clientCounter] = client
+	lb.clientsMutex.Unlock()
+
+	return client, client.channel
+}
+
+// remove specific client channel - closed connection
+func (lb *LeaderboardBroadcaster) RemoveClient(client *Client) {
+	lb.clientsMutex.Lock()
+	defer lb.clientsMutex.Unlock()
+
+	if _, exists := lb.clients[client.ID]; exists {
+		delete(lb.clients, client.ID)
+		client.cancel()
+		close(client.channel)
+	}
+}
+
+// broadcastToAllClients sends an update to all connected clients
+func (lb *LeaderboardBroadcaster) broadcastToAllClients(update LeaderboardUpdate) {
+	lb.clientsMutex.RLock()
+
+	var clientsToRemove []*Client
+
+	// what to do in case Client channel is full, skip this client (to be changed later - add channel clearing mechanism + alerting)
+	for _, client := range lb.clients {
+		select {
+		case client.channel &lt;- update:
+			// sent
+		case &lt;-client.ctx.Done():
+			// clean
+			clientsToRemove = append(clientsToRemove, client)
+		default:
+			metrics.FilledSSEChannels.Inc()
+		}
+	}
+	lb.clientsMutex.RUnlock()
+
+	if len(clientsToRemove) &gt; 0 {
+		lb.clientsMutex.Lock()
+		for _, client := range clientsToRemove {
+			if _, exists := lb.clients[client.ID]; exists {
+				delete(lb.clients, client.ID)
+				client.cancel()
+				close(client.channel)
+			}
+		}
+		lb.clientsMutex.Unlock()
+	}
 }
 
 // package level var
 	broadcaster = b
 
+// poll redis, dedup leaderboard values, push to broadcast to all clients
 func (lb *LeaderboardBroadcaster) detectLeaderboardChanges() {
 	defer lb.wg.Done()
-
-	// TODO: check this time conversion
 	ticker := time.NewTicker(time.Duration(config.AppConfig.Server.PollingIntervalSeconds) * time.Second)
 	defer ticker.Stop()
 					Hash: currentHash,
 				}
 
-				// non blocking send
-				select {
-				case lb.broadcastChan &lt;- update:
-				default:
-				}
+				lb.broadcastToAllClients(update)
 			}
 		case &lt;-lb.ctx.Done():
 			return
 	c.Writer.Flush()
 
 	metrics.ActiveSSEConnections.Inc()
-	config.Info(&quot;New SSE conn&quot;, map[string]any{&quot;Num active clients&quot;: metrics.ActiveSSEConnections})
+	config.Info(&quot;New SSE conn&quot;, map[string]any{})
 	defer metrics.ActiveSSEConnections.Dec()
 
-	broadcastChan := broadcaster.GetBroadcastChannel()
+	client, channel := broadcaster.AddClient()
+	defer broadcaster.RemoveClient(client)
 
 	for {
 		select {
-		case update, ok := &lt;-broadcastChan:
+		case update, ok := &lt;-channel:
 			if !ok {
 				// channel closed
 				return
 			fmt.Fprintf(c.Writer, &quot;data: %s\n\n&quot;, update.Data)
 			c.Writer.Flush()
 			metrics.SSEMessagesSent.Inc()
-
 		case &lt;-c.Request.Context().Done():
 			metrics.DroppedSSEConnections.Inc()
 			config.Info(&quot;Closed SSE conn&quot;, map[string]any{&quot;open&quot;: metrics.ActiveSSEConnections})
```
&lt;/details&gt;  
&lt;br&gt;

### What can be improved
1. Global variable usage for leaderboardBroadcaster
2. Testing - no tests yet (except for load testing benchmarks)
3. Flush buffer (empty) if found filled, add in latest update (that&apos;s required)
4. Figure out how to deal with surges and spikes in traffic
5. Send an initial update to client when connecting to /stream-leaderboard

---

## Final version

As implemented in [this PR](https://github.com/pranshu-raj-211/leaderboard/pull/7).

![Sequence diagram of fanout](@/assets/images/leaderboard/sequence_diagram_leaderboard.png)
&lt;p align=&quot;center&quot;&gt;&lt;u&gt;Sequence diagram of a SSE request&lt;/u&gt;&lt;/p&gt;

What improved:
1. Added backpressure handling (clear buffer, add latest update - intermediates do not matter in this application)
2. Used JSON marshaling before hash based dedup - strings may be different as Go&apos;s map is unordered
3. Heartbeats to detect dead clients
4. Fix bugs in Redis code
5. Use Dependency Injection instead of global var for leaderboardBroadcaster
6. Send an update to the client as soon as it connects to the SSE endpoint `/stream-leaderboard`


### What can be improved
1. Still need to figure out how to deal with spikes in traffic (100s of clients try to connect at once)

---

## Learnings
1. Go channels are point to point, not broadcast
2. You really need to have different kinds of tests for sanity checks
3. Sometimes test code can be the issue (earlier load test scripts were a bottleneck)
4. Connection pools matter for external systems like Redis, databases
5. Ephemeral ports can be a hidden scaling limit
6. Observability is not optional

---

## Future work
### Immediate fixes
1. Send a leaderboard update immediately on connecting to SSE stream
2. Understand how to cleanly shut down connections

### Scaling improvements
1. Add retries to SSE send, add jitter to prevent thundering herd
2. Learn how to deal with spikes in traffic
3. Explore horizontal scaling in this kind of system

### Production considerations
1. Reduce Grafana and Prometheus interval
2. Throughput metrics - per time interval
3. Validation for score updates - with proper error messages
4. Authentication for game servers
5. Postgres integration (leaderboard historical data - aggregatiion)
6. Redis/database based id to username translation

---

## Implementation notes
### Tech stack
I tried to choose the most sensible tech stack, and ended up with:
- Go for the backend (high concurrency, more fine grained control and less resource usage than python)
- Redis for the sorted set data structure. Could have implemented in pure Go, but overkill at this stage.
- Prometheus and Grafana for observability - wanted to learn how observability works and get hands on experience with it.
- Docker and Docker compose for coordinating everything. Wrote a [blog on docker optimization for this project](https://blog.pranshu-raj.in/posts/optimizing-docker-builds/).
- PostgreSQL for persistent storage, historical time queries (in the works).

---

## Further reading
1. [Backpressure](https://news.ycombinator.com/item?id=29366275)
2. [Broadcasting in Go](https://stackoverflow.com/questions/36417199/how-to-broadcast-message-using-channel)
3. [Dealing with ephemeral port exhaustion](https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/)
4. [Understanding how goroutines work](https://x.com/RituRaj12797/status/1954228899296010716)
5. [Race conditions in Go](https://thinhdanggroup.github.io/golang-race-conditions/)
6. [Redis pipelines and transactions](https://redis.io/docs/latest/develop/using-commands/pipelining/)
7. [Redis connection pooling](https://redis.io/docs/latest/develop/clients/pools-and-muxing/)
8. [The c10k problem](https://www.kegel.com/c10k.html)
9. [Go concurrency patterns - talk by Rob Pike](https://www.youtube.com/watch?v=f6kdp27TYZs)</content:encoded></item><item><title>Blogs (and discussions, threads) that I found interesting</title><link>https://blog.pranshu-raj.in/posts/interesting-blogs/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/interesting-blogs/</guid><description>Collection of blogs, discussions, threads on twitter I found interesting, which I intend to explore deeply.</description><pubDate>Thu, 31 Jul 2025 17:25:43 GMT</pubDate><content:encoded>Ordered by date of discovery, latest first.

Each entry has the format:

`date, [type] name (link) - short description | #tags`

Date formatted as DD-MM-YYYY.

Type can be one of `blog`, `discussion` or `thread`.

## Important (to deep dive, implement)
- [Post training](https://tokens-for-thoughts.notion.site/post-training-101#264b8b68a46d80e1b9faf7d6c2da2baa)

## 2025
### September
- 23-09-2025, [blog] [setsum](https://avi.im/blag/2025/setsum/) - order agnostic, additive and subtractive checksum
- 23-09-2025, [blog] [replacing a cache service with a db](https://avi.im/blag/2025/db-cache/)
- 23-09-2025, [blog] [tackling alert noise](https://importhuman.me/blog/alert-noise/) - How to build and manage alerting systems for enterprise.
- 11-09-2025, [blog] [backpressure warpstream](https://www.warpstream.com/blog/dealing-with-rejection-in-distributed-systems) - Warpstream explanation on backpressure, how it&apos;s implemented in production systems, what could go wrong and how to fix things.
- 11-09-2025, [blog] [sqlite does not have the n+1 query problem](https://www.sqlite.org/np1queryprob.html) - docs really, explains why sqlite does not have the n+1 query problem (hint - same process as app).
- 11-09-2025, [blog] [joel test for better code](https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/) - trying to improve my own process of building software by implementing some of these ideas.
- 11-09-2025, [blog] [ssh forwarding](https://yashikabadaya.medium.com/ssh-port-forwarding-visualized-dc7e677974a3) - how ssh forwarding works.
- 09-09-2025, [blog] [write ahead log on s3](https://trychroma.com/engineering/wal3) - interesting system detail by chroma that I want to implement.

### August

- 02-08-2025, [blog] [benchmarking postgres](https://planetscale.com/blog/benchmarking-postgres) - Planetscale blog on benchmarking postgres, will likely learn a lot on benchmarking in industry from this | #benchmarking #postgres</content:encoded></item><item><title>Tech I find interesting</title><link>https://blog.pranshu-raj.in/posts/interesting-tech/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/interesting-tech/</guid><description>Collection of tech I found interesting, which I intend to explore sometime in the future.</description><pubDate>Thu, 31 Jul 2025 17:25:43 GMT</pubDate><content:encoded>Ordered by date of discovery, latest first.

Each entry has the format:

`date, name (link) - short description | #tags`

Date formatted as DD-MM-YYYY.

## 2025

### August

- 02-08-2025, [slim](https://github.com/slimtoolkit/slim) - tool to improve Docker image builds, for size, speed, security | #optimization #docker
- 01-08-2025, [VictoriaLogs](https://github.com/VictoriaMetrics/VictoriaLogs) - Fast and easy to use db for logs | #observability #databases</content:encoded></item><item><title>How I Shrunk My Docker Image size by 48x (and Cut Build Time in Half)</title><link>https://blog.pranshu-raj.in/posts/optimizing-docker-builds/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/optimizing-docker-builds/</guid><description>How to optimize (and benchmark) docker image builds for build time and image size.</description><pubDate>Thu, 31 Jul 2025 17:25:43 GMT</pubDate><content:encoded>tl;dr:

Reduced a Go backend Docker image size from 1.29 GB to 27.1MB using `.dockerignore`, Alpine base images, Multi stage builds. That&apos;s a 48x total reduction in image size, also achieved a 46% reduction in build time (from 43.8s to 23.6s).

&lt;div class=&quot;series-box&quot;&gt;

&lt;p&gt;This post is part of a series on my &lt;a href=&quot;https://github.com/pranshu-raj-211/leaderboard&quot;&gt;real-time leaderboard&lt;/a&gt; project&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/optimizing-docker-builds/&quot;&gt; Optimizing Docker Image builds&lt;/a&gt; ( ← you are here)&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/scaling-sse-1m-connections/&quot;&gt;&lt;b&gt;Scaling SSE to 150k connections&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/backpressure/&quot;&gt;&lt;b&gt;Backpressure in Distributed Systems&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/28k-connections-zero-messages/&quot;&gt;&lt;b&gt;Fixing fanout, and other issues&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/exploring-sse/&quot;&gt;&lt;b&gt;Introduction to Server Sent Events(SSE)&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Replacing Redis sorted sets (coming soon)&lt;/li&gt;
  &lt;li&gt;Reproducible Grafana setup (coming soon)&lt;/li&gt;
  &lt;li&gt;TCP stack tuning (coming soon)&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;


---

While working on my [real time leaderboard](https://github.com/pranshu-raj-211/leaderboard) I was making multiple changes every day, testing out how each would improve performance, which led me to scale up to 28232 concurrent SSE connections on a single machine (the default Linux max for outbound ports).

&gt;Linux has 28232 ports for outbound connections by default, can be increased, see [Baeldung](https://www.baeldung.com/linux/increase-max-tcp-ip-connections).

Each change required a rebuild of the backend Docker image. With rapid iteration and long build times, I found myself spending more time waiting than building. I wanted to solve this, and set out to optimize builds for time and speed.


## Initial metrics

Initially my image (for the Go backend server) would be one of the major bottlenecks for build time. I had been using a very basic Go 1.24 base image, with a generic Dockerfile to build the image and run the server during runtime (`CMD`).

```dockerfile
FROM golang:1.24

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY src/ ./src/

RUN go build -o main ./src/main.go

CMD [&quot;./main&quot;]
```


This had terrible performance, with respect to both time and image size:

![baseline dockerfile performance](@/assets/images/baseline_dockerfile_perf.png)


To optimize it, I started learning about what kind of base images were available for Go and reading blogs on best practices to improve the build performance.

To understand why these worked, I dived into Docker&apos;s documentation to figure out how building an image really works.

---

## Image deep dive
Docker images are essentially packaged bundles of configuration files, binaries, and all the things needed to make your application work. It&apos;s not an active component though, for that you need to change it into a container(You run containers from images — and while containers can exist in multiple states -e.g., created, running, exited; the image itself is static).

Every line that you add to a Dockerfile adds a layer to the final image being built. A layer in an image is also an image, and layers are add only, which means even if you wrote a command to remove certain files in your Dockerfile, it&apos;ll only increase the size instead of reducing it. This means you&apos;ll have to be careful and not install any packages you don&apos;t need, remove things in the same command as you are using to install, or use other techniques such as the ones discussed later to reduce size.

### Docker history
Use docker history to get details of each layer inside the final image, provides details of sizes of each layer as well.

![docker history output](@/assets/images/docker_history_op.png)

## Benchmarking setup
The method I used to measure effects of each change is pretty simple. A bash script is run after each change to measure improvements (or degradation), which executes the build step, taking care not to use docker cache. This step is repeated for a number of times, all of the trial outputs are logged to a file for review (also has terminal output).

After working on and improving the benchmarking script(adding better logging, repeated trials and averaging over trials), I ended up with this:

```bash
#!/bin/bash

IMAGE_NAME=${1:-&quot;leaderboard&quot;}
DOCKERFILE=${2:-&quot;Dockerfile&quot;}
TRIALS=${3:-5}

echo &quot;Starting Docker build benchmarking for $IMAGE_NAME : (${TRIALS} trials)&quot;
echo &quot;Testing dockerfile changes - single stage non alpine build, no cache flag&quot;
echo &quot;$(date): Starting docker build benchmarking for $IMAGE_NAME (${TRIALS} trials)&quot; &gt;&gt; logs/docker_benchmark.log
echo &quot;Testing dockerfile changes - single stage non alpine build, no cache flag&quot; &gt;&gt; logs/docker_benchmark.log

total_time_seconds=0
total_size_mb=0
successful_trials=0

for ((i=1; i&lt;=TRIALS; i++)); do
    echo &quot;Running trial $i/$TRIALS...&quot;
    
    start_time=$(date +%s)
    
    build_output=$(docker build --no-cache -t $IMAGE_NAME -f $DOCKERFILE . 2&gt;&amp;1)
    exit_code=$?
    
    end_time=$(date +%s)
    build_time=$((end_time - start_time))
    
    if [[ $exit_code -eq 0 ]]; then
        image_size=$(docker images $IMAGE_NAME --format &quot;{{.Size}}&quot;)
        
        echo &quot;Trial $i: Success - ${build_time}s, $image_size&quot;
        echo &quot;Trial $i: build time ${build_time}s image size $image_size time start $start_time time end $end_time&quot; &gt;&gt; logs/docker_benchmark.log
        
        total_time=$((total_time + build_time))
        size_mb_total=$(awk &quot;BEGIN {print $size_mb_total + $image_size}&quot;)
        successful_trials=$((successful_trials + 1))
    else
        echo &quot;Trial $i: Failed (exit code: $exit_code)&quot;
        echo &quot;Trial $i: BUILD FAILED (exit code: $exit_code) time start $start_time time end $end_time&quot; &gt;&gt; logs/docker_benchmark.log
        echo &quot;$(date): $build_output&quot;&gt;&gt;logs/error.log
        echo &quot;$(date): $build_output \n --------------------&quot;
    fi
done

echo &quot;=================================&quot;
if [[ $successful_trials -gt 0 ]]; then
    avg_time=$(awk &quot;BEGIN {print $total_time / $successful_trials}&quot;)
    avg_size_mb=$(awk &quot;BEGIN {print $size_mb_total / $successful_trials}&quot;)
    
    echo &quot;Successful trials: $successful_trials/$TRIALS&quot;
    echo &quot;Average time: ${avg_time}s&quot;
    echo &quot;Average size: $avg_size_mb mb&quot;
    
    echo &quot;$(date): Average results - time ${avg_time}s size - ${avg_size_mb}mb (${successful_trials}/${TRIALS} successful)&quot; &gt;&gt; logs/docker_benchmark.log
else
    echo &quot;All trials failed!&quot;
    echo &quot;$(date): All trials failed!&quot; &gt;&gt; logs/docker_benchmark.log
fi
echo &quot;=================================&quot;

echo &quot;Cleaning up Docker resources&quot;
docker builder prune -af
docker image rm -f $IMAGE_NAME 2&gt;/dev/null
docker container prune -f
docker volume prune -f
docker network prune -f
```

At the start of the file I&apos;ve defined some variables, which can be passed as args while running the bash script. There&apos;s a few echo commands after that, which are used to signal the start of a test run, logging the change done, other details of the current run.

---

## Optimizations 
### 1. .dockerignore

Like the `.gitignore` file prevents unnecessary files from being tracked in Git, the `.dockerignore` file prevents the files and directories mentioned inside it from being included in the image, when the parent directory is copied to build an image.

```dockerignore
.git
.gitattributes
.gitignore

.env

docker-compose.yml
Dockerfile

logs/
grafana/
prometheus/

.vscode/
```

#### Why These Specific Files Are Excluded
Let&apos;s break down each entry in my .dockerignore and why it matters:
- git metadata: Containers do not need git metadata to run, can take up significant space in repositories with lots of commits.
- .env: Prevent secrets from being copied to image. Set env vars through other means (set configs at runtime).
- docker-compose.yml and Dockerfile: Required for building, but not during runtime.
- /logs, /grafana, /prometheus: The directories having the biggest space usage after src. `/grafana` and `/prometheus` contain configs to ensure reproducibility of dashboards, ensuring no setup is needed, just starting with compose is enough.
- .vscode: IDE specific configs.

Get templates of .dockerignore [here](https://github.com/garygitton/dockerignore).

#### Performance improvements:

![dockerignore benchmark](@/assets/images/dockerignore_based.png)

Since I&apos;m already importing only the required files (src, go.mod), this step does not have much of a difference. Differences created due to this would be of the order of a few megabytes (in my case), which is overshadowed by the size of the final image (1.29 GB). There is however a build time improvement (cannot rely on it too much, fluctuates between runs, but average still decreased).


### 2. Smaller base images
I&apos;ve used `alpine` and `slim` based base images for python before, but did not know they existed until I wandered onto the [Dockerhub repository for Go](https://hub.docker.com/_/golang).

These contain the bare essentials to run applications, so they&apos;re much smaller. A downside to this is that they don&apos;t have many of the tools required to compile some packages (C compilers among many other packages).

Dockerfile:
```dockerfile
FROM golang:1.24-alpine

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY src/ ./src/

RUN go build -o main ./src/main.go

CMD [&quot;./main&quot;]
```

This provides a significant improvement:
![alpine build benchmark](@/assets/images/alpine_build.png)

That&apos;s almost a **2x improvement in build size**.


### 3. Multi stage builds
Multi stage builds use multiple `FROM` statements in a single Dockerfile, which allows you to just take what you need from the previous images (binaries, configs), without having the overhead of packages and tools required to build those binaries. This is extremely convenient because the builder images can be bulky and have lots of packages in order to build the app, but the final binaries produced will be copied in a much smaller base image, and can be used to dramatically cut down on resource usage.

```dockerfile
FROM golang:1.24-alpine AS builder

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY config.yaml .
COPY src/ ./src/

RUN go build -o main ./src/main.go

FROM alpine:latest

WORKDIR /app

COPY --from=builder /app/main .
COPY --from=builder /app/config.yaml .

CMD [&quot;./main&quot;]
```

![multi stage build benchmark](@/assets/images/multi_stage_build.png)

That&apos;s an **improvement of ~26x**.

And an **overall improvement of ~48x in build size** (1.29GB to 27.1MB).

I know that this configuration can still be improved, such as the copy command in the multi stage build alpine image can just copy both the code and configs in a single line.

---
## Results Summary
After applying all the optimizations step by step, here&apos;s how the image size and average build time changed.

| Build Stage          | Size       | Avg Build Time |
| -------------------- | ---------- | -------------- |
| Baseline             | 1.29 GB    | 43.8s          |
| + `.dockerignore`         | 1.29 GB    | 28.4s          |
| + Alpine + `.dockerignore`      | 699 MB     | 27s            |
| + Multi-stage + Alpine + `.dockerignore` | **27.1MB** | **23.6s**      |

The real gains in image size came from switching to Alpine and combining it with multi stage builds. `.dockerignore` did not improve image size, but it had a considerable effect on build time.

---

## Production considerations
Although this optimization journey shows a bunch of ways to improve builds, it does not use a great feature of Docker, the caching layer.

Every time you build an image, Docker caches the layers so that it does not need to be rebuilt each time, saving a lot of resources.

This however, does not work when a change occurs, let&apos;s say we&apos;re copying config files, and a value in the config needed to be changed. The layers (inclusive of the copy config line) after the copy line will need to be rebuilt, as a dependency has been changed. 

This is why ordering least frequently changed operations (go mod copy, install, config copy) should be done in the layers before the frequently changing operations (source code copy, binary build).

&gt;Additional reduction in size can be achieved by opting for scratch or distroless base images, but those have some drawbacks (missing packages - might now be able to do a lot of stuff out of the box).

## Conclusion
This was a fun side quest that taught me how Docker really builds images under the hood. The impact of optimizing Docker builds is huge - faster iteration, lower CI/CD costs, and leaner deploys; all especially valuable for small teams or fast-moving solo devs.

By understanding Docker image concepts, using `.dockerignore`, choosing the a leaner base image, and combining it with multi-stage builds, I reduced my Go backend image from 1.29GB to 27.1MB - a 48x reduction in image size and a 46% reduction in build time.

If you&apos;re working on performance-critical systems or just want tighter feedback loops, I strongly recommend auditing your Docker setup. Small improvements in how you build today can pay off massively as your project scales.

&gt;P.S.: I learnt of an even better way to improve docker images, this one&apos;s automatic - try [slim](https://github.com/slimtoolkit/slim).


## Where to go from here?
I&apos;d highly recommend reading the following:
- [Buildkit](https://docs.docker.com/build/buildkit/)
- [Docker cache](https://docs.docker.com/build/cache/)
- [Layer caching in CI](https://testdriven.io/blog/faster-ci-builds-with-docker-cache/)</content:encoded></item><item><title>Building an AI LinkedIn Sourcing Agent (Full version)</title><link>https://blog.pranshu-raj.in/posts/linkedin-scraping-full/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/linkedin-scraping-full/</guid><description>How I built a complete recruiting pipeline that finds candidates, scores them intelligently, and generates personalized outreach</description><pubDate>Tue, 01 Jul 2025 07:22:48 GMT</pubDate><content:encoded>## The Challenge: Round two with an old nemesis

Recruiting is broken. Finding the right candidates is like searching for needles in a haystack, and when you do find them, your generic LinkedIn message gets lost in their inbox with 50 others.

For Synapse&apos;s AI hackathon, the challenge was to &quot;Build a LinkedIn Sourcing Agent that finds profiles, scores candidates using AI, and generates personalized outreach messages.&quot;

Two years ago, I tried building something similar: a LinkedIn scraper combined with a job recommendation engine. I was going to scrape LinkedIn jobs, match them to user resumes, and use that to reduce information overload in job searching. LinkedIn&apos;s anti-scraping measures crushed that dream within days.

Back then, I had less technical knowledge but even with perfect execution, it wouldn&apos;t have worked. LinkedIn&apos;s defenses are just too aggressive. I pivoted to scraping job postings from Indeed and YCombinator instead.

That experience taught me LinkedIn scraping is essentially impossible without using paid solutions. So when this hackathon challenge came up, I knew what not to do.


## What I Built

Instead of another keyword-matching tool, I built something that tries to think like a recruiter:

`Job Description → Smart Search → Profile Scraping → AI Scoring → Personalized Messages`

The core components:

- Multi-source discovery: LinkedIn + GitHub profile combination
- 6-factor scoring algorithm: Because fit isn&apos;t just about keywords
- AI-powered outreach: Llama via Groq for personalized messaging
- Async processing: Handle multiple jobs without blocking

You can check out the full code [here](https://github.com/pranshu-raj-211/score_profiles).

### The Architecture

I used FastAPI for the backend with async processing throughout. The data flow looks like this:

1. **Search Query Generation**: Transform job descriptions into effective search queries
2. **Profile Discovery**: SerpAPI to find profiles - LinkedIn and Github URLs
3. **Data Fetch**: RapidAPI&apos;s LinkedIn service (because scraping LinkedIn directly is a nightmare), HTTP calls for Github
4. **Data Extraction**: Custom logic (using BeautifulSoup for Github)
5. **Intelligent Scoring**: 6-factor algorithm with confidence levels
6. **Message Generation**: Llama-powered personalized outreach

![data flow diagram](@/assets/images/data_flow.png)
*Data flow diagram*

![architecture](@/assets/images/architecture_lnkd_scraper.png)
*App architecture*

### The Scoring Algorithm

| Factor | Weight | What It Measures |
|--------|--------|------------------|
| **Education** | 20% | Elite schools get higher scores |
| **Career Trajectory** | 20% | Clear progression vs. lateral moves |
| **Company Relevance** | 15% | Relevant industry experience |
| **Skill Match** | 25% | How well skills align with job requirements |
| **Location** | 10% | Geographic fit for the role |
| **Tenure** | 10% | Stability vs. job hopping patterns |

Since it&apos;s using LLMs, it understands context. A engineer who moved from startup → Google → senior role gets a higher trajectory score than someone who stayed at the same level for years.

### Smart Outreach Generation

Generic LinkedIn messages get ignored. My solution uses Llama (via Groq) to create personalized messages that:

- Reference specific experience and achievements
- Connect candidate background to job requirements  
- Feel personal, not templated
- Include clear next steps

**Example output:**
*&quot;Hi John, I noticed your work at OpenAI on transformer architectures and your ICML 2023 paper on attention mechanisms. Your blend of research and production ML experience is exactly what Windsurf needs for their ML Research Engineer role...&quot;*

### Sample Results

Testing with the Windsurf ML Research Engineer role:

```json
{
  &quot;name&quot;: &quot;John Doe&quot;,
  &quot;fit_score&quot;: 8.7,
  &quot;confidence&quot;: 0.91,
  &quot;score_breakdown&quot;: {
    &quot;education&quot;: 9.2,    // Stanford PhD in ML
    &quot;trajectory&quot;: 8.5,   // Research → Engineering → Lead
    &quot;company&quot;: 9.0,      // Google, OpenAI experience
    &quot;skills&quot;: 9.1,       // Perfect LLM/transformer match
    &quot;location&quot;: 10.0,    // Mountain View based
    &quot;tenure&quot;: 7.8        // Healthy 2-3 year progression
  },
  &quot;outreach_message&quot;: &quot;Hi John, I came across your transformer optimization work at Google Research, particularly your ICML paper on efficient attention mechanisms. Your move from research to production ML at OpenAI shows the exact blend we need at Windsurf...&quot;
}
```

**Why this works for this JD:**

- Specific achievements (ICML paper)
- Career progression understanding
- Clear connection to role requirements

---

## Key Technical Decisions

### Why These Choices Mattered

**Llama via Groq instead of OpenAI**: Faster, cheaper, and surprisingly good at personalized messaging

**RapidAPI for LinkedIn data**: More reliable than web scraping, cleaner data extraction

**Async processing with FastAPI**: Can handle multiple jobs in parallel without blocking

**MongoDB for storage**: Perfect for flexible candidate profiles and easy scaling

**Smart caching**: Avoids re-fetching the same profiles, reduces overhead, cost

## What I Learned

### 1. Focus on the Algorithm, Not the Data Collection

Anyone can scrape LinkedIn (using paid APIs to fetch data). The value is in smart scoring that understands candidate quality beyond keywords, to automate the manual tasks and reduce information that needs to be processed.

### 2. Personalization Actually Works

Generic outreach gets low response rates. AI-generated personalized messages referencing specific achievements can convert a lot of leads.

As a fallback, we always have template messages.

### 3. Production Thinking From Day 1

Built with FastAPI, async processing, proper error handling, and caching. This is designed to scale easily.

### 4. Multi-Source Data is Key

Combining LinkedIn + GitHub profiles gives much richer candidate insights than either alone.

## Scaling Strategy

For production use (100s of jobs daily):

1. **Async Processing**: Already built with asyncio for parallel job handling. Can explore multiprocessing as well
2. **Queue System**: Redis/Celery integration template implemented, integration remains
3. **Database**: MongoDB for caching profiles and storing results
4. **Rate Limiting**: Smart backoff with API key rotation
5. **Observability**: Comprehensive logging for performance tracking (add complex later)
6. **Comprehensive Testing**: Including load testing, e2e and more

---

## The Real Challenges (And Why They Matter)

1. **LinkedIn&apos;s War Against Scraping (Round Two)**
LinkedIn really, really doesn&apos;t want you scraping their data. Having learned this lesson the hard way two years ago, I didn&apos;t even attempt direct scraping this time. My previous attempt involved rotating user agents, proxy pools, CAPTCHA solving - all of it failed within days.

This time I went straight to RapidAPI&apos;s LinkedIn service. More expensive per request ($0.01 per profile), but infinitely more reliable than fighting LinkedIn&apos;s ever-evolving bot detection. My 2022 self would have spent weeks trying to outsmart their defenses. My 2024 self just paid for the API.

Lesson learned: Sometimes the expensive solution is actually the cheap one when you factor in development time.

2. **LLM Consistency is a Myth**

Groq&apos;s Llama model was supposed to return structured JSON for scoring. In practice? It worked maybe 70% of the time. The other 30% I&apos;d get beautifully written prose instead of the JSON structure I needed.

What I learned: Always have fallback parsing. I ended up writing regex patterns to extract scores from malformed responses, and implementing retry logic with different prompts.

3. **GitHub Profile Matching Gone Wrong**

Searching for Github profiles is not straightforward, I would get Company profiles suggested instead of people.

Combining LinkedIn and GitHub data seemed straightforward - match by name and see if their GitHub activity aligns with their LinkedIn experience. Reality check: turns out &quot;John Smith&quot; working at &quot;Google&quot; could match with 47 different GitHub profiles.

Current state: I built the GitHub integration but disabled it for the final demo. Sometimes the feature that sounds coolest causes the most headaches.

4. **The MongoDB Integration That Never Happened**

I planned to use MongoDB with Motor for async operations.

What actually happened: spending hours debugging data validation mismatches took up a lot of time. For the hackathon timeline, I switched to simple JSON file caching.

Lesson: Sometimes the &quot;better&quot; technical choice isn&apos;t worth the time cost, especially under deadline pressure.

5. **Data Validation**

The biggest and stupidest issue that plagued me. A major chunk of my time building was debugging and fixing data validation issues, so I started doing a TDD style thing midway, made my logger verbose to capture a ton of context.

---

## What actually worked well

### Caching

I implemented a simple profile caching that actually saves time and API costs. Before making any external calls, the system checks if we&apos;ve seen this LinkedIn URL before. For a hackathon scale, simple file-based caching works fine. For production, I&apos;d use Redis with proper TTL settings.

### Async Processing

FastAPI with asyncio lets me process multiple candidates simultaneously. Instead of waiting 30 seconds for 10 profiles sequentially, I can get them all in 5-6 seconds.

I could have used FastAPI&apos;s `BackgroundTasks`, but it wouldn&apos;t have made a lot of difference. It would be a lot more sensible to go to a task queue based setup for scaling (using Redis + Celery).

### LLM based scoring

Rather than just keyword matching, LLMs understands context. An engineer who went from startup → Google → senior role gets higher trajectory scores than someone who&apos;s been at the same level for years. The LLM can recognize patterns that regex never could.

---

## Scaling

The current system handles maybe 20-30 profiles before throttling and API rate limits kick in. For production scale (hundreds of concurrent jobs), here&apos;s what needs to change:

### Code Quality &amp; Architecture

The current codebase is a mess of random object creation everywhere. I&apos;m instantiating API clients, scrapers, and scoring services scattered throughout the code. This makes testing painful and concurrency unpredictable.

Dependency injection would clean this up significantly. Instead of creating LinkedInScraper() objects everywhere, I&apos;d inject them as dependencies. For FastAPI, this means using dependency providers that create singleton instances for thread-safe operations.

```python
# Current messy approach
async def score_candidates(candidates):
    scraper = LinkedInScraper()  # New instance every time
    scorer = FitScorer()         # Another new instance
    # ... rest of logic

# Better approach with DI
async def score_candidates(
    candidates, 
    scraper: LinkedInScraper = Depends(get_scraper),
    scorer: FitScorer = Depends(get_scorer)
):
    # Clean, testable, predictable
```

For concurrency, dependency injection actually helps. You can inject thread-safe, connection-pooled clients rather than creating new HTTP sessions for every request. This reduces overhead and prevents connection exhaustion.

Combining DI with connection pooling is another great idea.

### API key rotation

Though the code is setup, it&apos;s not being used. Ideally would prefer to use a bunch of generators to do this, would help when rate limits for one API kick in.

### Real Production Scaling

For hundreds of concurrent jobs, the architecture needs fundamental changes:

1. **Multi-Query Strategy**
Instead of a single search query, I&apos;d implement tiered searching:

Strict query: Perfect keyword matches, paginate deeply (until you don&apos;t get results)
Medium query: Broader terms, fewer pages
Loose query: Industry + location only, limited results

This builds a large candidate pool while prioritizing the most relevant profiles.

2. **Smart Pre-filtering**
Before hitting expensive LLMs:

Deduplication: Bloom filters for URL dedup at scale
Basic filtering: Years of experience, location, title keywords
Batch scoring: Group similar profiles for bulk processing

3. **Queue Architecture (Async Task Queue Pattern)**
Job Queue → Search Workers → Filter Workers → LLM Workers → Results

Each stage handles its bottlenecks independently. Search workers can run fast and cheap, while LLM workers are expensive but fewer in number.

4. **Resource Management**

API key pools: Rotate keys across workers to handle rate limits
Connection pooling: Shared HTTP clients across async workers
Circuit breakers: Fail fast when external services are down

---

## Future Roadmap

### Short Term (1-2 months)

- [ ] Complete MongoDB async integration with Motor
- [ ] Dockerization for consistency across environments
- [ ] Enhanced deduplication using bloom filters
- [ ] A/B testing framework for prompt optimization

### Medium Term (3-6 months)

- [ ] Multi-platform integration (Twitter, personal websites)
- [ ] Advanced ML models for candidate scoring
- [ ] Real-time job market insights
- [ ] Integration with ATS systems

### Long Term (6+ months)

- [ ] Predictive analytics for hiring success
- [ ] Automated interview scheduling
- [ ] Bias detection and mitigation
- [ ] Custom model training for specific companies

## Try It Yourself

**GitHub Repository**: [score_profiles](https://github.com/pranshu-raj-211/score_profiles)  
**API Documentation**: Available at `/docs` when running locally

&gt; I tried using uv, but there were some issues on my laptop recently - so I switched to pip

```bash
# Quick start
git clone https://github.com/pranshu-raj-211/score_profiles.git
cd score_profiles
pip install -r requirements.txt
cp .env.example .env  # Add your API keys
python app/main.py
```

**API Usage:**

```bash
curl -X POST &quot;http://localhost:8000/jobs&quot; \
  -H &quot;Content-Type: application/json&quot; \
  -d &apos;{&quot;search_query&quot;: &quot;ML Engineer at AI startup&quot;, &quot;max_candidates&quot;: 10}&apos;
```

---

**[GitHub Repository](https://github.com/pranshu-raj-211/score_profiles)**</content:encoded></item><item><title>Scaling Server Sent Events - A practical guide to 28,000+ concurrent connections</title><link>https://blog.pranshu-raj.in/posts/exploring-sse/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/exploring-sse/</guid><description>Understanding Server Sent events and it&apos;s use cases, advantages over other realtime protocols.</description><pubDate>Sun, 29 Jun 2025 07:01:27 GMT</pubDate><content:encoded>Check out my post on [scaling SSE to 150k concurrent connections](https://blog.pranshu-raj.in/posts/scaling-sse-1m-connections/).

&gt;This post is under maintenance, expected to have a new version by 26th January 2026 (Monday).

I&apos;ve been diving into Server-Sent Events (SSE) lately, trying to understand how it works, where it fits, and what its tradeoffs are. It’s an interesting protocol, especially compared to WebSockets and traditional HTTP streaming.

&lt;div class=&quot;series-box&quot;&gt;

&lt;p&gt;This post is part of a series on my &lt;a href=&quot;https://github.com/pranshu-raj-211/leaderboard&quot;&gt;real-time leaderboard&lt;/a&gt; project&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/optimizing-docker-builds/&quot;&gt;&lt;b&gt;Optimizing Docker Image builds&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/scaling-sse-1m-connections/&quot;&gt;&lt;b&gt;Scaling SSE to 150k connections&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/backpressure/&quot;&gt;&lt;b&gt;Backpressure in Distributed Systems&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/28k-connections-zero-messages/&quot;&gt;&lt;b&gt;Fixing fanout, and other issues&lt;/b&gt;&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;/posts/exploring-sse/&quot;&gt;Introduction to Server Sent Events(SSE) ( ← you are here)&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Replacing Redis sorted sets (coming soon)&lt;/li&gt;
  &lt;li&gt;Reproducible Grafana setup (coming soon)&lt;/li&gt;
  &lt;li&gt;TCP stack tuning (coming soon)&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

## What is SSE?

Unlike the full-duplex (two-way) communication channel of WebSockets, SSE offers a simpler, lightweight alternative built directly on HTTP. It&apos;s designed for scenarios where a server needs to push data to a client, without needing to receive messages back. This makes it highly efficient for real-time updates like live news feeds or notifications.

## How SSE Works: Peeking under the hood

### The Protocol Flow

- The client sends a GET request with the header `Accept: text/event-stream`.
- The server responds with `Content-Type: text/event-stream` and keeps the connection open.
- The response is sent in chunks (using `Transfer-Encoding: chunked`), each containing an event.
- The underlying TCP connection ensures reliable delivery, but this also means each packet must be acknowledged, unlike UDP-based solutions where you trade reliability for speed.
- The client can automatically reconnect if the connection is lost by using the retry field sent by the server.

### The Event Stream Format

The data sent from the server is a plain text stream with a specific format. Each message is separated by a pair of newlines. The format supports several fields:

```
: this is a comment and will be ignored
retry: 10000
id: event-123
event: leaderboard_update
data: {&quot;user&quot;: &quot;pranshu&quot;, &quot;score&quot;: 9001}
```

  - **`id`**: Allows the browser to track the last received event. If the connection drops, the browser will automatically reconnect and send a `Last-Event-ID` header, so the server can resume the stream.
  - **`data`**: The payload of the message. You can have multiple `data` lines for a single event.
  - **`event`**: A custom name for the event. The client can listen for specific event types. If omitted, it defaults to a &apos;message&apos; event.
  - **`retry`**: Tells the client how long to wait (in milliseconds) before attempting to reconnect if the connection is lost.

### Client Side Implementation: The EventSource API

On the client-side, browsers provide the native EventSource API, which handles all the complexity of connection management and parsing for you.

```javascript
var es = new EventSource(&quot;localhost:8000/stream&quot;)
```

### Server Side Implementation (Go)

```go
import (
  &quot;fmt&quot;
  &quot;net/http&quot;
  &quot;time&quot;
)

func sseHandler(w http.ResponseWriter, r *http.Request) {
    // Setup http headers
    w.Header().Set(&quot;Content-Type&quot;, &quot;text/event-stream&quot;)
    w.Header().Set(&quot;Cache-Control&quot;, &quot;no-cache&quot;)
    w.Header().Set(&quot;Connection&quot;, &quot;keep-alive&quot;)

    // CORS headers may be needed if you&apos;re using a browser to test

    // Create channel for client disconnection
    clientGone := r.Context().Done()

    rc := http.NewResponseController(w)
    t := time.NewTicker(2*time.Second)
    defer t.Stop()
    for {
        select {
        case &lt;-clientGone:
            fmt.Println(&quot;Client disconnected&quot;)
            return
        case &lt;-t.C:
            // Send an event to the client
            _, err := fmt.Fprintf(w, &quot;data: The time is %s\n\n&quot;, time.Now().Format(time.UnixDate))
            if err != nil {
                return
            }
            err = rc.Flush()
            if err != nil {
                return
            }
        }
    }
}

func main(){
  http.HandleFunc(&quot;/stream&quot;, sseHandler)
  err:=http.ListenAndServe(&quot;:8080&quot;, nil)
  fmt.Println(&quot;Started SSE Server&quot;)
  if err!=nil{
    fmt.Println(err.Error())
  }
}
```

## Stateless or Stateful?

Technically, SSE is mostly stateless, but there’s a catch. The server might need to track client state to some extent, especially when handling reconnections. Ideally, I’d love to make my implementation fully stateless, but then:

## How do you handle reconnections?

Should the client resume from the last event it received?

What if the server doesn’t store any state at all?

One approach is to send an id field with each event, which the client can send back to resume from the last received message after reconnecting. This allows for stateless reconnections while still maintaining continuity.

## Scaling and Proxying SSE

SSE works with both HTTP 1.1 and HTTP 2.0, but there are some considerations when scaling it (more on that later). Since it’s built on top of HTTP, it behaves like any other HTTP request-response cycle but keeps the connection open, allowing the server to send data whenever it wants.

Proxying SSE can be a bit tricky. Since the connection is persistent, Layer 7 proxies (like Nginx) need to be properly configured to support long-lived connections. While it’s simpler than WebSockets, some proxies may still close the connection prematurely.

&gt; Note: Another concern is the six-connection limit in HTTP 1.1 — this limit applies per domain in a browser.
&gt; This means if a user opens many tabs making SSE connections to the same server, they may hit this limit, preventing subsequent connections from that browser from being established until an existing one is closed.
&gt; However, HTTP/2 mitigates this with multiplexing, allowing multiple streams over a single connection.

## Observability &amp; Performance

If I scale SSE servers, I’d want to measure:

- Connection handling (how many concurrent clients?)
- Latency (how fast are events being pushed?)
- Resource usage (CPU, memory overhead per connection)

I plan to use Prometheus for monitoring and observability to track performance at scale.

### Key considerations for SSE

1. Will the six-connection limit in HTTP 1.1 affect SSE scaling?

&gt;Yes, but only for browser clients — HTTP/2 helps mitigate this.

2. How is SSE different from HTTP streaming apart from the headers?

&gt;SSE is a standardized protocol with event formatting, automatic reconnection, and an event ID mechanism.

3. How truly stateless is SSE?

&gt;Stateless by design, but client state tracking may be needed for reconnections.

4. How do I detect client disconnections and clean up resources efficiently?

&gt;Use TCP connection close detection or periodic heartbeats.

5. Why is timeout used in SSE?

&gt;To detect stalled connections and trigger reconnections.

## When to use SSE (and when not to)

| Feature | Server-Sent Events (SSE) | WebSockets |
| :--- | :--- | :--- |
| **Direction** | Unidirectional (Server -\&gt; Client) | Bidirectional (Two-way) |
| **Transport** | Standard HTTP/S | Upgraded from HTTP |
| **Protocol** | Simple text-based | More complex binary/text protocol |
| **Reconnects** | Built-in, automatic | Must be implemented manually |
| **Use Cases** | Notifications, news feeds, stock tickers, monitoring dashboards, live score updates for spectators. | Chat apps, collaborative document editing, real-time multiplayer games (for player actions). |

---

&gt; Theory is great, but to truly understand the performance characteristics and limitations of SSE, I decided to put it to the test. My goal was to build a simple real-time leaderboard and see how many concurrent connections a single Go server could handle.

## Key Production Considerations

### State management and reconnection

While the SSE protocol itself is stateless, a robust implementation requires thinking about state. When a client reconnects using the `Last-Event-ID` header, the server needs a way to reconstruct and send the missed events. This could involve querying a database or a cache (like Redis) for messages created after that ID. For true statelessness at the web-server level, this logic can be offloaded to a message broker or cache.

### Scaling, Proxies and Connection Limits

Proxying SSE requires careful configuration. Since connections are long-lived, proxies like Nginx must be configured to not buffer the response and not time out the connection prematurely.

Furthermore, browsers limit the number of concurrent HTTP/1.1 connections per domain (typically to six). If a user opens many tabs to your site, they can exhaust this pool. HTTP/2 largely solves this with multiplexing, allowing many streams over a single TCP connection, making it the preferred protocol for scaling SSE.

### Security

Since SSE runs over HTTP, you can secure it using standard web security practices:

- **Authentication**: An SSE endpoint is just a `GET` request. You can protect it like any other API endpoint. The client can send a session cookie or a JWT `Authorization: Bearer &lt;token&gt;` header. The server should validate this before starting the stream.
- **Transport Security**: Always serve SSE over HTTPS (`TLS`) to encrypt the data in transit, preventing man-in-the-middle attacks.
- **Cross-Origin Resource Sharing (CORS)**: If your client and server are on different domains, you&apos;ll need to configure the correct CORS headers on the server, including `Access-Control-Allow-Origin`.

## Benchmarking

I tried building a real time leaderboard that streams its state to consumers (broadcasting) through SSE. While the players in a game would need WebSockets to send their moves, a leaderboard that broadcasts updates to all spectators is a perfect, one-way communication scenario for SSE.

The testing setup does not mimic realistic traffic for now. This is intentional, I wanted to test how many connections can be made in a standalone manner before tackling the issue of realistic load.

The testing manner is detailed at this [repo](https://github.com/pranshu-raj-211/benchmarks/leaderboard).

Find the code for the server at [leaderboard](https://github.com/pranshu-raj-211/leaderboard).

![Real time leaderboard benchmarking (15400 SSE connections)](@/assets/images/leaderboard/real_time_lb_init.png)
&gt;Reaches 15400 (or similar number of connections) before unable to connect due to queue getting full or memory issues. I do not know which, so I&apos;m working on understanding more about it.

  _Update_: The 15400 connection limit was hit while testing on Windows. Since memory, CPU and other system metrics seemed to be fine, I dug deeper into the issue to figure out what exactly was the bottleneck.

  Specifically, I was getting this error:
  `conn 15400 failed: Get &quot;http://127.0.0.1:8080/stream&quot;: dial tcp 127.0.0.1:8080: bind: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.`

  A quick google search revealed that this was a common problem on windows systems, often faced in applications like Docker.

The error indicates ephemeral port exhaustion on the client machine running the benchmark. A single client machine can only initiate a certain number of outgoing connections (around 16k on Windows by default) before it runs out of available source ports. This was a limitation of my client, not the Go server itself.

This can be bypassed by forcibly increasing the limit, but I didn&apos;t wanna risk messing with the settings of Windows, which has been a pain to fix sometimes.

![Grafana dashboard-28231 conns](@/assets/images/leaderboard/fedora_28k_conns.png)
&gt; _Update_: Crossed 28k connections

  The 28k limit this time is probably due to limit on number of file descriptors (ulimit) or some other system issue (ports getting exhausted, NAT table limits). Will check and update, but need to optimize memory usage first (growing too fast and not getting deallocated). This is a client issue, not a bottleneck on the server side. To fix this, I believe having multiple distributed clients each making a large number of connections to a server would find the true scaling capabilities of the server.

  To do this, a standard way to test would be to host the server on a cloud instance (using docker on a VM of fixed size for standard environments), have multiple client VMs send a lot of requests to it. Since we already have Prometheus and Grafana setup on this server, it&apos;ll be easy to monitor changes as the number of client connections grows.</content:encoded></item><item><title>Building a Python Package to Turn Unstructured Data into Financial Insights</title><link>https://blog.pranshu-raj.in/posts/building-concall-parser/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/building-concall-parser/</guid><description>How we built concall-parser to extract structured insights from messy earnings call transcripts, dealing with PDFs, text processing, and speaker identification challenges.</description><pubDate>Mon, 23 Jun 2025 08:50:45 GMT</pubDate><content:encoded>Earnings calls are packed with crucial information, but trying to extract it from their transcripts is often a painful manual process. These documents, usually PDFs, are a nightmare of inconsistent formatting. Every company seems to do things differently, layouts change without warning, and they&apos;re filled with noise and excessive text.

We ([Jay Shah](https://www.linkedin.com/in/jay-shah-4a4829209/) and I) needed a way to programmatically get structured data out of this mess – with the end goal of building a full pipeline that can deliver insights and reduce time spent to understand how a company is doing.

This resulted in a python package [`concall-parser`](https://pypi.org/project/concall-parser/), the code for which can be found [here](https://github.com/JS12540/concall-parser).

### The Problem: Unstructured PDFs Break Everything

The core challenge here is turning a visually-oriented PDF document into clean, logically structured text that code can understand. When you pull text out of these transcripts, you don&apos;t just get the content; you get artifacts that completely disrupt parsing:

- Headers and footers are stuck into the text stream.
- Words get broken up across lines or pages, or by random characters inserted during transcription (we actually saw `management` turn into `m\n anagement`).
- Spacing and paragraph breaks are inconsistent.

These aren&apos;t just minor annoyances; they break any simple pattern-matching logic you might use later. If your code expects a clean word or phrase, noisy text like `m anagement` will cause it to fail.

---

### The Pipeline

We designed `concall-parser` as a multi-step process to handle this progressively. Here&apos;s the basic flow:

![concall parser workflow](https://raw.githubusercontent.com/pranshu-raj-211/pranshu-raj-211.github.io/main/_posts/static/concall-parser-workflow.png)

1. **Load:** Get the PDF input.
2. **Extract Text:** Pull the raw text out page by page (used `pdfplumber` for this).
3. **Clean &amp; Segment:** Process the raw text to fix errors, identify who&apos;s speaking, and break the text into individual speaker turns.
4. **Categorize:** Group the speaker turns into sections like Management&apos;s opening remarks and the Q&amp;A.
5. **Output:** Provide the final structured data.

Steps 2 and 3 were where most of the technical headaches (and interesting solutions) lay.

---

### Tackling the Hard Parts

Building this meant confronting some specific problems head-on.

#### Problem 1: Getting Clean Text

The text coming straight out of the PDF is just not reliable. Besides headers/footers, those internal text errors like `m anagement` mean you can&apos;t trust the raw output for pattern matching.

- **Why regex breaks here?** Regex is too fragile for this kind of unpredictable noise. You can&apos;t write rules for every way text might be broken or have junk inserted.
- **Why not LLMs for cleaning?** For this specific task – fixing character-level or word-level errors – LLMs felt like overkill and came with risks. You don&apos;t want an AI &quot;fixing&quot; text in a way that changes a financial number or removes a critical piece of jargon. They&apos;re also non-deterministic and add latency.

Our solution here was to accept the raw text and then apply specific, deterministic heuristics _after_ extraction to fix the most common issues we saw. For instance, I built logic to detect and reassemble fragmented words based on common patterns. It&apos;s not perfect, but it cleans up enough of the noise to make downstream steps possible. We&apos;re also exploring trying other PDF-to-text libraries and using page cropping to get cleaner input from the start.

#### Problem 2: Figuring Out Who Spoke

Identifying speakers and segmenting the text by speaker is critical. The pattern is usually `&lt;Name&gt;:&lt;Speech&gt;`, but it&apos;s rarely that simple: names might be missing from introductions, the &quot;Moderator&quot; tag isn&apos;t always used, analysts don&apos;t appear in a speaker list, and colons appear _everywhere_ in financial text (ratios, bullet points, etc.). A simple regex looking for `.*:` after every line fails miserably.

We needed a more robust way to find speakers that wasn&apos;t fooled by stray colons or missing intros. The solution was to develop a heuristic: scan the text for structural cues that _typically_ indicate a speaker turn – think capitalized text at the start of a line, followed by punctuation. This gives a list of _potential_ speaker names.

Then, I apply my own validation logic to this list. This involves checking potential names against various criteria (like capitalization consistency, length, frequency patterns) to filter out false positives and build a reliable list of the actual speakers in the call. This approach is flexible enough to work even when the formatting isn&apos;t perfect. Once we have that list of validated speakers, segmenting the text becomes straightforward – split the text block every time I see a validated speaker name followed by a colon.

#### Problem 3: What Kind of Content is This?

Finally, we needed to differentiate between Management&apos;s prepared remarks and the Analyst Q&amp;A.

If a Moderator is present (which is usually the case), this becomes a lot easier - moderators introduce people who are speaking, signal the start of analyst sessions and this can be used to separate the parts.

If there isn&apos;t a moderator, heuristics and llm logic need to be applied.

---

### The Output: Clean Data for Analysis

After all this processing, `concall-parser` gives you structured data: a list of identified management personnel, the text of management&apos;s comments, and the Q&amp;A pairs.

This structured output is the whole point. It&apos;s clean, predictable, and ready for the next step. You can easily feed it into other scripts or libraries for NLP tasks like sentiment analysis, run simple code to pull out key numbers or metrics, or compare sections across different reports. We&apos;ve managed to get the parser working reliably for a majority of companies in the Nifty 200 index, which shows the approach holds up against real-world variability.

---

### What&apos;s Next

This is still a work in progress. We&apos;re continuing to improve `concall-parser` by:

- Trying to get even cleaner text out of the PDF initially.
- Handling more of the edge case formats I still encounter.
- Thinking about building some modules on top of the parser for common tasks like automatically spotting metrics or basic sentiment.

Turning messy, unstructured documents into usable data is a challenging but rewarding problem to solve as an engineer.

We&apos;re actively developing concall-parser. Try it out and let us know your feedback!</content:encoded></item><item><title>Designing a minimal, local-first version of lichess.</title><link>https://blog.pranshu-raj.in/posts/designing-tinychess/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/designing-tinychess/</guid><description>How I&apos;m planning to build tinychess, and what I&apos;ve learnt so far while preparing to build it.</description><pubDate>Sat, 10 May 2025 15:22:00 GMT</pubDate><content:encoded>I&apos;m building a smaller, local version of [Lichess](https://lichess.org) to learn more about how it works, improve my knowledge of real time protocols and local first web.

I&apos;m learning Go as I go about building this project, and I&apos;m stoked about the delicate balance between simplicity and low level control provided by the language.

## Functional Requirements

A wishlist of what stuff users and the system should be able to do.

1. Anonymous game playing (play with friend in lichess)
2. Real time gameplay (moves of opponent relayed to player in real time)
3. Game logic validation
4. Game state persistence (server side)
5. Client interface
6. View ongoing game (spectate)
7. Leaderboard (for tournaments)
8. Rate Limiting
9. Matchmaking
10. Scoring (Elo/Glicko/any other system)

## Non Functional Requirements

1. High concurrency support
2. Low latency move relaying
3. Extensible (plan to build an online version and scale later)

## Suggested Tech stack

- Go (server side logic)
- Websockets (real time communication)
- Javascript (client interface)

Since it&apos;s a local system, I&apos;m trying to keep it as lean as possible, therefore I&apos;ll be using SQLite or even a flat file as a database.

![init data model]</content:encoded></item><item><title>A quick introduction to data modeling in real world applications</title><link>https://blog.pranshu-raj.in/posts/data-modeling/</link><guid isPermaLink="true">https://blog.pranshu-raj.in/posts/data-modeling/</guid><description>What data modeling is, why it&apos;s so useful, how can we do it effectively to get the best results for our use case.</description><pubDate>Sun, 13 Apr 2025 15:22:00 GMT</pubDate><content:encoded>Data modeling is one of those obscure topics that everyone has an idea of but no one can&apos;t really explain in detail what it&apos;s all about. In the last few interviews I had, I was asked a lot of questions on this topic, which led me to learn more about this topic and how it&apos;s done, what the key considerations are and specifics regarding this concept when applied to different kinds of databases.

## So, what is it, really?

Data modeling can be defined as understanding what data is relevant to the use case, the way that data is going to be represented and worked with (in context of our use case), and how to represent this &apos;model&apos; of the system to others.

Basically create a simplified model of what the use case needs and how data and system components are going to interact.

As we have real world entities, we want to map them to some concept in our database. This can be a person, books, ships or anything that we want to represent.

Both tangible (people, cars, houses) and intangible (loans, accounts) entities are considered, as per needs of the system. These are generally represented as tables in Relational databases like Postgres, and collections in Document databases like MongoDB.

(The original Relational model uses entity sets to refer to what we are calling entities.)

Specific examples of these entities are called as Entity instances - a particular person or bank account or book.

Attributes are things that describe an entity. Examples are - (name, author, ISBN, price) for a book.

Relationships define connections between entities. For example books are connected to authors and genres.

## The data modeling process

As with all sw processes, we should start by talking about some key considerations to keep in mind when designing a data model.

1. **Understand application requirements and workload**
   What kind of data entities are required, what relationships exist between them, how will this kind of data be accessed (access patterns - queries). This is the most important part of the data modelling process.

   Some more nuanced stuff when it comes to this part are estimation of data sizes, listing all potential operations ranked by importance, estimates of the number of queries running per day, etc.

2. **Mapping entities and relationships**
   In this phase we build a basic schema that defines entities under consideration by our system and the relationships between them. The kind of relationship between entities is also important.

   This is usually done through a diagrammatic representation, using ER diagrams for relational dbs and Collection relationship diagrams for mongodb.

3. **Apply relevant design patterns**
   This requires some domain expertise, as you&apos;ll need to find suitable design patterns based on your use case (access patterns defined in the first step), map them out to your specific data model and refine it based on the design pattern being used.

## Interesting things in modelling data for MongoDB

Mongo is not a relational database, so you don&apos;t have the concept of foreign keys there. But we still need to define relationships between different kinds of entities, which is where linking and embedding are useful.

Linking is where we place some unique field of one entity to link it to another entity. This is usually done by referencing id (or a bunch of ids) of documents of one collection in another collection. It looks somewhat like this:

Embedding is when you put the whole object (or objects) into the entity it has a relationship with. So object of one entity can contain fully one or more objects of another entity type. This is pretty useful when the other entities are required along with the first, which saves time while querying.

Decisions on whether to link or embed are made by understanding the way the data is going to be accessed. You&apos;ll need to consider whether data is queried using embedded info, how often will the embedded info really be used and the frequency of updates to the embedded data.

A hybrid of link and embed can be used, depending on the use case where some entities need to be accessed together frequently but some are rarely needed while querying the parent entity.

Link and embed are somewhat parallel to normalization and denormalization concepts in relational databases.</content:encoded></item></channel></rss>