Since its inception, Saavn has used REST architecture for it’s suite of core products, and like most apps, the underlying protocol has been HTTP. This has been performing great for us, but we’re always looking for ways to improve. With access to ever-evolving technology, and an increasing user base, we’re committed to finding new ways to provide an even better user experience. The search, in particular, caught our attention.
With users across a variety of networks and devices, we needed search to work seamlessly across them all. The need to create a new HTTP connection with every search request increased the latencies considerably, thereby negatively affecting our click-through rates. Apart from the latency issue, having a single layered REST architecture to serve all API’s made it very difficult to release and scale search-specific features independently.
To solve the above issues we decided to build search API as a separate microservice. This would help us scale, manage, and maintain its features independently.
To make search infrastructure more scalable we started looking into asynchronous messaging queues. After exploring rabitmq, ZeroMQ, redis, and a couple others, we decided on using ZeroMQ. This messaging queue consumed far less resources than others and also offers great support with PHP, which — as most of our backend code was written in PHP — was a requirement.
We also switched from HTTP to WebSocket protocol, which removed the need to create new server connections on every request thereby reducing the overall latency.
To setup the WebSocket server, we looked into Nginx(based on Node.js) and Ratchet(based on react-php). The main issue with Nginx was, apart from setting up the server and ZeroMQ, we also needed a PHP-FPM setup. We ended up doing a curl call internally, which increased the maintenance overhead a lot compared to the latency benefits. Contrary to that, Ratchet was easily compatible with PHP binding for ZeroMQ and react-php provided all the robustness and scalability that was required. So we decided on using Ratchet to implement our WebSocket server.
Before looking into the exact architecture lets have a quick look into the core technologies that we are using.
Ratchet Server request execution
Ratchet is a PHP library that provides support for both HTTP and WebSocket protocol. The above diagram shows how an HTTP connection is upgraded to WebSocket connection by ratchet. Even though ratchet is just a single threaded server, its still able to handle multiple connections and requests in parallel due to its event driven architecture.
Or in simple terms if you type something in the search bar on Saavn this is treated as an event on the server, the reaction to this is a list of most relevant songs(which you can also download if you are pro) as well as albums and playlists.
ZeroMQ work flow
ZeroMQ (also spelled ØMQ, 0MQ or ZMQ) is a high-performance asynchronous messaging library, aimed at use in distributed or concurrent applications. It provides a message queue, but unlike message-oriented middleware, a ZeroMQ system can run without a dedicated message broker.
Why ZMQ? Firstly, it’s really easy to integrate ZMQ with ratchet server. Apart from that it can create multiple parallel connections easily over multiple transport protocols like tcp, ipc or in_proc. The fact that ZMQ is basically a socket library gives much more flexibility and control over other Asynchronous Message Queuing Protocols. Also, the cost to process one message is so low, handling really high message volume (millions of messages per second) is not an issue.
The above diagram shows the complete flow of a user request. We have multiple WebSocket servers which are load balances through an AWS ALB. The requests are tunneled through the ALB to the socket servers. Let’s have a closer look at the different steps involved.
The client sends connection initiation request to the socket server by sending an HTTP Upgrade request. After accepting the request the server establishes the connection with tcp as the underlying transport protocol.
Unlike traditional synchronous request response of REST protocol, in WebSocket protocol messages are sent and received asynchronously over the established TCP connection. We attach a timestamp to our messages. These help the client to discard out-of-order responses from the server.
The Socket server pushes the request to a worker process over the ZMQ service . Worker processes the request and pushes the response back to server on ZMQ. The push-pull design pattern of this flow is shown in the diagram above. At any given time, we have multiple workers running to serve incoming requests from the socket server. Each worker is a long running PHP process (managed by supervisor).
Scaling — Too many queries, what to DO?
Over time search volume has increased from around 1 million queries per day to 5 million per day. The flexibility provided by server worker architecture has helped us scale our search infrastructure accordingly.
How do we handle increasing requests?
We looked into increasing both the number of socket servers (increasing the number of EC2 instances) and workers on single machine (using a bigger and powerful EC2 instance)
Scaling server processes
Scaling the number of servers helped in handling increasing number of incoming connections and increasing the number of workers led to increased throughput (more request served per second).
Scaling worker processes
Apart from scaling, we also worked on improving some system level tcp setting to further improve the functionality of ZMQ. The default values for file descriptor, tcp_fin_timeout, tcp_tw_reuse, tcp_keepalive_time aren’t optimized for higher throughput when working with ZMQ. These were tuned to the appropriate values based on multiple load tests.
Through rigorous load testing, we were able to define the limits for our infrastructure, which made the production deployment task easier. Currently our infrastructure consists of 5 EC2 instances, each hosting 2 socket servers and 150 workers. This easily handles an overall request rate of 60 req/sec, with 18000 active connections per minute, and 5000 new connections per minute with peak of 10000 new connections per minute.
Deploying — Building the monitoring system
Shipping a new piece of infrastructure into production takes much more than just load testing and deploying individual components. We built a real time monitoring system to alert us of any unseen issues and help us fix the problems proactively.
We run all our servers and worker processes using supervisor to effectively manage them against failures.
Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.
To further monitor the server and worker performance we process logs in real time (using storm topology). Alarms were configured to notify us of any anomalies in the data, such as increased latencies, and to send a series of mail and notifications to the ops group.
We also configured alerts for key metrics on AWS console, which pretty much gave us everything we needed to know about the machine. The machine reachability, throughput, and CPU were top priorities.
Effective monitoring helped us further tune our system for sudden spikes in traffic and increased the overall fault tolerance.
Moving to the new search microservice not only helped in reducing overall latencies but also in modular deployment and management of the search infrastructure. This led to an increase of around 1% in search CTRs and overall average response time dropped from 600 ms to 300 ms even in low bandwidth networks.
HTTP v/s WebSocket latency
This new system has been in production for quite some time now and has been working great for us. Learnings from building the search microservice has helped us move more services to this architecture in a safe and reliable manner.
Illustrations by Fahm Sikder