Designing a backend to store and process 100k images daily

Why?

The company for which this software project was undertaken provided 24/7 CCTV health and security monitoring services to around 500 locations each having 8–10 cameras across India. Health monitoring for CCTV systems means checking whether or not cameras are in working conditions which can be done in 2 ways, one is to look at them 24/7 * 500 locations * 10 cameras or capture an image of what each camera is currently viewing every N minutes.

We obviously choose the second option

Infrastructure

  • All of these locations had DVRs (Digital Video Recorders )of multiple brands installed.

Features Required

  • Store images of 15 minutes frequency for all CCTV cameras.

How to get images from a DVR?

In the case of a VPN or Public IP DVRs, a simple HTTP GET API fetches you the image for the specified camera attached, whereas in the case of DVR behind a NAT the DVR can be configured to send images via FTP.

System Design

VPN or Public IP

Lets us call every DVR a device and every camera a channel which means every device has multiple channels

The details about the IP addresses, port numbers, usernames, passwords, and dvr_brand were available in a MySQL database

This means now we have the details to fetch an image all we need is a storage backend API to store and process the images

FTP

A custom-made FTP server was developed using the python library pyftpdlib which would push images to Image API on receive of a new image.

FTP usernames were set as device_id so as to identify which device has sent an image

Image Storage

Minio Object Storage was used to store images because of ease of use and various features such as easy bucket replication, expiry, pre-resigned URLs, AWS S3 compatibility.

Metadata Storage

Every image that was captured had some metadata or tags associated with it that would be essential in order to search and filter them for example — site_id, device_id, channel_id, device_name, channel_name, capture_time, insert_time, etc. Since this metadata was variable in nature and the number of columns required was not fixed and could increase over time MongoDB was used to store all the metadata. Images are first stored in Minio and corresponding bucket name and object name were also stored as tags in MongoDB along with an image access URL which when called would return the stored image eg — https://example.com/api/getImage/<bucket_name>/<object_name>

Timelapse

A separate API server was also developed in order to make time-lapses in return to API calls that used FFmpeg in the background. A scheduler jobs script was written that would call the HTTP API at specified intervals so as to create time-lapses.

Image Analytics

On receive of every image, a message is published on the Kafka message queue so as to trigger the image analytics jobs on received images. Once image analytics is done, alerts(if any) are pushed to the Kafka queue, and metadata in MongoDB is updated with new AI tags generated by the AI engine.

Architecture Diagram

image-storage-architecture-design
image-storage-architecture-design

The above system architecture is created on draw.io in case you are curious.

This architecture is completely deployed on Docker. The base filesystem used for MongoDB, Kafka, and Minio is ZFS.

If you have any questions, please post them in the comments section or email me at zanwarnihar@gmail.com

Since this is my first major system design project few of the choices that I have made can be worked upon and improved and your suggestions are always welcome.

I am a final year Undergrad studying at BITS Pilani, India. OSS and system design enthusiast.