Reading time: 8 minutes
Jos Bredek is a senior computer science lecturer researching 5G applications at Hanze University of Applied Sciences.
At the 5Groningen field lab, the next generation of wireless technology is being put to the test in an experiment with a prototype involving real-time decision-making using 5G edge computing. One of the applications envisioned is a smart police vest that can fully automatically detect threats like firearms and stabbing weapons.
The fifth-generation (5G) mobile networks unleash a wide range of new possibilities and opportunities. They include a feature to hook up vast amounts of low-power IoT devices and over 1 Gb/s of bandwidth – ten to a hundred times that of 4G. Last but not least, they provide super-low-latency capabilities.
The 5G feature set is made up of three core components. Massive machine-type communication is intended for small bandwidth usage, typically in the 700 MHz range. Smart sleep and other energy-saving modes in the communication between the device and the 5G antenna minimize power consumption. Applications include lightweight sensors, performing measurements every hour or every few hours and sending this data via 5G using NBIoT (narrowband IoT), which is effectively already in place within 4G-LTE. A standard 3000 mAh battery is sufficient to power them for more than ten years without charging.
Enhanced mobile broadband promises a tenfold (or even more) increase in capacity. Whereas 4G download speeds top at about 100-200 Mb/s, 5G enables a bandwidth increase to 1 Gb/s and beyond. Typical applications are high-resolution (4K or even 8K) video streaming services. The upload bandwidth will also increase tenfold, from tens of Mb/s to several hundred. Frequency bands are in the 3.5 GHz range.
For ultra-reliable and low-latency communication, the ambitions are in the order of 99.999 percent (or more) availability and 1 ms delay (in the 5G radio part/control plane). High availability can be offered by increasing the number of antennas (and overlapping cells). For low delays, the options are limited. Electromagnetic signals always travel at (near) lightspeed. However, reducing the distance and the number of network devices the signal has to pass can reduce latency. By computing close to the 5G antenna itself, for example, the data doesn’t have to cross a provider network (and part of the internet). This is called edge computing.
In November 2020, the second 5G Student Battle was held at the 5Groningen field lab. Mixed groups of students had to come up with innovative ideas using (one of) the 5G functions. The winning idea involved enhancing the standard safety vest of a police officer with the ability to detect, fully automatically, threats like firearms and stabbing weapons. Since there’s no room for (large) additional battery packs and a powerful computing unit, interpreting video data, detecting objects using AI and sending a signal to the vest itself must all be done remotely.
About two months after the idea was born, two students started working on the implementation. Realizing the smart vest is representative of a host of situations that require real-time decision-making, they chose to generalize the problem statement: is it possible to make real-time decisions using a high-bandwidth sensor (such as a camera), 5G (at 3.5 GHz) and edge computing? For the vest, real-time means sub-second. For other scenarios, like self-flying drones with object recognition, the round-trip delay has to be below 200 ms. To be able to draw more general conclusions, measuring the delay throughout the component pipeline is crucial.
The pipeline set up by the students consisted of a Raspberry Pi connected to a camera unit, a 5G modem, a 5G antenna and a compute cluster, with additional software on both sides. The Pi, version 4B, was running Raspberry OS, the de facto operating system for this device. GStreamer was used to read the camera’s raw H.264 stream frame by frame. The frames were timestamped by a GStreamer plugin, encoded using the Pi’s H.264 hardware accelerator and sent to the 5G network (using RTP, UDP and IP). A simple UDP service running on the Pi generated an audio signal every time it received an alert.
On the compute cluster, running a Ubuntu server on the ESXI hypervisor, the stream was again fed into GStreamer. Taking into account the delay information derived from the binary timestamps and the precise NTP timekeeping on both sides, the frames were decoded. Through a Python wrapper, the resulting stream was fed into the Yolo v4 object detection model (tiny edition). Whenever the Yolo engine detected a relevant object (beyond a certainty threshold), the wrapper fired off a message back to the Pi (using a UDP packet).
This pipeline is heavily intertwined. Changes made within one component may influence several other components and thereby the performance of the whole. For example, choosing a different video resolution at the capturing side automatically affects the codec and the CPU usage on both ends as well as the bandwidth consumption. It can also have an impact on the precision of object detection.
The video stream was captured from a standard Pi camera, set at 720p/1080p and 30 fps, and hardware-encoded by the Pi in H.264 format. Successor H.265 has about 50 percent better compression ratios, reducing the required bandwidth, but this encoding wasn’t yet available in hardware on the Pi, which is why H.264 was used during the tests. The expected bandwidth of about 2 Mb/s turned out to be about 1 Mb/s (probably related to the entropy of the input images). The capture latency of the standard Pi camera was about 35 ms (at 25 fps).
Directly attached to the Pi was a USB 5G modem (either a Netgear Nighthawk or a Quectel RM520Q-GL). The setup used the 3.5 GHz band as the carrier. Due to national Dutch regulations for this frequency – its usage isn’t allowed north of Zwolle – the tests were performed indoors. Using ping, the round-trip time at IPv4 layer 3 from host to host was measured to be 8-9 milliseconds. These tests were performed with no other users/devices on the network and with an excellent signal quality (about 20 meters between modem and antenna). Some tests were run with a secure VPN between the two hosts (Wireguard). For as yet unknown reasons, this resulted in large amounts of jitter. Since applications like the vest do require shielding from eavesdropping and tempering, further efforts have to be put into this.
On the other side, the compute cluster ran a Ubuntu server as a virtual machine, which was allocated 4-8 cores, 16 GB RAM and full access to the 2560 CUDA cores provided by the Nvidia T4 hardware. The T4 supports GPU virtualization (grid mode), splitting it into a maximum of four GPUs. The setup of the proper ESXI and guest OS drivers for the GPU card turned out to be quite a hassle.
The artificial intelligence for image recognition on the compute cluster was based on a Yolo v4 engine set to use three convolution layers and trained with about 1,500 handmade photos of knives. Tests with a prerecorded stream, input as a file, showed that the engine in this particular setup could interpret a maximum of about 300 frames per second while adding about 6 ms of delay – a fraction of the entire round-trip delay. At the peak, about half of the 8 CPUs and 2 of the 16 GB memory were used and the GPU operated at 75 percent (with 1,176 MB GPU usage). So, none of these resources were fully stressed. The I/O could be a bottleneck somehow – this needs further investigation.
There’s still a lot of room for improvement in the Yolo setup, eg by increasing the precision of the detection algorithm. Since the main goal of testing was to see whether real-time decision-making was possible at all, no further efforts were made to improve the actual image recognition and decision-making. Once the AI concluded that there was a knife (beyond a precision threshold), a signal was sent back to the Pi, which, upon receipt, could trigger actuators. This took about 20 ms.
The actuator used in the tests was a simple speaker add-on to the Pi, generating an audio alarm. However, since audio isn’t always effective, this could be combined with an actuator sending haptic feedback, eg on the back of the vest’s bearer. To protect the bearer even further, a number of high-luminance LEDs could be activated on the rear side of the vest. Another option would be to extend the alarm to the police control room or the nearby buddy of the officer.
A primary requirement from the police is for the vest to respond to a threatening situation (in this case a pulled knife) within a second. During the tests, the round-trip delay on the application level turned out to be far better – around 300 ms. The tests also showed that real-time decision-making using edge computing and 5G is definitely possible.
While decoding is really quick, the primary source of delay is the video encoding and the length of the pipeline in general. Upcoming tests will target a further reduction of the overall latency. The focus will be on using faster encoding hardware at the camera side and further optimizing the GStreamer decoding settings. This will hopefully decrease the overall round-trip delay to below 200 ms. Another avenue for latency reduction is replacing the Python code with C (or another compiled language).
For the vest itself, the main focus is on improving object detection. At the moment, only one specific type of knife is detected. This needs to be generalized to all kinds of knives but also firearms like pistols, revolvers and (semi-)automatic guns. On top of that, object detection from further away has to be implemented – right now, the maximum distance is about 5 meters. To make this possible, the resolution of the video has to go up. However, this could impact CPU and bandwidth usage (and thereby increase the delay). Furthermore, the detection should also work under twilight or even darker conditions and with bad weather and backlight.
The general pipeline can be improved in lots of ways, eg by using different (better) hardware, faster codecs and optimized codec settings. The same goes for the decoding, but less gain is to be expected there. Alternative AI models could speed up decision-making. Bringing the latency below 200 ms opens up a lot of new opportunities and use cases, such as self-flying drones that recognize objects in flight and, in doing so, can avoid crashes. Especially small-weight drones, in which every gram matters, are a very interesting area to experiment with the 5G edge computing pipeline. The sky is the limit.
This experiment has been made possible by contributions from Hanze University of Applied Sciences and its HBO-ICT student Mathijs Volker, the Northern Netherlands Alliance (SNN), TNO, VMWare, the Innovation House of the National Police Force’s Unit Northern Netherlands and the 5Groningen field lab. 5Groningen is an initiative of the Economic Board Groningen and has received funding from the European Regional Development Fund (ERDF), the Dutch Ministry of Economic Affairs and Climate Policy and SNN.