Articles | May 18, 2022

IoT device traffic to demonstrate office personnel traffic

Hetal Kapadia

IoT, or internet of things, refers to the world of devices or objects which connect to other systems and devices over a network. Each IoT device generates network traffic, the accumulation of which provides a wealth of data to be studied

Network logging/packet capture register events which occur in a network. This usually includes the source and destination IP addresses, the sizes of the forward and backward packets, the duration of the event, and the start time of the event, among other details. This data can help an administrator monitor network usage and identify risks or issues. While this data is frequently used to understand issues like malicious traffic or bandwidth issues, there are other possible benefits to monitoring IoT device generated network traffic.

Consider an office with multiple conference rooms and an open bullpen. While the location of personnel could be determined based on their schedule-no meetings suggest they are at their desk, meeting at conference room 1 suggests they are at conference room 1-this does not take into account transient movement-being called into a meeting that isn’t on the calendar or choosing to take a private meeting in a free conference room ad hoc, etc.

As more and more devices become network capable or become a part of the IoT, network data becomes a part of managing a system.

Using network traffic to understand where people are allows a system to react accordingly: intelligently manage blinds to prevent a room from getting too hot or cold if there are more or fewer people than expected, motion sensors which can fail to detect a person at a laptop who is not moving too much. It is also useful for understanding network traffic and if there is enough bandwidth in an area.

This article explores network usage in a room based on a subset of generated data.

Import of packet capture data

Packet capture data generally includes information like the source and destination IP addresses, mac addresses, the sizes of the forward and backward packets, event durations, and start times of the event, among other details.

Between IP addresses and mac addresses for identifying a device: IP addresses are dynamically assigned, while a mac address is a fixed device ID. For the purpose of this notebook, we’ll be focusing on mac addresses, event starts, and durations to determine IoT device location. Fields like packet size and direction are all important for understanding a full network picture, but are not necessary to understand IoT device locations.

This is simulated data of network usage per gateway, with the data broken down based on each network gateway device and location. We’ll combine this data into one massive dataframe, including the room location as a column, then this dataframe is studied.

Dataframe merging code block — Merging multiple CSVs into one dataframe

Session creation from network data

Now that this is combined into a single dataframe, we can move ahead with ingesting this data into Atoti. We’ll create a session, set up a few configs, and create our cube.

session = tt.Session(user_content_storage = "./content")
Flows = session.read_pandas(NetworkFlows, table_name="NetworkFlows")

Flows.head()

With our cube created, we can investigate what the data looks like, generally speaking. For example, we can investigate the traffic across rooms due to any connected IoT device.

Basic count of IoT devices table — Count of network instances per room

Hierarchy management

Having investigated the basic shape of our data, there are other ways we would like to classify or investigate our data. For example, it would be useful to look at our traffic based on the time buckets. We have a datetime column. From here, we can use create_date_hierarchy to break this down further.

Since our data is all in the same year and month, we’ll only break this down to the day and hour. We’ll also create a separate date hierarchy for just the hour.

cube.create_date_hierarchy("DateTime", column=Flows["EventTime"], levels={"Day": "d", "Hour": "HH"})

cube.create_date_hierarchy("Hour", column=Flows["EventTime"], levels={"Hour": "HH"})

Number of events over time line chart — Number of events over time

From this, we can already see something pretty intuitive: there is more network traffic between the hours of 08:00 and 18:00, which are reasonable working hours for an office.

We also notice that between 18:00 to the following 08:00, the network traffic doesn’t quite drop to zero. Let’s investigate what is contributing to this by drilling through on one of those hours.

Drill through table during one of the lulls — Drill through during one of the lulls

Looking at this, we see the mac address for these devices are similar. We can look up the manufacturer for these devices to see what they are, or at least, where they come from using a website like macvendorlookup.

Screen capture of MAC Address Lookup website — MAC Address Lookup website

Looking up one of the devices beginning with ’00:17:88:XU:ED:P6′ we see the vendor is Philips Lighting. If the office has network connecting capable light bulbs, this type of network traffic makes sense. In the modern day, so many previously mundane objects are now ‘smart’.

If we look at the data specifically for that device, it is always found in the same room. This makes sense, as a lightbulb shouldn’t travel, unlike, say, a laptop.

Now that we have the hierarchies we want, we can investigate what our data is saying, and create additional measures to gain insights on how people move in our offices. Before we start, let’s first see in better detail where people are in our offices during the day.

As would be reasonable to expect, the bulk of our traffic seems to be located in the bullpen area of the office during the workday. Some questions to consider:

Do people stay fixed in the same area throughout the day?
How many people tend to gather in the conference rooms when the conference rooms are in use?
Is there a specific time during the day where a particular room is favored or disfavored?

To answer these questions, we can create additional measures. Let’s start with creating a measure which returns the distinct # of rooms. This one simple measure will allow us to see two things immediately:

how many devices cross through multiple rooms
roughly how many of our devices are stationary devices that are from the office.

Table of #Locations each device is found on April 4, 5, 6 — #Locations each device is found on April 4, 5, 6

So, which devices are moving around? For this, we’ll look at devices found in more than one room. This will naturally exclude devices which are room features, as well as employees who either work in one conference room or are always in the bullpen area.

Table of Devices found in multiple rooms on April 4, 5, 6 — Devices found in multiple rooms on April 4, 5, 6

We can also get a sense of the types of devices from the flow traffic. For example, a person with a laptop and a smartwatch may only sit in the bullpen area, but those devices will likely log a greater amount of events than a lightbulb.

Scatter plot of total network flow time vs number of rooms devices are found — Total network flow time vs number of rooms devices are found scatter plot

We get the sense that most of our devices are transient, meaning they, at some point or another, end up visiting every room. Let’s see when these rooms are occupied.

This is a bit difficult, since we already have devices like lightbulbs, which means there could be network traffic even if no humans are in the room. Let’s start with visualizing the duration of network events for each room per each hour. We’ll focus on the hours between 08:00 and 18:00.

Hourly network flow bar chart — Hourly network flow

We see conference rooms usage throughout the day. However, zooming in, conference room one seems to become unpopular in the afternoon.

Let’s recreate this same visual, excluding the bullpen data.

Hourly network flow in just the conference rooms bar chart — Hourly network flow in just the conference rooms

It seems like Conference Room 1 becomes unpopular as the day goes on. There could be many reasons for this:

It gets hot/cold
The sun/lighting gets worse throughout the day
Fewer people need that room and its features
Or other reasons

From an office management point of view, it is clear either its conditions need to be investigated, or the amount of resources to it could be diverted.

Let’s also create a similar measure counting the distinct mac addresses. This can help us see how many devices are in a room during a specific period.

m["#DevicesFound"] = tt.agg.count_distinct(Flows["MacAddress"])

IoT devices per room per hour table, during workday hours — Count of IoT devices found per room per hour during the work day

What else can we see from this data?

For example, how long, on average, does a device stay in a conference room? For this, we’ll build this metric up iteratively. Let’s start with determining what percent of time a device spends in a conference room while active (ie, actively communicating). Ideally, we would exclude any IoT device natively a part of that room, but we’ll include them for now.