0%

Lumos:识别并定位陌生环境中隐藏的IoT设备

Lumos: Identifying and Localizing Diverse Hidden IoT Devices in an Unfamiliar Environment
Lumos:识别并定位陌生环境中隐藏的IoT设备
原文链接:usenixsecurity22

1 Introduction

1.1 问题场景

an unfamiliar environment such as an Airbnb or hotel room

1.2 解决的问题

  1. 检测设备 detect:有没有设备?
  2. 识别设备 identify:是什么设备?
  3. 定位设备 localize:设备在哪里?

1.3 挑战

  1. 权限过低 users have limited visibility and control inside such an unfamiliar environment
  2. 工具简单 users typically only have personal (commodity) handhelds

1.4 相关工作缺陷

  1. 依赖手动并彻底的扫描 “spy-tech” solutions rely on manual and thorough scanning of the environment
  2. 针对特地相机场景,且不能推广到其他IoT设备 focus exclusively on camera-specific effects (e.g., motion or light triggering)
  3. 需要特定的网络权限 network-based device fingerprinting solutions rely on privileged access to the host network and fail in the presence of limited network visibility
  4. 无法定位设备,且需要额外的设备 cannot localize devices, and/or would need separate instrumentation of the environment

1.5 本文工作

Lumos sniffs and collects encrypted wireless packets over the air (aka 802.11) to detect and identify the hidden devices. It then predicts the location of each identified device with respect to the user as they walk around the perimeter of the space.

1.5.1 三点贡献

  1. Identifying diverse devices with limited features: 仅使用802.11 headers,构建ML模型自动提取特征
  2. Data acquisition with limited knowledge: 提出了在陌生环境中高效收集多个channles数据的方法
  3. Infrastructure-free device localization: 仅利用手机传感器和用户运动与信号变化的相关性来确定设备位置

1.5.2 实验环境

  1. six different environments
  2. a total of 44 devices
  3. walk around the perimeter of each space of around 1000 sq. ft(约92.9平方米)

1.5.3 效果

  1. accurately identify device types by 95% in under 30 minutes
  2. localized with a median localization accuracy of 1.5m

Code_暂未公开 视频演示

2 Problem Setting, Threat Model, and Scope

image.png
主要条件:

  1. 所有设备通过802.11 Wireless Network连接到Wi-Fi
  2. IoT设备使用的Wi-Fi可能和用户使用的不同(不同的Wi-Fi、不同的channels、IoT使用隐藏的Wi-Fi)
  3. 用户能够自由移动,且拥有的工具能够抓取WiFi 802.11 packets (across all channels) over the air

3 System overview

image.png

4 Device Fingerprinting Module

本节假设:all IoT devices are operating on the same channel.
所有wireless 802.11 packets作为模型输入,按MAC地址划分

4.1 Feature Engineering

4.1.1 Available Features

a sample 802.11 wireless packet:
image.png

  1. 总特征:125 (max) attribut
  2. 修剪:为了简化特征和防止过拟合,对于在不同设备上具有相同值的特征进行修剪
  3. 修剪后:after this pruning step, 52 out of 125 attributes remain

4.1.2 Multi-Time Resolution Aggregation

  • 在滑动时间窗口内对原始属性使用聚合函数 we define a sliding window of time and apply different aggregate functions on each raw attribute (include mean, standard deviation, median, max and min…)
  • 为每个设备选择合适的时间窗口大小 we design a multiple timescales scheme to pick a time window suitable for each device’s transmission pattern.

4.1.3 Feature Post-Processing

  1. 标准化 we standardize the features while maintaining the distribution of values
  2. 压缩 we remove correlated features to avoid over-fitting in the training phase
  3. Selecting the top ten features that have the highest mutual information score(计算方法有参考文献)
  4. 相似度超过95%的特征只保留一个

4.2 Model Training and Inference

4.2.1 Training

two types of classifiers:

  1. multi-class: learns a single classifier for all the classes
  2. one-vs-rest: learns one classifier per class ✅

Reasons:

  1. 不同的数据传送频率导致分类不均衡 some IoT devices transmit much more frequently than others, which leads to an extreme class imbalance
  2. 不同设备的相关特征不同 the relevant and informative features for each device type could be different

We picked XGBoost as ourML classifier

4.2.2 Inference

特征提取
image.png

  • feature vector F
  • Mkis the one-vs-rest classifier for device type k
  • Lt,kis the probability of predicting the type of device as k at time t.

结果聚合投票
image.png

5 Device-Aware Channel Sensing

背景

  • 第四节中只考虑所有IoT设备在同一个channels
  • 实际环境:the IoT devices are possibly on different wireless networks, spread over 30 channels across 2.4 and 5GHz WiFi frequency ranges.
  • 本节对多个IoT设备使用不同的channels的情况
    image.png

目的

数据获取:监控各个channels并在他们之间跳跃从而收集所有IoT设备的数据

we need to capture a sufficient number of packets from each active device to identify its device type.

挑战
we have no knowledge of what channel, when, where, and for how long each device is transmitting.

5.1 Hindsight-Optimal Problem Formulation

(事后优化问题)

假设
提前知道每个设备的流量行为 assumes that we know the traffic behavior of each device ahead of time.

某一时刻只能抓一个channel的包 we chunk time into epochs, and in each epoch, our channel sniffer can sense at most one channel.

目标

制定一个时间表使得抓取的数据包覆盖尽可能多的设备 to determine a sensing schedule to cover as many devices as possible.

问题模型

NumThresh为某设备准确分类至少需要数据包的个数,num_i实际抓取的数据包个数,要求NumThresh>=num_i hindsight optimal formulation整理为Integer Linear Program (ILP)如下

image.png

  • Eq 3 captures that we can sense at most one channel in any given time epoch.
  • Eq 4 captures the total number of packets sensed per device
  • Eq 5 captures that each device is successfully sensed if we have more than NumThresh packets.
  • The last three equations simply capture the constraints on the variables.

因此数据收集问题转换为一个ILP问题,如果activity matrix是已知的,即我们已经知道每个设备何时在其分配的频道上传输数据包,那么此ILP问题就解决了,但是这只是我们的假设,实际上我们并不知道。
Lumos能够基于粗略收集的数据预测出activity matrix。

5.2 Prior Work on Spectrum Sensing and Limitations

SpecInsight:the proposed reward function tries to capture all packets from every device.
使用贪婪策略,for a device d at time t 奖励函数:
image.png

  • T is the last time a packet was observed
  • μ i represents the mean packet inter-arrival time for the device.

This reward function assumes that the next packet will arrive at time T +μ, T +2μ, and so on.

缺陷

  1. 收据数据偏向于high transmission rate device. A high transmission rate device has lower packet inter-arrival times, and as a result, high reward value. This results in missing packets from a low transmission rate device as it is still trying to collect every packet from a high transmission rate device.
  2. averaged inter-arrival time计算不准确 it calculates the mean inter-arrival time from the previously captured packets. However, some packets transmitted by a device may be missed while sniffing in another channel, resulting in inaccurate estimation of averaged inter-arrival time.
  3. 浪费大量时间在不活跃的channel上收集数据包 there are more than 30 possible wireless channels, but a majority of them might not be active in the vicinity of a user, so it ends up wasting a lot of time sensing traffic on inactive wireless channels.

5.3 Our Approach

解决的问题:

  1. 避免浪费时间在无用的channels,Lumos首先快速在各个channels之间跳跃以发现active channels
  2. 不偏向low transmission rate devices,对于high transmission rate devices能很快收集到足够的数据包,我们把它的reward降低到0,因此Lumos可以专注于low transmission rate devices
  3. 通过少量数据包进行coarse estimate of its device type解决incorrect packet arrival time estimates,由于数据包较少,粗略的设备分类准确度较低,因此取mean inter-arrival times of the top three predictions

Lumos spends a fixed time (10 seconds) on that channel, the reward for a channel c is defined as
image.png

6 Localization

两项技术:

  • 确定距离 RSSI provides a coarse estimate of the distance between each IoT device and the user’s phone
  • 确定方位 VIO(Visual Inertial Odometry) determines the change in position and orientation of the user over time

Lumos leverages the spatial measurements of RSSI values and their variations to estimate the location of each device.
三种方法:

  • highest RSSI value 最大值法
  • Grid-based 网格法
  • Curve-based✅ (surface-fitting based) 曲面拟合法

image.png

7 Evaluation

实验结果

  1. Lumos can accurately identify diverse devices with 95% accuracy in under 30 minutes.
  2. Lumos’ channel sensing outperforms baselines such as random, round robin, and state-of-art spectrum sensing techniques.
  3. Our localization system can locate devices within 1.5m with a single random walk through the space.
  4. Lumos can identify previously unseen devices of the same type from different vendors and is robust across typical changes in device settings.

7.1 Implementation and Experimental Setup

7.1.1 Prototype

  1. Using a MacBook Pro(2018) to sniff 802.11 wireless traffic and an Intel RealSense Camera T265 for capturing the VIO traces.✅
  2. Using a combination of an iOS device and a Rpi, Lumos runs as an application on the phone and uses Bluetooth to communicate with Rpi.

7.1.2 Devices

image.png

7.1.3 Environments

image.png

7.1.4 Baselines for Comparison

没有能直接对比的其他工作,每一项单独与其他工作做对比

  • Fingerprinting: Sivanatha 使用了higher layers (TCP, DNS)
  • Channel Sensing: SpecInsight,本文方法在SpecInsight基础上进行了优化
  • Localization: against maximized RSSI and grid-based techniques

7.2 End-to-End Performance

30 minutes of scan time (27 minutes of wireless sniffing followed by 3 minutes of walking)
image.png

  1. Lumos can identify the type of devices with an accuracy of 95% to 98%.
  2. Lumos achieves a median localization accuracy of 1.5m

7.3 Device Fingerprinting Sensitivity Analysis

7.3.1 Device-Based Confusion Matrix

image.png
结论:802.11 packets contain enough information for fingerprinting IoT devices.

7.3.2 Impact of Classifiers

image.png
结论:XGBoost performs best across different device types by achieving an average accuracy of 95.26%.

7.3.3 Impact of Multi-Time Resolution Aggregation

为每个设备选择合适的时间窗口大小vs固定的时间窗口大小
image.png

7.3.4 Impact of Scan Time on Fingerprinting

假设:single channel
image.png

7.4 Channel Sensing Sensitivity Analysis

7.4.1 Impact of Channel Sensing Scan Time on Device Discovery

image.png
原因:Lumos predicts the next packet arrival time from each device, SpecInsight浪费时间在high transmission devices,忽略了lower transmission devices
结论:Lumos substantially outperforms the baselines and discovers 92% of devices in 15 minutes and 100% of devices by 40 minutes.

7.4.2 Impact of the Number of Active Channels

image.png
结论:

  1. the scan time increases as we increase the number of channels.
  2. With only one active channel, Lumos only requires 16 minutes to discover all devices.
  3. across 20 channels, Lumos can achieve 80% discovery rate in around 15 minutes and 100% discovery rate by 50 minutes.

7.5 Localization Sensitivity Analysis

7.5.1 Impact of Device Setups

the user performs the same walking pattern around the perimeters of the room.
image.png

7.5.2 Impact of Walking Duration

we installed 16 devices (at least one from each device type) in Locations 1, 5, and 7 of the lab space, shown in Figure 8e.
image.png

7.6 Other Sensitivity Analysis

7.6.1 Effect of Feature Reduction Technique

image.png
结论:removing correlated features is more effective in improving the classification performance, while the top ten features do not cover all the important features for differentiating device types.

7.6.2 Effect of Changing Device Settings

image.png
结论:有一定的影响,但Lumos能对不同厂商和不同设置有较高的概括能力。the fingerprinting accuracy slightly drops when we evaluate Lumos on cameras with modified settings. However, Lumos can still generalize well across different cameras from different vendors and settings.

7.6.3 Effect of Traffic Direction

image.png
结论:combining two directions of data allows Lumos to capture more fine-grained information from each IoT device.

7.7 Appendix A Impact of Device Density

image.png

  • Low (with 1 device from each category)
  • Medium (2 devices from each category)
  • High (all devices)

结论:device density doesn’t have any significant impact on our classification performance.

8 Discussion

8.1 Evading Lumos

  1. frequent MAC address randomization
  2. modify hidden device behavior
  3. randomly changing their transmit pow
  4. avoiding wireless transmission

8.2 Unprofiled Devices

适用于不同的设备品牌和型号 Lumos can potentially generalize across different device brands and models, as long as it has seen at least one device with similar behavior in the training phase.

8.3 OtherWireless Technologies

对于其他的无线协议,如5G,Lumos实际表现与可用特征有关

8.4 End-to-End Prototype on a Phone

Lumos needs to sniff 802.11 packets over the air,大部分手机没有权限,本文提供两种解决方案:

  1. a smartphone with a Rpi for WiFi sniffing
  2. a laptop with an Intel RealSense Camera for VIO tracking

9 RelatedWork

9.1 Device Fingerprinting Methods

  • Device identification using encrypted wired packets
  • Device identification using decrypted wired packets
  • Device identification using encrypted wireless packets(802.11 layer)

9.2 Using Hardware Properties for Detection

利用半导体特定的特性,仅检测设备的存在并且不识别设备

9.3 Device Localization

  • 基于信号距离或角度 schemes map signal measurements into geometric parameters (distance or angle) to localize a target device with respect to reference points
  • 基于信号强度 fingerprint the received signal strength at all possible locations

10 Conclusions

贡献:为用户在陌生环境中detect, identify, and localize IoT devices提供了一个低成本的解决方案
挑战

  1. 权限过低 limited wireless access
  2. 工具简单 the lack of sophisticated hardware
  3. 设备多样 the diversity of potential snooping devices
  4. 环境保持 the inability to instrument the environment

未来计划:we plan to extend Lumos to run on more mobile devices and support a broader spectrum of wireless protocols.