Identity, Location, Disease and More: Inferring Your Secrets from Android Public Resources

2013-10-14

tl;dr (too long; didn't read)

The authors develop a proof of concept Android app with no permissions that can "acquire sensitive information such as a smartphone user's identity, the disease condition he/she is interested in, her location and her driving route."

More specifically:

What revealing apps has the user installed? (e.g. a diabetes or gay social network app)
Where is the user?
What is the phone owner's Twitter account?
What diseases is he/she searching on the WebMD mobile app?
Stocks he/she searches in Yahoo! Finance.
What is the current travel route of the user?

The app uses 4 primary public channels of info:

Other apps installed on the phone
ARP information
Per-app data usage statistics
Speaker status (on or off)

Neat techniques

Capturing network requests and building a model of app network send/response sizes to automatically infer user behavior based on network packet size alone.
Using MOCK_LOCATION to more quickly and easily automate driving through routes.
Crawling driving directions from Google Maps and then running them through text-to-speech to know how long they would take to pronounce.

Abstract

The design of Android is based on a set of unprotected shared resources, including those inherited from Linux (e.g., Linux public directories). However, the dramatic development in Android applications (app for short) makes available a large amount of public background information (e.g., social networks, public online services), which can potentially turn such originally harmless resource sharing into serious privacy breaches. In this paper, we report our work on this important yet understudied problem. We discovered three unexpected channels of information leaks on Android: per-app data-usage statistics, ARP information, and speaker status (on or off). By monitoring these channels, an app without any permission may acquire sensitive information such as smartphone user’s identity, the disease condition she is interested in, her geo-locations and her driving route, from top-of-the-line Android apps. Furthermore, we show that using existing and new techniques, this zero-permission app can both determine when its target (a particular application) is running and send out collected data stealthily to a remote adversary. These findings call into question the soundness of the design assumptions on shared resources, and demand effective solutions. To this end, we present a mitigation mechanism for achieving a delicate balance between utility and privacy of such resources.

Review

Pros

I feel the main contribution of the paper is enumerating the public channels of info, especially the per-app data usage statistics and ARP info. Most people familiar with Android know that there's an API to see the installed packages and it's unsurprising that it's possible to check the speaker status.

Determining the user's driving route based on a series of lengths of speech I thought was pretty impressive. The idea is easy to conceptualize but actually making it work in practice is quite a feat.

Overall I thought this paper was an interesting examination of what an app with no permissions can tell about the user.

Cons

Clearly a bit F.U.D-y. The abstract makes it sound like Android itself is leaking users' medical conditions and identity which in reality Android just leaks app network usage data. The effect is similar but in my opinion the former is almost a bit intentionally sensationalist.

The inferring app behavior based on network request size technique was used pretty much exactly from another paper, I believe from the same group. Once you know it's possible to get decently fine-grained per-app data usage statistics the Twitter/WebMD/Yahoo! Finance results they demonstrate are only a logical extension.

I feel the route inference attack is not very scalable and relies on having a small number of Points of Interest that have many routes to them analyzed. This doesn't seem like a general technique for determining user travel to/from any arbitrary location. I would have liked a little discussion on the feasibility or time and effort required to extend this attack to determining user route to/from anywhere in a reasonably sized city.

Notes

Most people who have done Android work know that 1) is easy to obtain with no permissions. However, this alone can reveal potentially quite sensitive info about the user. Two examples they give include: disease-specific healthcare apps (like a diabetes app) or specific life-style apps (e.g gay social network like Hornet).

2) contains the BSSID of the WAP a phone is connected to, i.e. what WiFi network you're currently connected to. Geo-location databases can determine your location when provided the name of your WiFi network. I believe the READ_PHONE_STATE permission is required to see all nearby WiFi networks but with no permissions you can still see the current one.

Based on your interaction with an app, different network requests are sent. When the size of network requests for different events are easily distinguishable you can actually tell what the user has done based on 3). The authors demonstrate that:

Using some location filtering (helped by 2.) and a series of time stamps of Twitter posts (3.) you can fairly reliably identify the exact Twitter user of the phone you're on. This is important because many Twitter account contain the user's real name and potentially a link to their personal website. So in some cases an app with no permissions can identify the real person owning the phone.
The authors profiled the WebMD mobile app and found retrieving information about each disease resulted in a unique response size. Thus using 3) the app can observe the diseases searched by the user.
A similar attack is shown to be effective on the Yahoo! Finance app with stocks.

Finally, the authors show how a user's driving route can be inferred using whether the speaker status is on or off (4). The key idea is that reading out specific street names and directions in Navigation takes a different amount of time based on the instructions. Given a sequence of the lengths of time Navigation is reading out directions one can infer the path / destination. The route needed to have at least 9 steps for high accuracy.

android 2