Fingerprinting defenses in the Tor Browser

6:26:00 AM

The Tor Browser is based on Mozilla's Extended Support Release (ESR) Firefox branch. It has a series of patches against the browser to enhance privacy and security. Browser behavior is additionally augmented through the Torbutton extension, though they are in the process of moving this functionality into direct Firefox patches. They also change a number of Firefox preferences from their defaults. Tor process management and configuration is accomplished through the Tor Launcher addon, which provides the initial Tor configuration splash screen and bootstrap progress bar. Tor Launcher is also compatible with Thunderbird, Instantbird, and XULRunner.

To help protect against potential Tor Exit Node eavesdroppers, it includes HTTPS-Everywhere. To provide users with optional defense-in-depth against Javascript and other potential exploit vectors, it also includes NoScript. It also modifies several extension preferences from the defaults. To provide censorship circumvention in areas where the public Tor network is blocked either by IP, or by protocol fingerprint, it includes several Pluggable Transports in the distribution, Obfsproxy, meek, FTE, and FlashProxy.

forensic-fingerprinting-tor-brower-defense-unhappyghost-ethical-hacker-security-expert-india
Forensic Fingerprinting TOR Browser Defense | UnhappyGhost - Ethical Hacker - Security Expert - India 

Design Requirements and Philosophy


The Tor Browser Design Requirements are meant to describe the properties of a Private Browsing Mode that defends against both network and local forensic adversaries. There are two main categories of requirements: Security Requirements, and Privacy Requirements. Security Requirements are the minimum properties in order for a browser to be able to support Tor and similar privacy proxies safely. Privacy requirements are the set of properties that cause us to prefer one browser over another.

Security Requirements

The security requirements are primarily concerned with ensuring the safe use of Tor. Violations in these properties typically result in serious risk for the user in terms of immediate deanonymization and/or observability. With respect to browser support, security requirements are the minimum properties in order for Tor to support the use of a particular browser.

Proxy Obedience

The browser MUST NOT bypass Tor proxy settings for any content.

State Separation

The browser MUST NOT provide the content window with any state from any other browsers or any non-Tor browsing modes. This includes shared state from independent plugins, and shared state from operating system implementations of TLS and other support libraries.

Disk Avoidance

The browser MUST NOT write any information that is derived from or that reveals browsing activity to the disk, or store it in memory beyond the duration of one browsing session, unless the user has explicitly opted to store their browsing history information to disk.

Application Data Isolation

The components involved in providing private browsing MUST be self-contained, or MUST provide a mechanism for rapid, complete removal of all evidence of the use of the mode. In other words, the browser MUST NOT write or cause the operating system to write any information about the use of private browsing to disk outside of the application's control. The user must be able to ensure that secure deletion of the software is sufficient to remove evidence of the use of the software. All exceptions and shortcomings due to operating system behavior MUST be wiped by an uninstaller. However, due to permissions issues with access to swap, implementations MAY choose to leave it out of scope, and/or leave it to the operating system/platform to implement ephemeral-keyed encrypted swap.

Privacy Requirements

The privacy requirements are primarily concerned with reducing linkability: the ability for a user's activity on one site to be linked with their activity on another site without their knowledge or explicit consent. With respect to browser support, privacy requirements are the set of properties that cause us to prefer one browser over another.

For the purposes of the unlinkability requirements of this section as well as the descriptions in the implementation section, a url bar origin means at least the second-level DNS name. For example, for mail.google.com, the origin would be google.com. Implementations MAY, at their option, restrict the url bar origin to be the entire fully qualified domain name.

Cross-Origin Identifier Unlinkability

User activity on one url bar origin MUST NOT be linkable to their activity in any other url bar origin by any third party automatically or without user interaction or approval. This requirement specifically applies to linkability from stored browser identifiers, authentication tokens, and shared state. The requirement does not apply to linkable information the user manually submits to sites, or due to information submitted during manual link traversal. This functionality SHOULD NOT interfere with interactive, click-driven federated login in a substantial way.

Cross-Origin Fingerprinting Unlinkability

User activity on one url bar origin MUST NOT be linkable to their activity in any other url bar origin by any third party. This property specifically applies to linkability from fingerprinting browser behavior.

Long-Term Unlinkability

The browser MUST provide an obvious, easy way for the user to remove all of its authentication tokens and browser state and obtain a fresh identity. Additionally, the browser SHOULD clear linkable state by default automatically upon browser restart, except at user option.


Adversary Model


A Tor web browser adversary has a number of goals, capabilities, and attack types that can be used to illustrate the design requirements for the Tor Browser. Let's start with the goals.

Adversary Capabilities - Positioning

The adversary can position themselves at a number of different locations in order to execute their attacks.

Exit Node or Upstream Router

The adversary can run exit nodes, or alternatively, they may control routers upstream of exit nodes. Both of these scenarios have been observed in the wild.

Ad servers and/or Malicious Websites

The adversary can also run websites, or more likely, they can contract out ad space from a number of different ad servers and inject content that way. For some users, the adversary may be the ad servers themselves. It is not inconceivable that ad servers may try to subvert or reduce a user's anonymity through Tor for marketing purposes.

Local Network/ISP/Upstream Router

The adversary can also inject malicious content at the user's upstream router when they have Tor disabled, in an attempt to correlate their Tor and Non-Tor activity.

Additionally, at this position the adversary can block Tor, or attempt to recognize the traffic patterns of specific web pages at the entrance to the Tor network.

Physical Access

Some users face adversaries with intermittent or constant physical access. Users in Internet cafes, for example, face such a threat. In addition, in countries where simply using tools like Tor is illegal, users may face confiscation of their computer equipment for excessive Tor usage or just general suspicion.


Adversary Capabilities - Attacks


The adversary can perform the following attacks from a number of different positions to accomplish various aspects of their goals. It should be noted that many of these attacks (especially those involving IP address leakage) are often performed by accident by websites that simply have Javascript, dynamic CSS elements, and plugins. Others are performed by ad servers seeking to correlate users' activity across different IP addresses, and still others are performed by malicious agents on the Tor network and at national firewalls.

Read and insert identifiers

The browser contains multiple facilities for storing identifiers that the adversary creates for the purposes of tracking users. These identifiers are most obviously cookies, but also include HTTP auth, DOM storage, cached scripts and other elements with embedded identifiers, client certificates, and even TLS Session IDs.

An adversary in a position to perform MITM content alteration can inject document content elements to both read and inject cookies for arbitrary domains. In fact, even many "SSL secured" websites are vulnerable to this sort of active sidejacking. In addition, the ad networks of course perform tracking with cookies as well.

These types of attacks are attempts at subverting our Cross-Origin Identifier Unlinkability and Long-Term Unlinkability design requirements.

Fingerprint users based on browser attributes

There is an absurd amount of information available to websites via attributes of the browser. This information can be used to reduce anonymity set, or even uniquely fingerprint individual users. Attacks of this nature are typically aimed at tracking users across sites without their consent, in an attempt to subvert our Cross-Origin Fingerprinting Unlinkability and Long-Term Unlinkability design requirements.

Fingerprinting is an intimidating problem to attempt to tackle, especially without a metric to determine or at least intuitively understand and estimate which features will most contribute to linkability between visits.

The Panopticlick study done by the EFF uses the Shannon entropy - the number of identifying bits of information encoded in browser properties - as this metric. Their result data is definitely useful, and the metric is probably the appropriate one for determining how identifying a particular browser property is. However, some quirks of their study means that they do not extract as much information as they could from display information: they only use desktop resolution and do not attempt to infer the size of toolbars. In the other direction, they may be over-counting in some areas, as they did not compute joint entropy over multiple attributes that may exhibit a high degree of correlation. Also, new browser features are added regularly, so the data should not be taken as final.

Despite the uncertainty, all fingerprinting attacks leverage the following attack vectors:

Observing Request Behavior

Properties of the user's request behavior comprise the bulk of low-hanging fingerprinting targets. These include: User agent, Accept-* headers, pipeline usage, and request ordering. Additionally, the use of custom filters such as AdBlock and other privacy filters can be used to fingerprint request patterns (as an extreme example).

Inserting Javascript

Javascript can reveal a lot of fingerprinting information. It provides DOM objects such as window.screen and window.navigator to extract information about the useragent. Also, Javascript can be used to query the user's timezone via the Date() object, WebGL can reveal information about the video card in use, and high precision timing information can be used to fingerprint the CPU and interpreter speed. In the future, new JavaScript features such as Resource Timing may leak an unknown amount of network timing related information.

Inserting Plugins

The Panopticlick project found that the mere list of installed plugins (in navigator.plugins) was sufficient to provide a large degree of fingerprintability. Additionally, plugins are capable of extracting font lists, interface addresses, and other machine information that is beyond what the browser would normally provide to content. In addition, plugins can be used to store unique identifiers that are more difficult to clear than standard cookies. Flash-based cookies fall into this category, but there are likely numerous other examples. Beyond fingerprinting, plugins are also abysmal at obeying the proxy settings of the browser.

Inserting CSS

CSS media queries can be inserted to gather information about the desktop size, widget size, display type, DPI, user agent type, and other information that was formerly available only to Javascript.

Website traffic fingerprinting


Website traffic fingerprinting is an attempt by the adversary to recognize the encrypted traffic patterns of specific websites. In the case of Tor, this attack would take place between the user and the Guard node, or at the Guard node itself.

The most comprehensive study of the statistical properties of this attack against Tor was done by Panchenko et al. Unfortunately, the publication bias in academia has encouraged the production of a number of follow-on attack papers claiming "improved" success rates, in some cases even claiming to completely invalidate any attempt at defense. These "improvements" are actually enabled primarily by taking a number of shortcuts (such as classifying only very small numbers of web pages, neglecting to publish ROC curves or at least false positive rates, and/or omitting the effects of dataset size on their results). Despite these subsequent "improvements", we are skeptical of the efficacy of this attack in a real world scenario, especially in the face of any defenses.

In general, with machine learning, as you increase the number and/or complexity of categories to classify while maintaining a limit on reliable feature information you can extract, you eventually run out of descriptive feature information, and either true positive accuracy goes down or the false positive rate goes up. This error is called the bias in your hypothesis space. In fact, even for unbiased hypothesis spaces, the number of training examples required to achieve a reasonable error bound is a function of the complexity of the categories you need to classify.

In the case of this attack, the key factors that increase the classification complexity (and thus hinder a real world adversary who attempts this attack) are large numbers of dynamically generated pages, partially cached content, and also the non-web activity of entire Tor network. This yields an effective number of "web pages" many orders of magnitude larger than even Panchenko's "Open World" scenario, which suffered continuous near-constant decline in the true positive rate as the "Open World" size grew (see figure 4). This large level of classification complexity is further confounded by a noisy and low resolution featureset - one which is also relatively easy for the defender to manipulate at low cost.

To make matters worse for a real-world adversary, the ocean of Tor Internet activity (at least, when compared to a lab setting) makes it a certainty that an adversary attempting examine large amounts of Tor traffic will ultimately be overwhelmed by false positives (even after making heavy tradeoffs on the ROC curve to minimize false positives to below 0.01%). This problem is known in the IDS literature as the Base Rate Fallacy, and it is the primary reason that anomaly and activity classification-based IDS and antivirus systems have failed to materialize in the marketplace (despite early success in academic literature).

Still, we do not believe that these issues are enough to dismiss the attack outright. But we do believe these factors make it both worthwhile and effective to deploy light-weight defenses that reduce the accuracy of this attack by further contributing noise to hinder successful feature extraction.

Remotely or locally exploit browser and/or OS


Last, but definitely not least, the adversary can exploit either general browser vulnerabilities, plugin vulnerabilities, or OS vulnerabilities to install malware and surveillance software. An adversary with physical access can perform similar actions.

For the purposes of the browser itself, we limit the scope of this adversary to one that has passive forensic access to the disk after browsing activity has taken place. This adversary motivates our Disk Avoidance defenses.

An adversary with arbitrary code execution typically has more power, though. It can be quite hard to really significantly limit the capabilities of such an adversary. The Tails system can provide some defense against this adversary through the use of readonly media and frequent reboots, but even this can be circumvented on machines without Secure Boot through the use of BIOS rootkits.


Fingerprinting Properties


The Implementation section is divided into subsections, each of which corresponds to a Design Requirement. Each subsection is divided into specific web technologies or properties. The implementation is then described for that property.

In some cases, the implementation meets the design requirements in a non-ideal way (for example, by disabling features). In rare cases, there may be no implementation at all. Both of these cases are denoted by differentiating between the Design Goal and the Implementation Status for each property. Corresponding bugs in the Tor bug tracker are typically linked for these cases.

Proxy Obedience


Proxy obedience is assured through the following:

Firefox proxy settings, patches, and build flags
Firefox preferences file sets the Firefox proxy settings to use Tor directly as a SOCKS proxy. It sets network.proxy.socks_remote_dns, network.proxy.socks_version, network.proxy.socks_port, and network.dns.disablePrefetch.

To prevent proxy bypass by WebRTC calls, they disable WebRTC at compile time with the --disable-webrtc configure switch, as well as set the pref media.peerconnection.enabled to false.

They also patch Firefox in order to provide several defense-in-depth mechanisms for proxy safety. Notably, they patch the DNS service to prevent any browser or addon DNS resolution, and they also patch OCSP and PKIX code to prevent any use of the non-proxied command-line tool utility functions from being functional while linked in to the browser. In both cases, they could find no direct paths to these routines in the browser, but it seemed better safe than sorry.

During every Extended Support Release transition, they perform in-depth code audits to verify that there were no system calls or XPCOM activity in the source tree that did not use the browser proxy settings.

They have verified that these settings and patches properly proxy HTTPS, OCSP, HTTP, FTP, gopher (now defunct), DNS, SafeBrowsing Queries, all JavaScript activity, including HTML5 audio and video objects, addon updates, wifi geolocation queries, searchbox queries, XPCOM addon HTTPS/HTTP activity, WebSockets, and live bookmark updates. They have also verified that IPv6 connections are not attempted, through the proxy or otherwise (Tor does not yet support IPv6). They have also verified that external protocol helpers, such as smb urls and other custom protocol handlers are all blocked.

Disabling plugins

Plugins have the ability to make arbitrary OS system calls and bypass proxy settings. This includes the ability to make UDP sockets and send arbitrary data independent of the browser proxy settings.

Torbutton disables plugins by using the @mozilla.org/plugin/host;1 service to mark the plugin tags as disabled. This block can be undone through both the Torbutton Security UI, and the Firefox Plugin Preferences.

If the user does enable plugins in this way, plugin-handled objects are still restricted from automatic load through Firefox's click-to-play preference plugins.click_to_play.

In addition, to reduce any unproxied activity by arbitrary plugins at load time, and to reduce the fingerprintability of the installed plugin list, they also patch the Firefox source code to prevent the load of any plugins except for Flash and Gnash.

External App Blocking and Drag Event Filtering

External apps can be induced to load files that perform network activity. Unfortunately, there are cases where such apps can be launched automatically with little to no user input. In order to prevent this, Torbutton installs a component to provide the user with a popup whenever the browser attempts to launch a helper app.

Additionally, modern desktops now pre-emptively fetch any URLs in Drag and Drop events as soon as the drag is initiated. This download happens independent of the browser's Tor settings, and can be triggered by something as simple as holding the mouse button down for slightly too long while clicking on an image link. They filter drag and drop events events from Torbutton before the OS downloads the URLs the events contained.

Disabling system extensions and clearing the addon whitelist

Firefox addons can perform arbitrary activity on your computer, including bypassing Tor. It is for this reason we disable the addon whitelist (xpinstall.whitelist.add), so that users are prompted before installing addons regardless of the source. We also exclude system-level addons from the browser through the use of extensions.enabledScopes and extensions.autoDisableScopes.


Cross-Origin Fingerprinting Unlinkability


Fingerprinting defenses in the Tor Browser

The following defenses are listed roughly in order of most severe fingerprinting threat first. This ordering is based on the above intuition that user configurable aspects of the computer are the most severe source of fingerprintability, though we are in need of updated measurements to determine this with certainty.

Where our actual implementation differs from an ideal solution, we separately describe our Design Goal and our Implementation Status.

Plugins

Plugins add to fingerprinting risk via two main vectors: their mere presence in window.navigator.plugins (because they are optional, end-user installed third party software), as well as their internal functionality.

HTML5 Canvas Image Extraction

After plugins and plugin-provided information, we believe that the HTML5 Canvas is the single largest fingerprinting threat browsers face today. Initial studies show that the Canvas can provide an easy-access fingerprinting target: The adversary simply renders WebGL, font, and named color data to a Canvas element, extracts the image buffer, and computes a hash of that image data. Subtle differences in the video card, font packs, and even font and graphics library versions allow the adversary to produce a stable, simple, high-entropy fingerprint of a computer. In fact, the hash of the rendered image can be used almost identically to a tracking cookie by the web server.

In some sense, the canvas can be seen as the union of many other fingerprinting vectors. If WebGL is normalized through software rendering, system colors were standardized, and the browser shipped a fixed collection of fonts (see later points in this list), it might not be necessary to create a canvas permission. However, until then, to reduce the threat from this vector, we have patched Firefox to prompt before returning valid image data to the Canvas APIs, and for access to isPointInPath and related functions. If the user hasn't previously allowed the site in the URL bar to access Canvas image data, pure white image data is returned to the Javascript APIs.

Open TCP Port Fingerprinting

In Firefox, by using either WebSockets or XHR, it is possible for remote content to enumerate the list of TCP ports open on 127.0.0.1. In other browsers, this can be accomplished by DOM events on image or script tags. This open vs filtered vs closed port list can provide a very unique fingerprint of a machine, because it essentially enables the detection of many different popular third party applications and optional system services (Skype, Bitcoin, Bittorrent and other P2P software, SSH ports, SMB and related LAN services, CUPS and printer daemon config ports, mail servers, and so on). It is also possible to determine when ports are closed versus filtered/blocked (and thus probe custom firewall configuration).

In Tor Browser, we prevent access to 127.0.0.1/localhost by ensuring that even these requests are still sent by Firefox to our SOCKS proxy (ie we set network.proxy.no_proxies_on to the empty string). The local Tor client then rejects them, since it is configured to proxy for internal IP addresses by default.

Invasive Authentication Mechanisms (NTLM and SPNEGO)

Both NTLM and SPNEGO authentication mechanisms can leak the hostname, and in some cases the current username. The only reason why these aren't a more serious problem is that they typically involve user interaction, and likely aren't an attractive vector for this reason. However, because it is not clear if certain carefully-crafted error conditions in these protocols could cause them to reveal machine information and still fail silently prior to the password prompt, these authentication mechanisms should either be disabled, or placed behind a site permission before their use. We simply disable them.

USB Device ID Enumeration

The GamePad API provides web pages with the USB device id, product id, and driver name of all connected game controllers, as well as detailed information about their capabilities. This API should be behind a site permission in Private Browsing Modes, or should present a generic controller type (perhaps a two button controller that can be mapped to the keyboard) in all cases. We simply disable it via the pref dom.gamepad.enabled.

Fonts

According to the Panopticlick study, fonts provide the most linkability when they are provided as an enumerable list in filesystem order, via either the Flash or Java plugins. However, it is still possible to use CSS and/or Javascript to query for the existence of specific fonts. With a large enough pre-built list to query, a large amount of fingerprintable information may still be available, especially given that additional fonts often end up installed by third party software and for multilingual support.

Monitor, Widget, and OS Desktop Resolution

Both CSS and Javascript have access to a lot of information about the screen resolution, usable desktop size, OS widget size, toolbar size, title bar size, and OS desktop widget sizing information that are not at all relevant to rendering and serve only to provide information for fingerprinting. Since many aspects of desktop widget positioning and size are user configurable, these properties yield customized information about the computer, even beyond the monitor size.

Display Media information

Beyond simple resolution information, a large amount of so-called "Media" information is also exported to content. Even without Javascript, CSS has access to a lot of information about the device orientation, system theme colors, and other desktop and display features that are not at all relevant to rendering and also user configurable. Most of this information comes from CSS Media Queries, but Mozilla has exposed several user and OS theme defined color values to CSS as well.

WebGL

WebGL is fingerprintable both through information that is exposed about the underlying driver and optimizations, as well as through performance fingerprinting.

Because of the large amount of potential fingerprinting vectors and the previously unexposed vulnerability surface, we deploy a similar strategy against WebGL as for plugins. First, WebGL Canvases have click-to-play placeholders (provided by NoScript), and do not run until authorized by the user. Second, we obfuscate driver information by setting the Firefox preferences webgl.disable-extensions and webgl.min_capability_mode, which reduce the information provided by the following WebGL API calls: getParameter(), getSupportedExtensions(), and getExtension().

Another option for WebGL might be to use software-only rendering, using a library such as Mesa. The use of such a library would avoid hardware-specific rendering differences.

Locale Fingerprinting

In Tor Browser, we provide non-English users the option of concealing their OS and browser locale from websites. It is debatable if this should be as high of a priority as information specific to the user's computer, but for completeness, we attempt to maintain this property.

Timezone and Clock Offset

While the latency in Tor connections varies anywhere from milliseconds to a few seconds, it is still possible for the remote site to detect large differences between the user's clock and an official reference time source.

Javascript Performance Fingerprinting

Javascript performance fingerprinting is the act of profiling the performance of various Javascript functions for the purpose of fingerprinting the Javascript engine and the CPU.

Keystroke Fingerprinting

Keystroke fingerprinting is the act of measuring key strike time and key flight time. It is seeing increasing use as a biometric.

Operating System Type Fingerprinting

As we mentioned in the introduction of this section, OS type fingerprinting is currently considered a lower priority, due simply to the numerous ways that characteristics of the operating system type may leak into content, and the comparatively low contribution of OS to overall entropy. In particular, there are likely to be many ways to measure the differences in widget size, scrollbar size, and other rendered details on a page. Also, directly exported OS routines, such as the Math library, expose differences in their implementations due to these results. 

The source of this article and more details can be found here


#tor #forensic #torproject #torbrowser #fingerprinting #cybersecurity #forensics #firefox #socksproxy #webrtc #html5 #html5canvas #spnego #noscript #osfingerprinting #javascript #webgl

.

You Might Also Like

0 comments

Please choose to comment wisely, constructively, stay on the subject of the article, and respect the opinions of others. Commenting good or bad here may not impact the reputation of this blog but surely will show one of yours :)

If you have queries, issues, complaints, opinions or ideas especially if not related to this article, you are welcome to shoot them to us through Contact Page on this blog.

Contact Form

Name

Email *

Message *