HTTPS traffic analysis can leak user sensitive data

March 14, 2014

10:00 ET, 14 March 2014

A Team of US researchers at UC Berkeley conducted a study on the HTTPS traffic analysis of ten widely used HTTPS-secured Web sites with surprising results.

User’s privacy is considered a top priority after Snowden‘ revelations on the US surveillance program, recently a couple of cases have shocked IT security community both related to the lack of correct verification of digital certificates that allowed attackers to spy on secure communications. The first case was related to Apple that released an urgent update to iOS (iOS 7.06) to fix a flaw for certificate-validation checks that could be abused by attackers to conduct a man-in-the-middle attack within the victim’s network to capture or modify data even if protected by SSL/TLS.Yesterday I reported another clamorous case related to GnuTLS, a serious flaw in the certificate verification process of open source secure communications library exposes Linux distros, apps to attack, also in this case the flaw is exploitable for surveillance purposes.In both cases encrypted communications were not enough to preserve users from monitoring, another earthquake is shacking IT industry, also HTTPS can leak personal data to attackers.

A Team of US researchers at UC Berkeley focused a study on the HTTPS traffic analysis of ten widely used HTTPS-secured Web sites discovering that it is possible to capture “personal details, including medical conditions, financial and legal affairs and sexual orientation.”

HTTPS is not sufficient to make private web browsing, Berkeley researchers have revealed a technique for identifying individual web pages visited by users, with up to 89% accuracy.

The list of website using HTTPS examined includes Mayo Clinic, Planned Parenthood, Kaiser Permanente for health-care industry, Wells Fargo, Bank of America for banking and also Net ix and YouTube for streaming video.

The team composed researchers Brad Miller, A. D. Joseph and J. D. Tygar and Intel Labs’ researchers, Ling Huang demonstrated that HTTPS may be vulnerable to traffic analysis, the details of the researchers are reported in the paper issued, titled ‘I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis’.

“We present a trac analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same web-site with 89% accuracy, exposing personal details including medical con-ditions, financial and legal affairs and sexual orientation.“

By analogy to the approach of Bag-of-Words to document classification, the researchers refer their study as Bag-of-Gaussians (BoG), the Gaussian distribution is used to determine similarity to each cluster and map traffic samples into a fixed width representation compatible with a wide range of machine learning techniques.

“Our attack applies clustering techniques to identify patterns in traffic. We then use a Gaussian distribution to determine similarity to each cluster and map traffic samples into a fixed width representation compatible with a wide range of machine learning techniques,” “We design our attack to distinguish minor variations in HTTPS traffic from significant variations which indicate distinct traffic contents. Minor traffic variations may be caused by caching, dynamically generated content, or user-specific content, including cookies. Our attack applies clustering techniques to identify patterns in traffic.” say the researchers.

Prerequisite to conduct the attack is that the attacker must be able to visit the same web pages as the victim, and have access to the target’s traffic data to identify patterns in encrypted traffic indicative of content visited by the user. Be aware this means that an ISPs for example could be able to spy on web browsing even if it is on a secure channel via HTTPS.

“ISPs are uniquely well positioned to target and sell advertising since they have the most comprehensive view of the consumer. Both ISPs and commercial chains of Wi-Fi access points have shown efforts to mine customer data and/or sell advertising. These vulnerabilities would allow ISPs to conduct data mining despite the presence of encryption,” the researchers say.

The technique could be adopted also by oppressive regimes to target dissidents or by private companies to monitor web pages visited by their employees. The researchers detailed the evaluation of the performance of attack, comparing it with a selection of previous techniques (Liberatore and Levine (LL), Panchenko et al. (Pan), and Wang).

The BoG attack averages 89% accuracy, but what is surprising is that only 4 of the 10 websites included in experimental have an accuracy below 92%, it must be considered also that to date, all approaches have assumed that the victim browses the web in a single tab and that successive page loads can be easily delineated.

The team announced that in the future they will investigate the impact on analysis results removing this limitation, for example, while many users have multiple tabs open at the same time.

Pierluigi Paganini

(Editor-In-Chief, CDM)