The goal of this project was to start with a base directory (in this case The Hidden Wiki) and start spidering out to discover all reachable Tor servers. Some restrictions were placed on this after a few trial runs:
- Only HTML/JSON was parsed/spidered for more links to follow (no jpegs/xml, etc)
- There were a few skipped websites, noteably: Facebook, Reddit, and a few Blockchain websites due to the amount of spidering/time that would be required
- Limited to 10k visits per host so we wouldn’t infinitely keep spidering / some reasonable time frame to finish
200 OKstatus responses were skipped
Table of Contents
- Stack & Tools
- Crawl Stats
- Security Headers
Stack & Tools
I used a few different tools to build this out:
- HA Proxy to load balance between
torSOCKs proxies so multiple could be run at the same time to saturate a network link
- Redis to store state information about visits
- Golang for the spidering
- Postgres for data storage
This was all run on a single dedicated server over the period of about 1 week, multiple prototypes ran before that to flush out bugs.
|Total Scanned Pages||14,177,383|
|Total Visited (non-200+)||17,038,091|
|Content Security Policy (CSP)||0.15%|
|Cross-origin Resource Sharing (CORS)||0.07%|
|– Subresource Integrity (SRI)||0%|
|Public Key Pinning (HPKP)||0.01%|
|Strict Transport Security (HSTS)||0.11%|
Some of these headers are interesting when viewed through a Tor light. HSTS and HPKP for example, can be used for super cookies and tracking (although tor does protect against this across new identities) (source).
Services implementing CORS also help protect users by preventing cookie finger printing via scripts and other malicious finger printing methods.
We can fingerprint and figure out exposed software by taking a look at a few different signatures, like cookies and headers. There are other methods to fingerprint using the response body but due to server restrictions and time I couldn’t save every single page source, so the results based on headers/titles are below:
Source code hosting
|Gogs||Forked version has header||
I’m going to focus on build servers because I think this is the most easy to breach front. Not only has Jenkins had some serious RCE’s in the past, it is very helpful in identifying itself with headers and debug information as seen below. People also generally store sensitive information in build servers as well, such as SSH keys and cloud provider credentials.
1 | X-Jenkins-Session: 8965d09b 2 | X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAA..... 3 | Server: Jetty(9.2.z-SNAPSHOT) 4 | X-Xss-Protection: 1 5 | X-Jenkins: 2.60.1 6 | X-Jenkins-Cli-Port: 46689 7 | X-Content-Type-Options: nosniff nosniff 8 | X-Frame-Options: sameorigin sameorigin 9 | X-Hudson-Theme: default 10 | X-Jenkins-Cli2-Port: 46689 11 | Referrer-Policy: same-origin 12 | Content-Type: text/html;charset=UTF-8 13 | X-Hudson: 1.395 14 | X-Hudson-Cli-Port: 46689 15 | Set-Cookie: JSESSIONID.112b5e69=16uts5qfqz6j....Path=/;Secure;HttpOnly
We can get Jenkins version, CLI ports, and Jetty versions all from just visiting the host.
|Gocd||Cookie Path / Title||Generally sets a cookie path at
Unfortunately I was unable to find any exposed Gocd or Drone servers.
I was not able to find any running BugZilla, Mantis or OTRS instances.
Popular Web Servers
Total with Server Header: 15,630
Total without header: 91,437
Top 10 (full list of 282 available for download)
1 nginx | 9619 2 Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.6.30 | 2659 3 Apache | 1056 4 nginx/1.6.2 | 249 5 nginx/1.13.1 | 210 6 Apache/2.4.10 (Debian) | 161 7 Apache/2.4.18 (Ubuntu) | 100 8 Apache/2.2.22 (Debian) | 90 9 Apache/2.4.7 (Ubuntu) | 82 10 lighttpd/1.4.31 | 80 11 FobbaWeb/0.1 | 78
Just from the
Server header we can gather a bunch of useful information:
- 2,659 servers are running a potentially vulnerable OpenSSL version (1.0.1e) [vulns] and vulnerable Apache version [vulns]
- Many servers are leaving the OS tag on, revealing a mix of operating systems. I think it’s also a safe assumption to say the same people who would leave fingerprinting on will also be using the OS package of these servers, making it easy to combine both OS vulnerabilities and web server vulnerabilities to combine attack vectors:
- Amazon Linux
- Red Hat
- Scientific Linux
- Some people are exposing application servers directly:
- Very old versions of IIS (5.0/6.0), Apache (1.3), and Nginx
- Nginx appears to dominate the server share on Tor - just taking the top 2 in account, nginx is at least 3.5x as popular as Apache
This was a fun project to work on and I learned quite a bit about scaling up the tor binary in order to scan the network faster. I’m hoping to make this process a bit less manual and start publishing these results regularly over at my security data website, https://hnypots.com
Have any suggestions for other software to look for? Leave a comment and let me know!