Security and Software of hosts on the Tor Network
Jul 23, 2017 - 5 minutesThe goal of this project was to start with a base directory (in this case The Hidden Wiki) and start spidering out to discover all reachable Tor servers. Some restrictions were placed on this after a few trial runs:
- Only HTML/JSON was parsed/spidered for more links to follow (no jpegs/xml, etc)
- There were a few skipped websites, noteably: Facebook, Reddit, and a few Blockchain websites due to the amount of spidering/time that would be required
- Limited to 10k visits per host so we wouldn’t infinitely keep spidering / some reasonable time frame to finish
- Non
200 OK
status responses were skipped
Table of Contents
- Stack & Tools
- Crawl Stats
- Security Headers
- Software
- Source Code Hosting
- Build Servers
- Popular Servers
- Summary
Stack & Tools
I used a few different tools to build this out:
- HA Proxy to load balance between
tor
SOCKs proxies so multiple could be run at the same time to saturate a network link - Redis to store state information about visits
- Golang for the spidering
- Postgres for data storage
This was all run on a single dedicated server over the period of about 1 week, multiple prototypes ran before that to flush out bugs.
Crawl Stats
Metric | Count |
---|---|
Total Hosts | 107,067 |
Total Scanned Pages | 14,177,383 |
Total Visited (non-200+) | 17,038,091 |
Security Headers
Technology | % using |
---|---|
Content Security Policy (CSP) | 0.15% |
Secure Cookie | 0.01% |
– httpOnly | 0% |
Cross-origin Resource Sharing (CORS) | 0.07% |
– Subresource Integrity (SRI) | 0% |
Public Key Pinning (HPKP) | 0.01% |
Strict Transport Security (HSTS) | 0.11% |
X-Content-Type-Options (XCTO) | 0.52% |
X-Frame-Options (XFO) | 0.58% |
X-XSS-Protection | 0% |
Some of these headers are interesting when viewed through a Tor light. HSTS and HPKP for example, can be used for super cookies and tracking (although tor does protect against this across new identities) (source).
Services implementing CORS also help protect users by preventing cookie finger printing via scripts and other malicious finger printing methods.
Software Stats
We can fingerprint and figure out exposed software by taking a look at a few different signatures, like cookies and headers. There are other methods to fingerprint using the response body but due to server restrictions and time I couldn’t save every single page source, so the results based on headers/titles are below:
Source code hosting
Software | Type | Identifier |
---|---|---|
Gitea | Cookie | i_like_gitea [src] |
GitLab | Cookie | gitlab_session [src] |
Gogs | Forked version has header | X-Clacks-Overhead: GNU Terry Pratchett from NotABug.org |
Build Servers
I’m going to focus on build servers because I think this is the most easy to breach front. Not only has Jenkins had some serious RCE’s in the past, it is very helpful in identifying itself with headers and debug information as seen below. People also generally store sensitive information in build servers as well, such as SSH keys and cloud provider credentials.
1| X-Jenkins-Session: 8965d09b
2| X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAA.....
3| Server: Jetty(9.2.z-SNAPSHOT)
4| X-Xss-Protection: 1
5| X-Jenkins: 2.60.1
6| X-Jenkins-Cli-Port: 46689
7| X-Content-Type-Options: nosniff nosniff
8| X-Frame-Options: sameorigin sameorigin
9| X-Hudson-Theme: default
10| X-Jenkins-Cli2-Port: 46689
11| Referrer-Policy: same-origin
12| Content-Type: text/html;charset=UTF-8
13| X-Hudson: 1.395
14| X-Hudson-Cli-Port: 46689
15| Set-Cookie: JSESSIONID.112b5e69=16uts5qfqz6j....Path=/;Secure;HttpOnly
We can get Jenkins version, CLI ports, and Jetty versions all from just visiting the host.
Software | Type | Identifier |
---|---|---|
Jenkins | Headers | X-Jenkins- and X-Hudson- style headers |
GitLab | Cookie | gitlab_session |
Gocd | Cookie Path / Title | Generally sets a cookie path at /go and uses - Go in <title> tags |
Drone | Title | Sets a drone title |
Unfortunately I was unable to find any exposed Gocd or Drone servers.
Software Tracking
Software | Type | Identifier |
---|---|---|
Trac | Cookie | trac_session |
Redmine | Cookie | redmine_session |
I was not able to find any running BugZilla, Mantis or OTRS instances.
Popular Web Servers
Total with Server Header: 15,630
Total without header: 91,437
Top 10 (full list of 282 available for download)
1nginx | 9619
2Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.6.30 | 2659
3Apache | 1056
4nginx/1.6.2 | 249
5nginx/1.13.1 | 210
6Apache/2.4.10 (Debian) | 161
7Apache/2.4.18 (Ubuntu) | 100
8Apache/2.2.22 (Debian) | 90
9Apache/2.4.7 (Ubuntu) | 82
10lighttpd/1.4.31 | 80
11FobbaWeb/0.1 | 78
Just from the Server
header we can gather a bunch of useful information:
- 2,659 servers are running a potentially vulnerable OpenSSL version (1.0.1e) [vulns] and vulnerable Apache version [vulns]
- Many servers are leaving the OS tag on, revealing a mix of operating systems. I think it’s also a safe assumption to say the same people who would leave fingerprinting on will also be using the OS package of these servers, making it easy to combine both OS vulnerabilities and web server vulnerabilities to combine attack vectors:
- CentOS
- Debian
- Ubuntu
- Windows
- Raspbian
- Amazon Linux
- Fedora
- Red Hat
- Trisquel
- YellowDog
- FreeBSD
- Scientific Linux
- Vine
- Some people are exposing application servers directly:
- thin
- node-static
- gunicorn
- Mojolicious
- WSGI
- Jetty
- GlassFish
- Very old versions of IIS (5.0/6.0), Apache (1.3), and Nginx
- Nginx appears to dominate the server share on Tor - just taking the top 2 in account, nginx is at least 3.5x as popular as Apache
Summary
This was a fun project to work on and I learned quite a bit about scaling up the tor binary in order to scan the network faster. I’m hoping to make this process a bit less manual and start publishing these results regularly over at my security data website, https://hnypots.com
Have any suggestions for other software to look for? Leave a comment and let me know!