{ Josh Rendek }

<3 Go & Kubernetes

The goal of this project was to start with a base directory (in this case The Hidden Wiki) and start spidering out to discover all reachable Tor servers. Some restrictions were placed on this after a few trial runs:

  • Only HTML/JSON was parsed/spidered for more links to follow (no jpegs/xml, etc)
  • There were a few skipped websites, noteably: Facebook, Reddit, and a few Blockchain websites due to the amount of spidering/time that would be required
  • Limited to 10k visits per host so we wouldn’t infinitely keep spidering / some reasonable time frame to finish
  • Non 200 OK status responses were skipped

Table of Contents

Stack & Tools

I used a few different tools to build this out:

  • HA Proxy to load balance between tor SOCKs proxies so multiple could be run at the same time to saturate a network link
  • Redis to store state information about visits
  • Golang for the spidering
  • Postgres for data storage

This was all run on a single dedicated server over the period of about 1 week, multiple prototypes ran before that to flush out bugs.

Crawl Stats

Metric Count
Total Hosts 107,067
Total Scanned Pages 14,177,383
Total Visited (non-200+) 17,038,091

Security Headers

Technology % using
Content Security Policy (CSP) 0.15%
Secure Cookie 0.01%
– httpOnly 0%
Cross-origin Resource Sharing (CORS) 0.07%
– Subresource Integrity (SRI) 0%
Public Key Pinning (HPKP) 0.01%
Strict Transport Security (HSTS) 0.11%
X-Content-Type-Options (XCTO) 0.52%
X-Frame-Options (XFO) 0.58%
X-XSS-Protection 0%

Some of these headers are interesting when viewed through a Tor light. HSTS and HPKP for example, can be used for super cookies and tracking (although tor does protect against this across new identities) (source).

Services implementing CORS also help protect users by preventing cookie finger printing via scripts and other malicious finger printing methods.

Software Stats

We can fingerprint and figure out exposed software by taking a look at a few different signatures, like cookies and headers. There are other methods to fingerprint using the response body but due to server restrictions and time I couldn’t save every single page source, so the results based on headers/titles are below:

Source code hosting

Software Type Identifier
Gitea Cookie i_like_gitea [src]
GitLab Cookie gitlab_session [src]
Gogs Forked version has header X-Clacks-Overhead: GNU Terry Pratchett from NotABug.org

Build Servers

I’m going to focus on build servers because I think this is the most easy to breach front. Not only has Jenkins had some serious RCE’s in the past, it is very helpful in identifying itself with headers and debug information as seen below. People also generally store sensitive information in build servers as well, such as SSH keys and cloud provider credentials.

 1| X-Jenkins-Session: 8965d09b
 2| X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAA.....
 3| Server: Jetty(9.2.z-SNAPSHOT)
 4| X-Xss-Protection: 1
 5| X-Jenkins: 2.60.1
 6| X-Jenkins-Cli-Port: 46689
 7| X-Content-Type-Options: nosniff nosniff
 8| X-Frame-Options: sameorigin sameorigin
 9| X-Hudson-Theme: default
10| X-Jenkins-Cli2-Port: 46689
11| Referrer-Policy: same-origin
12| Content-Type: text/html;charset=UTF-8
13| X-Hudson: 1.395
14| X-Hudson-Cli-Port: 46689
15| Set-Cookie: JSESSIONID.112b5e69=16uts5qfqz6j....Path=/;Secure;HttpOnly

We can get Jenkins version, CLI ports, and Jetty versions all from just visiting the host.

Software Type Identifier
Jenkins Headers X-Jenkins- and X-Hudson- style headers
GitLab Cookie gitlab_session
Gocd Cookie Path / Title Generally sets a cookie path at /go and uses - Go in <title> tags
Drone Title Sets a drone title

Unfortunately I was unable to find any exposed Gocd or Drone servers.

Software Tracking

Software Type Identifier
Trac Cookie trac_session
Redmine Cookie redmine_session

I was not able to find any running BugZilla, Mantis or OTRS instances.

Total with Server Header: 15,630

Total without header: 91,437

Top 10 (full list of 282 available for download)

 1nginx | 9619
 2Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.6.30 | 2659
 3Apache | 1056
 4nginx/1.6.2 | 249
 5nginx/1.13.1 | 210
 6Apache/2.4.10 (Debian) | 161
 7Apache/2.4.18 (Ubuntu) | 100
 8Apache/2.2.22 (Debian) | 90
 9Apache/2.4.7 (Ubuntu) | 82
10lighttpd/1.4.31 | 80
11FobbaWeb/0.1 | 78
Full list available here

Just from the Server header we can gather a bunch of useful information:

  • 2,659 servers are running a potentially vulnerable OpenSSL version (1.0.1e) [vulns] and vulnerable Apache version [vulns]
  • Many servers are leaving the OS tag on, revealing a mix of operating systems. I think it’s also a safe assumption to say the same people who would leave fingerprinting on will also be using the OS package of these servers, making it easy to combine both OS vulnerabilities and web server vulnerabilities to combine attack vectors:
    • CentOS
    • Debian
    • Ubuntu
    • Windows
    • Raspbian
    • Amazon Linux
    • Fedora
    • Red Hat
    • Trisquel
    • YellowDog
    • FreeBSD
    • Scientific Linux
    • Vine
  • Some people are exposing application servers directly:
    • thin
    • node-static
    • gunicorn
    • Mojolicious
    • WSGI
    • Jetty
    • GlassFish
  • Very old versions of IIS (5.0/6.0), Apache (1.3), and Nginx
  • Nginx appears to dominate the server share on Tor - just taking the top 2 in account, nginx is at least 3.5x as popular as Apache

Summary

This was a fun project to work on and I learned quite a bit about scaling up the tor binary in order to scan the network faster. I’m hoping to make this process a bit less manual and start publishing these results regularly over at my security data website, https://hnypots.com

Have any suggestions for other software to look for? Leave a comment and let me know!

comments powered by Disqus