Legal & ethical disclaimer. This article is for education and authorized security testing only. Open-source intelligence (OSINT) gathering is often passive, but querying third-party services, scraping, and enumerating targets can still violate terms of service, computer-misuse laws (e.g. the U.S. CFAA, the UK Computer Misuse Act), or your engagement scope. Only collect intelligence about assets you own or are explicitly contracted in writing to assess. Respect data-protection regulations (GDPR, etc.) when handling personal data.
Introduction / Overview
Open-source intelligence (OSINT) is the practice of collecting and correlating publicly available information to build a picture of a target before you ever send a packet at their production systems. In the kill chain it maps to Reconnaissance — MITRE ATT&CK tactic TA0043 — and it is the phase that most cheaply determines whether the rest of an engagement succeeds.
In this article you will learn how to run a structured passive recon workflow using five staple tools: theHarvester, recon-ng, Google dorks, Shodan, and Maltego. We will collect subdomains, e-mail addresses, exposed services, and credential leaks, then pivot the findings into a graph. Finally — with equal weight — we will cover how a blue team detects and shrinks this attack surface.
How it works / Background
OSINT divides cleanly into passive and semi-passive collection:
- Passive: you never touch the target's infrastructure. You query search engines, certificate-transparency logs (crt.sh), DNS aggregators, code repositories, breach databases, and registries (WHOIS/RDAP). The target sees nothing.
- Semi-passive: you touch the target indirectly — for example resolving a subdomain or pulling an HTTP banner — which generates traffic but looks like normal user activity.
The core data sources are remarkably consistent across tools:
| Source | What it yields |
|---|---|
| Certificate Transparency (crt.sh, Censys) | Subdomains, internal hostnames |
Search engines (site:, filetype:) |
Indexed files, login portals, errors |
| Shodan / Censys | Open ports, banners, products, CVEs |
| WHOIS / RDAP | Org, registrant, name servers |
| Breach corpora (HIBP) | Leaked e-mail/password pairs |
theHarvester and recon-ng are essentially collectors and correlators that wrap dozens of these APIs behind one interface.
Prerequisites / Lab setup
Use Kali Linux (or any Debian-based distro). Most of these ship pre-installed on Kali; otherwise:
# theHarvester
sudo apt install theharvester # or: pipx install theHarvester
# recon-ng
sudo apt install recon-ng # or: pipx install recon-ng
# Shodan CLI
pipx install shodan
shodan init <YOUR_API_KEY> # from account.shodan.ioBashRegister free API keys to dramatically improve results: Shodan, Hunter.io, VirusTotal, GitHub (a fine-grained read-only PAT), and SecurityTrails. In recon-ng these go into the keystore:
recon-ng
[recon-ng][default] > keys add shodan_api <KEY>
[recon-ng][default] > keys add hunter_io <KEY>
[recon-ng][default] > keys listBashFor a safe, ownable target throughout this guide we use the deliberately public test domain example.com — substitute your authorized scope.
Attack walkthrough / PoC
1. theHarvester — fast surface sweep
theHarvester aggregates subdomains and e-mails from many engines in one shot. The -b flag selects backends (data sources):
# Enumerate subdomains and e-mails using crt.sh, DNS dumpster, and Bing
theHarvester -d example.com -l 500 -b crtsh,bing,dnsdumpster -f harvest_example
# Use all available sources
theHarvester -d example.com -b allBash-l limits results per source, -f writes an HTML/JSON/XML report. The output gives you a seed list of hosts and addresses to feed into the next stages.
2. recon-ng — modular, repeatable enumeration
recon-ng is a Metasploit-style framework: workspaces, a database, and modules. A typical subdomain-to-host pivot:
recon-ng
[recon-ng][default] > workspaces create example
[recon-ng][example] > db insert domains
domain (TEXT) > example.com
# Pull subdomains from certificate transparency
[recon-ng][example] > marketplace install recon/domains-hosts/certificate_transparency
[recon-ng][example] > modules load recon/domains-hosts/certificate_transparency
[recon-ng][example] > run
# Resolve discovered hosts to IPs
[recon-ng][example] > modules load recon/hosts-hosts/resolve
[recon-ng][example] > run
# Show what we gathered, then export
[recon-ng][example] > show hosts
[recon-ng][example] > modules load reporting/csv
[recon-ng][example] > runBashBecause everything lands in a SQLite database, recon-ng excels at chaining: hosts feed resolvers, resolvers feed port-scan reporting, and contacts feed breach-lookup modules.
3. Google dorks — surgical search-engine queries
Dorking uses advanced operators to surface content the target probably did not intend to expose:
site:example.com -www # subdomains indexed by Google
site:example.com filetype:pdf # exposed documents
site:example.com inurl:admin | inurl:login # login portals
site:example.com intitle:"index of" # open directory listings
site:example.com ext:sql | ext:env | ext:log # leaked configs and dumps
"example.com" site:pastebin.com # paste leaks
site:github.com "example.com" password # credentials in public reposPlaintextThe Google Hacking Database (GHDB) at exploit-db.com curates thousands of vetted dorks. Pair these with GitHub code search (org:ExampleCorp AKIA for AWS keys, filename:.env).
4. Shodan — the search engine for devices
Shodan indexes service banners across the IPv4 space, so you can find exposed assets without scanning them yourself:
# Everything Shodan knows about an org's IP
shodan host 93.184.216.34
# Search facets: exposed RDP on a netblock
shodan search 'net:203.0.113.0/24 port:3389'
# Find an org's Internet-facing assets
shodan search 'org:"Example Corp" http.title:"login"'
# Hosts vulnerable to a named CVE
shodan search 'vuln:CVE-2021-44228 org:"Example Corp"'BashShodan tags banners with detected CVEs (e.g. CVE-2021-44228 / Log4Shell), turning recon directly into a vulnerability shortlist. Censys offers a comparable certificate- and host-centric dataset.
5. Maltego — visual link analysis
Maltego turns flat lists into a graph. You drop an Entity (Domain example.com) onto the canvas, then run Transforms — small queries that expand one entity into related ones: domain to MX records, domain to subdomains (via crt.sh/PassiveTotal), e-mail to breach via Have I Been Pwned, person to social profiles. The result is a relationship map that exposes pivots a text list hides — for instance a shared name server linking two "unrelated" target companies.
Mermaid diagram

The flow: collect names and emails, store them in recon-ng, resolve to IPs, enrich with Shodan service/CVE data, then correlate everything in Maltego to produce a prioritized attack surface.
For deeper post-recon steps, see Subdomain Enumeration & Takeover and, once you have valid users, Password Spraying Against Microsoft 365.
Detection & Defense (Blue Team)
Passive OSINT against third-party services is largely invisible to the target, so defense is about reducing what is collectable and monitoring the indirect signals that remain.
Attack-surface reduction
- Audit Certificate Transparency yourself. Subscribe to CT-log monitoring (crt.sh feeds, Cert Spotter, Facebook CT monitor) so you discover new/forgotten subdomains the same way attackers do. Decommission stale hosts.
- Scan as the adversary does. Run periodic Shodan/Censys queries on your own ASN and IP ranges (
org:/net:facets) and use Shodan Monitor alerts to flag newly exposed ports. Close management interfaces (RDP 3389, SSH 22, databases) behind a VPN or zero-trust proxy. - Hunt your own leaks. Continuously scan GitHub and pastes for your domains and secret patterns using gitleaks, trufflehog, or GitHub secret scanning + push protection. Rotate any key that appears (e.g. an
AKIA...AWS key) immediately. - Minimize WHOIS/DNS exposure. Enable WHOIS privacy, and avoid descriptive internal hostnames (
vpn-prod-finance.example.com) in public DNS or certificates.
Monitoring the semi-passive edge
- Detect dorking and scraping in web logs: bursts of 404s on
/admin,/.git/,/.env,/backup.sql, orUser-Agentstrings from known tools. Return generic 404s and removeindex ofdirectory listing (Options -Indexes/autoindex off). - Block sensitive indexing with
X-Robots-Tag: noindexand proper auth — never rely onrobots.txt, which is itself a recon map. - Credential-leak response. Feed Have I Been Pwned Domain Search into your IAM workflow; force resets and enforce phishing-resistant MFA so leaked passwords from theHarvester/recon-ng results are dead on arrival.
Relevant ATT&CK coverage: detection maps to mitigation M1056 (Pre-compromise) and the active-defense techniques under Reconnaissance (TA0043) such as T1593 (Search Open Websites/Domains), T1596 (Search Open Technical Databases — Shodan/Censys, WHOIS, CT), and T1589 (Gather Victim Identity Information).
Conclusion
OSINT is the highest-ROI phase of an engagement: theHarvester and recon-ng give breadth, Google dorks and Shodan give depth, and Maltego turns the noise into pivots. The same techniques are available to defenders — the team that scans its own CT logs, IP ranges, and code repositories first removes the very findings an attacker would have weaponized. Treat external attack-surface management as a continuous control, not a one-off pentest deliverable. For the next phase, continue to Active Recon & Network Scanning with Nmap.
References
- MITRE ATT&CK — Reconnaissance (TA0043): https://attack.mitre.org/tactics/TA0043/
- MITRE ATT&CK — T1596 Search Open Technical Databases: https://attack.mitre.org/techniques/T1596/
- theHarvester (laramies): https://github.com/laramies/theHarvester
- recon-ng (lanmaster53): https://github.com/lanmaster53/recon-ng
- Shodan documentation: https://help.shodan.io/
- Google Hacking Database (GHDB): https://www.exploit-db.com/google-hacking-database
- Maltego documentation: https://docs.maltego.com/
- HackTricks — External Recon Methodology: https://book.hacktricks.xyz/generic-methodologies-and-resources/external-recon-methodology
- Have I Been Pwned: https://haveibeenpwned.com/



Comments