How quickly can hackers find exposed data online? Faster than you think …

Jatin Jain
September 9, 2020 by
Jatin Jain


Data is an organization’s most valuable asset. An organization's data can consist of their financial details, consumer information, NPI, PII information, employee details and more. The protection of employee and consumer data should be the top priority of any organization, as exposing data online could result in a critical attack and untold damages.

A hacker can easily find exposed online data. That can turn into a nightmare, because an organization-focused hacker commonly uses open-source intelligence tools to do the job. 

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

In August of 2019, LinkedIn suffered a massive dump of data exposed online. More than 159 million pieces of data were found available on Pastebin in an “email: password” format. Of course, this is all available, and for .012BTC you can attempt to access anyone's LinkedIn user account. If they have not changed their password and used the same password on other sites, you can hack into many places where they do their work. You could access their iCloud/Gmail/Yahoo with all their photos, email accounts, Facebook and Instagram — all of them vulnerable to hacking once you have this database.

The Dark Web

One nefarious way that a hacker can find the exposed data of an organization online is on the Dark Web or Deep Web. The Dark Web is web content that exists on darknets, overlay networks that use the internet but require specific software, configurations or authorization to access. The Dark Web uses onion routing, which is only accessible via TOR VPN. 

The best part about onion routing for hackers is that it is not indexed by google and not accessible via the standard internet. A hacker is very much aware of onion routing and how to access Dark Web. It’s been found that the details of over 267 million Facebook accounts are available on the dark web for a mere 500 Euros. The data consist of user's account details, such as names, user IDs, and phone numbers. While these details don't include passwords and other such credentials, they can still be used by the hacker to impersonate/steal a user's identity with which they could conduct a spamming and/or phishing attack with great realism. 

The Dark Web is a protected place to buy/sell such sensitive details on the internet without being caught because it provides complete anonymity for anyone who uses it. 

Zoosk is a dating app which suffered a massive data breach recently. The hacking group ShinyHunters have offered for sale what they claim is the stolen account information of millions of online daters who have used the popular app. Nowadays, a hacker with a good understanding of the Dark Web could easily get their hands on this dump and access millions of stolen accounts, whereas a regular internet user may not even know what the dark web is.

OSINT stands for open-source intelligence. The OSINT framework is focused on gathering information from free tools or resources, and the intention is to help people find free OSINT resources. Some of the sites included might require registration or offer more data for money, but you should be able to get at least a portion of the available information for no cost. 

OSINT helps hackers to search available data for a person- or organization-specific data that is exposed on the internet. There are plenty of tools available in the OSINT Framework to achieve this task. OSINT helps the hacker to gain small but very critical details of a person/organization, such as the location of their office, current job openings, the employee working at the office, their names, address, Social Security number and so forth. 

In today's world, there is more data available on the internet than ever before. The major difference between a regular internet user and a hacker is that a hacker knows where to look for the most reliable information. Using OSINT techniques, a hacker can gather following details about an organization:

  • Username
  • Email address
  • Domain name
  • IP address
  • Image/video/docs
  • Telephone number
  • Public records
  • Business records
  • Forum/blogs
  • Documentation

And there are plenty of things that an attacker can get their hands on:

A hacker gathers all possible data they can using OSINT. Once the data gathering is complete, the hacker then analyzes all useful data. For example, “theHarvester” is an awesome tool for gathering IP, email, subdomains of a website/organization by using a search engine, e.g., Google, from a public source.


Google hacking, also named Google dorking, is a hacker technique that uses Google search and other Google applications to find security holes in the configuration and computer code of websites. 

In Google dorking, the hacker takes advantage of Google’s advanced search capability to find only the data type he has requested. This reduces the Google search result to show the unwanted result which may be not very useful for the hacker. For example, if a hacker wants to search the PDF document for a website that is available on public sites, he will do the following query to get the desired result:

  site:example.org filetype:pdf


In the above example, the hacker is searching for all the PDF files available on example.org and publicly accessible. We may not know what sensitive files have been unintentionally exposed. Suppose you find a file called configuration.pdf that may have the configuration details about their software which they are utilizing for their day-to-day tasks. If lucky enough, we may also discover the default credentials in the configuration file. 

Hackers also use Google dorking to find the default login panel on a website that may not be directly linked to the website and may launch a brute-force attack on it.

Shodan is the world's first search engine for internet-connected devices. Shodan indexes everything, unlike Google and other search engines that just only display the web. Shodan will find everything that is connected to the internet, including a webcam, traffic signal, smart TV and more. Shodan lets you search with an IP or subnet, open ports on those IPs and vulnerabilities. 

Shodan gave you the power to discover the devices which are vulnerable to a specific vulnerability. If a device is vulnerable to an “FTP anonymous” attack, Shodan will list that in the search result. That will help to discover a good number of servers/devices vulnerable to FTP anonymous. Shodan also helps you to filter the result by country, operating system and organization. 

First, a hacker will search for “FTP anonymous.” From there, you can filter the results. In the left sidebar, we see a good amount of summary data:

  • Results map
  • Top services
  • Top organizations
  • Top operating systems
  • Top products

Then in the main section, we get complete details, including:

  • IP address
  • Hostname
  • ISP
  • The country
  • The banner

Shodan also provides an advanced search feature through which a hacker can search via Google dorking and gets a filtered result. Let’s suppose a hacker wants to know how many devices are vulnerable in the city of London. He will simply put a query in Shodan like below:

  • FTP anonymous city:"london"

This will provide you with the result for all servers that are vulnerable via FTP anonymous in London. With this result, the hacker will try to log into FTP server and will get their hands on all the data located in the FTP server. From there, they start filtering out data that seems more sensitive and even may get cleartext credentials stored on these servers.

Let’s suppose there is a website called XYZ from the last five years and they regularly update their website. You know that they posted something regarding a job query in which they disclosed their operating system or software details and now you don't have access for that page. 

So how would you get that page back and retrieve the sensitive details? Many regular internet users will think that is impossible, for to do so one would think to travel back in time. But what if this is possible? You can visit the pages a company had in the past but no longer in the current version of the website. 

The answer is the Wayback Machine. Don’t think this is a fantasy time travel machine that will send you back in time where you can browse the website. Instead, it takes a snapshot of the website when a website gets updated. It records every change that happened in the past on the website.

The hacker would simply visit the Wayback Machine and enter the website address to browse the history of a website. This would provide a calendar of website history, where you can go to any past date and visit the snapshot of that day. This also lets you know how many snapshots were taken on a particular day with the time mentioned.


Now, as a hacker, how would you chain all of these to get a fruitful result? Let’s suppose you want to target a company called XYZ with basic internet knowledge that you found on their website called xyz.com. A hacker would simply fire up the Harvester tool to see if they can retrieve some IP or subdomain details from the site. 

After running the tool, you find that there is one email address that the tool found: admin@xyz.com. Now there is a website that lets you know if the email address has been compromised or not. It is called Have I Been Pwned. From here, a hacker will search to see if the email is compromised or not. 

Another method the hacker can use is an OSINT tool called Maltego. Maltego will automate the email query using the OSINT technique and will also help you to discover where the credentials leaks are from compromised accounts. If you are lucky enough, Maltego will show you a result from Pastebin if the email is leaked there. Now you may find not only the admin@xyz.com email account, but many more from the same group. This is how a hacker chains all the discovered details to obtain maximum results that get succeed eventually.

Let’s talk about the S3 bucket

2017 was one of the worst years for data breaches and compromises. It has been noted that 5,207 breaches took place, which is around 20% higher than in 2015.

Quite a few organizations were impacted — Accenture, Dow Jones and Verizon, to name a few. Many cases were found where organizations misconfigured their AWS S3 buckets containing highly sensitive data and put it as publicly readable. More red flags were found; research says that the majority of the breaches carried out by hackers were not preplanned and could be avoided by not making mistakes that the dev/arch/ops teams were doing in the way they were handling AWS S3 buckets.

Data breaches often occur due to the team changing the default permissions before working these buckets. S3 buckets could be configured either manually or using script and misconfigurations can be an oversight by admins or scripts at the time of granting permissions. The same has been noticed when a user changes permission for temporary use and sometimes forgets to revert back. This is a blatant invitation to a hacker.

Searching for publicly-exposed buckets is easy for a hacker, as buckets have a predictable and publicly accessible URL. By default, the URL should be following:

  • s3.amazonaws.com/[bucket_name]/


  • [bucket_name].s3.amazonaws.com/

A bucket can sometimes be exposed in HTML code as well. Plenty of tools are available to seek such buckets. The best way is to go for the brute-force approach in order to predict the name of the buckets.

Widely-known tool Burp Suite can also be useful with the AWS Extender. Once you get the bucket name successfully, it is possible to exploit only if no configuration is made correctly.

“AWS CLI” can be useful in order to facilitate this exploitation process.


Hackers have a useful list of tools and resources available to them. They use them to enumerate the maximum number of details about a person or organization by gathering all publicly exposed data, which a regular internet user may not find with regular internet browsing knowledge. The hacker knows exactly where to look up sensitive data using OSINT, Google dorking, the Deep Web and more.

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.



  1. 250 Million Microsoft Customer Support Records Exposed Online, The Hacker News
  2. Microsoft Security Shocker As 250 Million Customer Records Exposed Online, Forbes
  3. OSINT Framework, osintframework.com
  5. Google Hacking Database, Exploit Database
  6. Open AWS S3 Bucket Exposes Private Data of Thousands of FedEx Customers, McAfee
Jatin Jain
Jatin Jain

With versatile experience in Information Security domain, he has successfully proven himself in Information Security Audit, Web Application Audit, Vulnerability Assessment, Penetration Testing/ Ethical Hacking and also acted as corporate trainer. Have served different government and private organization and provided best security services. Also he has been awarded from world's best organization like Face book, Apple, etc for providing best security support to them. He included his name in worldwide recognized various hall of fame as well as written article for famous PenTest, Hackin9 Magazine.