General security

The art of searching for open source intelligence

Irfan Shakeel
May 11, 2016 by
Irfan Shakeel

The Internet is a big ocean, and it carries loads of information you might be interested in or looking for, but where and how to find that information? Thanks to search engines like Google that make the searches using a query possible, but is it enough? If you think so, then you need to rethink about it, we have world wide web, deep web, and dark web, are you getting information from every corner of the Internet? Well, it depends on how you are searching particular information; this is why it calls "art of searching."

The art of searching applies to many aspects, but here it is being used particularly for open source intelligence. The objective of this series is to discuss the open source intelligence (OSINT) concept, tools, methodology and processes, the art of searching is the part of this series, in this series, we will look at the Internet using different prospect, and we will utilize a different approach.

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

Internet research experts is a myth:

People claim that they are Internet research experts, and they can find anything using their techniques, but nobody knows about the actual information available out there. Internet changes constantly, it changes within a fraction of seconds, while you are reading this there have been significant changes occurred on the Internet. The second reason can be drawn from the size of the Internet; it is huge, and you can't claim that you have scanned every corner and provided the correct information with the accuracy.

You always find the information you are looking for on the Internet, but it changes so the accuracy can't be measured. It makes open source intelligence a continuous process; World Wide Web is not the Internet. However, it is part of the Internet. The Internet is the network of different networks; it is an umbrella of the connected devices (computers, printers, routers, switches, servers, etc.) Think of the shodan; it does not search web pages, whatever the devices. Think about the quick port scan against any technology infrastructure; you scan for the devices, not the web pages.

So the important points to conclude here:

  • Open source intelligence is a continuous process and organizations should adopt it like this.
  • The World Wide Web is not the entire Internet; you should search the Internet for the information
  • War dialing is not dead, it evolved and changed its shape (think of the random port scanning)

New school open source intelligence

Intelligence gathering is not a new topic, people of every era used to gather intelligence using their own techniques, now the techniques have been evolved, and we have 'new' way of searching the information.

The word 'open' refers the publicly available sources; it has nothing to do with the open-source software. "Open source intelligence (OSINT) is the process of collecting the intelligence from publicly available sources, paid or free, print or electronic." The scope of OSINT is not limited to cyber security only, but business and corporate intelligence, military intelligence and another field where information matters.

Businesses hire information brokers and private investigators to gather information of their competitor, well it connected with the competitive intelligence; a corporate term refers the process to gather information about the competitors. The process utilizes the same techniques of gathering information:

  • Web-based communities: social media websites, forums/blogs, Wikis, video and images sharing websites, news portals and other user generated mediums
  • Dark web
  • Newspaper, magazines, radio, television and computer-based information
  • Govt reports, press conferences, marketing surveys, speeches, press releases, official statements (Tweets, Facebook posts, etc.)
  • Academic research papers, theses, and dissertations, and interviews

Military and security agencies use open source intelligence to counter the terrorism and to gather the information of their opponents, content analysis of Middle East's regional newspapers are mostly effective to predict the stability/instability of the region. It is an effective technique of gathering the cultural and demographic intelligence from the area not covered by the military intelligence spy. Commercial imagery sources and digital maps provide the up to date information to military commander regarding the airfield, roads, bridges, buildings and Govt. offices.

Offensive VS defensive OSINT

As discussed OSINT has a broader scope, but this series primarily focuses on cyber attacks. Offensive OSINT is when you study the attack before it happens while defensive is learning about the attacks against a company. The OSINT gives opportunities to both the defender and attacker; you can learn the weakness of a company and fix it while at the same time the weakness could be exploited.

The OSINT process

In the first step, you need to identify the sources from where you can get the required data, there are many techniques of acquiring the data, but the identification process is the most important because this is where you decide the result of the overall activity. Every single step of the process will be discussed throughout this series.

Harvesting divided into two types:

  • Active harvesting – Target can learn about the harvesting
  • Passive harvesting – It makes no connection with the target, hence targets never know about it

What information to look

  • Technology infrastructure
    • IP
    • Hostname
    • Services
    • Networks
    • Software / hardware versions and OS information
    • Geo-location
    • Network diagram
  • Database
  • Documents, papers, presentations, spreadsheets and configuration files
  • Metadata
  • Email and employee search (name and other personal information)

The information above can lead to the following cyber attacks:

  • Phishing
  • Brute force (password)
  • Denial of service
  • Social engineering

The search engine seems a rich source to search particular information; however, it is not in many cases. You can't find the classified information by just Googling, yes, using Robots.txt you can make stop the crawler to crawl and index a particular page. The search engine can only provide the pages that it indexes. Apart from search engines, utilize online libraries and private forums/blogs.

Students study Boolean logic while studying digital electronics or other related courses; this logic also applies to a search engine, operators, and the relationship are: AND, OR and NOT.

Quotation mark " "

It shows the exact order and phrase

For example: "African Americans."

-word to exclude it "African Americans" diet -kid -girl -"marriages."

AND, a default operator If you write 'infosec training' or 'infosec and training' it makes no difference. Use AND with other operators. Otherwise, it makes no difference; it shows the result where every type keyword occur

OR allow more than one term

It does not follow a specific order. However, it requires at least one of the term to appear in the result

"African Americans" OR blacks

Meta search engines

Yippy clustering search engine

Yippy is formerly known as a cluster, and it is the best amongst the meta search tool available so far. It is unique because it employs its own clustering engine, software that organizes unstructured information into hierarchical folders. Clusty offers clustered results of Web, news, and certain specialty searches. The Clusty default is to search the web using Live Search, Gigablast, Ask, Wikipedia, and the Open Directory. Let see the logical categories that it creates:

It creates the cluster without the correct spelling or not

It allows the user to look at the sources of the search results and types of sites (e.g., .com, .qov). It also supports all the advanced search queries, and operators discussed before. It automatically groups huge amount of information logically at the same time it also shares the new areas of subject development. It also allows the user to create a custom tab based on the need of the search result; you can select the news sources, directories, and particular domain extensions.

Keep in mind that no search engine is the best and use more than one search engine during your research, utilize the specialized search engine for specific cases.

Other Metasearch engines:

Geographically limited search engines:

Naver Korea

Biglobe Japanese Switzerland

Walla Israel Slovenia

Maktoob Arab

Rambler Russia

Yandex Russia, Turkey, Ukraine, Belarus, Kazakhstan

Ansearch Australia

Search documents and files

There are numerous people search engines are also there, but we will be discussing them in detail while doing the analysis and we will see how an attacker get the valuable information to launch their phishing and social engineering attacks.

Accessing the darknet for the information is crucial now a day, people share valuable information behind the TOR project, but during the open source intelligence process, you need to dig around everything and anything. You can access the darknet using the TOR, but you must require a smart darknet search engine like to look for the relevant information in the darknet.

Advance search operator – Google

Operator Description Web Images Groups News

Intitle Search page title yes yes yes Yes

Allintitle Search page title Yes Yes Yes Yes

Inurl Search URL yes yes no Not really

Allinurl Search URL Yes yes Yes Not really

Site Search specific site Yes Yes No Not really

Allintext Search text of page only Yes Yes Yes Yes

Filetype Search file Yes Yes Not Not really

insubject Group subject search Like intitle Like intitle Yes Like intitle

Search engine is so powerful, and sometimes it shows the information that should not be available for public, a basic search operator:

Inurl: admin URL: orders

It is too dangerous for a company, and they should think about it.

Intranet or private network of companies is open, although they should be protected, this is what we call vulnerability? A quick Google search "Welcome to Intranet" reveals many addresses of the intranet that could be exploited. There are hundreds of examples where hackers used a search engine to find the important information about a company. Another example shows how someone can get the username and password of NOD32 antivirus by just Googling "intext:"eav" filetype:txt"

This is certainly not the end of the art of searching, not even the end of operators. Google hacking database provides lists of operators (dorks) to use for different purposes. In this article, we have discussed the OSINT from the search point of view, but there are so many things to discuss including but not limited to metadata searching, people searching, technology infrastructure and how it linked with an attack. Well, the next article of this series will focus on these objectives.

Irfan Shakeel
Irfan Shakeel

Irfan Shakeel is the founder & CEO of An engineer, penetration tester and a security researcher. He specializes in Network, VoIP Penetration testing and digital forensics. He is the author of the book title “Hacking from Scratch”. He loves to provide training and consultancy services, and working as an independent security researcher.