General security

Defending yourself from Google hackers

Hari Krishnan
May 31, 2012 by
Hari Krishnan

Before looking how we can prevent ourselves from Google hackers, let's see what Google hacking is.

Google Hacking:

Google hacking is a hacking technique that uses Google Search and other Google applications to find security holes in the configuration and computer code that websites use - Wikipedia

Google is a very powerful web search engine and is capable of doing many things which are very useful for a hacker. Using simple Google dorks, people are able to hack a website and many web developers are not or unable to protect themselves or their customers data from such attacks. For example, using Google dorks, the attacker can extract various information like the database configuration details, username, passwords, directory listings, error messages, etc. For example,

intitle:index.of.config

These directories can give information about a web server's configuration. This is not meant to be public since it contains files with passwords depending on the level of security. It can also contain information on various ports, security permissions.

The major reason for such leaks of data is improper security policy related to data on the internet. There are few methods by which we can protect our web server.

The public server is always used for storing data which are mostly being accessed by the public and if you are really concerned of keeping the data private, then the easiest and the best way is to keep it away from the public server. Though such documents are kept isolated, it is easy to get access to such pages. All know the risk associated with directory listings, which can allow user to see most of the files stored inside the directory, the sub directories, etc. Sometimes even the .htaccess file is being listed which actually is used to protects the directory contents from unauthorized access but a simple misconfiguration allows this file to be listed and also read. Since many have the habit of uploading important data on their servers to enable access from anywhere and they are indexed by the web search crawlers. One of the simple rules is that Web site administrators can create a robots.txt file that specifies particular locations, so that the search engine should not explore and store in its cache. To protect yourself,use robots.txt file to avoid indexing of such documents or folders. E.g. User-agent: *Disallow: /documents

Also to block individual pages or if you don't want the page to be indexed by any search engine, we can use something like meta tag "meta name='sipder_name' content='NOarchive'

Robots.txt Examples:

The following allows all robots to visit all files.


User-agent: *

Disallow:

This entry will keep all robots out of all directories.


User-agent: *

Disallow: /

We can specify particular directories that we don't want. The following example, will keep all robots out of the /infosec/ directory, as well as any subdirectories.


User-agent: *

Disallow: /infosec/

By not including the trailing /, we can stop the spiders from crawling files as well.

The following example will stop Google's robots (googlebot) from crawling anything on our site, but allows all other robots access to the whole site.


User-agent: googlebot

Disallow: /

The following meta tag will prevent all robots from scanning any links on the site.


<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

We can also deny or allow certain spiders using this tag.

Example: <META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">

For more information, you can visit : http://www.robotstxt.org/wc/exclusion.html#meta.

Google Dork for checking the .htaccess file is intitle:index of ".htaccess" would list the websites with the file .htaccess in the directory listing.

Directory listing should be disables unless required. The directory listing also happens when the index file defined by the server configuration is missing. On apache servers, we can disable the directory listings by using a dash or minus sign before the word Indexes in the httpd.config file.

Check your own site:

This article is to show how you can test your own website to get an idea about the potential security loopholes and prevent it by using both manual and automated testing. Many web developers don't have knowledge about how to hack or other related technicalities like a pen tester does. In this topic we will cover how to prevent your website from Google hacking. I will show how you can see your own site in the Google perspective.

Starting with manual method, the most common and simple Google dork is site keyword. The keyword site can be used if you want to narrow down your search results with a particular domain or server. For example site:infosecinstitute.com can list all the Google cached pages from the domain infosecinstitute.com.

Now you can click and open all the links listed and check if the data shown is supposed to be public or not, but this seems to be time consuming if the query results has more than hundred or thousand links. So in this scenario, we can go for some automated testing.

The tools which we are going to look are

  • Gooscan
  • Sitedigger
  • Wikto

Gooscan:

Gooscan is a Linux-based tool and can be used for bulk Google searches. This tool violates the Google TOS (Terms of Service) since it doesn't use the Google API. And if you're using a tool which violates the Google TOS, then there are chances of the getting few IP address blocked.

Gooscan options:

There are list of options available in this tool for getting various results. There are two required argument which has to be passed for performing the scan and also you have other optional arguments.

The Required arguments are

  • -t target : This is used to scan a target. A target can be a host name or an IP address.
  • -q query | -I query_file : This argument is used to send the query to get a particular search result. The –q takes only single argument or in other words a single Google dork. For example:
    • -q intitle:index of ".htaccess"

The tool can also take multiple queries which will be read from the query file.

The optional arguments are

  • -o output_file : If you want to create a html output file, then you can use this argument. The output file will contain all the links which were fetched by the query used.
  • -p proxy:port : To use a html proxy server
  • -v : Verbose mode.
  • -s site : As said before, this can be used to get the results from the particular domain or target.

Using Gooscan:

Gooscan can be used to in two ways wither by sending a single query or by sending multiple queries. A simple example would be

Gooscan –q "hack" –t www.google.com –s infosecinstitue.com

To create an output file

Gooscan –q "hack" –t www.google.com –o infosec.html

Performing a multiple query based search using the Gooscan can cause problems. In order to avoid that, we can send small batches of queries rather than sending huge amount of files. To create a small data file, use the head command.

Head -5 data_files.gdork.gs > data_files/small_dorks.gs

Gooscan –t www.google.com –i data_files/small_dorks.gs –o multiplequeries.html

Once the output file has been created, click on the links which you find suspicious.

SiteDigger:

The first and most basic tool is sitedigger, written by Foundstone. Sitedigger is integrated with the Google hacking database and uses the Google API. Sitedigger only allows you to select a site to test and choose which Google hacking signatures to run against it or just you can select any category of dork and run the query which will return the links accordingly. Select any query and click the urls which are shown in the results.

Wikto:

Wikto is another tool which is used for Google hacking. It is a complete web assessment tool which means that you can use this tool for testing the server, applications which are running on the server. To perform Google hacking, we have an applet named Googler. This applet will search the certain file types in the Google index which are then imported and used as a backend. There is another applet which can be used in Wikto and its called GoogleHacks which import the GHDB and execute the queries from the GHDB automatically on any particular site

Google Hack Honeypot:

Google Hack Honeypot (GHH) is designed to provide reconnaissance against attackers that use search engines as a hacking tool. This implements the honeypot concept to provide additional security to your web. The best factor of this is that it allows us to monitor any attempts by malicious attackers to compromise your security. GHH also have a logging functionality which allows us to administer it and take actions accordingly.

You can download this from http://ghh.sourceforge.net/

Installation details can be found in http://ghh.sourceforge.net/gettingstarted.php

Conclusion:

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

It is very much essential to follow secure coding practices and implement security code reviews in this approach. For a better understanding, you can go through OWASP guide for secure coding practices. There is also an option to request for immediate removal of the content from Google's index. This can be achieved by sending a request to Google after registering through a Google account with the Google's automatic URL removal system after creating either the META tags or the 'robots.txt' file in the web server.

Hari Krishnan
Hari Krishnan

Hari Krishnan works as a security and bug researcher for a private firm, as well as InfoSec Institute. His interests largely encompass web application security issues. Hari is also an organizer for Defcon Chennai (http://www.defcontn.com).