Hacking

Google Hacking: The hidden face of Google

Soufiane Tahiri
April 30, 2013 by
Soufiane Tahiri

No need for an introduction, Google is quite possibly the more powerful search engine used today, even used sometimes to check our connectivity; except that the power of the single search bar on the top of Google has become a source of concern for many, and if not they should and we will see why!

In addition to be one of the most powerful information databases, Google can be used to find much more than what we even should find. Google can find things like sensitive files, web vulnerabilities; it allows the identification of operating systems and can even be used to find passwords, databases and even whole mailbox content... Google understands which operators to target in order to get precisely what we are seeking; I'll try to detail the most important of them.

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

You have to know that queries on Google are not case sensitive, thus there is no difference between lower or upper cases or even a combination of both: Security, SECURITY and SeCuriTY will return exactly the same result, but this rule has an exception when using logical operators.

Logical operators and symbols

Google can understand three logical operators: AND, NOT and OR, so Google recognizes the "OR" as the operator and "Or", "oR" or "or" as search elements or keywords.

  • The AND operator is used to include more than one keyword in a single research query and can be replaced by a single space " " even if the results differ slightly between both, as you can see by looking for example for "reverse AND engineering AND tutorials "and" reverse engineering tutorials"
  • The NOT operator is extremely useful and can be used to eliminate some keywords from the result of a query, this operator is equivalent to the sign "-" (less) used within a keyword, to figure out the meaning try searching for "email service" and "email service -marketing" (please note that there is no space between "-" and "marketing")
  • The OR operator is used to include in the result of a query a keyword or another keyword but not both, and is equivalent to the use of "|" , eg "reverse OR engineering" means to Google exactly "reverse|engineering" (try it then try "reverse engineering" to see the difference)
  • In addition to these operators, Google distinguishes between some symbols like ~, +, *,""

    • Using the tilde "~"
    • This little character is used to include in the result of a query the desired keyword, its synonyms and words similar to it, for example, if you search "it security ~tools" the result will be more consistent the result of "it security tools", since Google will consider also terms such as "Software" and show them among the returned result.

      • Using the sign plus "+"
      • Google tends to ignore punctuations and removes little words like "we", "the", 'to", and "of"… Using the sign plus before a word tells Google to include it in the search query, so this way and for instance, the result of this query "security is never complete" will definitely differs from this one "security +is never complete"

        • Use of quotation marks "" (or exact phrase search)
        • If you are sure that you have entered a word as it should be written but Google continues to suggest spelling corrections, or if you want to search for a phrase, quote or an error message ... putting your query between quotes marks provides you with a more relevant result, example try searching "Debugging DLLs" with and without quotes.

          • Using the asterisk "*" also called wildcard or Joker
          • The use of the wildcard helps a lot when you want to search something but with one or more missing words (generally used with exact phrase search). For example if you want to find the title of the movie "Get the Gringo" but you are remembering only "Get The" you can try "Get The * movie", try also "the art of *" hacking book"

            Now that we know a little more about how the Google search bar interprets what we type in, let's see some more interesting operators and keywords, especially when talking about security!

            Define:word

            This query returns the definition of the given word from the most reliable sources (websites). Define:Security

            Filetype:file_extension

            Using Filetype you can find files with specific extensions; this means that you restrict your search to a specific file type. Note that there is no space between filetype: and the following word; eg. We can search for databases backups using "backup filetype:sql"

            [caption id="" align="alignnone" width="607"]Click to Enlarge Click to Enlarge[/caption]

            Ext :file_extension

            Regarding this operator, we can say that it has more or less the same role as the one cited above (filetype), except that the use of "ext" to seek uncommon extensions (like dmp, ks, key ...) sends a more deep and accurate result.

            Intitle:keyword(s)

            This keyword allows you to search for a single word or a whole phrase present in the title of web pages and it is a commonly used keyword / operator to find directory listings. For example: intitle:index of "Last modified"

            [caption id="" align="alignnone" width="608"]Click to Enlarge Click to Enlarge[/caption]

            You can also use allintitle:keyword1 keyword2 keyword3 ... to find results with all these different elements / keywords in web page titles.

            Inurl :keyword

            As Intitle and allintitle, Inurl and Allinurl can be used find one or more keywords present in the web pages URLs, this operator is widely used and can provide a lot of sensitive information such as in the case of the use of this query inurl:cgi-bin/etc/

            Intext :keyword / Allintext :keyword1 keyword2 keyword3 …

            Allintext and intext can search for keywords present in the body of web pages or documents and can be very helpful to find some interesting things like: allintext:"Control Panel" "login"

            Site:domain

            The use of the keyword site restricts the result to a particular website; specifying the domain, Google filters the result by limiting it to the chosen domain or website. Site:com, site:fr , site:gov … or you can limit your query to a specific website "reverse engineering site:infosecinstitute.com"

            Cache :www.site.com

            Once a website is indexed by Google, there are a lot of chances that it is kept in the Google cache, so we can get some old information even after website's updates or in some cases even if the website is not available anymore:

            Info :www .site.com

            This query returns links to pages containing information about the website or web page in question. For example info:infosecinstitute.com

            Google is not only good at finding stuff, it can even do math!

            Until now, there's nothing bad, but we will see that by combining different operator's together, different keywords and knowing exactly what we want to find ... the results usually exceed our expectations and especially when we are looking for vulnerabilities or some "private" data. This is conventionally called Google Hacking.

            A according to the Wikipedia definition, Google hacking involves using advanced operators in the Google search engine to locate specific strings of text within search results. Some of the more popular examples are finding specific versions of vulnerable web applications. The following search query would locate all web pages that have that particular text contained within them. It is normal for default installations of applications to include their running version in every page they serve, e.g., "Powered by XOOPS 2.2.3 Final".

            Finding usernames

            We will use Google to find files containing user names which is useful for making dictionaries for example. allintext:username filetype:log . Here is a part of a file with more than 2209 rows:

            Error Retrieving RSS File:

            username:picklepeople


            user_id:7321

            rss:http://a*******l.org/feed

            XML Processing Error: 4Empty document

            username:inferno

            user_id:240

            rss:http://r*****o.l******n.com/rss/

            XML Processing Error: 9Invalid character

            username:rishey

            user_id:338

            rss:http://feeds.feedburner.com/____dio.xml

            And using the same query I found an SQL injection log attack:
            2012-08-15 03:48:50 213.xxx.xx.229 cid http://www.h*****.at/index.php?option=com_yelp&controller=showdetail&task=showdetail&cid=-1+UNION+ALL+SELECT+1,2,3,concat(0×26,0×26,0×26,0×25,0×25,0×25,username,0x3a,password,0×25,0×25,0×25,0×26,0×26,0×26),5,6,7,8,9,10,11,12,13,14,15,16,17+FROM+jos_users– 2012-08-21 04:48:01 61.xxx.xxx.72 id http://www.h*****.at/index.php?option=com_recipes&Itemid=S@BUN&func=detail&id=-1/**/union/**/select/**/0,1,concat(username,0x3a,password),username,0x3a,5,6,7,8,9,10,11,12,0x3a,0x3a,0x3a,username,username,0x3a,0x3a,0x3a,21,0x3a/**/from/**/mos_users/*

            
            

            Collecting email addresses

            allintext:email OR mail +*gmail.com filetype:txt, with this query I was really surprised since the first result was a text file (without talking about the very interesting host found) containing 35,572 email addresses and passwords


            Finding sensitive files and directories

            intitle:"index of" inurl:ftp (pub OR incoming)

            intitle:"Index of" phpMyAdmin , intitle:index of inurl:config* intext:last modified

            intitle:"index of" AND password OR passwd OR pwd intext:"last modified"

            All these queries return interesting results; we just need to know what we want to find and how to tell Google to look for it. Example of a result returned by one of these queries:

            define("MYSQL_HOST", "mysql106.db.******.***.jp");

            define("MYSQL_ID" , "na***o-hoso");

            define("MYSQL_PASS", "mJtp2XfG");

            define("DBNAME", "na***o-hoso");

            Finding error messages (eg finding some websites vulnerable to SQL Injection)

            allintext:"Warning: mysql_connect(): Access denied for user: '*@*" "on line" -help -forum -tuto* inurl:"id=" & intext:"Warning: mysql_num_rows()" -help –forum

            We can almost find everything we want using Google if we are able enough to sharpen our query. I enjoyed making some queries using different combinations of keywords within different operators, see some of results below:

            Full access to mailboxes


            IPN Logs (Instant Payement Notification)

            [07/30/2012 8:16 PM] - SUCCESS!

            IPN POST Vars from Paypal:

            mc_gross=30.16, protection_eligibility=Eligible, address_status=confirmed, payer_id=624*****REN, tax=0.00, <strong>address_street=11 Ta*****x Cr*****nt,</strong>

            <strong>Napsbury Park, London Colney</strong>,, payment_date=12:16:49 Jul 30, 2012 PDT, payment_status=Completed, charset=windows-1252, address_zip=AL2 1UT, first_name=francis, mc_fee=1.23, address_country_code=GB, address_name=francis dixon, notify_version=3.5, custom=, payer_status=unverified<strong>, business=aka******ns@gmail.com</strong>, <strong>address_country=United Kingdom</strong>, <strong>address_city=St Albans</strong>,, quantity=1, verify_sign=A5RHA3OA3pOT5X1MMHRoOSFAM28uAiONl5B7uyghy9xnGSAd9ccEWHE0, <strong>payer_email=f****s_m_d****n@hotmail.com</strong>, memo=<strong>11 Tamarix Crescent is my home and card address, but I'd like the goods to be delivered to work, hence the delivery address is for Ashlyns Hall, Chesham Road, Berkhamsted, Herts, HP4 2ST. Thanks</strong>, txn_id=65W*******6337, payment_type=instant, last_name=dixon, address_state=Hertfordshire, <strong>receiver_email=ak*******ns@gmail.com</strong>, payment_fee=, receiver_id=223*****GE, txn_type=web_accept, <strong>item_name=www.tg*****en.co.uk</strong>, mc_currency=GBP, item_number=284, residence_country=GB, handling_amount=0.00, <strong>transaction_subject=www.tl****en.co.uk</strong>, payment_gross=, shipping=9.77, ipn_track_id=adca*******6f56,

            IPN Response from Paypal Server:

            HTTP/1.1 200 OK

            Date: Mon, 30 Jul 2012 19:16:58 GMT

            Server: Apache

            X-Frame-Options: SAMEORIGIN

            Set-Cookie: cwrClyrK4LoCV1fydGbAxiNL6iG=hK2VxLRsSDcIYah2BmIWM47I715hlkzTGZn77XqmWH_hTHKBD4Dfb_YB7QJlb4i-XN1tcAlHsYZ7SJG0nwdzGZ9eCXsD8fnHSGUfuv2VDtDWp5doDsPpyYHv0QQK0YpgrIYVxG%7cEm0x-LnDlXeHNV0UQExcUhT9rGdmvXVCyQ4nJjpQbWY-aukw2RIxc_jHE0Le2yfB79mo2m%7cSbl_lt9TSLMGNvfjbyQmu4B3eh7tFun2OFsf-SGv2lectPoMVxcIrwMNF7QDvzNc8v_ON0%7c1343675818; domain=.paypal.com; path=/; HttpOnly

            Set-Cookie: cookie_check=yes; expires=Thu, 28-Jul-2022 19:16:58 GMT; domain=.paypal.com; path=/; HttpOnly

            Set-Cookie: navcmd=_notify-validate; domain=.paypal.com; path=/; HttpOnly

            Set-Cookie: navlns=0.0; expires=Sun, 25-Jul-2032 19:16:58 GMT; domain=.paypal.com; path=/; HttpOnly

            Vary: Accept-Encoding

            Connection: close

            Transfer-Encoding: chunked

            Content-Type: text/html; charset=UTF-8

            Full information about some website's customers with their names, addresses, postal codes, cities, phones, mobiles and emails addresses

            [caption id="" align="alignnone" width="611"]Click to Enlarge Click to Enlarge[/caption]

            You can see that things are getting more serious. As you probably guessed, no one escapes the indexation's spiders and crawlers of Google!

            Here is an Excel file containing names, country codes, marks and bachelor courses of more than 8014 students:

            [caption id="" align="alignnone" width="610"]Click to Enlarge Click to Enlarge[/caption]

            Here are full dumps of databases of tens if not hundreds of some website containing in some cases clear usernames and passwords:

            [caption id="" align="alignnone" width="612"]Click to Enlarge Click to Enlarge[/caption]

            I'm going to stop at this point, no need for more demonstration; Google is certainly our common friend, including malicious people with malicious intents, before putting a file, a directory or any other information that's not supposed to be publicl, you should remember checking the state of access to your sensitive files and folders.

            The use of an empty index.hml file within a directory can be very useful to remove simple directory listing, think also about applying the correct CHMOD to your sensitive directories and limit or remove access to your uploaded backups.

            The use of the file Robots.txt can also save the privacy of your data; you can prevent Google or any other search engine from indexing your website, files or directories by correctly filling a Robots.txt file.

            The following tips may help:

            • Preventing Google from indexing your site:

              User-agent: Googlebot

              Disallow: /

            • Preventing every search engine from indexing your site:

              User-agent: *

              Disallow: /

            • You can also prohibit Google from indexing a specific file type:

              User-agent: Googlebot

              Disallow: /*.sql$

            • To prohibit a directory and all its content from being indexed by Google:

              User-agent: Googlebot

              Disallow: /directoryName/

            • To prohibit a specific page from being indexed by Google:

              User-agent: Googlebot

              Disallow: /confidential.html

            These tips could be used along with some HTML tags (Meta tags) which you can place between <head> and </head>

            <meta name="robots" content="noindex, nofollow">

            And you can also prevent caching your website by Google by using this:

            <meta name="Googlebot" content="noarchive">

            FREE role-guided training plans

            FREE role-guided training plans

            Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

            This non-exhaustive list of solutions may possibly help you to protect yourself against search engines and especially against Google, but you must be very careful when handling the way Googlebot (or any other search engine crawler) can see your website to not see your pages disappearing completely from their search engine results!

            Soufiane Tahiri
            Soufiane Tahiri

            Soufiane Tahiri is is an InfoSec Institute contributor and computer security researcher, specializing in reverse code engineering and software security. He is also founder of www.itsecurity.ma and practiced reversing for more then 8 years. Dynamic and very involved, Soufiane is ready to catch any serious opportunity to be part of a workgroup.

            Contact Soufiane in whatever way works for you:

            Email: soufianetahiri@gmail.com

            Twitter: https://twitter.com/i7s3curi7y

            LinkedIn: http://ma.linkedin.com/in/soufianetahiri

            Website: http://www.itsecurity.ma