Application security

Hacking WolframAlpha - The Anatomy

Israel Torres
September 3, 2018 by
Israel Torres

Preview

Sharing source code with peers is one thing; sharing secrets over a public medium is another. The all-seeing eye of Google has no mercy, and once the secret has been seen, indexed, and copied to clone sites, it is no longer a secret. Now combine the search power of Google with the computational power of WolframAlpha and the results are limitless! It's raining data from these saturated clouds, and you just need to hold out your hands for a taste.

Introduction

Wolfram|Alpha: Computational Knowledge Engine (WolframAlpha for short) is a fantastic (if not, simply THE BEST) tool for calculating arbitrary information using various forms of syntax. Interestingly, most users aren't aware of how to use this engine to its full potential. By default, the interface to do all these wonderful calculations is available via the web using a popular browser. For specific needs, there is also an API that users may use to execute queries and return results that then can further be processed programmatically. To use the API one must have a valid AppID... (read that again)... a VALID AppID. ;)

11 courses, 8+ hours of training

11 courses, 8+ hours of training

Learn cybersecurity from Ted Harrington, the #1 best-selling author of "Hackable: How to Do Application Security Right."

11 courses, 8+ hours of training

11 courses, 8+ hours of training

Learn cybersecurity from Ted Harrington, the #1 best-selling author of "Hackable: How to Do Application Security Right."

API

WolframAlpha requires users sign up for a development account to get an AppID by signing up for a Wolfram ID [1]so they may access their WolframAlpha API[2]. Using the free development account with an assigned AppID the user can access up to 2,000 non-commercial API calls per month - if more are needed, pricing plans are available.

Figure 1

Logic would dictate that due to this limitation developers would keep their AppIDs secret (read as: unavailable from the public eye); especially if their AppIDs were not of the free variety. However, in my research, I have found that this is certainly not the case. (Figure 2) Then the question comes as to why someone would want to go through a sign-up process to test something a few times or even a few hundred thousand times if they didn't have to? [Especially if Google brings it right to your desktop]

Figure 2

History Timeline

I first stumbled upon WolframAlpha's API while looking for ways to extract data to get better results from bash on my Mac OS X system. For a lot of my calculations I was using the commonly known 'bc - An arbitrary precision calculator language' but to write something with parameters it takes quite a bit of work to account for return values with large numbers. I had been using WolframAlpha's web interface for quite some time and noticed they prevented scraping of this data by using javascript/css trickery. I had explored this avenue for a while, but it also took time go through on a case by case basis for something I wanted to whip up quickly. In my research, I found mention of their API version 1 and quickly snapped up the documentation. In the documentation they gave sample information on how to access the API thusly:

http://api.wolframalpha.com/v1/query?appid=xxxxx&input=pi

Moving quickly, I used curl to mimic this procedure. It wasn't clear what AppID was but I was quite surprised when I saw the results come back as expected and I could easily parse out the data I needed. I also was REALLY surprised that the default documented appid=xxxxx literally accepted the value of xxxxx (five letter Xs) to be interpreted as a valid AppID and proceed with the query, other values such as 12345, VWXYZ, BEBAD, 1ABCD, were not. The lesson here was to always RTFM (Figure 3) - the answers are there (literally) ;)

Figure 3

After running a few successful tests at 10:22 AM on September 24, 2010, I hopped on to Twitter and posted this message to the world "MuhAhaha Wolfram I have you now!" as I had struggled far longer than I had intended with getting fantastic results (while working on other projects including a public crypto challenge). By 10:54 AM the same day, I quickly pieced together a bash script and posted it on pastebin; then hopped onto Twitter again and posted this message with the link: "WolframAlpha bash API Scraper / used to get timespan for crypto challenge 20101008 -> http://pastebin.com/E64Fq2fV".

Figure 4

The script (Figure 5) itself is quite basic as it just runs curl queries for a specific task to get the data int variables and then represent them. Looking back, I just chuckle at the crudeness. After the initial post, I would just use the script to keep track of the timespan; with no real idea if anyone else used it or even cared. I was satisfied with the result and moved onto other things... or so I thought.

Figure 5

Around 18 days later (October 12 2010), I had written queries for other things, but then noticed that my WolframAlpha queries were all failing. Unusual. So, I went on to debug – perhaps, I had formed the query wrong, or the API had changed. I saw no changes in the documentation and the queries were written just fine. During debugging, I noticed that all my returns were now "1 Invalid appid" - they patched their own bug! Splendid. I verified it a few more times, and then updated my bash script on pastebin with a new version https://pastebin.com/AbMC5DiW

Figure 6

And then tweeted specifically to WolframAlpha on Twitter: "well played @wolfram_alpha ... well played [ 1 Invalid appid ] ;)" - and that was that... or so I thought.

Figure 7

Jump forward 1 year 5 months and 8 days - March 20, 2012. I often reflect on things I've played with in the past and try to make them better with all of the things that I've learned with the passage of time. I was looking through my pastebins and noticed that my WolframAlpha script still lay unusable. I began to research if anything new had happened since my last playtime. I learned that API version 2 was announced January 20, 2011. This made me curious again, and I jumped back onto the updated documentation; a lot of it looked the same and backwards compatible with version 1. I tried the same test again, and indeed, it still failed as expected [ 1 Invalid appid ]. So, I hopped on Google and scouted around to see if anyone had found anything new using bash with WolframAlpha (why re-invent the wheel right?). Nothing of the sort came up, but I did notice when I started to search using this query (wolframalpha "&appid="):

http://www.google.com/search?q=wolframalpha+%22%26appid=%22

Figure 8

I had 165 results (easy enough to look over one at a time in a short amount of time - and so I did). I noticed immediately on the front page 3 AppIDs!

Figure 9

I went through the query results and found about 17 unique AppIDs!!!

Figure 10

I quickly noticed the AppID Format (which I never bothered looking up since about a year and a half prior) as follows:

AppID Format (XXXXXX-XXXXXXXXXX) 6 hex chars + hyphen + 10 hex chars.

AppID Validity Test

Naturally I was ecstatic at all the reaping of AppIDs, but I had no idea if these "keys" were actually valid or otherwise. The math quickly worked itself out in my head, and I looked back at my old scripts and created a new script (check-wa-appid.sh) to test the validity of these; and since I had a bunch of them I'd test them in batch! (Test all the things!)

Figure 11

I put all the AppIDs in a text file (appid-list.txt) and created an external loop to test each one using check-wa-appid.sh thusly:

for x in $(cat appid-list.txt); do ./check-wa-appid.sh $x; done

The resultant data would then have the following tab delimited log format:

Epoch-Timestamp {tab} AppID {tab} [VALID|INVALID|BLOCKED]

Zipping through the validation of the initial 17 AppIDs went quickly (Figure.12); I even added all Xs in one AppID for a sanity check - which I know should always come up invalid.

Figure 12

Now, by knowing which AppIDs were valid, they could be used until they became invalid or blocked (by the way, blocked is essentially invalid but it is a different error code). If either case the next one could be used in a more elaborate script. But then came the notion of what happens when all the AppIDs are all used up before the next reset of maximum queries?

AppID Generation (Potential)

With knowing the format of the AppID, there really isn't anything stopping anyone from randomly creating a million of these, sorting them, filtering out duplicates, and shuffling them to appear random in order... then attempting to check their validation finally adding them to the collection of valid AppIDs. Even getting 1 valid AppID out of a million generated ones is one more valid AppID that previously had (albeit that it is unique from the current collection)

At the time of this writing, it is not known what type of algorithm is used to generate the AppIDs or if they are just randomly generated and stored in a database lookup table that could eventually be stumbled upon using the described bruteforce; but it is possible that the first 6 derive or are derived from the last 10. Conceptually, think of it as possible of a "username-password" in hex (signed/hashed). If the two don't correlate upon check, then it may be automatically discarded as invalid. With more valid AppIDs and more time, it may be possible to eventually write a proper keygen for such a case.

Generating a million AppIDs takes under an hour on a modern system and validating them takes even longer (about 6 times longer). Interestingly, out of 1 million generated AppIDs only about 100K are unique; generating a true 1M unique IDs would take even longer! (See Figure 13 below)

Figure 13

For example, using multiple tests (3) across multiple machines with two (simultaneous) sessions each gives us 6 sessions total for a potential 6 million AppID generations. However, the reality of RNG bites us nicely with only yielding 614,850 AppIDs of only which 170,044 (~26%) are unique across all 6 sessions and only ~2.8% of the targeted 6 million. (tl;dr: generating 1M unique AppIDs is hard)

We'll run through a demonstration based off a test case involving the facets mentioned above.

Demonstration Requirements

Shell Scripts:

gen-wa-1M.sh - (Figure.14) - generate 1Million AppIDs, filter uniques and reshuffle for validation.

Figure 14

gen-wa-appid.sh - (Figure.15) - generate single AppID for validation (used by gen-wa-1M.sh)

Figure 15

check-wa-appid.sh - validate potential AppID (log and interpret return values)

Artifacts:

appid-list.txt - (Figure.16) - contains AppIDs for validation

log-wa-appid.txt - contains AppID validation [VALID, INVALID, BLOCKED]

Figure 16

test-01-102694

Bruteforce "AppID harvesting" basically flows like so:

  1. generate AppID using gen-wa-appid.sh into appid-list.txt
  2. validate AppID using check-wa-appid.sh logging to log-wa-appid.txt
  3. lather, rinse, repeat.

In test-01-102694 gen-wa-appid.sh generated 102,694 unique AppIDs appid-list.txt(1.8MB) under an hour, the validation itself using check-wa-appid.sh took 5.73 hours and created the log file log-wa-appid.txt (3.8MB). Just to compare 1M AppIDs is (18MB)!

While the process is running, you can either stare at the screen as you run tail -f on the resultant log file (Figure.17). Or you can search every once in a while for a VALID key (by filtering out the result INVALID) thusly:

AFILE='log-wa-appid.txt'; wc -l $AFILE; grep -v INVALID $AFILE

Figure 17

Unfortunately, during this specific test, no valid AppIDs were found. All that means is that there are infinite combinations to go; and with these scripts, you can pretty much set it and forget it. Naturally, if WolframAlpha blocks harvesting or honeypots (based on vectors such as IP) then harvesting AppIDs they may be coming up as invalid on purpose (or at least that's how I would roll it to further frustrate "the reaper").

Conclusion

Ideally, correct and secure methods involve developers making sure they are using variables imported and not defining them in the same code they are using them in. For example using labels such as $api_key, MY_APPID, and key instead of hardcoding them in their source code and pasting them on the public medium as "low hanging fruit". In short, keep your secrets secret.

Notes

All source code created and tested on:

Mac OS X 10.7.3 11D50b

Darwin Kernel Version 11.3.0

GNU bash, version 3.2.48(1)-release

Resources

[1] https://developer.wolframalpha.com/portal/apisignup.html>

[2] http://products.wolframalpha.com/api/

Israel Torres
Israel Torres

Israel Torres is a security researcher for InfoSec Institute and resides in Irvine, California. He spends his free clock cycles writing/coding/hacking freelance, making and breaking ones and zeros, and staying in the digital shadows. He loves PKI/cryptography and prefers poking at Mac OS X. Professionally, he serves as a Systems Administrator in Higher Education.