Hacking

Facebook Hacker Cup 2013 Qualification Round: Beautiful strings

Dejan Lukan
February 1, 2013 by
Dejan Lukan

Here's my follow-on to the Facebook Hacker Cup. Yesterday, I told you about my take on the Balanced Smileys puzzle. Today, we'll discuss the easiest challenge, Beautiful Strings. The actual problem is described below:

[plain]When John was a little kid he didn't have much to do. There was no internet, no Facebook, and no programs. So he did the only thing he could... he evaluated the beauty of strings in a quest to discover the most beautiful string in the world.

What should you learn next?

What should you learn next?

From SOC Analyst to Secure Coder to Security Manager — our team of experts has 12 free training plans to help you hit your goals. Get your free copy now.

Given a string s, little Johnny defined the beauty of the string as the sum of the beauty of the letters in it.

The beauty of each letter is an integer between 1 and 26, inclusive, and no two letters have the same beauty. Johnny doesn't care about whether letters are uppercase or lowercase, so that doesn't affect the beauty of a letter. (Uppercase 'F' is exactly as beautiful as lowercase 'f', for example.)

You're a student writing a report on the youth of this famous hacker. You found the string that Johnny considered most beautiful. What is the maximum possible beauty of this string?

Input
The input file consists of a single integer m followed by m lines.

Output
Your output should consist of, for each test case, a line containing the string "Case #x: y" where x is the case number (with 1 being the first case in the input file, 2 being the second, etc...) and y is the maximum beauty for that test case.

Constraints

5 ≤ m ≤ 50

2 ≤ length of s ≤ 500

[/plain]

The example input is given here:

[plain]

5

ABbCcc

Good luck in the Facebook Hacker Cup this year!

Ignore punctuation, please :)

Sometimes test cases are hard to make up.

So I just go consult Professor Dalves

[/plain]

And the example output is presented below:

[plain]

Case #1: 152

Case #2: 754

Case #3: 491

Case #4: 729

Case #5: 646

[/plain]

We can see that we need to calculate the beauty of the strings, where we need to sum the beauty of all the letters together to get the beauty of the string. The first line in the input file contains the number of lines followed by that number, so we know how many test strings we need to solve. The constraints are that there are at least 5 strings and a maximum of 50 strings, so the input files are not that big. The other constraint is also that the length of each string is from 2 to 500 characters long, which we can use if we're writing in a program like C/C++ where such things matter a lot. Because otherwise, if we're not careful, that could result in a buffer overflow and in an exploitation of the program. Ok, so we know what the problem is: we need to read the first line of the file to know how many strings are following this line, read each string written in every line, calculate the beauty of each string and write it to the stdout.

Solving the Challenge

The first thing that we need to do is figure out the beauty of each letter. If we look at the "

ABbCcc" input string, we can see that it has a beauty of 152 points, but how? If the beauty of the letters is in normal order, the beauty of the string would be: 1+2+2+3+3+3, where the letter 'a' has a beauty of 1, letter 'b' has a beauty of 2 and letter 'c' has a beauty of 3. But that only gives us the beauty of 14 for the whole string, which is clearly incorrect. The next thing we could check is whether the letters are in reverse, so that would mean letter 'a' has a beauty 26, letter 'b' has a beauty 25 and letter 'c' has a beauty 24. The beauty of the whole string should then be: 1*26+2*25+3*24=148, which is still not correct, because the actual value should be 152. What's going on? Clearly the letters are not in any particular order and we need to figure out the beauty of each letter independently of how they appear in the alphabet. We can see that the 148 is pretty close to 152, so if we rearrange the beauty of the letters 'a', 'b', and 'c', we might get the right value. If the beauty of the letter 'c' is 26, the beauty of the letter 'b' is 25 and the beauty of the letter 'a' is 24, we get exactly 152: 1*24+2*25+3*26=152. So we've just figured out the beauty of the first three letters, which can't be calculated any other way.

But we still have to figure out the beauty of the rest of the alphabet. The best way to start is to analyze the next shortest string, because it has the lowest possible number of solutions that we need to try to get the beauty of other letters. That string is the "Ignore punctuation, please :)". We can immediately see that there are some characters inside the string that do not belong to the alphabet; those characters should be discarded, and we can assign the beauty of 0 to them if we really want to; the space character is also not in the alphabet, so we should discard it too.

We really don't want to be doing this by hand, because there's a lot of work and why would we, isn't it more fun to actually write a program to do this for us? The first thing we can do is write a simple program that gives each letter in the alphabet certain beauty and then opening the input.txt input file and reading each line by line calculating the beauty of each string in the line. A simple program written in Python can be seen in the output below:

[python]
#!/usr/bin/python

import sys

# the number of input arguments should be 1 and should be an existing file

if len(sys.argv) != 2:

print "You didn't supply the right number of input arguments."

exit(1)

try:

with open(sys.argv[1]) as f: pass

except IOError:

print "The file you specified doesn't exist."

exit(1)

letters = {

'a' : 24,

'b' : 25,

'c' : 26,

'd' : 1,

'e' : 2,

'f' : 3,

'g' : 4,

'h' : 5,

'i' : 6,

'j' : 7,

'k' : 8,

'l' : 9,

'm' : 10,

'n' : 11,

'o' : 12,

'p' : 13,

'q' : 14,

'r' : 15,

's' : 16,

't' : 17,

'u' : 18,

'v' : 19,

'w' : 20,

'x' : 21,

'y' : 22,

'z' : 23,

}

# read all the lines and discard the first

f = open(sys.argv[1])

lines = f.readlines()[1:]

# foreach line in file

for line in lines:

# lowecase each line

line = line.lower()

# the initial beauty of the string is 0
beauty = 0

# foreach character in line

for char in line:

if char in letters.keys():

beauty += letters[char]

print beauty

[/python]

The program contains comments, so a deep explanation of it is not really needed. First, we're checking if we're passing one input argument to the python script, which should be an existing file; if that is not the case we're terminating the program and displaying the error string. Then we're creating a dictionary of keys, where each key has a certain beauty. We already know the beauty of the letters a, b and c, which are inputted in the dictionary. But we don't actually know the beauty of other letters, which is why we're just assigning them the ascending beauty of 1 to 23 from the letter "d" onwards. If we save the program and run it now, we can see that the calculated beauty of the first string is correct, which should be the case, since we already verified it by hand. Note that we haven't yet defined the right output format, which isn't really needed at this point; we first need to figure out the beauty of each letter.

[plain]

# python temp.py input.txt

152

487

289

417

392

[/plain]

Then we can start analyzing the "Ignore punctuation, please :)" string, which has the following letters in it:

[plain]

a 2x

c 1x

e 3x

g 1x

i 2x

l 1x

n 3x

o 2x

p 2x

r 1x

s 1x

t 2x

u 2x

[/plain]

We already know the value of the letters "a" and "c", so we only need to figure out the value of other 11 letters. Then the actual formula is the following:

1*(c+g+l+r+s) + 2*(a+i+o+p+t+u) + 3*(e+n) = 491

But since we know the values of a and c, we can simplify the formula to the following:

1*(g+l+r+s) + 2*(i+o+p+t+u) + 3*(e+n) = 417

But that doesn't simplify the problem much does it? The best thing to do is write a program like the one shown below and manually adjust the beauty of the letters:

[python]

letters = {

'e' : 1,

'g' : 2,

'i' : 3,

'l' : 4,

'n' : 5,

'o' : 6,

'p' : 7,

'r' : 8,

's' : 9,

't' : 10,

'u' : 11,

}

s = "Ignore punctuation, please :)"
s = s.lower()

def calc_beauty(s):

beauty = 0

for char in s:

if char in letters.keys():

beauty += letters[char]

return beauty

print calc_beauty(s)
[/python]

If we run the program now, we would get the beauty 115, which is far from the value 417, so we need to adjust the values appropriately. Let's give the letters which are multiplied by 3 the biggest value that's left, the letters that are multiplied by 2 the second biggest value, etc. Soon after the testing, we can figure out that the numbers just don't add up. This is because if we use the biggest numbers for the letters that appear most often in the remaining letters, we still don't get a big enough number. What's going on? Maybe our logic is incorrect or the actual beauty of the letters depends based on the string in question and is not equal for the letters across whole input file. When we figure that out, it's actually pretty easy to understand what's going on: from all the letters in the string, the ones taking that have the most repetitive letters also have the biggest beauty.

The actual program that can be used to solve the challenge is as follows:

[python]
#!/usr/bin/python

import sys

import collections

import string

for line in open(sys.argv[1]).readlines()[1:]:

line = line.lower()

l = collections.Counter(line)

add = 26

beauty = 0

for key,value in l.most_common():

if key in string.letters[:26]:

beauty += value*add

add -= 1

print beauty

[/python]

If we run the program, we get exactly what we want:

[plain]

# python temp.py input.txt

152

754

491

729

646

[/plain]

The only thing that needs to be done is to print the results in the right order, but that's pretty easy, so we won't be describing it here.

Conclusion

FREE role-guided training plans

FREE role-guided training plans

Get 12 cybersecurity training plans — one for each of the most common roles requested by employers.

We've seen that the first challenge of the Facebook Hacker's Cup 2013 isn't very hard, we just need to take the time to actually figure out what the challenge is all about; writing the actual program to solve it is quite easy, it's the logic behind the challenge that's the hard part to solve. By the end of the day, we can say that it's like the time we were back at school solving various problems for different assignments, which were very interesting and fun, and this is the same. If you find this kind of stuff interesting, then I urge you to try to solve the problem by yourself or even optimize my solution.

Dejan Lukan
Dejan Lukan

Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.