BinarySearch

Puzzle Solved! Politician Name Swap

Posted on Updated on

The Challenge

The challenge comes from listener Steve Daubenspeck of Fleetwood, Pa. Take the first names of two politicians in the news. Switch the first letters of their names and read the result backward to name something that each of these politicians is not.

Link: http://www.npr.org/2015/04/19/400614542/w-seeking-w-for-compound-word-dates

Screen Shot - I checked my lists of common first names to find any which I could combine to create a word in my dictionary file
Screen Shot showing the solution. – I checked my lists of common first names to find any which I could combine to create a word that exists in my list of all English words. Technically, my code didn’t solve the puzzle, it just generated a relatively short list (51 candidates) for me to check manually.

Techniques to Solve

My algorithm uses the following techniques:

  • Background Workers
  • Linq, Lambda Expressions and String Manipulation
  • Binary Search
  • File IO

Algorithm Overview

I don’t know of any list containing ‘politicians in the news’. I looked around on the web, and the closest thing I could find was a Wikipedia list of lists. It generally takes a quite while to chase down all those sub-lists, but I already had couple of nice lists of common first names (boy names and girl names). I elected to try checking all first names that might match the solution and then see how many names applied to politicians I recognized. Since only a small number (51) names could be manipulated to form an English word, it was relatively easy to look through those 51 and manually check if any matched-up with prominent politicians.

BackgroundWorker – Rationale

I used a BackgroundWorker because I wanted the grid to display work-in progress while it worked, instead of waiting until the end to see the all results. Since it takes over 2 minutes to solve, it’s a nice way to tell the users progress is being made. Important: in order to make this work, your grid should be bound to an ObservableCollection so that your grid will be aware of your updates.

When you use a BackgroundWorker, you hook-it-up to several methods which it will run at the appropriate time, including:

  • DoWork – your method where you do the work!
  • ProgressChanged – an event that is raised, where your other method can update the UI
  • RunWorkerCompleted – fired when the DoWork method completes

The point is that we will update the UI in our ProgressChanged method, which will be on a different thread than the DoWork method. This allows the UI to update almost immediately, instead of waiting for the main thread to finish. Here’s how I set-up my worker:

//Declare the background worker at the class-level
private BackgroundWorker _Worker;
//... Use it inside my button-click event:
_Worker = new BackgroundWorker();
_Worker.WorkerReportsProgress = true;
//Tell it the name of my method, 'Worker_DoWork'
_Worker.DoWork += Worker_DoWork;
//Tell the worker the method to run when it needs to report progress:
_Worker.ProgressChanged += Worker_ProgressChanged;
//Likewise, tell it the name of the method to run when complete:
_Worker.RunWorkerCompleted += Worker_RunWorkerCompleted;

_Worker.RunWorkerAsync();

Hopefully this is pretty clear, my methods are nothing special. If you’re confused, I invite you to download my code (link at the bottom of this post) and take a look, Now let’s talk about the Linq and Lambda expressions I used inside my method ‘Worker_DoWork’.

Linq and Lambda Expressions

I didn’t use Linq/Lambda a lot, but I did use them to work with my list of names, specifically

  1. to perform a ‘self-join’ on my list of names
  2. to reverse my string (which gives me a list of char)
  3. and the ‘Aggregate’ method to convert my list of char back to a string

Here’s what I’m talking about:

var candidates = from n1 in allNames
                 from n2 in allNames
                 where n1 != n2
                 select new SolutionCandidate {
                     Combined = (n2[0] + n1.Substring(1) + n1[0] + n2.Substring(1))
                                                                     .Reverse()
                                                                     .Aggregate("", (p, c) => p + c),
                     Name1 = n1,
                     Name2 = n2
                 };

What is this code doing? It takes every entry in allNames, in combination with a corresponding entry in another instance of allNames (in case you’re wondering, ‘allNames’ is just a list of first names). The names in the first instance are referred to using the variable name ‘n1’, and the names in the second instance are referred to using name ‘n2’. In other words, create every possible 2-name combination. This is the equivalent of a Cartesian Product, which you should normally avoid due to huge results sets. But in this case, we actually want a Cartesian Product. If you’re thinking algorithm analysis, this code runs in O(n2) time. (Link explains my notation: http://en.wikipedia.org/wiki/Big_O_notation)

For every name pair (and there are quite a few!), the code above creates a SolutionCandidate, which is a little class with three properties, namely, ‘Combined‘, ‘Name1‘ and ‘Name2‘.

Take a look at how I create the ‘Combined’ property:

  • n2[0]   – this means: take the 0th char from n2 (i.e., the name from the 2nd instance of allNames)
  • + n1.Substring(1)  this means take a substrig of n1, starting at index 1, up to the end. In other words, all of it except the first letter (at index 0), and concatenate it the previous term (the plus sign means concatenate)
  • + n1[0]  this means get the 0th char of the name from the 1st instance of allNames, concatenate it too
  • + n2.Substring(1)  this means get all of the 2nd name except the first letter, and concatenate

After I do that, I invoke the ‘Reverse’ method on the result. Reverse normally works with lists (actually, with IEnumerable objects), but a string is actually a list of letters (char) so it will revers a string too. The problem is that the result is a list of char.

Finally, I convert my list of char back to a string using the ‘Aggregate‘ method. If I performed Aggregate on an list of numbers, I could use it to add each entry to create a sum, for example (I can do anything I want with each entry). But for a list of char, the equivalent to addition is concatenation.

Binary Searching

Binary search is an efficient way to check if some candidate is in your list. In our case, I want to check my list of all words to see if I created a real word by manipulating two first names. Here’s how I do that, remember that I build my list ‘candidates’ above and it contains a bunch of potential solutions:

foreach (var candidate in candidates)
    if (allWords.BinarySearch(candidate.Combined) >= 0) {
        //pass the candidate to a method on another thread to add to the grid; 
        //because it is on another theread, it loads immediately instead of when the work is done
        _Worker.ReportProgress(0, candidate);
    }

When I invoke ‘BinarySearch’, it returns the index of the entry in the list. But if the candidate is not found, it returns a negative number. BinarySearch runs in O(Log(n)) time, which is good enough for this app, which really only runs once. (I could squeeze out a few seconds by using a Dictionary or other hash-based data structure, but the dictionary would look a bit odd because I only need the keys, not the payload. The few seconds I save in run time is not worth the confusion of using a dictionary in a weird way.)

File IO

OK, I used some simple file IO to load my list of names, and also my list of all words. The code is pretty straightforward, so I’ll just list it here without any exegesis.

List//allNames will hold boy and girl names
//I run this code inside 'Worker_DoWork'
ReadNameFile(ref allNames, BOY_NAME_FILE, allWords);
ReadNameFile(ref allNames, GIRL_NAME_FILE, allWords);

//Here's the method that loads a file and adds the name to the list
private void ReadNameFile(ref List allNames, string fName, List allWords) {
    string[] wordsArray = allWords.ToArray();
    using (StreamReader sr = File.OpenText(fName)) {
        while (sr.Peek() != -1) {
            //The name is columns 1-16, so trim trailing blanks
            string aName = sr.ReadLine().Substring(0, 15).TrimEnd();
            int p = allNames.BinarySearch(aName);
            if (p < 0)
                //By inverting the result form BinarySearch, we get the position
                //where the entry would have been. Result: we build a sorted list
                allNames.Insert(~p, aName);
        }
    }
}

Notes on Improving the Efficiency

We pretty much need to perform a self-join on the list of names, and that is by far the slowest part of the app. It would be faster if we could reduce the size of the name list. One way to do that is to reject any names that could never form a solution. For example, ‘Hillary’ could not be part of a solution because,

  • when we remove the first letter (H),
  • we are left with hillar,
  • which, when reversed, becomes ‘rallih’
  • Since there are no English words that start with, or end with, ‘rallih’, I know that ‘Hillary’ is never part of the solution

I briefly experimented with this method; there are some issues. First, the standard implementation of BinarySearch only finds exact matches, not partial hits. Same thing for hash-based data structures, and a linear search is way too slow. You can write your own BinarySearch that will work with partial hits, but that leads us to the second issue.

The second issue is that, in order to do an efficient search for words that end with a candidate string, you need to make a separate copy of the word list, but with all the words spelled backwards, and run your custom binary search on it, again modifying the search to find partial matches.

After starting down that path, I elected to call it a wrap. My app runs in 2 minutes, which is slow but still faster than re-writing my code, perhaps to save 30 seconds of run time. To any ambitious readers, if you can improve my algorithm, please leave a comment!

Summary

I build a list of common names (with some simple file IO) and matched it against itself to form all combinations of common names (using Linq/Lambda Expressions); for each pair, I manipulated the names by swapping first letters and concatenating them. I then checked that candidate against my master list of all words, using a BinarySearch to check if the candidate exists in that list..

Download your copy of my code here