NPR Puzzle for November 20, 2022

Posted on Updated on

Science Branch with Two Embedded Anagrams

Last week’s challenge came from Henri Picciotto, of Berkeley, Calif. He coedits the weekly “Out of Left Field” cryptic crossword. Name a branch of scientific study. Drop the last letter. Then rearrange the remaining letters to name two subjects of that study. What branch of science is it?

Link to the challenge

General Approach

I elected to use WPF for this one, for one reason, I was in a bit of a hurry, and I’m faster in WPF. Also, WPF is well suited for file handling, unlike web-based solutions, and the code needs to load a file containing English words, as well as a file containing fields of science. Finally, the code is a bit CPU intensive; Python would run slower, and building a nice UI in Python is harder, at least for me!

Here’s how my UI looks after clicking the ‘Solve’ button:

The screen shot above shows that 920 possible solutions were displayed. That doesn’t completely solve the puzzle, but note that it is very hard to determine whether an arbitrary word is part of a field of study. With this list, we at least can pick from a fairly short list of possible answers to the challenge. Think you can do better? You would need some really advanced AI to complete the final part of the puzzle, because there is not dictionary entry that you can link to which says, for example, “astronomy studies the objects ‘starts’ and ‘moons'”.

Techniques Used

Synopsis

We can get a list of 716 branches of study from Wikipedia. Using HtmlAgilityPack, we can extract the names from the raw HTML on that page. That is the list we will attempt to manipulate, so we need a list of English words, many of which are available on the web (including this one).

Using those two sources, proceed by first splitting each branch into two parts (after stripping the last letter). Note that each branch can be split at several points, so we use a loop to try every valid split point.

Having split the original, we anagram the two sub-words. To do so, we take advantage of a simple fact relating to anagrams:

Two words are anagrams if their sorted letters are the same

We know that the three words above are anagrams because, when we sort their letters, we get the same result, namely, “arst”.

To capitalize on this principle, we build a dictionary, using every word in English, whose keys are the sorted letters, having an associated value composed of the list of words that share those sorted letters. That way, we can easily and efficiently find the anagrams of any word. Here’s what a sample dictionary entry looks like:

Interpretation: the image above is a screen shot from Visual Studio’s Immediate Pane. It shows the value associated with one particular dictionary key, “arst”, which is a list of 5 words, all of which are anagrams of each other. Just to make it ultra clear, here is the dictionary entry for “mnoo”:

The dictionary will contain 75,818 such entries.

Using the dictionary, we can tell if a candidate has any anagrams by checking if its sorted letters exist in the key. If the key exists, we can get all the anagrams for the candidate by examining the value associated with that key. Remember, each entry in a dictionary is a key/value pair.

So, to recapitulate, we will build a dictionary from all English words, each entry has the sorted letters as the key, and the list of anagrams as the value. We loop through the science branches, splitting each into two candidates. If both candidates are in the dictionary (after sorting their letters), then we have a candidate solution.

Here’s the Code!

To save space, I will omit the part of my code which downloads the list of science branches. But have no fear, you can download all my code and examine it to satisfy your curiosity, using the link at the bottom.

Note that the code below is just an excerpt, representing the most important code. At the point where the excerpt begins, we already have the following variables:

  • branchList – a list of all 716 branches of science, a simple list of type string
  • wordDic – a dictionary constructed as described above, using sorted letters as the key to each entry

Obviously, if you want to see the code I haven’t discussed below, you can just download my entire project (link below); I believe the embedded comments in that code should satisfy your curiosity.

Note that each numbered comment in the listing below has a corresponding explanation after the code listing.

//Every candidate solution will go into this collection, which is bound to the display:
var answers = new ObservableCollection<Answer>();
grdResults.ItemsSource = answers;
//The syntax below allows the screen to remain responsive while the algorithm runs
await Task.Run(() => {
  int count = 0;
  foreach (var candidate in branchList) {
    //1) remove the last letter
    var trimmed = candidate.ToLower()[..^1];

    //2) This inner loop tries splits the science branch at all possible points
    //forming two parts, br1 and br2:
    for (var i = 1; i < trimmed.Length - 1; i++) {
      //3) Take the first part of the candidate and sort its letters
      var br1 = trimmed[..i].OrderBy(c => c).Aggregate("", (p, c) => p + c);
      //4) If the sorted letters are in the dictionary, work with 
      //the 2nd part of the branch:
      if (wordDic.ContainsKey(br1)) {
        //5) Grab the 2nd part of the branch and sort its letters
        var br2 = trimmed[(i)..].OrderBy(c => c).Aggregate("", (p, c) => p + c);
        //6) If br2 is in the dictionary, we have a potential solution:
        if (wordDic.ContainsKey(br2)) {
          //7) Each entry (the science branch name part) has a list associated with it 
	  //(the entry's value); display all combinations from those 2 lists 
          foreach (var word1 in wordDic[br1]) {
            foreach (var word2 in wordDic[br2]) {
              var addmMe = new Answer { Branch = candidate, Word1 = word1, Word2 = word2 };
              answers.Add(addmMe);
            }
          }
        }
      }
    }

    //Update the progress bar
    var pct = (int)(100 * count++ / (float)branchList.Count);
    Dispatcher.Invoke(() => prg.Value = pct);
  }
});

Numbered Comment Explanation

  1. var trimmed = candidate.ToLower()[..^1];
    • Each branch of science in the list is mixed case, so use “ToLower” to match our dictionary
    • [..^1] means take a substring starting at the beginning, all the way to 1 before the end
  2. for (var i = 1; i < trimmed.Length - 1; i++) {
    • The variable i controls the split point in the candidate; the first eligible position is 1, because we don’t want any zero-length strings. Similarly, the last split point is 1 before the end
    • For example, the word “astronomy” would be split as 7 different times, once for each pass through the loop, with the split points shown below, as well as the pertinent variables:
      • i = 1, a – stronom, br1 = a, br2 = oomnrst
      • i = 2, as – tronom, br1 = as, br2 = oomnrt
      • i = 3, ast – ronom, br1 = ast, br2 = oomnr
      • i = 4, astr – onom, br1 = arst, br2 = oomn
      • i = 5, astro – nom, br1 = aorst, br2 = omn
      • i = 6, astron – om, br1 = anorst, br2 = mo
      • i = 7, astrono – m, br1 = anoorst, br2 = m
  3. var br1 = trimmed[..i].OrderBy(c => c).Aggregate("", (p, c) => p + c);
    • Take the substring ending at position i. Then sort the letters (“OrderBy“). Note that the result is IEnumerable<char> – basically a list of characters. Now take the result of that operation and Aggregate it so we get a string instead of a list of char.
    • Interpret the aggregation as follows:
      • “” (Empty quotes) This is the seed, we start with an empty string and append everything to it
      • (p,c) => this introduces the two parameters used in the operation, they represent the previous (p) value and the current value c. Previous means the string we have built so far
      • p + c We simply append the current character to the previous value
    • The upshot is this code converts a list of char into a string
  4. if (wordDic.ContainsKey(br1)) {
    • Test if the sorted letters (br1) have an anagram by asking the dictionary if it has a corresponding key. This is like asking the dictionary if ‘arst’ has an entry in the key. Remember, ‘arst’ is an anagram of ‘star’.
  5. var br2 = trimmed[(i)..].OrderBy(c => c).Aggregate("", (p, c) => p + c);
    • Now take the second part of the science branch and do the same thing
    • The only thing that is different is how we take the last part of the original, namely,
      • trimmed[(i)..]
    • This means “take a substring starting at position i, up to the very end
  6. if (wordDic.ContainsKey(br2))
    • Like before, check if the sorted letters “br2” have any anagrams,
    • If so, we have a potential solution!

Remember, the list answers is bound to the display grid, so merely adding it causes it to be displayed. It is then up a human to decide whether the displayed anagrams are actually studied as part of the science branch.

Get the Code to Solve the Puzzle

You can download my complete code, including my list of 716 branches of science, here. Note that you will need a DropBox account to use that link, but that is free. Also, you will need a list of English words; you can get one here. You will need to modify your version of the code to reflect the path where you save this file.

Leave a comment