Fun With Linq – Solving the Sunday Puzzle

Posted on Updated on

The Challenge

This week’s challenge is a common two-word expression. The expression consists of 8 letters and uses all five vowels — A, E, I, O and U.
It has only three consonants, one of which is repeated. The first word in the expression has two letters and the second has six letters.

Link to the challenge

Techniques to Solve

  • File IO
  • Complex dictionaries
  • Linq operations, including
    • Where
    • Aggregate
    • Except
    • GroupBy

Comments on the Puzzle

That was kind of a tough puzzle! I found some phrase lists on the internet, but they weren’t very big and none of them contained the solution. Out of frustration, I elected to search word lists to construct word pairs that match Will’s description.  Even that didn’t work until I get a really big word list that had some French words.

Finally, after plugging in my biggest word list, and manually sifting through 4,403 candidates, I found the answer. While looking at the candidates starting with “au”, something in the back of my mind pulled-up the answer “au revoir” before I actually saw it on screen.

Some Near Misses

I got a kick out of these, which met the rules, but probably aren’t “common expressions”:

  • Go, Aussie!
  • No Auntie!
  • Oi, abuses
  • Up roarie

Au RevoirOverview of The Algorithm

  1. Open the word list, read every word
  2. Length == 2 ? add to a list named “twoLetterWords”
  3. Length == 6 ? Store in a dictionary
    1. The dictionary uses the word’s vowels as a key,
      • Such as key = “eio” for the word “revoir”
    2. For each key, keep a list of all the 6-letter words that contain those vowels
      1. So the data type for my dictionary is “Dictionary<string, List> sixLetterWordDic”
    3. So, under the dictionary key “eio”, I store a list of 1,227 words that use those three vowels, including “revoir”
  4. After building the dictionary and the 2-letter word list
    1. Iterate all the two-letter words
    2. Compute the vowels not used in the 2-letter word
    3. Use them as a key to the dictionary
    4. Loop through the word list stored under key composed of the remaining vowels
    5. Apply Will’s criteria “3 consonants, one of which is repeated”
      1. By using the GroupBy operator

Here’s the Code

private void btnSolveFromWordList_Click(object sender, RoutedEventArgs e) {
   List<string> twoLetterWords = new List<string>();

   //For speed, we look-up words using their vowels as a key
   //For example, 
   //  sixLetterWordDic["eio"] = { "belion", "beloid", .... "revoir" }
   //To wit, the key "eio" represents all the vowels EXCEPT "au"
   //  The value is a list of words containing those vowels, 1,227 total, including "revoir"
   Dictionary<string, List<string>> sixLetterWordDic = new Dictionary<string, List<string>>();

   List<string> vowels = new List<string> { "a", "e", "i", "o", "u" };
   //Open the word file and read every line, building the word list and the dictionary
   using (StreamReader sr = new StreamReader(File.OpenRead(WORD_File))) {
    while (sr.Peek() != -1) {
     string aWord = sr.ReadLine().ToLower();
     if (aWord.Length == 6) {
      //Get all the vowels in the word and concatenate into a string
      string vowelKey = vowels.Where(v => aWord.Contains(v))
            .Aggregate("", (p, c) => p + c);
      if (vowelKey.Length > 0) {
       //Add to the dictionary using the vowels as a key; hopefully it does not need sorting
       if (sixLetterWordDic.ContainsKey(vowelKey))
        sixLetterWordDic[vowelKey].Add(aWord);
       else
        sixLetterWordDic.Add(vowelKey, new List<string> { aWord });
      }
     } else if (aWord.Length == 2 && !twoLetterWords.Contains(aWord))
      twoLetterWords.Add(aWord);
    }
   }

   //Now that we have the word list and dictionary, loop through seeking
   //a solution:
   int word2Count = 0;
   foreach (string aWord in twoLetterWords) {
    //Get just the vowels; for examle, if the word is "of", then foundVowels will be { "o }
    List<string> foundVowels = vowels.Where(v => aWord.Contains(v)).ToList();

    if (foundVowels.Count > 0) {
     //get all the vowels NOT used in foundVowels
     string remaining = vowels.Except(foundVowels)
            .Aggregate("", (p, c) => p + c);
     if (sixLetterWordDic.ContainsKey(remaining)) {
      foreach (string partner in sixLetterWordDic[remaining]) {
       //We know there should be two distinct consonants, and one of them is repeated
       //so use GroupBy on the letters.
       var consonantGroups = (aWord + partner)
           .Where(l => "aeiou".IndexOf(l) < 0)
           .GroupBy(l => l);
       if (consonantGroups.Count() == 2 && 
        (consonantGroups.First().Count() == 2 || consonantGroups.Last().Count() == 2)) {
        txtAnswer.Text += aWord + " " + partner + "\n";
        matchCount++;
       }
      }
     }
    }
   }
}

What’s Up with the Dictionary?

My word list is huge – more than 775,000 entries. That results in 54,596 six letter words and 646 2-letter words. I can’t wait around while my code checks every 2-letter word against that huge list of 6-letter words. So I used a dictionary for speed.

The dictionary has a shorter list for every vowel combination. As I mentioned in my code comments, the key “eio” is linked to a list of 1,227 6-letter words. So when I encounter words like “au”, I don’t need to check all 56,596 6-letter words, I only need to check 1,227. So, for just that sample, it runs 46 times faster.

Explain the Aggregate Operation!

I used the Linq operation ‘Aggregate’ a couple of times, like here:

string vowelKey = vowels.Where(v => aWord.Contains(v)) .Aggregate(“”, (p, c) => p + c);

For example, if aWord = “revoir”, then vowelKey will be assigned “eoi”.

Here’s a step-by-step breakdown of what each clause does:

  • Vowels.Where(v => aWord.Contains(v))”  means:
    • Loop through my list of vowels
    • Build a list of each vowel (string) that meats the Where clause
    • Namely, if aWord contains that vowel, add it to that list
    • If you want to be picky, it is an IEnumerable<string>, not a “list”
    • Take the list we just built and pipe it into an aggregate operation
  • Aggregate(“”, (p, c) => p + c)” means:
    • “” – The empty quotes represent the seed.
      • We add everything else to the seed
      • We could have used something else as a seed, like a hyphen
        • In which case the result would have been “-eio”, not “eio”
    • (p, c) => p + c – perform this operation on every entry in the list
      • p represents the accumulated result so far
      • c is the current list entry, i.e. a vowel
      • In case you didn’t know, the “+”  operator concatenates two strings
        • So, on the first pass, p == “” and c == “e”
        • Second pass, p == “e” and c == “i”
        • Third pass, p == “ei” and c == “o”

Summary

Linq operators shorten the code quite a bit. The hashing capabilities of the dictionary make the solution run a lot faster.

Get the Code

Here’s a link to my drop box account to download the code and run it yourself (note that DropBox will require you to set-up a free account to get the code). I didn’t include my word list (too big), but you can download it here.

Leave a comment