Fun With Linq – Solving the Sunday Puzzle
The Challenge
This week’s challenge is a common two-word expression. The expression consists of 8 letters and uses all five vowels — A, E, I, O and U.
It has only three consonants, one of which is repeated. The first word in the expression has two letters and the second has six letters.
Techniques to Solve
- File IO
- Complex dictionaries
- Linq operations, including
- Where
- Aggregate
- Except
- GroupBy
Comments on the Puzzle
That was kind of a tough puzzle! I found some phrase lists on the internet, but they weren’t very big and none of them contained the solution. Out of frustration, I elected to search word lists to construct word pairs that match Will’s description. Even that didn’t work until I get a really big word list that had some French words.
Finally, after plugging in my biggest word list, and manually sifting through 4,403 candidates, I found the answer. While looking at the candidates starting with “au”, something in the back of my mind pulled-up the answer “au revoir” before I actually saw it on screen.
Some Near Misses
I got a kick out of these, which met the rules, but probably aren’t “common expressions”:
- Go, Aussie!
- No Auntie!
- Oi, abuses
- Up roarie
Overview of The Algorithm
- Open the word list, read every word
- Length == 2 ? add to a list named “twoLetterWords”
- Length == 6 ? Store in a dictionary
- The dictionary uses the word’s vowels as a key,
- Such as key = “eio” for the word “revoir”
- For each key, keep a list of all the 6-letter words that contain those vowels
- So the data type for my dictionary is “Dictionary<string, List> sixLetterWordDic”
- So, under the dictionary key “eio”, I store a list of 1,227 words that use those three vowels, including “revoir”
- The dictionary uses the word’s vowels as a key,
- After building the dictionary and the 2-letter word list
- Iterate all the two-letter words
- Compute the vowels not used in the 2-letter word
- Use them as a key to the dictionary
- Loop through the word list stored under key composed of the remaining vowels
- Apply Will’s criteria “3 consonants, one of which is repeated”
- By using the GroupBy operator
Here’s the Code
private void btnSolveFromWordList_Click(object sender, RoutedEventArgs e) { List<string> twoLetterWords = new List<string>(); //For speed, we look-up words using their vowels as a key //For example, // sixLetterWordDic["eio"] = { "belion", "beloid", .... "revoir" } //To wit, the key "eio" represents all the vowels EXCEPT "au" // The value is a list of words containing those vowels, 1,227 total, including "revoir" Dictionary<string, List<string>> sixLetterWordDic = new Dictionary<string, List<string>>(); List<string> vowels = new List<string> { "a", "e", "i", "o", "u" }; //Open the word file and read every line, building the word list and the dictionary using (StreamReader sr = new StreamReader(File.OpenRead(WORD_File))) { while (sr.Peek() != -1) { string aWord = sr.ReadLine().ToLower(); if (aWord.Length == 6) { //Get all the vowels in the word and concatenate into a string string vowelKey = vowels.Where(v => aWord.Contains(v)) .Aggregate("", (p, c) => p + c); if (vowelKey.Length > 0) { //Add to the dictionary using the vowels as a key; hopefully it does not need sorting if (sixLetterWordDic.ContainsKey(vowelKey)) sixLetterWordDic[vowelKey].Add(aWord); else sixLetterWordDic.Add(vowelKey, new List<string> { aWord }); } } else if (aWord.Length == 2 && !twoLetterWords.Contains(aWord)) twoLetterWords.Add(aWord); } } //Now that we have the word list and dictionary, loop through seeking //a solution: int word2Count = 0; foreach (string aWord in twoLetterWords) { //Get just the vowels; for examle, if the word is "of", then foundVowels will be { "o } List<string> foundVowels = vowels.Where(v => aWord.Contains(v)).ToList(); if (foundVowels.Count > 0) { //get all the vowels NOT used in foundVowels string remaining = vowels.Except(foundVowels) .Aggregate("", (p, c) => p + c); if (sixLetterWordDic.ContainsKey(remaining)) { foreach (string partner in sixLetterWordDic[remaining]) { //We know there should be two distinct consonants, and one of them is repeated //so use GroupBy on the letters. var consonantGroups = (aWord + partner) .Where(l => "aeiou".IndexOf(l) < 0) .GroupBy(l => l); if (consonantGroups.Count() == 2 && (consonantGroups.First().Count() == 2 || consonantGroups.Last().Count() == 2)) { txtAnswer.Text += aWord + " " + partner + "\n"; matchCount++; } } } } } }
What’s Up with the Dictionary?
My word list is huge – more than 775,000 entries. That results in 54,596 six letter words and 646 2-letter words. I can’t wait around while my code checks every 2-letter word against that huge list of 6-letter words. So I used a dictionary for speed.
The dictionary has a shorter list for every vowel combination. As I mentioned in my code comments, the key “eio” is linked to a list of 1,227 6-letter words. So when I encounter words like “au”, I don’t need to check all 56,596 6-letter words, I only need to check 1,227. So, for just that sample, it runs 46 times faster.
Explain the Aggregate Operation!
I used the Linq operation ‘Aggregate’ a couple of times, like here:
string vowelKey = vowels.Where(v => aWord.Contains(v)) .Aggregate(“”, (p, c) => p + c);
For example, if aWord = “revoir”, then vowelKey will be assigned “eoi”.
Here’s a step-by-step breakdown of what each clause does:
- “Vowels.Where(v => aWord.Contains(v))” means:
- Loop through my list of vowels
- Build a list of each vowel (string) that meats the Where clause
- Namely, if aWord contains that vowel, add it to that list
- If you want to be picky, it is an IEnumerable<string>, not a “list”
- Take the list we just built and pipe it into an aggregate operation
- “Aggregate(“”, (p, c) => p + c)” means:
- “” – The empty quotes represent the seed.
- We add everything else to the seed
- We could have used something else as a seed, like a hyphen
- In which case the result would have been “-eio”, not “eio”
- (p, c) => p + c – perform this operation on every entry in the list
- p represents the accumulated result so far
- c is the current list entry, i.e. a vowel
- In case you didn’t know, the “+” operator concatenates two strings
- So, on the first pass, p == “” and c == “e”
- Second pass, p == “e” and c == “i”
- Third pass, p == “ei” and c == “o”
- “” – The empty quotes represent the seed.
Summary
Linq operators shorten the code quite a bit. The hashing capabilities of the dictionary make the solution run a lot faster.
Get the Code
Here’s a link to my drop box account to download the code and run it yourself (note that DropBox will require you to set-up a free account to get the code). I didn’t include my word list (too big), but you can download it here.