Month: September 2016

Solving the NPR Sunday Puzzle for Sept 25, 2016

Posted on

The Challenge:

Take the words DOES, TOES and SHOES. They all end in the same three letters, but none of them rhyme. What words starting with F, S and G have the same property? The F and S words are four letters long, and the G word is five letters. They all end in the same three letters.

Link to the challenge on the NPR web site.

Solution:screenshot

As you can see, I actually found four candidate groups, but I’m pretty sure Will is looking for the last one, since it is the only group of common words. Also, some of my solutions might be considered rhymes, even though they don’t actually end with the exact same sounds.

Techniques of Interest:

  • New string formatting feature in Visual Studio 2015
  • Utilizing Cartesian Products with Linq
    • For non-equal joins (trickier than you might think!)
  • String and file processing in C#
  • Dictionaries and Tuples

Plan of Attack:

  1. Use a file of English words to identify words of the correct length, starting with F, S or G
  2. Use a file containing phonemes for English words to determine whether any words rhyme
    • A “phoneme” is a unit of sound in English, for example, ‘soul’ has these 3 phonemes:
      • S OW1 L
    • Where as ‘coal’ has phonemes ‘K OW1 L’
    • Because these two sets of phonemes end the same, the words rhyme
    • Naturally, we want words that don’t rhyme, but we can use the phonemes for that too

The Nitty-Gritty

Here’s how I build the first dictionary, which groups words that end with the same 3 letters.

When I find words of the correct length and starting letter, I add them to a dictionary, using their ending letters as a key, illustrated below:

sameendingdictionary

The code for this is rather straightforward:

public static Dictionary<string, List<string>> LoadWordsEndingSame(string fName) {
  Dictionary<string, List<string>> sameEndingDic 
	   = new Dictionary<string, List<string>>();
  using (StreamReader sr = File.OpenText(fName)) {
    while (sr.Peek() != -1) {
      string aLine = sr.ReadLine();
      char letr1 = aLine[0];
      if (((letr1 == 'f' || letr1 == 's') && aLine.Length == 4) ||
         (letr1 == 'g' && aLine.Length == 5)) {
        string suffix = aLine.Substring(aLine.Length - 3, 3);
        if (sameEndingDic.ContainsKey(suffix)) {
          sameEndingDic[suffix].Add(aLine);
        } else {
          sameEndingDic.Add(suffix, new List<string> { aLine });
        }
      }
    }
  }
  return sameEndingDic;
}

 

Processing the Phonemes File

My second dictionary uses words for the key, and stores the trailing phonemes as the value.

This is rather straightforward string processing; for full details, please download my code (to to the end of this post) and take a look. I have written a method named ‘BuildPhonemeDic’ which belongs to class ‘Utilities’. The upshot is that it builds a dictionary having

  • A word as the key
  • The ‘trailing phonemes’ as the value associated with that word
    • By ‘trailing’, I mean all the phonemes after the consonant

The screen shot below illustrates the dictionary structure:

Phoneme Dictionary.png

Using the Dictionaries

Now that I’ve build the dictionaries, I can use them to find the sweet, sweet solution to the puzzle!

  1. Examine all the entries in our dictionary ‘sameEndingDic’
  2. Look for an entry with at least one f, one g and one s; check the non-rhyme shortly
  3. Note that sometimes there multiple words qualify for each starting letter
    1. For example, my group fade’, ‘glade’, ‘grade’ and ‘sade’
    2. I.e., we have two g-words
  4. I use a Cartesian Product to build every combination of f, g and s words
  5. If I can find a triple that doesn’t have the same trailing phonemes, I’ve got a winner!

Here’s the code to accomplish that:

StringBuilder sb = new StringBuilder();
//iterate the dictionary we just built; remember, the key is the last 3 letters 
//and the value is a list of words ending with those letters
foreach (KeyValuePair<string, List<string>> kvp in sameEndingDic) {
  //First, make sure we have at least one word 
  //starting with each of f,g and s
  if (kvp.Value.Any(k => k.StartsWith("f")) &&
    kvp.Value.Any(k => k.StartsWith("g")) &&
    kvp.Value.Any(k => k.StartsWith("s"))) {

    //Now see if we can get a grouping that doesn't rhyme;
    //First we will build one list for each starting letter
    List<WordAndTrailingPhonemes> startsWithF = new List<WordAndTrailingPhonemes>();
    List<WordAndTrailingPhonemes> startsWithG = new List<WordAndTrailingPhonemes>();
    List<WordAndTrailingPhonemes> startsWithS = new List<WordAndTrailingPhonemes>();
    foreach (string aWord in kvp.Value) {
      if (wordPhonemeDic.ContainsKey(aWord)) {
        if (aWord.StartsWith("f"))
          startsWithF.Add(new WordAndTrailingPhonemes(aWord, wordPhonemeDic[aWord]));
        else if (aWord.StartsWith("g"))
          startsWithG.Add(new WordAndTrailingPhonemes(aWord, wordPhonemeDic[aWord]));
        else
          startsWithS.Add(new WordAndTrailingPhonemes(aWord, wordPhonemeDic[aWord]));
      }
    }

    if (startsWithF.Count > 0 && startsWithG.Count > 0 && startsWithS.Count > 0) {
      //Join the three lists on non-matches of the trailing phonemes
      //That will give us every combination of matches, which we need if there is more than
      //one word starting with each letter
      //This is Cartesian Product
      var matched = from f in startsWithF
              from g in startsWithG
              from s in startsWithS
              where (f.TrailingPhonemes != g.TrailingPhonemes &&
                 g.TrailingPhonemes != s.TrailingPhonemes &&
                 s.TrailingPhonemes != f.TrailingPhonemes)
              select new { fWord = f.TheWord, gWord = g.TheWord, sWord = s.TheWord };

      foreach (var candidate in matched) {
        sb.AppendLine($"{candidate.fWord}, {candidate.gWord}, {candidate.sWord}");
      }
    }
  }
}
txtResult.Text = sb.ToString();

I highlighted the comment starting the Cartesian Product. Note that you can’t use the ‘join’ keyword to accomplish this, because Linq only allows equijoins ☹.

Also, I highlighted the new string formatting syntax for Visual Studio 2015, namely, this code:

  • $”{candidate.fWord}, {candidate.gWord}, {candidate.sWord}”

This is the equivalent of:

  • String.Format(“{0}, {1}, {2}”, candidate.fWord, candidate.gWord, candidate.sWord);

As you can see, the new syntax is more concise and easier to figure-out.

Summary

Using a dictionary to group words by their ending 3 letters makes it possible to find candidates. However, there are over 1,900 such word groups, so we need to look-up each candidate’s trailing phonemes to see if they rhyme. We use a Cartesian Product in Linq, with the phoneme dictionary, to find groups that don’t rhyme.

Get the Code and Word Lists!

As usual, you can take a look at my code by downloading it from my Dropbox account here. You will need to download the phoneme word list from here, as well as a list of English words, and modify the code to reference your file (look for the ‘Carnegie Mellon List’).