NPR Puzzle for June 12, 2022
Famous American’s last name manipulated to become a European capital.
This week’s challenge comes from listener Theodore Regan, of Scituate, Mass. Take the last name of a famous 20th-century American. The 5th, 6th, 7th, and 1st, 2nd, and 3rd letters, in that order, name a European capital. Who is the person, and what capital is it?
Link to the challenge
Techniques Used
- String Manipulation
- Simple File IO
- Screen Scraping (to get the famous Americans lists)
- XPath (Only discussed in my comments in the download)
- HtmlAgilityPack (also only in the download)
Synopsis
This one should have been easy, but I struggled some. First, I had to get a good enough list of famous Americans. That was tricky because my list source was 56 separate pages on Wikipedia (one page per state or territory). The large number of pages was not a problem as much as the differing formats on each page. Then I accidentally assumed the person’s name should be six letters long, resulting in no output until I realized my mistake.
Even though I struggled a bit, I’m pretty sure that this puzzle is still easier to solve by coding than by traditional methods. Yea, team coders!
However, having struggled to build the famous Americans files (also the European capitals), the algorithm is straightforward: 1) read the capitals into a list, 2) read each person from the 56 files, 3) parse each input line to extract the last name, 4) manipulate the last name per Will’s instructions, and 5) search the capitals list to see if the manipulated name matches any capital.
The heart of the algorithm is encapsulated in the following four lines of code:
var candidate = lastName[4..7] + lastName[..3]; var ndx = capitalList.BinarySearch(candidate, StringComparer.OrdinalIgnoreCase); if (ndx >= 0) txtResults.Text += $"{fullName} - {capitalList[ndx]}\n";
Don’t worry if it doesn’t make sense yet, I’ll explain it below. 😊
Here’s the Code!
Comments below with a number correspond to a numbered bullet point in the explanation deeper down:
private void btnSolve_Click(object sender, RoutedEventArgs e) { //Load the capitals into a list via a method defined below: var capitalList = LoadCapitalList(); var folderName = @"C:\Users\Randy\Documents\Puzzles\Data\Americans"; //1) The folder holds 56 files, one for each state or territory var fList = Directory.GetFiles(folderName); //Loop through all 56 files and read each line: foreach(var fileName in fList) { using(var sr = File.OpenText(fileName)) { //2) Peek tells us when we hit the end of the file while(sr.Peek() != -1) { var aLine = sr.ReadLine() ?? ""; //3) Each line holds multiple fields separated by | characters //The first field is the person's name, so find the 1st | var p = aLine.IndexOf('|'); var fullName = ""; if (p >= 0) //4) Extract everything before the |, using range operators: fullName = aLine[..p]; else fullName = aLine; //Now that we have the full name, extract the last name var lastName = ""; //Look for the last space, the last name starts after it: p = fullName.LastIndexOf(' '); if (p >= 0) //5) lastName = fullName[(p + 1)..]; else //Some people only have one name, so there is no space lastName = fullName; if (lastName.Length >= 7) { //5) Swap the letters per Will's instructions: var candidate = lastName[4..7] + lastName[..3]; //6) Binary search is fast and allows mixed-case matches: var ndx = capitalList.BinarySearch(candidate, StringComparer.OrdinalIgnoreCase); //If ndx is non-negative, that means we have a solution! if (ndx >= 0) txtResults.Text += $"{fullName} - {capitalList[ndx]}\n"; } } } } }
Discussion
var fList = Directory.GetFiles(folderName);
- The method
GetFiles
returns an array of file names which we can loop through - The
GetFiles
method returns an array, which I iterate, so as to process every file in the folder
- The method
while(sr.Peek() != -1) {
- ‘sr’ is a
StreamReader
; thePeek
method tells us when we reach the end of the file by returning -1
- ‘sr’ is a
var p = aLine.IndexOf('|');
- My files containing famous people uses the pipe (‘|’) character to separate fields
- So that is why I search for a ‘|’, because the name ends where the ‘|’ starts.
fullName = aLine[..p];
- Here I am using the
range
operator to extract a sub-string from the variableaLine
. - The syntax above is equivalent to
aLine[0..p]
, meaning “start at position 0 and finish prior to p“. - However, we can omit the 0 because it is implied when omitted
- Here I am using the
var candidate = lastName[4..7] + lastName[..3];
- The
candidate
might be a European capital; per Will’s instructions, we put letters 4-7 at the start, concatenate letters 1-3 at the end. Again, I am using range operators
- The
var ndx = capitalList.BinarySearch(candidate, StringComparer.OrdinalIgnoreCase);
- The binary search method is a super fast way to search a list, also convenient
- Note that the candidate I just built looks like this: “berLin“. Note that ‘b’ is lower case and ‘L’ is upper case, because that’s how they appeared in the source, ‘Lindebergh‘.
- Also note that the entry in
capitalList
has different capitalization, ‘Berlin‘, and under normal string comparison, these two strings are not equal! - I handle this by specifying ‘
StringComparer.OrdinalIgnoreCase
‘ as a parameter to the search. This means the strings will be compared ignoring case. - Note that the result returned by the
BinarySearch
method is a negative number if nothing is found, otherwise it is the index of the item searched for. In my list of 65 capitals, Berlin is at index 7. Note that C# uses zero-based indexing, meaning that the first entry has index 0, which is confusing to many, including me 😒
Method ‘LoadCapitalList‘ Loads the Capitals File into a List
The method shown below was invoked in the code listing above:
private List<string> LoadCapitalList() { var fName = @"C:\Users\Randy\source\repos\GetFamousAmericans\GetFamousAmericans\EuropeanCapitals.txt"; var capitalList = new List<string>(); using (var sr = new StreamReader(File.OpenRead(fName))) while (sr.Peek() != -1) { //'Readline' returns null or a string, the null coalesce //operator ?? says 'convert any null to an empty string' var aLine = sr.ReadLine() ?? ""; //The file has two fields separated by '~'; the 1st field is the capital var p = aLine.IndexOf("~"); var capital = aLine[..p]; capitalList.Add(capital); } //Binary search only works on sorted lists, so sort it! capitalList.Sort(); return capitalList; }
Remarks – Loading the File
The code is pretty straightforward, using the same techniques as shown above to open the file and loop through it. Note that this file is different because its delimiter is a tilde, not a pipe.
Code to Get Famous Americans
In order to keep this post short and sweet, I omitted explaining how I did that. However, you can download my code and examine the comments in the file to learn more. Note that the code to get the famous Americans is considerably more complex than the code to solve the puzzle. To download the HTML pages and parse them, I used HtmlAgilityPack, which in turn relies on XPath. That allowed me to extract the parts of the page I wanted, ignoring all the text and formatting.
Get the Code!
Click here to download my code; note that to do so, you will need a free DropBox account. You will need to revise any variables containing file paths to match the path on your PC. Unless you have folders name ‘Randy’!