Lambda Expressions

Puzzle Solved! Politician Name Swap

Posted on Updated on

The Challenge

The challenge comes from listener Steve Daubenspeck of Fleetwood, Pa. Take the first names of two politicians in the news. Switch the first letters of their names and read the result backward to name something that each of these politicians is not.

Link: http://www.npr.org/2015/04/19/400614542/w-seeking-w-for-compound-word-dates

Screen Shot - I checked my lists of common first names to find any which I could combine to create a word in my dictionary file
Screen Shot showing the solution. – I checked my lists of common first names to find any which I could combine to create a word that exists in my list of all English words. Technically, my code didn’t solve the puzzle, it just generated a relatively short list (51 candidates) for me to check manually.

Techniques to Solve

My algorithm uses the following techniques:

  • Background Workers
  • Linq, Lambda Expressions and String Manipulation
  • Binary Search
  • File IO

Algorithm Overview

I don’t know of any list containing ‘politicians in the news’. I looked around on the web, and the closest thing I could find was a Wikipedia list of lists. It generally takes a quite while to chase down all those sub-lists, but I already had couple of nice lists of common first names (boy names and girl names). I elected to try checking all first names that might match the solution and then see how many names applied to politicians I recognized. Since only a small number (51) names could be manipulated to form an English word, it was relatively easy to look through those 51 and manually check if any matched-up with prominent politicians.

BackgroundWorker – Rationale

I used a BackgroundWorker because I wanted the grid to display work-in progress while it worked, instead of waiting until the end to see the all results. Since it takes over 2 minutes to solve, it’s a nice way to tell the users progress is being made. Important: in order to make this work, your grid should be bound to an ObservableCollection so that your grid will be aware of your updates.

When you use a BackgroundWorker, you hook-it-up to several methods which it will run at the appropriate time, including:

  • DoWork – your method where you do the work!
  • ProgressChanged – an event that is raised, where your other method can update the UI
  • RunWorkerCompleted – fired when the DoWork method completes

The point is that we will update the UI in our ProgressChanged method, which will be on a different thread than the DoWork method. This allows the UI to update almost immediately, instead of waiting for the main thread to finish. Here’s how I set-up my worker:

//Declare the background worker at the class-level
private BackgroundWorker _Worker;
//... Use it inside my button-click event:
_Worker = new BackgroundWorker();
_Worker.WorkerReportsProgress = true;
//Tell it the name of my method, 'Worker_DoWork'
_Worker.DoWork += Worker_DoWork;
//Tell the worker the method to run when it needs to report progress:
_Worker.ProgressChanged += Worker_ProgressChanged;
//Likewise, tell it the name of the method to run when complete:
_Worker.RunWorkerCompleted += Worker_RunWorkerCompleted;

_Worker.RunWorkerAsync();

Hopefully this is pretty clear, my methods are nothing special. If you’re confused, I invite you to download my code (link at the bottom of this post) and take a look, Now let’s talk about the Linq and Lambda expressions I used inside my method ‘Worker_DoWork’.

Linq and Lambda Expressions

I didn’t use Linq/Lambda a lot, but I did use them to work with my list of names, specifically

  1. to perform a ‘self-join’ on my list of names
  2. to reverse my string (which gives me a list of char)
  3. and the ‘Aggregate’ method to convert my list of char back to a string

Here’s what I’m talking about:

var candidates = from n1 in allNames
                 from n2 in allNames
                 where n1 != n2
                 select new SolutionCandidate {
                     Combined = (n2[0] + n1.Substring(1) + n1[0] + n2.Substring(1))
                                                                     .Reverse()
                                                                     .Aggregate("", (p, c) => p + c),
                     Name1 = n1,
                     Name2 = n2
                 };

What is this code doing? It takes every entry in allNames, in combination with a corresponding entry in another instance of allNames (in case you’re wondering, ‘allNames’ is just a list of first names). The names in the first instance are referred to using the variable name ‘n1’, and the names in the second instance are referred to using name ‘n2’. In other words, create every possible 2-name combination. This is the equivalent of a Cartesian Product, which you should normally avoid due to huge results sets. But in this case, we actually want a Cartesian Product. If you’re thinking algorithm analysis, this code runs in O(n2) time. (Link explains my notation: http://en.wikipedia.org/wiki/Big_O_notation)

For every name pair (and there are quite a few!), the code above creates a SolutionCandidate, which is a little class with three properties, namely, ‘Combined‘, ‘Name1‘ and ‘Name2‘.

Take a look at how I create the ‘Combined’ property:

  • n2[0]   – this means: take the 0th char from n2 (i.e., the name from the 2nd instance of allNames)
  • + n1.Substring(1)  this means take a substrig of n1, starting at index 1, up to the end. In other words, all of it except the first letter (at index 0), and concatenate it the previous term (the plus sign means concatenate)
  • + n1[0]  this means get the 0th char of the name from the 1st instance of allNames, concatenate it too
  • + n2.Substring(1)  this means get all of the 2nd name except the first letter, and concatenate

After I do that, I invoke the ‘Reverse’ method on the result. Reverse normally works with lists (actually, with IEnumerable objects), but a string is actually a list of letters (char) so it will revers a string too. The problem is that the result is a list of char.

Finally, I convert my list of char back to a string using the ‘Aggregate‘ method. If I performed Aggregate on an list of numbers, I could use it to add each entry to create a sum, for example (I can do anything I want with each entry). But for a list of char, the equivalent to addition is concatenation.

Binary Searching

Binary search is an efficient way to check if some candidate is in your list. In our case, I want to check my list of all words to see if I created a real word by manipulating two first names. Here’s how I do that, remember that I build my list ‘candidates’ above and it contains a bunch of potential solutions:

foreach (var candidate in candidates)
    if (allWords.BinarySearch(candidate.Combined) >= 0) {
        //pass the candidate to a method on another thread to add to the grid; 
        //because it is on another theread, it loads immediately instead of when the work is done
        _Worker.ReportProgress(0, candidate);
    }

When I invoke ‘BinarySearch’, it returns the index of the entry in the list. But if the candidate is not found, it returns a negative number. BinarySearch runs in O(Log(n)) time, which is good enough for this app, which really only runs once. (I could squeeze out a few seconds by using a Dictionary or other hash-based data structure, but the dictionary would look a bit odd because I only need the keys, not the payload. The few seconds I save in run time is not worth the confusion of using a dictionary in a weird way.)

File IO

OK, I used some simple file IO to load my list of names, and also my list of all words. The code is pretty straightforward, so I’ll just list it here without any exegesis.

List//allNames will hold boy and girl names
//I run this code inside 'Worker_DoWork'
ReadNameFile(ref allNames, BOY_NAME_FILE, allWords);
ReadNameFile(ref allNames, GIRL_NAME_FILE, allWords);

//Here's the method that loads a file and adds the name to the list
private void ReadNameFile(ref List allNames, string fName, List allWords) {
    string[] wordsArray = allWords.ToArray();
    using (StreamReader sr = File.OpenText(fName)) {
        while (sr.Peek() != -1) {
            //The name is columns 1-16, so trim trailing blanks
            string aName = sr.ReadLine().Substring(0, 15).TrimEnd();
            int p = allNames.BinarySearch(aName);
            if (p < 0)
                //By inverting the result form BinarySearch, we get the position
                //where the entry would have been. Result: we build a sorted list
                allNames.Insert(~p, aName);
        }
    }
}

Notes on Improving the Efficiency

We pretty much need to perform a self-join on the list of names, and that is by far the slowest part of the app. It would be faster if we could reduce the size of the name list. One way to do that is to reject any names that could never form a solution. For example, ‘Hillary’ could not be part of a solution because,

  • when we remove the first letter (H),
  • we are left with hillar,
  • which, when reversed, becomes ‘rallih’
  • Since there are no English words that start with, or end with, ‘rallih’, I know that ‘Hillary’ is never part of the solution

I briefly experimented with this method; there are some issues. First, the standard implementation of BinarySearch only finds exact matches, not partial hits. Same thing for hash-based data structures, and a linear search is way too slow. You can write your own BinarySearch that will work with partial hits, but that leads us to the second issue.

The second issue is that, in order to do an efficient search for words that end with a candidate string, you need to make a separate copy of the word list, but with all the words spelled backwards, and run your custom binary search on it, again modifying the search to find partial matches.

After starting down that path, I elected to call it a wrap. My app runs in 2 minutes, which is slow but still faster than re-writing my code, perhaps to save 30 seconds of run time. To any ambitious readers, if you can improve my algorithm, please leave a comment!

Summary

I build a list of common names (with some simple file IO) and matched it against itself to form all combinations of common names (using Linq/Lambda Expressions); for each pair, I manipulated the names by swapping first letters and concatenating them. I then checked that candidate against my master list of all words, using a BinarySearch to check if the candidate exists in that list..

Download your copy of my code here

Defensive Coding with Linq and Generic Methods

Posted on Updated on

Suppose you work with some off-shore programmers. Also suppose that those off-shore programmers write your BLL (Business Logic Layer) Code. Further, suppose those programmer constantly change your results without telling you. Finally, suppose they give you results in DataTable format, which is completely flexible and not strongly typed, thus making it hard to detect unannounced changes.

Given those assumptions, you’ve got a situation where they break your code all the time and  leave you to clean-up the mess. And that is the situation I will address in this post.

Not to slam off-shore programmers per se, but perhaps something about working in a different time zone, in a different culture, makes it more likely they will not understand your needs, even if those needs seem blindingly obvious to you! So, my attitude is, just do the best you can under the circumstances and hope the blame falls on the right shoulders. Defensive coding will help. Even if your teammates work in the cube next to you, that type of coding can help.

What You Will Learn

  1. How to use Linq (actually Lambda expressions) on DataTables to check for missing columns
  2. How to use Generic Methods to simplify your logic for extracting data out of a DataTable while checking for misnamed columns

Step One: Check if the Table Returned by the BLL Team has the Columns You Need

If the result set lacks columns we need, we need to log that problem. Also, we should try to work with the data we get and avoid completely crashing. Here is my short-and-sweet method to log missing columns and check how many desired data columns are present in the data we get from the BLL. Basically, I use the ‘Except’ method to compare one list (the expected columns list) against another list (the actual column list in the data table):

public int ColumnMatchCount(DataTable dtReturnTable,
                            List<string> expectedColumnNames,
                            string callingMethod)
{
    //Use the lambda expression 'Except' to determine which columns
    //are not present in the data table provided by BLL
    var unmatched = expectedColumnNames
        .Except(dtReturnTable.Columns.Cast<DataColumn>()
        .Select(c => c.ColumnName));
    foreach (string missingCol in unmatched) {
        //Log the problem column
        _Logger.Log(LogLevel.Error, callingMethod +
                    " - expected BLL to provide column '"
                    + missingCol
                    + "' in the ReturnTable, but it was not present.");
    }
    //Tell the caller how many columns match
    return expectedColumnNames.Count - unmatched.Count();
}

Key point: The data table has a columns collection we can compare using the Except method. Since DataTables are not quite the type of data that works with Linq, we use ‘Cast<DataColumn>() to make ti work. Because DataTables existed before Linq.

In my case, I will give my clients data if at all possible, i.e. if the BLL crew has given me at least one column that matches what they told me they would provide. I’ll provide some sample code shortly, but first, let’s look at my method to get the data out of the data table when the column I expect might not be there.

Step Two: Get Data from the DataTable  When a Column is Missing

If the BLL gave me the column I expected, no big deal (of course, I also want to handle possible null values). But, if the column I want is not even present, a default value should be used instead. The method below does that, and it works for any data type, whether string, int, DateTime, etc. Because it is written as a Generic Method, it can handle any data type you provide.

public T CopyValueFromTable<T>(DataRow dr, string columnName)
{
    T result = default(T);
    if (dr.Table.Columns.Contains(columnName) && ! dr.IsNull(columnName))
        result = (T)dr[columnName];
    return result;
}

The method takes a data row as input and examines that DataRow’s parent table.

    dr.Table

That table has a Columns collection, and we can check it to see if the column exists or not.

    if (dr.Table.Columns.Contains(columnName)

If the BLL crew remembered to give us the column we need, and it it is actually present and not null, then extract it.

dr[columnName]

If you understand Generic Methods, you will recognize that ‘T’ is the data type our caller wants us to use, so we convert the data column to type ‘T’. We cast it to type T using the parenthesis notation; remember that
‘T’ is a placeholder for any type, such as int, string, DateTime, etc.

    result = (T)dr[columnName];

The first line in my method ensures that an appropriate default value will be returned if no other choice is available, the ‘default’ method will give us an empty string, a zero, etc., depending on what type is represented by ‘T’:

    T result = default(T);

Putting it All Together

Screen shot shows my code handling unexpected missing column 'ServiceCenterID' and using a default value instead. The discrepancy was logged.
Screen shot shows my code handling unexpected missing column ‘ServiceCenterID’ and using a default value instead. The discrepancy was logged.
private void btnSimulateUnplanedChange_Click(object sender, RoutedEventArgs e) {
    BLL dataFetcher = new BLL();
    //Ask our simulated BLL to get some data, which will lack the column 'ServiceCenterID'
    DataTable ReturnTable = dataFetcher.UnannouncedChange_GetServiceCenters();
    List<string> expectedColumns = new List<string> { "Location", 
                                "LocationDescription", "ServiceCenter", "SCDescription" };
    List<InventoryResponse> response = new List<InventoryResponse>();


    //If at least one column name matckes, 
    if (ColumnMatchCount(ReturnTable, expectedColumns, "GetResults") > 1) {
        foreach (DataRow dr in ReturnTable.Rows) {
            InventoryResponse invResp = new InventoryResponse();
            invResp.ServiceCenterID = CopyValueFromTable<int>(dr, "ServiceCenterID");
            invResp.Location = CopyValueFromTable<string>(dr, "Location");
            invResp.SCDescription = CopyValueFromTable<string>(dr, "SCDescription");
            response.Add(invResp);
        }
        txtResults.Text = RjbUtilities.DebuggerUtil.ComplexObjectToString(response);
    } else {
        _Logger.Log(LogLevel.Error, "No column names match the expected values");
    }
}

Summary

Sometimes you can’t work with top-quality programmers and possibly they don’t understand the concept of their teammates depending on their code. If you can’t force them to grow-up and communicate, at least you can make sure your code logs the problem and does the best it can with what they give you.

I showed you one method ‘ColumnMatchCount’ that checks how many expected columns are present the results the BLL crew gives you. I showed you another method ‘CopyValueFromTable’ that will provide default values and avoid crashing if the BLL crew renamed a column without telling you.

If you write similar defensive code, you might be lucky and your managers will blame the appropriate people!

Download my sample code here.

Efficiently Protect Your Company from SQL Injection

Posted on Updated on

Most places I work don’t want to be bothered protecting themselves from hackers, an attitude I find hard to believe given that, almost every day, another company looses mega-bucks after failing to protect themselves.

What you will learn: How to protect yourself against SQL Injection with a few lines of really cool code that runs six times faster than a naive implementation.

Hacker

Why Don’t Most Folks Protect Themselves?

Most programmers think that hacker attacks are rare, and defending is work, so they’ll hope they get lucky. Sometimes I think they’re just burying their heads in the sand. The problem is, they aren’t thinking about how devastating one lucky hacker attack can be. A hacker attack can easily put a company out of business; just think what would happen if all your credit card info was stolen, or your database backups were destroyed!

Firewall
Firewalls are imperfect, just like every other human invention! Last week, Company X was attacked and their firewall did not save them!

The other mistake they make is assuming they are protected by technology X; usually they put all their trust in their firewall or https. Sorry, hackers have defeated those technologies plenty of times. Furthermore, most attacks from from disgruntled former employees or business associates. They already know your passwords, so they use one weak app to get through the fire wall, then exploit their loop hole to attack other apps. Bottom line: the best security is to never assume any technology is foolproof, always use defense-in-depth.

I’ll guarantee that the latest company who got hacked had that attitude; i.e. the guys who screwed-up assumed that they didn’t need to use precautions because technology X was enough.I hope they like sleeping in their cars! I know you don’t like sleeping in your car, so spend some time to prevent a catastrophe.

My Defense Task: Prevent SQL Injection

SQL Injection is when an attacker submits SQL in a string that you use to run as a query. Generally you are vulnerable when you build an ad hoc query and run it against your database. For example, if you append the contents of a textbox to your query, the attacker can turn part of your intended query into a comment and run his query after a truncated version of your query runs. Note: any string from the outside world is a potential threat, even XML from your corporate partners. They could have disgruntled employees, or they could be hacked, or someone could spoof their transmission.

Normally, you can protect yourself by using parameterized queries. My problem: our ERP controls access to our database, and I can’t force it to use parameterized queries. Solution: scrub all input to sanitize SQL key words. For example, if the user input contains the words ‘drop table foo’, you can inject a space, or other character, into either ‘drop’ or ‘table’. In other words, substitute ‘dr op table’ whenever you find ‘drop table’. That way, legitimate  text is still readable, but your database won’t run the query.

Screen shot showing SQL scrubbing
The top text has potentially dangerous SQL commands in it; the bottom shows the results of scrubbing SQL key words. Notice how the dangerous phrase ‘delete from foo’ has been replaced with ‘d elete from foo’. A human can still read the text, but your database will not execute this command.

Scrub Efficiently with Regular Expressions!

Your naive coworker will conduct six different search-and-replace operations. I assert that this is six times too slow. Instead, be smart, use regular expressions: then you can only make one pass through your text!

Naive Approach:

public string SlowScrub(string cleanMe) {
  string result = cleanMe.Replace("Drop Table", "D rop Table")
                         .Replace("Select ", "S elect ")
                         .Replace("Update ", "U pdate ")
                         .Replace("Delete ", "D elete ")
                         .Replace("Insert ", "I nsert")
                         .Replace("Truncate Table ", "T runcate Table ");
  return result;
}

There are three problems with the code above:

  1. It makes 6 passes through your string, that slows-down your app
  2. It is case sensitive (and the Replace method won’t allow you to change that problem)
  3. It won’t catch Truncate Table or Drop Table when split across two lines

Regular Expressions to the Rescue

Here’s how I used a regular expression to make 1 pass through my string. The regular expression is a pattern to replace in the string. I also used a lambda (anonymous function) that .NET executes on every match it finds – how cool is that!

public string ScrubSql(string cleanMe) {
  string pattern = @"(          #Start our capture group
            (?:drop\stable)     #Non-capturing group 1; the word 'drop', space, then 'table'
            |                   #or
            (?:select\s)        #Non-capturing group 2: select' followed by whitespace
            |                   #Or
            (?:update\s)        #Non-capturing group 3; update' followed by whitespace
            |                   #or
            (?:delete\s)        #Non-capturing group 4; 'delete' followed by whitespace
            |                   #or
            (?:insert\s)        #Non-capturing group; the word 'insert' followed by whitespace
            |
            (?:truncate\stable) #'truncate table'
          )";

  string result = Regex.Replace(cleanMe, pattern, (m) => {
                    //This anonymous function is run on every match
                    string danger = m.Groups[1].Value.ToLower();
                    return danger.StartsWith("dr") ? "d rop t able "
                         : danger.StartsWith("s") ? "s elect "
                         : danger.StartsWith("u") ? "u pdate "
                         : danger.StartsWith("de") ? "d elete "
                         : "i nsert";
		 },
		RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace 
		| RegexOptions.Multiline | RegexOptions.Singleline);
  
  return result;
}

Notes:

  • I include comments in my regular expression; that helps make it easier to maintain. # is the comment char
  • This only works if I also specify ‘IgnorePatternWhitespace’ and ‘MultiLine’
  • My Search Pattern consists of six different choices, the same choices I showed you in my naive implementation.
  • The pipe character | means ‘or’ when used in a regular expression; it separates the for choices
  • Note that I supply a function (a ‘MatchEvaluator’) that .NET runs for every match it finds
  • My MatchEvaluator serves to identify which dangerous text it identified in the input string
  • And then it returns an appropriate sanitized string
  • Note that I also specified ‘IgnoreCase’
  • Also: the RegexOption ‘SingleLine’ is a bit confusing. It basically means “match your text as if it were a single line”, with the effect that, when your pattern seeks 2 words, it will find them even if on separate lines.

Bottom Line

You can easily plug-in my little function to your code. Every string you get from the outside world, use my funciton to scrub that string. The job you save may be your own!

Download the Code!

Download sample

NPR Dancing Puzzle, December 1st, 2013

Posted on Updated on

Challenge: Name a dance. Change one of the letters to a U. The resulting letters can be rearranged to name an event at which this dance is done. What is it?

Although similar to the last several NPR puzzles, this one was a bit tough! Not for coding reasons, but because I couldn’t find a list of social events. I had to go to online thesauri repeatedly until I found the correct event name.

 

Hula (dance) --> Luau
Screen shot shows the solution to the puzzle.

The heart of the solution is to

  1. Build a dictionary of events, again using the sorted event name as the key and the original event name as the value
  2. Iterate the dance list
  3. For each dance name, form a candidate by
    1. Iterating the letters
    2. Replacing each letter with u
    3. Sorting the resulting string to form a candidate
    4. And checking whether that candidate is a key in our event name dictionary

If it seems like I am going a bit fast, please take a look a this previous post, where I used the same anagram dictionary technique to find other anagrams .

I created two extension methods, ‘Left‘ and ‘Right‘, to make my sample a bit easier to read:

  • MyString.Left(i) will return all the characters left of position i
  • MyString.Right(i) will return all the character to the right of position i

For example,

  • “hula”.Left(1) returns “h”.
  • “hula”.Right(1) returns “la”

I’ll show the (very simple) code for these two extension methods shortly, in the mean time, here’s the heart of the solution. This code snippet assumes

  1. That you have a file containing dance names named “DANCE_FILE”
  2. Which you have cleaned-up to exclude non-alpha characters like blanks and hyphens
  3. And that you have an event name dictionary “eventDic” similar to what I described above
//Open the dance file and process each dance name (one per line)
using (StreamReader sr = File.OpenText(DANCE_FILE)) {
    while (sr.Peek() != -1) {
        string aDance = sr.ReadLine().Trim().ToLower();

        //Iterate the dance name chracters, injecting u at every 
        //position, then sorting the result, to form a candidate:
        for (int i = 0; i < danceName.Length -1; i++) {
            //Combine the left side + u + right side and sort the result:
            string candidate = (danceName.Left(i) + "u" + danceName.Right(i))
                                .Select(c => c)
                                .OrderBy(c => c)
                                .Aggregate("", (t, c) => t + c);

            //If the candidate's sorted letters match the sorted letters of
            //an event name, that means they are anagrams!
            if (eventDic.ContainsKey(candidate)) {
                txtAnswer.Text += string.Format("Dance name: '{0}'; " 
                                    + "Event name: '{1}'\n", 
                                    aDance, eventDic[candidate]);
            }
        }
    }
}

If you’re gagging on the funky syntax (“.Select”, “.Aggregate”, etc.),  be advised that I skimmed over this because I explained it in a previous post and didn’t want to bore repeat visitors. These are Lambda Expressions, and I explain their usage in depth in this post, except that there I used Lambda Expressions instead of Linq syntax.

I promised to show my extension class, so here it is:

public static class StringExtensions {
    public static string Left(this string str, int rightPos) {
        if (rightPos <= 0)
            return "";
        else
            return str.Substring(0, rightPos);
    }

    public static string Right(this string str, int rightPos) {
        if (rightPos < str.Length - 1)
            return str.Substring(rightPos + 1);
        else
            return "";

    }
}

There is nothing in here too difficult; the point is that .NET allows you to write your own extension methods which behave as if they were part of the .NET string class. A simple class like this can help make your code more readable. Note the class is static, and also note the use of ‘this string str‘ as a parameter. That’s all it takes to make an extension method!

I used one more interesting bit of coding while getting a list of dance names. If you’ve read my previous puzzle articles, you know that I use regular expressions to get lists from web pages, frequently from Wikipedia. In this case, I got my list of dances from here. If you go to that link, you will see that the main list is preceded by a bunch of other links, which happen to match the same pattern as the dance names! In order to deal with this, I modified my regular expression to exclude any hits whose ‘payload’ starts with ‘List’. To do so, I used the Regex feature ‘Negative Lookahead‘, which was fairly new to me.

Here’s my documented regex to capture dance names from that page (negative lookahead highlighted):


<li><a                  #Literal match
\s+                     #Whitespace, one or more
href=\"/wiki/           #Literal match
[^>]+                   #Anything except >, one or more
>                       #Literal match
(                       #Start of our group
(?!List)                #Exclude matches starting with List
[^<]+                   #Anything except <
)                       #end of group
</a>                    #Literal match

In case you are totally baffled,

  • The regex will take raw HTML as input
  • That is, the HTML from Wikipedia
  • Which looks like this:
    <li><a href="/wiki/Hula" title="Hula">Hula</a></li>
  • This regular expression contains a capture group (starts on line 6)
  • Basically, it looks for
    • “<li><a” (line 1)
    • followed by space (line 2)
    • followed by href=\”/wiki/ (line 3)
    • followed by anything except a ‘>’  (this is a “negated character class“) (line 4)
    • Followed by > (line 5)
    • The parenthesis on line 6 starts a capture group
    • The negative lookahead (line 7) rejects any text starting with ‘article’
    • On line 8 we specify the text we will capture, namely: anything except <
    • Line 9 we close the capture group with a close-parenthesis
    • And specify that it must be followed by </a> (line 10)

I have listed the entire code below (138 lines, it took way longer for me to explain it here!)


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;
using System.Text.RegularExpressions;
using System.Net;                           //For WebClient
using System.IO;

namespace Dec_1_2013 {
    /// <summary>
    /// Solves the puzzle by responding to each button click form the main page
    /// </summary>
    /// <remarks>
    /// Has code to fetch dance lists from two web pages and 
    /// then use those dance lists to /// solve the puzzle
    /// </remarks>
    public partial class MainWindow : Window {
        private const string DANCE_FILE = "DanceList.txt";
        public MainWindow() {
            InitializeComponent();
        }

        /// <summary>
        /// Fires when user clicks the button to get the dance list from the web
        /// </summary>
        /// <param name="sender">The button</param>
        /// <param name="e">Event args</param>
        /// <remarks>
        /// Goes the the URL in the textbox, gets the page content, extracts matches 
        /// with the regex, saves to file
        /// </remarks>
        private void btnGetDances_Click(object sender, RoutedEventArgs e) {
            string pattern = @"
                            <li><a			    #Literal match
                            \s+				    #Whitespace, one or more
                            href=\""/wiki/	    #Literal match
                            [^>]+			    #Anything except >, one or more
                            >				    #Literal match
                            (				    #Start of our group
                            (?!List)		    #Exclude matches starting with List
                            [^<]+			    #Anything except <
                            )				    #end of group
                            </a>			    #Literal match
                            ";
            Regex reDance = new Regex(pattern
                    , RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);

            //Go to the web page, get the text, extract 
            //all the matches using the regex, save to file
            using (WebClient wc = new WebClient()) {
                using (StreamReader sr = new StreamReader(wc.OpenRead(txtDanceURL.Text))) {
                    string allText = sr.ReadToEnd();

                    //There exist a number of false matches after the last dance on 
                    //the page (marked by the text 'see also'), find the position now
                    int stopPos = allText.IndexOf("See also");
                    using (StreamWriter sw = File.CreateText(DANCE_FILE)) {
                        Match m = reDance.Match(allText);

                        //keep looping so long as we have more matches and have 
                        //not exceeded the last dance position
                        while (m.Success && m.Index < stopPos) {
                            sw.WriteLine(m.Groups[1].Value);
                            m = m.NextMatch();
                        }
                    }
                }                
            }
            MessageBox.Show("Done", "Dances Captured");
        }

        /// <summary>
        /// Fired when user clicks the button to solve the puzzle
        /// </summary>
        /// <param name="sender">The button</param>
        /// <param name="e">Event args</param>
        /// <remarks>
        /// Load the dance list from file into a dictionary
        ///     - Each dictionary key is the sorted dance name
        ///     - Each dictionary value is the UNsorted dance name, the original value
        /// 
        /// Loop through every social event in the text box
        ///      - Try replacing each letter in the event name with 'u'
        ///      - Sort the result and check for a match in the dictionary
        ///      - If we have a hit, display in the textbox
        /// </remarks>
        private void btnSolve_Click(object sender, RoutedEventArgs e) {
            txtAnswer.Clear();

            //Snag the events from the textbox, put them into an array after splitting on comma char
            string[] events = txtEventNames.Text.ToLower().Split(',');
            Dictionary<string, string> eventDic = new Dictionary<string, string>();
            foreach (string evt in events) {
                //Remove non-alpha letters, sort, recombine into a new string
                string sortedEvent = evt.Where(ev => ev>= 'a' && ev <= 'z')
                                     .OrderBy(ev => ev)
                                     .Aggregate("", (t, c) => t + c);

                //store in the dictionary
                if (eventDic.ContainsKey(sortedEvent))
                    eventDic[sortedEvent] += ", " + evt;
                else
                    eventDic.Add(sortedEvent, evt.Trim());
            }

            //Now open the dance file and process each dance name (one per line)
            using (StreamReader sr = File.OpenText(DANCE_FILE)) {
                while (sr.Peek() != -1) {
                    string aDance = sr.ReadLine().Trim().ToLower();
                    //Remove non-alpha characters
                    string danceName = (aDance.Where(d => d >= 'a' && d <= 'z'))
                                                .Aggregate("", (t, c) => t + c);

                    //Iterate the dance name chracters, injecting u at every 
                    //position, to form a candidate
                    for (int i = 0; i < danceName.Length -1; i++) {
                        //Combine the left side + u + right side and sort the result:
                        string candidate = (danceName.Left(i) + "u" + danceName.Right(i))
                                            .Select(c => c)
                                            .OrderBy(c => c)
                                            .Aggregate("", (t, c) => t + c);

                        //If the candidate's sorted letters match the sorted letters of
                        //an event name, that means they are anagrams!
                        if (eventDic.ContainsKey(candidate)) {
                            txtAnswer.Text += string.Format("Dance name: '{0}'; "
                                                + "Event name: '{1}'\n", 
                                                aDance, eventDic[candidate]);
                        }
                    }
                }
            }
            MessageBox.Show("Done", "Matching Complete", MessageBoxButton.OK, 
                         MessageBoxImage.Exclamation);
        }

    }
}

And here is the XAML:


<Window x:Class="Dec_1_2013.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        WindowStartupLocation="CenterScreen"
        Title="Dec 1st, 2013" Height="460" Width="555">
    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="auto" />
            <RowDefinition Height="auto" />
            <RowDefinition Height="auto" />
            <RowDefinition />
            <RowDefinition Height="auto" />
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="auto" />
            <ColumnDefinition/>
            <ColumnDefinition  Width="auto" />
        </Grid.ColumnDefinitions>

        <Border Grid.ColumnSpan="3">
            <Border.Background>
                <LinearGradientBrush EndPoint="0.5,1" StartPoint="0.5,0">
                    <GradientStop Color="DarkRed" Offset="0" />
                    <GradientStop Color="Crimson" Offset="1" />
                </LinearGradientBrush>
            </Border.Background>
            <TextBlock HorizontalAlignment="Center" FontSize="36" 
                       Foreground="AntiqueWhite" 
                       Text="Dance Name to Event Name">
                <TextBlock.Effect>
                    <DropShadowEffect BlurRadius="4" Color="LightSalmon" 
                       Opacity="0.5" />
                </TextBlock.Effect>
            </TextBlock>
        </Border>

        <TextBlock Grid.Row="1" Grid.ColumnSpan="3" TextWrapping="Wrap" 
                   FontSize="14">
            <Bold>Next week's challenge: </Bold>
            Name a dance. Change one of the letters to a U. The 
            resulting letters can be rearranged to name an event at 
            which this dance is done. What is it?
        </TextBlock>

        <Label Grid.Row="2" Content="Dance _URL:" 
               Target="{Binding ElementName=txtDanceURL}" />
        <TextBox Grid.Row="2" Grid.Column="1" x:Name="txtDanceURL" 
                 Text="http://en.wikipedia.org/wiki/List_of_dances" />
        <Button Grid.Row="2" Grid.Column="2" x:Name="btnGetDances" 
                Content="Get _Dance List" Height="30"
                HorizontalAlignment="Right" Margin="3" Click="btnGetDances_Click" />

        <Label Grid.Row="3" Content="_Events:" 
               Target="{Binding ElementName=txtEventNames}" />
        <TextBox Grid.Row="3" Grid.Column="1" x:Name="txtEventNames" 
                 AcceptsReturn="True" TextWrapping="WrapWithOverflow"
                 VerticalAlignment="Stretch">
                social function, fundraiser, function, debutante ball, 
                photo opportunity, jubilation, funeral, groundbreaking, 
                graduation, sacramental union, communion, masquerade, ritual, 
                inauguration, presidential inauguration, papal inauguration,
                audience, changing of the guard, courtesy call, 
                investiture, yule, house party, circumstance,  nuptials, 
                induction, Maundy, bunfight, musical, burlesque, puppet play, 
                tournament, occurrence, masquerade ball, masque, masquerade party, 
                nauch, nautch, nautch dance, duet, pas de deux, pas de quatre, 
                ritual dance, ritual dancing, nude dancing, beaux arts ball,
                buffet, housewarming party, housewarming, costume party, 
                hurricane party, June Event, Open house, houseparty, 
                Symposium, Musical Occasion, Cultural festivals, Burning Man,
                Hula Festival, Saturnalia, Fat Tuesday, blowout, Ludi Saeculares,
                secular games, perambulation, church festival, luau, slumber party,
                reunion, banquet, musical soiree, soiree musicale
        </TextBox>
        <Button Grid.Row="4" Grid.Column="2" Content="_Solve" Height="30" 
                Margin="3" HorizontalAlignment="Right"
                VerticalAlignment="Top" Click="btnSolve_Click" x:Name="btnSolve"
                Width="{Binding ElementName=btnGetDances, Path=ActualWidth}" />

        <Label Grid.Row="4" Content="_Answer:" 
               Target="{Binding ElementName=txtAnswer}" />
        <TextBox Grid.Row="4" Grid.Column="1" x:Name="txtAnswer" />
    </Grid>
</Window>

NPR Puzzle Challenge – Solved in Code

Posted on Updated on

I enjoy listening to the NPR Sunday Puzzle (shouting-out the answers as I listen!), and it is even more fun to solve in code. Here’s my solution to the latest challenge, written in C# as a WPF application. You may enjoy it if you like code, want to learn new coding techniques, or just like puzzles

Laying-out the Puzzle Challenge

Next week’s challenge: This week’s challenge comes from listener Steve Baggish of Arlington, Mass. Think of a word meaning “quarrel” in which several of the letters appear more than once. Remove exactly two occurrences of every repeated letter, and the remaining letters can be rearranged to spell a new word meaning “quarrel.” What are the two words?

Link to the NPR challenge

General Approach

This puzzle is basically an anagram combined with string manipulation. The input is pretty easy, you can get synonyms for ‘quarrel’ from Google and a couple other places.

Checking for Anagrams in Code

There is an easy way to solve anagrams and a hard way. The hard way is probably the first way that comes to mind for most of us, namely

  1. Start with word 1, generate all possible combinations of the letters in the word
  2. Do the same thing for word 2
  3. Compare every combination from step 1 against step 2

Not only is that hard, but it is computationally expensive (generating word combinations is an O(n-squared) operation, not that we care for such a short-list).

The easy way is:

  1. Sort the letters in word 1
  2. Sort the letters in word 2
  3. Compare the result, if they are the same, word1 is an anagram of word2!

Solving Anagrams for Groups of Words

For this puzzle we need to determine if a any two of a number of words are anagrams. We can make it easy by modifying the approach above, instead of sorting each word just before comparing it to another word,

  1. Sort each word once
  2. Store the result in a dictionary
  3. Using the sorted letters as the entry’s key
  4. And the original word as the entry’s value 

Then, if I have an anagram candidate, I can look-it up in the dictionary, if the key is present, I know I have two matched-anagrams and can inspect the value to see what the other word is.

Sample Code Illustrates An Anagram Dictionary

//Declare a dictionary object that uses strings for keys and strings for values:
Dictionary<string, string> anagramDic = new Dictionary<string, string>();
 //add an entry into the dictionary using the sorted letters (act) as the key and the original word as the value
 anagramDic.Add("act", "cat");

 //we have another word that might be an anagram, 'tac'
 //we can check whether the word exists by sorting its letters to get 'act' and then checking whether it
 //already exists in the dictionary
 if (anagramDic.ContainsKey("act")) {
     //We can fetch the value using the key as an indexer
     string anagramPair = anagramDic["act"];
     Console.WriteLine(anagramPair + " is an anagram of tac");
 }

Side Note About Dictionaries

I love 🙂 dictionaries because they use hashing technology. That makes them ultra-efficient, and as you can see above, they are also quite easy to use.

One More Coding Technique

The code below uses a bit of Linq usage that is explained in the comments, but I will discuss it here for emphasis. The lines of code take a string variable named ‘aWord’ and sort the letters, storing the result into a new string called ‘sorted’:

string sorted = (from l in aWord
                 where l >= 'a' && l <= 'z'
                 orderby l
                 select l
                ).Aggregate("", (r, b) => r + b);
  • ‘aWord’ is a string such as ‘argument’, we select chars from it when we use the syntax “from l in aWord’
  • ‘l’ is an anonymous variable that has type char (.NET knows this so we do not need to explicitly declare it)
  • ‘orderby l’ has the effect of sorting the letters in aWord
  • Into an IEnumerable<char>, basically as list of chars
  • Since we don’t want a list, we use the ‘Aggregate‘ operator to concatenate the results into a string (because the + operator creates strings from chars)
    • The aggregate method used above takes two inputs:
      • the seed (in my casen an empty string “”)
      • And an anonymous function to perform the aggregation, namely “(r, b) => r + b)
        • The anonymous function takes two anonymous inputs, the current aggregated value (represented by r)
        • and the current value being included into the aggregate (represented by b)
        • The operation to be performed  is r + b, in other words, take the current aggregate value r and append the next value, b

Code to Solve the Puzzle

UI for my code to solve Will's puzzle
UI for my code to solve Will’s puzzle

I solved the puzzle in about 50 lines of code, then I added another 55 lines of documentation to it just to explain what is going on.

The full code is listed below, it uses a simple form with

  1. A TextBox ‘txtSynonyms’ containing synonyms to the word ‘quarrel’
  2. A button labeled ‘Solve’ whose click-event does all the work
  3. Another Textbox ‘txtAnswer’ to display the result it
using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows;
using System.Windows.Controls;

namespace Nov17_2013 {
    /// <summary>Solves the puzzle using the synonyms in the textbox, 
	/// displaying the answer in the second textbox</summary>
    public partial class MainWindow : Window {
        public MainWindow() {
            InitializeComponent();
        }

        /// <summary>
        /// Fires when user clicks the button to solve the puzzle
        /// </summary>
        /// <param name="sender">The button</param>
        /// <param name="e">Event args</param>
        /// <remarks>
        /// This solution relies on a simple technique to determine if two words are anagrams:
        ///     - Sort the letters of both words
        ///     - If their results are the same, they obviously are anagrams
        ///  
        /// Actually, I take it a bit further by sorting all the candidate words and storing them
        /// in a dictionary. The dictionary allows us to store a 'payload' (value) for each key
        /// we create. I use the sorted letters as the key and the original word as 
		/// the value, like this:
        /// 
        ///     Key                 Value
        ///     ---                 -----
        ///     aegmnrtu            argument
        ///     adeeegimnrst        disagreement
        ///     fghit               fight
        ///     achls               clash
        ///     defu                feud
        ///     addegiimnnnrsstu    misunderstanding
        /// 
        /// After constructing the dictionary I identify repeated letters in all the candidates,
        /// remove two of each, and check to see if the resulting word exists in the dictionary.
        /// If it does, the dictionary value is the anagram we seek!
        /// </remarks>
        private void btnSolve_Click(object sender, RoutedEventArgs e) {
            txtAnswer.Clear();
            Dictionary<string, string> sortedLettersDic = new Dictionary<string, string>();

            //The textbox contains words separated by commas, when I split them 
	    //I get an array with one entry per word
            string[] synonyms = txtSynonyms.Text.Split(',');

            //Process each synonym and put it into the dictionary.
            foreach (string aWord in synonyms) {
                //The "where-clause" cleans-up the candidate, removing blanks and hyphens
                //"orderby" sorts the letters
                //"Aggregate" concatenates the sorted chars into a string
                string sorted = (from l in aWord
                                 where l >= 'a' && l <= 'z'
                                 orderby l
                                 select l
                                ).Aggregate("", (r, b) => r + b);

                if (!sortedLettersDic.ContainsKey(sorted))
                    sortedLettersDic.Add(sorted, aWord);
            }

            //Now examine every dictionary entry, first building a list of repeated letters,
            //then using it to construct a potential anagram by removing two 
	    //of each repeated letter from the original
            foreach (KeyValuePair<string, string> kvp in sortedLettersDic) {
                List<char> repeatedLetters = new List<char>();
                for (int ndx = 1; ndx < kvp.Key.Length; ndx++) {
                    //examine the letter immediately prior to current letter
                    if (kvp.Key[ndx-1] == kvp.Key[ndx] 
                        && !repeatedLetters.Contains(kvp.Key[ndx])) {
                               repeatedLetters.Add(kvp.Key[ndx]);
                    }
                }

                if (repeatedLetters.Count > 0) {
                    //It is easier to remove letters from a List than a string, 
					//to put the letters into one:
                    //Also it is more efficient too, since strings are imutable in .NET
                    List<char> ltrs = new List<char>(kvp.Key);
                    foreach (char c in repeatedLetters) {
                        //remove exactly two of each repeated letter
                        ltrs.Remove(c);
                        ltrs.Remove(c);
                    }
                    //construct a new string by aggregating the list; use empth string for seed,
                    //and the inputs to the lambda expression are the running total (r) 
		    //and the current value (c):
                    string candidate = ltrs.Aggregate("", (r, c) => r + c);

                    //for example, if our candidate originally was 'misunderstanding', we now
                    //will have manipulated it as follows:
                    //step 1, sort:                     addegiimnnnrsstu
                    //step2, remove 2 repeated letters: aegmnrtu
                    //next step -                       is 'aegmnrtu' a key in the dictionary?

                    //Finally! If our candidate is in the dictionary, it means we have 
                    //constructed anagram of on of the other synonyms and solved the puzzle
                    if (sortedLettersDic.ContainsKey(candidate)) {
                        txtAnswer.Text += sortedLettersDic[candidate] + " - " + kvp.Value + " ";
                    }
                }
            }
        }
    }
}


In case you’re curious, I have also saved the XAML markup below so you can run the app. You don’t need any new references; I built it with VS 2010 but it should work in VS 2008, 2012 and 2013.

<Window x:Class="Nov17_2013.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        WindowStartupLocation="CenterScreen"
        FocusManager.FocusedElement="{Binding ElementName=btnSolve}"
        Title="Puzzle Nov 17, 2013" Height="430" Width="525">
    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="auto" />
            <RowDefinition Height="auto" />
            <RowDefinition Height="auto" />
            <RowDefinition />
            <RowDefinition Height="auto" />
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="auto" />
            <ColumnDefinition />
            <ColumnDefinition Width="auto" />
        </Grid.ColumnDefinitions>
        <Border Grid.ColumnSpan="3" >
            <Border.Background>
                <LinearGradientBrush EndPoint="0.5,1" StartPoint="0.5,0">
                    <GradientStop Color="DarkRed" Offset="0" />
                    <GradientStop Color="Crimson" Offset="1" />
                </LinearGradientBrush>
            </Border.Background>
            <TextBlock HorizontalAlignment="Center" FontSize="36" Text="Quarrel Manipulation Puzzle" Foreground="Ivory">
                <TextBlock.Effect>
                    <DropShadowEffect Color="MistyRose" BlurRadius="5" Opacity="0.35" />
                </TextBlock.Effect>
            </TextBlock>
        </Border>

        <!-- Row 1 -->
        <TextBlock Grid.Row="1" Grid.ColumnSpan="3" FontSize="14" TextWrapping="Wrap" Margin="3" >
             This week's challenge comes from listener Steve Baggish of Arlington, Mass. 
            Think of a word meaning "quarrel" in which several of the letters appear more than once. 
            Remove exactly two occurrences of every repeated letter, and the remaining letters can 
            be rearranged to spell a new word meaning "quarrel."
            <LineBreak />
            <LineBreak />
            What are the two words?
        </TextBlock>

        <!-- Row 2 -->
        <Label Grid.Row="2" Content="Puzzle _URL:" VerticalAlignment="Center" FontWeight="SemiBold" 
               Target="{Binding ElementName=tblPuzleUrl}" />
        <TextBlock Grid.Row="2" Grid.Column="1" Grid.ColumnSpan="2" x:Name="tblPuzleUrl" VerticalAlignment="Center" 
                   Text="http://www.npr.org/2013/11/17/245761408/more-fun-than-a-dead-rose" />

        <!-- Row 3 -->
        <Label Grid.Row="3" Content="Quarrel Synonyms:" FontWeight="SemiBold" Target="{Binding ElementName=txtSynonyms}" />
        <TextBox Grid.Row="3" Grid.Column="1" x:Name="txtSynonyms" TextWrapping="Wrap" >
                argument, disagreement, fight, clash, feud, contretemps, war of words, shouting match, informaltiff, 
            blowup, altercation, bickering, brawl, controversy, difference, difference of opinion, discord, dispute, 
            dissension, disturbance, shitstorm, falling-out, fracas, misunderstanding, row, ruckus, run-in, spat, 
            squabble, strife, struggle, tiff, tumult, vendetta, wrangle, affray, beef, breach, broil, combat, 
            commotion, complaint, contention, difficulty, disapproval, disputation, dissidence, dust, fisticuffs, 
            fray, fuss, hassle, objection, rhubarb, scrap, set-to, battle royal, brannigan
        </TextBox>
        <Button Grid.Row="3" Grid.Column="2" Content="_Solve" x:Name="btnSolve" VerticalAlignment="Top" 
                Height="30" Margin="3" Click="btnSolve_Click" />

        <!-- Row 4 -->
        <Label Grid.Row="4" Content="_Answer:" FontWeight="SemiBold" Target="{Binding ElementName=txtAnswer}" />
        <TextBox Grid.Row="4" Grid.Column="1" x:Name="txtAnswer" />
    </Grid>
</Window>