String Animation

Sunday Puzzle: Find a Word With a ‘T’ Pronounced as ‘ZH’

Posted on

A tough one from listener Joe Becker, of Palo Alto, Calif. The “zh” sound can be spelled in many different ways in English — like the “s” in MEASURE; like the “g” in BEIGE; like the “z” in AZURE; like the “j” in MAHARAJAH; and like the “x” in LUXURY as some people pronounce it. The “zh” sound can also be spelled as a “t” in one instance. We know of only one common word this is true of, not counting its derivatives. What word is it?

Link to the challenge

Synopsis

Once again, a sweet easy one 😁. Easy because I happen to have the magic key to solving it: a phoneme dictionary! Carnegie Mellon University has released the CMU Pronunciation Dictionary. Because I have that dictionary, I solved it with just 49 lines of code. And most of those lines were infrastructure.

Definition

phoneme fō′nēm″ noun

  1. The smallest phonetic unit in a language that is capable of conveying a distinction in meaning, as the m of mat and the b of bat in English.

Their Dictionary is a File that Looks Like This

The image above represents a snapshot of the dictionary file showing how each word has its own list of phonemes.

To solve the puzzle, we simply read this file and seek a word having a ‘ZH’ phoneme in the same position as a ‘T’ in the original word. Sounds simple? It basically is.

But there was a tricky part that I could only avoid because I got lucky. I’ll discuss it below!

While examining my code below, see if you can spot the assumption my code makes before I explain it.

I elected to use WPF (Windows Presentation Framework, a type of .NET solution) as the environment to solve the puzzle. Rationale: I need to process a file, so web solutions are out. WPF looks nice and .NET has a powerful set of string manipulation functions that I already know.

Screen shot showing my solution

Techniques Used

Video Explaining the Code

Sometimes it’s more fun to watch a video than to read through code, so this time I made a video explaining it, here’s a link! It’s just 13 minutes long, and by watching, you may learn a couple tricks you can use in the debugger.

Here’s the Code!

Since the algorithm is so simple, I will just list my code. Note that numbered comments in the code correspond to explanations listed below.

private void btnSolve_Click(object sender, RoutedEventArgs e) {
  Mouse.OverrideCursor = Cursors.Wait;
  txtResults.Clear();

  // 1) Open the file for reading
  using (var sr = File.OpenText(DICT_PATH)) {
    string aLine = "";
    // 2) Skip the top of the file, looking for a line
    // which starts with a letter, which represents the file payload
    while (sr.Peek() != -1) {
      aLine = sr.ReadLine() ?? "";
      if (aLine.Length > 0 && char.IsLetter(aLine[0]))
        break;
    }

    while(true) {
      // 3) Every word is separated from its phonemes by a double space
      var p = aLine.IndexOf("  ");
      if (p > 0) {
        // 4) Every phoneme is separated by a space, build a list of them
        var phonemes = aLine[(p + 2)..].Split(' ').ToList();
        
        // 5) Find the index (if any) of the phoneme represented by 'ZH'
        var zhPos = phonemes.IndexOf("ZH");
        if (zhPos >= 0) {
          // 6) Grab the word and strip off anything that looks lke '(2)'
          var aWord = Regex.Replace(aLine[..p], @"\(\d\)", "");
          // 7) Check if the index of 'T' is the same as the index of 'ZH'
          if (zhPos < aWord.Length && aWord[zhPos] == 'T') {
            txtResults.Text += $"Word: {aWord}, Phonemes: {aLine[(p + 2)..]}\n";
          }
        }
      }

      if (sr.Peek() == -1)
        break;
      aLine = sr.ReadLine() ?? "";
    }
  }

  Mouse.OverrideCursor = null;
}

Comments Explained

  1. Open the file for reading. ‘DICT_PATH’ is a constant, defined above, holding the file path of the dictionary, such as “c:\cmudict.0.7a”. The variable sr is a StreamReader that can read a line of text.
  2. Skip the top of the file, looking for a line which starts with a letter, which represents the file payload. Note that the first 121 lines of the file are comments and explanations, not part of the dictionary per se. Those lines start with a space or punctuation mark. Note that
    • aLine[0] represents the first character of the line (data type char)
    • char.IsLetter is a built-in method that returns true for letters and false otherwise
  3. Every word is separated from its phonemes by a double space. For example, EQUATION IH0 K W EY1 ZH AH0 N. It might be hard to see, but there are two spaces after ‘EQUATION’. The variable p represents the position of the first space. Now we know where to find the end of the word, and the beginning of the phoneme list.
  4. Every phoneme is separated by a space, build a list of them. There is a lot happening in this short line, let’s examine each carefully:
    • aLine[(p + 2)..] This code grabs a substring out of the original line, effectively the last half of the line, starting at the 2nd letter after p and running to the end of the line. If the double space starts at position 10, then (p + 2), i.e. 12 is the starting position of everything after the double space
    • .Split(' '). This built-in method builds an array out of the substring I gave it, splitting on the space character. Every time a space is encountered, we start a new entry in the array. If the input to split is “IH0 K W EY1 ZH AH0 N”, then the array will look like [“IH0”, “K”, “W”, “EY1”, “ZH”, “AH0”, “N”]. Meaning that the first entry is “IH0” and the last is “N”.
    • ToList(). This converts the array to a list. Why? Arrays and Lists are very similar, but they don’t have the same methods available to them. In particular, I want to use the method IndexOf, which is not available to arrays.
  5. Find the index (if any) of the phoneme represented by ‘ZH’. For our example array, [“IH0”, “K”, “W”, “EY1”, “ZH”, “AH0”, “N”], the index will be 4, meaning we can access that entry with the line of code phonemes[4]. If the list doesn’t have a ZH, then zhPos will be assigned -1.
  6. Grab the word and strip off anything that looks lke ‘(2)’. Once again, there is a lot going on with this line:
    • aLine[..p]. Remember, p is the position where we found the double space. This code fragment grabs a substring starting at the beginning of the line, up to position p.
    • Regex.Replace. Regular expressions are a handy, but tricky, way to match patterns in strings. If you download and examine the CMU Phoneme dictionary, you will see some lines that look like the following:
      • ERNESTS ER1 N AH0 S T S
      • ERNESTS(2) ER1 N AH0 S S
      • ERNESTS(3) ER1 N AH0 S
    • Note the two entries “(2)” and “(3)’. We want to replace these with an empty space, i.e., delete them. Regex.Replace will do that if we give it a string to start with, a pattern to match, and a replacement string (in our case, an empty string).
    • @"\(\d\)" This is the pattern we will look for
      • @” means this string contains \ characters
      • \( represents an open parenthesis. Because ( is a specal character, we need to preface it with \ so that .NET will look for that character in the string, instead of starting a capture group. Capture groups are an advanced topic which we don’t need to understand today. There are a few special characters which we need to preface with \, so we can avoid confusion; the \ is called an escape character.
      • \d This is a pattern element signifying “capture any digit“, i.e., the characters 0-9
      • \) This represents the close parenthesis. As before, the backspace is used as an escape character.
    • The upshot is that, if aLine contains “ERNESTS(2) ER1 N AH0 S S”, then aWord will be assigned “ERNESTS”, because we replaced (2) with “”
  7. Check if the index of ‘T’ is the same as the index of ‘ZH’. Note that, within the string ‘EQUATION’, ‘T’ has index 4 (remember, indexes start at 0, not at 1). Similarly, ‘ZH’ also has index 4 within the list [“IH0”, “K”, “W”, “EY1”, “ZH”, “AH0”, “N”]. Since the index is 4 in both, we are pretty confident that T is pronounced ‘ZH’ in the word equation.

The Tricky Part Explained!

You’ve now read my code. Have you managed to guess what I meant when I said I got lucky?

Answer: My code assumes that every letter in the original word is represented by a phoneme. But that isn’t actually true. Sometimes a pair of letters are used to make a sound, like the pair “sh”, “th”, or “ch”. I don’t know how many letter pairs are used that way, but it makes things complicated when you try to determine whether the letter ‘T’ is represented by the phoneme ‘ZH’ by determining that both are at position 4.

Since I found a good solution without taking this issue into account, and since dealing with it makes my blog entry a lot harder to understand, and since the whole thing is just for fun, I elected not to deal with the issue. Once again, I leave it as an exercise to the reader.

Get the Code

You can get a copy of my code from my dropbox account at this link. Note that you will need to create a free DropBox account to do so. Use the link above to get the CMU phoneme dictionary, and remember to revise my code to point to the file path on your machine.

Users Ignore Instructions? Animate them!

Posted on Updated on

You can imitate Movie makers and get users to read text by animating it. Somehow, the movement and action encourages users to read instructions. But, let’s just animate the instructions the first time they launch; thereafter, the users can just revisit the instructions when they need to. Benefit to you: your users actually read the instructions the first time, so they are less confused and like your app better.

Overview

The text doesn’t dance around, but it is displayed in sequence, as if being typed. I prefer to display whole words at a time, you can also add single letter at a time. In this sample, I will do the following actions in sequence:

  1. Animate my instruction string using a StringAnimationUsingKeyFrames
  2. Call attention to the instructions, one last time, by gently pulsing the background color
  3. Shrink-down the instructions
  4. Hide the instructions by closing the expander containing them

Screen Shot
4 Screen Shots Showing Each Animation Phase

The Animation Will only Run the First Time

The first time it runs, we will try hard to get the user to read the instructions. But the animation takes time, so let them use the expander if they need to read it again.

	if (AnimatedInstructions.Properties.Settings.Default.FirstTime) {
		AnimatedInstructions.Properties.Settings.Default.FirstTime = false;
		AnimatedInstructions.Properties.Settings.Default.Save();
		SetupInstructionAnimation();
	} else {
		expInstructions.Header = "Introduction/Instructions";
		expInstructions.IsExpanded = false;
	}

We will use the .NET properties functionality to record whether this is the first time user has run the app. We update the users’ config file to write ‘false’ to an xml variable called ‘FirstTime’. The method ‘SetupInstructionAnimation’ does the animation, as I am sure you can tell by the name!

Why am I doing this animation in code? You’ll see shortly, but basically, animating the text in XAML requires creating a lot of  KeyFrames; it is easier to generate them inside a loop.

The Method ‘SetupInstructionAnimation’ Builds Three Animations

private void SetupInstructionAnimation() {
	Storyboard sb = new Storyboard();
	string instructions = "These instructions tell you exactly how to run my new app. " +
		"You will get maximum benefit from the app if you read the instructions. " +
		"Users who don't read the instructions sometimes fail to capitalize on important " +
		"features. Read and avoid frustration! ";
	StringAnimationUsingKeyFrames stringAni;
	int stringAnilMs;
	BuildInstructionStringAnimation(instructions, out stringAni, out stringAnilMs);
	tbInstructions.BeginAnimation(TextBlock.TextProperty, stringAni);

	//now, a color animation for the background that starts after the string animation completes.
	ColorAnimation bgColorAni = BuildBackgroundColorAnimation(stringAnilMs);

	sb.Children.Add(bgColorAni);

	DoubleAnimation shrinkAni = BuildShrinkInstructionAnimation(stringAnilMs);
	sb.Children.Add(shrinkAni);
	sb.Completed += new EventHandler(Introductory_Animation_StoryBoard_Completed);

	tbInstructions.Loaded += delegate(object sender, RoutedEventArgs e) {
		sb.Begin(this);
	};
}

The first method we invoke is called ‘BuildInstructionStringAnimation’, and does exactly what you would think. The other methods generate the other animations.

 The String Animation

We use a StringAnimationUsingKeyFrames to make it look like we are typing the words. To set it up, we specify a “Key Frame” for every step of the animation. In our case, we will create a Key Frame for each word. As I mentioned, you can also provide a frame for every letter, but that didn’t look as nice to me.

private static void BuildInstructionStringAnimation(string instructions, 
                    out StringAnimationUsingKeyFrames stringAni, out int ms) {
    stringAni = new StringAnimationUsingKeyFrames();
    ms = 0;
    KeyTime kyTime = new KeyTime();
    int interval = 150;
    int wordIndex = 0;
    wordIndex = instructions.IndexOf(' ');
    while (wordIndex > 0) {
        string aWord = instructions.Substring(0, wordIndex);
        kyTime = TimeSpan.FromMilliseconds(ms);
        if (aWord.EndsWith("?") || aWord.EndsWith(".")) {
            ms += 1250;
        } else {
            ms += interval;
        }
        stringAni.KeyFrames.Add(new DiscreteStringKeyFrame(aWord, kyTime));

        wordIndex = instructions.IndexOf(' ', wordIndex + 1);
    }
    stringAni.Duration = TimeSpan.FromMilliseconds(ms);
}

Note that each word is displayed after a pause of 150 milliseconds, using the variable ‘ms’. Except, after the end of each sentence, we wait 1-1/4 seconds (1250 milliseconds) so the user can read the whole sentence. To find word boundaries in the instruction string, we search for spaces using the ‘IndexOf’ method. Then we use the Substring method to grab all of the instructions up to the current space and build a KeyFrame with it.

Note that we return ms to the caller, because we will use it for the BeginTime of the next animation.

The Remainder of the Code

The rest of the code is fairly standard if you have done any other animation.

private ColorAnimation BuildBackgroundColorAnimation(int ms) {
    ColorAnimation bgColorAni = new ColorAnimation();
    bgColorAni.BeginTime = TimeSpan.FromMilliseconds(ms);

    bgColorAni.From = Colors.White;
    bgColorAni.To = Colors.LightYellow;
    bgColorAni.Duration = TimeSpan.FromSeconds(1);
    bgColorAni.RepeatBehavior = new RepeatBehavior(2);

    Storyboard.SetTarget(bgColorAni, tbInstructions);
    Storyboard.SetTargetProperty(bgColorAni, new PropertyPath("Background.Color"));
    bgColorAni.AutoReverse = true;
    return bgColorAni;
}

private DoubleAnimation BuildShrinkInstructionAnimation(int stringAnilMs) {
    ScaleTransform scale = new ScaleTransform(1.0, 1.0);
    tbInstructions.RenderTransformOrigin = new Point(0, 0);
    tbInstructions.RenderTransform = scale;

    DoubleAnimation shrinkAni = new DoubleAnimation(1.0, 0.35, TimeSpan.FromMilliseconds(500), FillBehavior.Stop);
    shrinkAni.BeginTime = TimeSpan.FromMilliseconds(stringAnilMs + 4000);
    Storyboard.SetTargetProperty(shrinkAni, new PropertyPath("RenderTransform.ScaleY"));
    Storyboard.SetTarget(shrinkAni, tbInstructions);
    return shrinkAni;
}

void Introductory_Animation_StoryBoard_Completed(object sender, EventArgs e) {
    expInstructions.IsExpanded = false;
    expInstructions.Header = "Show Introduction/Instructions";
}

And Finally, the XAML

<Expander Grid.Row="1" Grid.ColumnSpan="3" Name="expInstructions" IsExpanded="True" >
    <TextBlock FontSize="18" Name="tbInstructions" TextWrapping="Wrap" >
        <TextBlock.Background>
            <SolidColorBrush Color="White" />
        </TextBlock.Background>
        These instructions tell you exactly how to run my new app.
        You will get maximum benefit from the app if you read the instructions.
        Users who don't read the instructions sometimes fail to capitalize on important
        features. Read and avoid frustration!
    </TextBlock>
</Expander>

Summary

Easy-to-learn applications are cheaper to maintain! They will generate fewer support calls from confused users. You can help your users learn to run your app by animating the instructions. Your boss and users will like you better for it!