NR Puzzle Feb 25, 2022
This week’s challenge comes from listener Alan Hochbaum, of Duluth, Ga. Name a famous actor β first and last names. Remove the last letter of each name. You’ll be left with an animal and an adjective that describes that animal, respectively. Who is the actor?
Link to Puzzle at NPR
Another easy puzzle, and again I elected to solve it using Python. Skills and techniques include:
- File processing
- String manipulation
- Simple Regular Expressions
- Binary search
- Loops and basic logic
Comments
I’m pretty confident in my code, and I have a lists of 16,000 actors, 5,000 adjectives, and 2,000 animals. But, I don’t have all the actors, animals and adjectives! So there could be other solutions that I missed, even though my code works satisfactorily. “Redd Foxx” is not a well known actor these days. Sorry if either is your favorite actor, but I’ve never heard of my other two actors, Cody Longo and Sean Whalen. So, come Sunday, I would not be surprised if Will says some other actor is the solution. But, as you can see, given my data, these three are legitimate solutions.
Strategy
- Load the animal file into a list. Note my file is alphabetically sorted
- Load the adjective file into another list
- Open the actor file and process each entry
- Split the name into separate words
- Using a regular expression expressing to control the separator character
- Split the name into separate words
- Take the first word – call it the first name,
- Take the last word – call it the actor’s last name
- Avoid any suffix, such as ‘Jr.’, ‘Sr.’, etc.
- Use a binary search to quickly search the animal and adjective lists
- If we get a hit for first and last names, print a candidate solution
- Take the last word – call it the actor’s last name
Here’s The Code
Step 1: Load the two Files
import re
# Open and load the animals file; note it is presorted
fName = "C:\\Users\\Randy\\Documents\\Puzzles\\Data\\CommonAnimals.txt"
animals = []
# Note: file is already sorted
fHandle = open(fName, 'rt')
for aLine in fHandle:
animals.append(aLine.strip())
# Open and load the adjectives file, also presorted
adjectives = []
# This file is sorted too
fName = "C:\\Users\\Randy\\Documents\\Puzzles\\Data\\Adjectives.txt"
fHandle = open(fName, 'rt')
for aLine in fHandle:
adjectives.append(aLine.strip())
A couple notes:
- I use a double backslash \\ in the file path because a single backslash is used to designate special characters
- I invoke the ‘strip’ method primarily to remove the linefeed
Step 2: Define the Binary Search Method
def binary_search(list1, n):
low = 0
high = len(list1) - 1
mid = 0
while low <= high:
# Compute index for our next guess, the middle of high and low
mid = (high + low) >> 1
# Check if n is present at mid
if list1[mid] < n:
low = mid + 1
# If n is greater, compare to the right of mid
elif list1[mid] > n:
high = mid - 1
# If n is smaller, compared to the left of mid
else:
return mid
# element not found, return -1
return -1
Notes
- Binary search is extremely fast (it runs in
O (log(n))
time). Meaning that the search time is proportionate to the log of the list size. For example, if the list has a million entries, binary search would find an entry in an average 6 operations (because Log(1000000) = 6). Note that it works if the list is sorted. low, mid and high
are indexes to the list. ‘mid
‘ is the middle of the current search subsection.- We examine the element at position
mid
- If it matches our target ‘n’, we found it
- But if it is smaller, recompute high (which then causes mid to be recomputed)
- Alternatively, if it is higher, recompute low (which also causes mid to be recomputed next guess)
- Bit-shift for even more speed: the following is the faster version of division by 2:
mid = (high + low) >> 1
- It does this by shifting the bits to the right one position. I do this for a minor increase in speed; note that division is by far the slowest arithmetic operator
Step 3: Process the Actor File
# Use this list to make sure we get the correct surname
suffixes = ['Jr', 'Sr', 'II', 'III']
fName = "C:\\Users\\Randy\\Documents\\Puzzles\\Data\\Actors\\Actors.txt"
# The file has some non-standard letters, so open with utf-8
fHandle = open(fName, 'rt', encoding="utf8")
for aLine in fHandle:
# By splitting the name, we can get each name part as
# a separate entry in the list 'tokens':
tokens = re.split(r'[ ,.]', aLine.strip())
first = tokens[0][:-1]
last = ''
lastIndex = len(tokens) - 1
if tokens[lastIndex] in suffixes and lastIndex >= 2:
last = tokens[lastIndex -1][:-1]
elif lastIndex >= 1:
last = tokens[lastIndex][:-1]
# At this point, the first and last names have been stripped of
# their last characters. Some names might not work
# because they have length 1,
# such as the actor 'X Brands' or 'A.J. Trauth',
# so skip over empty first/last names
if last != '' and first != '':
# see if first name starts corresponds to an animal
aniIndex = binary_search(animals, first)
# -1 means no match; so now let's see
# if first name works with adjectives:
if aniIndex == -1:
adjIndex = binary_search(adjectives, first.lower())
if adjIndex != -1:
aniIndex = binary_search(animals, last)
if aniIndex != -1:
print(aLine.strip(), '-->', adjectives[adjIndex], animals[aniIndex].lower())
else:
# First name is an animal; try last name for adjectives
adjIndex = binary_search(adjectives, last.lower())
if adjIndex >= 0:
print(aLine.strip(), '-->', adjectives[adjIndex], animals[aniIndex].lower())
Comments
- I use a regular expression to split the actor name into a list of separate strings
tokens = re.split(r'[ ,.]', aLine.strip())
- For example, “William Collier, Jr.”
- Becomes [“William”, “Collier”, “Jr”], i.e. a list of 3 strings
- The separator characters (delimiters) are space, comma and period, specified here:
r'[ ,.]'
- The first name is at position 0 of the list, the first element
- The following line of code creates a variable ‘first’ by removing the last character from the first element:
first = tokens[0][:-1]
- Using string slicing
- Usually, the last string in my list is the actor’s surname, but sometimes they have a suffix like “Sr.”
- The following code tests if the last string is such a suffix:
if tokens[lastIndex] in suffixes and lastIndex >= 2:
- After assigning the first and last name, we use those variables to perform a binary search
- If
first
is an adjective and last is ananimal
, print a solution - Or, if
last
is an adjective andfirst
is an animal, print
Download the Code
You can download my code and data files here. Note that you will be required to create a free DropBox account to do so.
April 11, 2022 at 5:04 am
[…] to search a list. That is important when searching a big list. Since I discussed this code before (NPR Puzzle February 5, 2022), I will only list the code and not discuss the […]