819. Most Common Word
Easy
Problem:
Return the most frequently appearing word, excluding banned words. Do not distinguish between uppercase and lowercase, and also ignore punctuation (such as periods, commas, etc.).
Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.
https://leetcode.com/problems/most-common-word/
What to learn:
Modifying a list while iterating over it can lead to unpredictable results or bugs. In this case, it's possible that not all instances of a banned word are removed because the iteration skips some elements due to the modification of the list.
for ban in banned:
for word in result_array:
if ban.lower() == word:
result_array.remove(word)
In regular expression,
\w
means Word Character and^
means not. Therefore,re.sub(r'[^\w]', ' ', paragraph)
means to substitute all characters in theparagraph
that are not word characters with a space.The python
max()
function haskey=
parameter to define a function that returns a value for each element of the iterable.
ages = {
'Matt': 30,
'Katie': 29,
'Nik': 31,
'Jack': 43,
'Alison': 32,
'Kevin': 38
}
# Return the key with the highest value in a Python dictionary.
max_value = max(ages, key=ages.get)
most_common(n)
method ofcounter
class returns a list of top 'n' elements from most common to least common, as specified the parameter 'n'.
import collections
counterObject = collections.Counter()
# Tea types vs number of cups
counterObject["Green tea"] = 10
counterObject["Yellow tea"] = 8
counterObject["White tea"] = 10
counterObject["Oolong tea"] = 5
counterObject["Black tea"] = 15
Solutions:
Preprocess paragraph by using list comprehension and regular expression.
Return the dictionary key that corresponds with the highest value.
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
new_paragraph = paragraph.lower()
new_array = []
result_array = []
count_array = []
split_str = ""
# Make split
for word in new_paragraph:
if word.isalpha():
split_str += word
else:
if split_str: # avoid appending empty strings
new_array.append(split_str)
split_str = ""
# Append the last word (if any) to new_array
if split_str:
new_array.append(split_str)
# No split
if(len(new_array) == 0):
new_array.append(new_paragraph)
# Make new array from paragraph
for word in new_array:
new_str = ""
for char in word:
if char.isalpha():
new_str += char
result_array.append(new_str)
# use a list comprehension to create a new list
# that only contains the words not present in the banned list
if len(banned) > 0:
result_array = [word for word in result_array if word not in banned]
print(result_array)
for word in result_array:
count = 0
for char in result_array:
if char == word:
count += 1
count_array.append(count)
# Return max
max_index = count_array.index(max(count_array))
return result_array[max_index]
Counter
import re
import collections
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
words = [word for word in re.sub(r'[^\w]', ' ', paragraph).lower().split() if word not in banned]
counts = collections.Counter(words)
return counts.most_common(1)[0][0]
Defaultdict
import re
import collections
class Solution:
def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
words = [word for word in re.sub(r'[^\w]', ' ', paragraph).lower().split() if word not in banned]
counts = collections.defaultdict(int)
for word in words:
counts[word] +=1
return max(counts, key=counts.get)
Last updated