819. Most Common Word

Easy

Problem:

Return the most frequently appearing word, excluding banned words. Do not distinguish between uppercase and lowercase, and also ignore punctuation (such as periods, commas, etc.).

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
Explanation: "hit" occurs 3 times, but it is a banned word. "ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. Note that words in the paragraph are not case sensitive, that punctuation is ignored (even if adjacent to words, such as "ball,"), and that "hit" isn't the answer even though it occurs more because it is banned.

https://leetcode.com/problems/most-common-word/

What to learn:

Modifying a list while iterating over it can lead to unpredictable results or bugs. In this case, it's possible that not all instances of a banned word are removed because the iteration skips some elements due to the modification of the list.

for ban in banned:
    for word in result_array:
        if ban.lower() == word:
            result_array.remove(word)

In regular expression, \w means Word Character and ^ means not. Therefore, re.sub(r'[^\w]', ' ', paragraph) means to substitute all characters in the paragraph that are not word characters with a space.
The python max() function has key= parameter to define a function that returns a value for each element of the iterable.

ages = {
    'Matt': 30,
    'Katie': 29,
    'Nik': 31,
    'Jack': 43,
    'Alison': 32,
    'Kevin': 38
}

# Return the key with the highest value in a Python dictionary.
max_value = max(ages, key=ages.get)

Python Dictionary Methods

dict() constructs a dictionary.
- dict(key="value")
- dict({name: "value"})
- dict([(name, "value")])
len() returns the number of items a dictionary has.
get(key) returns the value of the specified key.
items() returns a list containing a tuple for each key value pair.
keys() returns a list containing the dictionary's keys.
values() returns a list of all the values in the dictionary
pop(key) removes the element with specified key. We use the del statement to remove an element from the dictionary as well: del dictonary_name["key"]
popitem() removes the last inserted key-value pair.
has_key(key)returns true if the dictionary contains the specified key. has_key() has been removed in Python 3. in operator is used to check whether a specified key is present or not in a Dictionary.
clear() removes all the items from the dictionary.

🙋🏻 How to get key from value in Dictonary

most_common(n) method of counter class returns a list of top 'n' elements from most common to least common, as specified the parameter 'n'.

import collections

counterObject = collections.Counter()

# Tea types vs number of cups
counterObject["Green tea"]  = 10
counterObject["Yellow tea"] = 8
counterObject["White tea"]  = 10
counterObject["Oolong tea"] = 5
counterObject["Black tea"]  = 15

Solutions:

Preprocess paragraph by using list comprehension and regular expression.
Return the dictionary key that corresponds with the highest value.

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        new_paragraph = paragraph.lower()
        new_array = []
        result_array = []
        count_array = []
        split_str = ""
    
        # Make split
        for word in new_paragraph:
            if word.isalpha():
                split_str += word
            else:
                if split_str: # avoid appending empty strings
                    new_array.append(split_str)
                split_str = ""
    
        # Append the last word (if any) to new_array
        if split_str:
            new_array.append(split_str)
    
        # No split
        if(len(new_array) == 0):
            new_array.append(new_paragraph)
    
        # Make new array from paragraph
        for word in new_array:
            new_str = ""
            for char in word:
                if char.isalpha():
                    new_str += char
            result_array.append(new_str)
    
        # use a list comprehension to create a new list 
        # that only contains the words not present in the banned list
        if len(banned) > 0:
            result_array = [word for word in result_array if word not in banned]
    
        print(result_array)
        for word in result_array:
            count = 0
            for char in result_array:
                if char == word:
                    count += 1
            count_array.append(count)
    
        # Return max
        max_index = count_array.index(max(count_array))
        return result_array[max_index]

Counter

import re
import collections

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        words = [word for word in re.sub(r'[^\w]', ' ', paragraph).lower().split() if word not in banned]
        
        counts = collections.Counter(words)
        return counts.most_common(1)[0][0]

Defaultdict

import re
import collections

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        words = [word for word in re.sub(r'[^\w]', ' ', paragraph).lower().split() if word not in banned]
        
        counts = collections.defaultdict(int)
        for word in words:
            counts[word] +=1
        return max(counts, key=counts.get)

defaultdict(argument)

It provides a default value for the key that does not exists. Therefore, defaultdict never raises a KeyError.

from collections import defaultdict
  
# Defining the dict and passing 
# lambda as default_factory argument
d = defaultdict(lambda: "Not Present")
d["a"] = 1
d["b"] = 2

When the list class is passed as the argument, then a defaultdict is created with the values that are list.

from collections import defaultdict
  
# Defining a dict
d = defaultdict(list)
  
for i in range(5):
    d[i].append(i)

When the int class is passed as the argument, then a defaultdict is created with default value as zero.

from collections import defaultdict
   
# Defining the dict
d = defaultdict(int)
   
L = [1, 2, 3, 4, 2, 4, 1, 2]
   
# Iterate through the list
# for keeping the count
for i in L:
       
    # The default value is 0
    # so there is no need to 
    # enter the key first
    d[i] += 1

Reference: https://www.geeksforgeeks.org/defaultdict-in-python

Last updated 2 years ago