Subdomain Visit Count

Patrick Leaton
Problem Description

A website domain like "discuss.leetcode.com" consists of various subdomains. At the top level, we have "com", at the next level, we have "leetcode.com", and at the lowest level, "discuss.leetcode.com". When we visit a domain like "discuss.leetcode.com", we will also visit the parent domains "leetcode.com" and "com" implicitly.

Now, call a "count-paired domain" to be a count (representing the number of visits this domain received), followed by a space, followed by the address. An example of a count-paired domain might be "9001 discuss.leetcode.com".

We are given a list cpdomains of count-paired domains. We would like a list of count-paired domains, (in the same format as the input, and in any order), that explicitly counts the number of visits to each subdomain.

Example 1:
Input: 
["9001 discuss.leetcode.com"]
Output: 
["9001 discuss.leetcode.com", "9001 leetcode.com", "9001 com"]
Explanation: 
We only have one website domain: "discuss.leetcode.com". As discussed above, the subdomain "leetcode.com" and "com" will also be visited. So they will all be visited 9001 times.

Example 2:
Input: 
["900 google.mail.com", "50 yahoo.com", "1 intel.mail.com", "5 wiki.org"]
Output: 
["901 mail.com","50 yahoo.com","900 google.mail.com","5 wiki.org","5 org","1 intel.mail.com","951 com"]
Explanation: 
We will visit "google.mail.com" 900 times, "yahoo.com" 50 times, "intel.mail.com" once and "wiki.org" 5 times. For the subdomains, we will visit "mail.com" 900 + 1 = 901 times, "com" 900 + 50 + 1 = 951 times, and "org" 5 times.

Notes:

  • The length of cpdomains will not exceed 100
  • The length of each domain name will not exceed 100.
  • Each address will have either 1 or 2 "." characters.
  • The input count in any count-paired domain will not exceed 10000.
  • The answer output can be returned in any order.

 

Description taken from https://leetcode.com/problems/subdomain-visit-count/.

Problem Solution

#O(N) Time, O(N) Space
class Solution:
    def subdomainVisits(self, cpdomains: List[str]) -> List[str]:
        visit = {}
        output = []     
        for count_domain in cpdomains:
            count, domain = count_domain.split(" ")  
            subs = domain.split(".")         
            subs[0] = domain   
            index = domain.find(".")
            subs[1] = domain[index +1:]
            for sub in subs:                   
                if sub not in visit:
                    visit[sub]  = int(count)
                else:
                    visit[sub] += int(count)
        for element in visit:
            visit_count = "{} {}".format(visit[element], str(element))
            output.append(visit_count)
        return output

Problem Explanation


String manipulation questions that handle getting a raw input then rearranging the input into a clean data output typically can be approached by breaking the input down into bite-sized chunks and processing those chunks separately.

Here, we can split each domain into separate subdomains then place those subdomains into a hashmap where we can then apply the count given from the raw input to each subdomain separately.  

Once we have each subdomain and its count in the hashmap, we will just need to group them together and append them to the output, in the format given then we're finished.


Let's start by initializing our visit dictionary which will hold each separate subdomain and the output array.

        visit = {}
        output = []     

 

Next, let's split our input string on the space to separate the count and the entire domain into two different indices. 

        for count_domain in cpdomains:
            count, domain = count_domain.split(" ")  

 

After, we’ll create our list of subdomains by splitting the domain string on the dots to get each separate subdomain.

            subs = domain.split(".")   

 

Since the question wants the output to be in the format “x  a.b.c, y  b.c, z  c”, we’ll need to overwrite the first and second index of our subdomain list.

The third element won’t be overwritten if there are three subdomains so it will still be “com” or “org”, things like that.

Let's make the entire domain the first element in our subdomain list, the second element will be the slice of the domain after the first dot.

            subs[0] = domain   
            index = domain.find(".")
            subs[1] = domain[index +1:]

 

Now that we have each subdomain split, we will go through and apply the count given to each subdomain separately.

            for sub in subs:                

 

If a subdomain isn't in the visit hashmap, we will place it there with the initial count given.

                if sub not in visit:
                    visit[sub]  = int(count)

 

If it is, we are visiting it a second time so we will increment its count by the current count we have.

                else:
                    visit[sub] += int(count)

 

Next, we’ll use a loop to iterate through the elements in the visit dictionary and we will make a string of each subdomain and its visited count.  We will take each of those strings and append them to the output array.

        for element in visit:
            visit_count = "{} {}".format(visit[element], str(element))
            output.append(visit_count)

 

Once we have our array of the subdomain visit counts, we will return it.

        return output