Monday, May 14, 2018

Count Number of Words in the given input String

Task Description

Given an input string, you need to get the number of words or list of words delimited by prefefined list of delimiter characters.

Simple Solution (Incorrect Solution)

I had come across this problem for the first time when I had just joined by Undergraduate Program. Then I was just learning how to code. So when I was given this problem for the first time, the solution I came up with was to traverse the entire string and count the number of delimiters. It would give correct word count for a normal sentence where we would expect only one delimiter in between each word and no leading or trailing spaces in the string.

Correct Solution

Idea

The problem in the simple solution discussed above is that there could be contigous sequence of delimiters in between the words. So the idea is to increase the word count only when we encounter a Non-Delimiter character right after a Delimiter character.

Edge Case

The edge case in this solution is, if there are no delimiters in the beginning of the string, then we will be missing the first word because we are increasing word count only after we encounter a delimiter character. This can be easily solved by assuming that we have encountered delimiters in the beginning.

Psuedo-Code


  boolean delimiter = true // assume we  have already encountered delimiters in the beginning to consider first word if there are leading delimiters
  int count = 0
  for i = 0 to input.length - 1
    char current = input.charAt(i)
    if current == delimiter:
      delimiter = true
    else 
      if delimiter: // non delimiter found after delimiters, increase word count
        count++
      delimiter = false

  return count

      
Similar logic can applied to actually get the list of words in the input string.

Implementation

Following the complete implemenation for the task. The method count() returns the number of words in the string and getWords() returns the list of the words.

The implementation has also pre-defined a list of delimiters based on which the input string will be splitted into words. Delimiters can be added or removed.

Complexity

We are traversing the string only once. For each character, we check if it is delimiter or not by comparing with a HashSet - this operation is contant time ${\cal O}(1)$ operation. Then we just set a boolean value and increase count if needed. Both of these are also constant time operations. $\therefore$ the overall time complexity of this solution is ${\cal O}(n)$, n being the length of the input string. Space complexity of the solution is obviously ${\cal O}(1)$.






1 comment: