Monday, May 14, 2018

Count Number of Words in the given input String

Task Description

Given an input string, you need to get the number of words or list of words delimited by prefefined list of delimiter characters.

Simple Solution (Incorrect Solution)

I had come across this problem for the first time when I had just joined by Undergraduate Program. Then I was just learning how to code. So when I was given this problem for the first time, the solution I came up with was to traverse the entire string and count the number of delimiters. It would give correct word count for a normal sentence where we would expect only one delimiter in between each word and no leading or trailing spaces in the string.

Correct Solution

Idea

The problem in the simple solution discussed above is that there could be contigous sequence of delimiters in between the words. So the idea is to increase the word count only when we encounter a Non-Delimiter character right after a Delimiter character.

Edge Case

The edge case in this solution is, if there are no delimiters in the beginning of the string, then we will be missing the first word because we are increasing word count only after we encounter a delimiter character. This can be easily solved by assuming that we have encountered delimiters in the beginning.

Psuedo-Code


  boolean delimiter = true // assume we  have already encountered delimiters in the beginning to consider first word if there are leading delimiters
  int count = 0
  for i = 0 to input.length - 1
    char current = input.charAt(i)
    if current == delimiter:
      delimiter = true
    else 
      if delimiter: // non delimiter found after delimiters, increase word count
        count++
      delimiter = false

  return count

      
Similar logic can applied to actually get the list of words in the input string.

Implementation

Following the complete implemenation for the task. The method count() returns the number of words in the string and getWords() returns the list of the words.

The implementation has also pre-defined a list of delimiters based on which the input string will be splitted into words. Delimiters can be added or removed.
/* File: WordCounter.java
* Created: 5/14/2018
* Author: Sabbir Manandhar
*
* Copyright (c) 2018 Hogwart Inc.
*/
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
/**
* Utility Class to capture the words in the given input string delimited
* by pre-defined delimiters
*
* @author Sabbir Manandhar
* @version 1.0
*/
public class WordCounter {
private static Set<Character> delimiters = new HashSet<>();
static {
delimiters.add(' ');
delimiters.add('\'');
delimiters.add('!');
delimiters.add('-');
delimiters.add('(');
delimiters.add(')');
delimiters.add('{');
delimiters.add('}');
delimiters.add('[');
delimiters.add(']');
delimiters.add('?');
delimiters.add('/');
delimiters.add('\\');
delimiters.add(',');
}
//---------------------------------------------------------------------------------------------
/**
* driver method
* @param args
*/
public static void main(String[] args) {
System.out.println(count("I am lord,voldemort")); // 4
System.out.println(getWords("I am lord,voldemort")); // 4
System.out.println(count("I am lord voldemort ")); // 4
System.out.println(getWords("I am lord voldemort ")); // 4
System.out.println(count(" I am lord voldemort")); // 4
System.out.println(getWords(" I am lord voldemort")); // 4
System.out.println(count(" I am lord voldemort ")); // 4
System.out.println(getWords(" I am lord voldemort ")); // 4
System.out.println(count(" Lord Voldemort! is my past, my present and my future!!!! ")); // 10
System.out.println(getWords(" Lord Voldemort! is my past, my present and my future!!!! ")); // 4
} // main
//---------------------------------------------------------------------------------------------
/**
* Counts the number of words in the given input string
* @param input input string
* @return number of words in the input
*/
public static int count(String input) {
boolean delimiter = true;
int count = 0;
for (int i = 0; i < input.length(); i++) {
if (delimiters.contains(input.charAt(i))) { // delimiter encountered
delimiter = true;
} else {
if (delimiter) { // first non delimiter encountered after sequence of delimiters, increase word count
count++;
}
delimiter = false;
}
}
return count;
} // count
//---------------------------------------------------------------------------------------------
/**
* Gets the list words in the input string
* @param input input string
* @return List of words in the input
*/
public static List<String> getWords(String input) {
List<String> result = new ArrayList<>();
boolean delimiter = true;
//int count = 0;
int wordStart = 0;
for (int i = 0; i < input.length(); i++) {
if (delimiters.contains(input.charAt(i))) { // delimiter encountered
if (!delimiter) { // first delimiter after sequence of non delimiters, capture word
result.add(input.substring(wordStart, i));
}
delimiter = true;
} else {
if (delimiter) {
wordStart = i;
//count++;
}
delimiter = false;
}
}
if (!delimiter) {
result.add(input.substring(wordStart));
}
return result;
} // getWords
//---------------------------------------------------------------------------------------------
} // WordCounter

Complexity

We are traversing the string only once. For each character, we check if it is delimiter or not by comparing with a HashSet - this operation is contant time O(1)O(1) operation. Then we just set a boolean value and increase count if needed. Both of these are also constant time operations. the overall time complexity of this solution is O(n), n being the length of the input string. Space complexity of the solution is obviously O(1).






1 comment: