Word break problem
--
Given a string s='abcde'
and a dictionary of valid words valid_words=['a', 'ab', 'cd', 'cde', 'e']
, return all possible ways to break down the string into only valid words (e.g. ['ab cd e', 'ab cde']
).
We can decompose this problem by first identifying which valid words are also prefixes of s
(e.g. 'a' and 'ab'), and then applying word_break()
recursively to the rest of the string (e.g. 'bcde' and 'cde'). In this example, the valid words 'a' and 'ab' are both prefixes of s
. The remaining substrings corresponding to these prefixes are 'bcde' and 'cde', respectively, so we need to recursively call both word_break('bcde')
and word_break('cde')
.
The left child node corresponds to the prefix ‘a’ and the right one corresponds to prefix ‘ab’. Next, we follow the same decomposition step and check to see which valid words are prefixes of either ‘bcde’ or ‘cde’. There are no valid words that ‘bcde’ starts with, but ‘cde’ starts with both ‘cd’ and ‘cde’. If we remove these from the start of ‘cde’, the remaining strings that we need to break down are ‘e’ and ‘’, respectively.
Great! Now we see that since ‘e’ is a valid word and the empty string ‘’ is our base case, we’re ready to return back up our function call tree. But how do we combine the results of word_break('e') = ['e']
and word_break('') = []
with the valid words that we broke off from the parent node (i.e. 'cd' and 'cde' respectively) earlier? In this case, we know that our output needs to be a list of space-separated strings consisting of the valid words that we identified, so we can simply join our output strings with spaces.
So now we have combine('cd', ['e']) = ['cd e']
and combine('cde',[]) = ['cde']
. Since we know that each word_break(...)
call needs to return a list of strings, we can simply concatenate the results of these two combine
calls to obtain `word_break('cde') = ['cd e', 'cde']. Now let's continue back up the function call stack:
Moving up the function call tree, we continue combining the prefix for each node with the results of the word_break(...)
calls in the child nodes (e.g. the prefix for 'cde' is 'ab', so we call combine('ab', word_break('cde'))
).
public List<String> wordBreak(String s, List<String> wordDict) {
List<String> outputs = new ArrayList<String>();
if( s.length()==0)
return new ArrayList<String>();
if(wordDict.contains(s)){
outputs.add(s);
}
for ( int i=1 ;i <s.length() ; i++){
String prefix = s.substring(0, i);
if(wordDict.contains(prefix)){
String suffix = s.substring(i);
List<String> suffix_breakdown = wordBreak(suffix, wordDict);
List<String> suffix_output = combine(prefix, suffix_breakdown);
for(String item : suffix_output){
outputs.add(item);
}
}
}
return outputs;
}
public List<String> combine(String prefix, List<String> list){
List<String> output = new ArrayList<String>();
for(String item : list){
output.add(prefix+" "+item);
}
return output;
}