For my first repost, we have a StringReplace function that needs optimizing. Independent of the questioner's implementation, let's consider the core algorithm.
StringReplace(OldText, NewText, BaseString)
for SubString of length OldText in BaseString
if (OldText == SubString) {Replace text}
This is how we want to replace. Find each instance of OldText in BaseString and replace. Given the nature of the question, our implementation will be written in C and not use any libraries (like regex, CRT, etc).
Working backwards, we'll need to return a final string of some indeterminate size. Then the BaseString, as it is modified, is stored in this final string, in the following form, where each base is a subset of the string:
When the problem is viewed this way, an implementation is suggested, recursive. The following code expands upon this approach (though avoiding some casting requirements, necessities of sizeof(char), and possibility of unicode support).
#define StringReplace(x, y, z) StringReplaceHelper(x, y, z, 0)
char* StringReplaceHelper(char* OldText,
char* NewText,
char* BaseString,
unsigned long space)
{
char* start = BaseString, ret = NULL;
unsigned long offset = 0;
while (*BaseString != '\0')
{
if (Match(OldText, BaseString))
{
offset = (BaseString - start);
// The next search will begin after
// the replacement text
ret = StringReplaceHelper(OldText, NewText,
BaseString + strlen(OldText),
space + offset + strlen(NewText));
break;
}
BaseString++;
}
// If the end of string, then this is the last piece
// Else copy in subset and replacement piece
if (*BaseString == '\0')
{
offset = (BaseString - start);
ret = (char*) malloc((space + offset));
memcpy(ret + space, start, offset));
}
else
{
memcpy(ret + space, start, offset);
// Don't actually do the following, the processor
// will have to walk NewText twice
memcpy(ret + offset, NewText, strlen(NewText));
}
return ret;
}
First, the routine Match() is basically a string compare, which can have its own interesting optimizations including partial skip-ahead. Second, a final version wouldn't use strlen, except perhaps at the start. The lengths can be passed between iterations. Finally, using memcpy and malloc aren't cheating as I've written my own implementations in the past, and therefore their omission in the present is just for the sake of brevity.
But this is just what I'd currently write for a string replace. You can follow the link to Stack Overflow and see the original request (and my suggestion for improvement). Or perhaps consider your own.
No comments:
Post a Comment