Tuesday, November 1, 2011

Searching Characters and Substrings in a String in ANSI C

ANSI C provides the string.h library which is specialized in string operations. If you're not that familiar with how strings work in ANSI C, you can check out my previous post here. The available functions for string searching are:
-Searching for a character : strchr, strrchr, strcspn, strpbrk,
-Searching for a substring : strstr, strspn

Searching for a character in a string

Both strchr and strchr will return a pointer (the memory address) of the character you search.

The difference between them is that strchr will return a pointer to the location of the first occurrence of the character, while strrchr will return a pointer to the location of the last occurence of the character.

If the character does not exist in the string, NULL will be returned.

Here's an example on how to use them:
#include<stdio.h>
#include<string.h>

int main(void)
{
    char string[40] = "The bird is the word";
    char *firstE = NULL;
    char *lastE = NULL;
    int firstEIndex = 0;
    int lastEIndex = 0;

    firstE = strchr(string, 'e');
    lastE = strrchr(string, 'e');

    /*You can use these pointer to print the
     substring starting with the first/last occurence of
     your character*/
    puts(firstE);
    puts(lastE);

    /*You can also easily determine de index of the searched
    character by applying simple pointer arithmetic*/
    firstEIndex = firstE - string;
    lastEIndex = lastE - string;
    printf("%d %d\n",firstEIndex, lastEIndex);

    if(firstE!=NULL)
    {
        puts("We found the character!");
    }
    return 0;
}
/*Output
e bird is the word
e word
2 14
We found the character!
 */
TIP: As you saw in the example above, you can easily determine the index of the character using pointer arithmetic.
TIPThe searched character is sent as an integer, but is internally converted to a char. If you want to read why, check out this link.
PITFALL: The result is unpredictable for string which are not null terminated.
The ANSI functions defined in the library string.h that are available for this purpose are:

An alternative to strchr is strcspn. This function no longer takes as parameter a single character, but another string (a key string).

strcspn will return the index of the first occurrence of one of the characters from the key string. If none of the characters from the key string are found, strcspn will return the length of the searched string.

Here's an example on how to use this function:
#include<stdio.h>
#include<string.h>

int main(void)
{
    char string[20]="The bird is the word";
    size_t size = strlen(string);

    int eFirstIndex = strcspn(string,"e");
    int eOrhFirstIndex = strcspn(string, "eh");
    int xyFoundIndex = strcspn(string, "xXyY");

    //Verifies if e was found
    if(eFirstIndex!=size)
    {
        printf("e char index is %d\n",eFirstIndex);
    }
    else
    {
        printf("e char not found\n");
    }
    //Verifies if e or h was found
    if(eOrhFirstIndex!=size)
    {
       printf("e or h char index is %d\n",eOrhFirstIndex);
    }
    else
    {
       printf("e or h chars not found in string\n");
    }

    //Verifies if x,X,y or Y was found
    if(xyFoundIndex!=size)
    {
       printf("x or y or X or Y found at char index is %d\n",xyFoundIndex);
    }
    else
    {
       printf("No occurences of x, y, X or Y\n");
    }

    return 0;
}
/*Output:
e char index is 2
e or h char index is 1
No occurences of x, y, X or Y
 */
Another alternative is strpbrk. The difference is that strpbrk will return a pointer (just like strstr) instead of the index.

Here's an example on how to use it:
#include<stdio.h>
#include<string.h>

int main(void)
{
    char string[20]="The bird is the word";

    char* eFirstOccurence = strpbrk(string,"e");
    char* eOrHFirstOccurence = strpbrk(string, "eh");
    char* xOrYFirstOccurence = strpbrk(string, "xXyY");

    //Verifies if e was found
    if(eFirstOccurence!=NULL)
    {
        printf("e char index is %d\n",(eFirstOccurence-string) );
    }
    else
    {
        printf("e char not found\n");
    }
    //Verifies if e or h was found
    if(eOrHFirstOccurence!=NULL)
    {
       printf("e or h char index is %d\n",(eOrHFirstOccurence-string));
    }
    else
    {
       printf("e or h chars not found in string\n");
    }

    //Verifies if x,X,y or Y was found
    if(xOrYFirstOccurence!=NULL)
    {
       printf("x or y or X or Y found at char index is %d\n",
              (xOrYFirstOccurence-string) );
    }
    else
    {
       printf("No occurences of x, y, X or Y\n");
    }

    return 0;
}
/*Output
e char index is 2
e or h char index is 1
No occurences of x, y, X or Y
 */
TIPFor strpbrk you can apply pointer arithmetic to determine the index, just like for strchr.
PITFALL: The result is unpredictable if any of the strings are not null terminated. This is true for both strpbrk and strcspn.

Searching for a substring in a string

The functions use for searching substrings are similar to the ones for searching character. strstr is the equivalent of strchr for strings.

strstr will return a pointer to the first occurence of the substring. If the substring is not found, NULL shall be returned.

Here's an example on how to use it:
#include<stdio.h>
#include<string.h>

int main(void)
{
    char string[20] = "The bird is the word";

    char* birdOccurrence = strstr(string, "bird");
    char* angryBirdOccurrence = strstr(string, "angry bird");

    if(birdOccurrence!=NULL)
    {
        printf("The 'bird' substring was found starting at index %d\n",
               (birdOccurrence-string));
    }
    else
    {
        puts("The 'bird' substring was not found");
    }
    if(angryBirdOccurrence!=NULL)
    {
        printf("The 'angry bird' substring was found starting at index %d\n",
               (angryBirdOccurrence-string));
    }
    else
    {
        puts("The 'angry bird' substring was not found");
    }
    return 0;
}
/*Output
The 'bird' substring was found starting at index 4
The 'angry bird' substring was not found
 */
Another interesting function made available by the string.h is strspn.

strspn is very particular because it returns the length of the initial portion consisting only of characters that are part of the key string. If not character from the key string is found, it shall return 0.

Here's an example:
#include<stdio.h>
#include<string.h>

int main(void)
{
    char string[20] = "the bird is the word";

    size_t initialPortionLength = strspn(string, "abcdefghijklmnopqrt ");

    if(initialPortionLength!=0)
    {
        printf("The first %d characters exist in the key string",
                initialPortionLength);
    }
    else
    {
        puts("The first character was not found in the key string");
    }
    return 0;
}
/*Output
The first 10 characters exist in the key string
 */

TIP: As with all char* returning functions from string.h, you can apply pointer arithmetic to strstr in order to determine the index.
PITFALL: The result is unpredictable if any of the strings are not null terminated. This is true for both strspn and strstr.

QuestionWhat tips & tricks you know related to string searching operations?

References:
http://www.cplusplus.com/reference/cstring/

No comments:

Post a Comment

Got a question regarding something in the article? Leave me a comment and I will get back at you as soon as I can!

Related Posts Plugin for WordPress, Blogger...
Recommended Post Slide Out For Blogger