Pub Pixel Avatar Image

How To Count The Characters In A String Such As Letters, Numbers And More Using PHP

Published:

Rate Article (Sign in and be the first to rate this article.)

Article Details

Views: 1887

First Published:

Word Count: 2393

Edition: 1

Before I show you how to count the characters in a string let me first briefly explain to you what a string is in PHP. A string is basically a series of letters, numbers, spaces, punctuation, and so on, that is contained inside either single (') or double (") quotation marks, or you can use the heredoc syntax (<<<EOT EOT) which acts just like the double quotes, or if you are using PHP 5.3.0 you can use the nowdoc syntax (<<<'EOT' EOT) which acts just like the single quotes. So, in other words the following strings below are all valid strings in PHP.


<?php

echo 'Single quotes string';

echo "Double quotes string";

echo <<<EOT
This is a heredoc syntax string  
EOT;

echo <<<'EOT'
This is a nowdoc syntax string
EOT;

?>

The difference between these four types of strings in a nutshell are listed below.

  1. Single quoted strings will display the strings data almost completely as is, even variables and most escape sequences like \t or \n for example, will not be interpreted. Only sinlge quotes that are escaped with a backslash (\) are displayed, for example, $str = 'I\'ll be back'; will be displayed as I'll be back when the string is outputted. Also backslashes that are escaped with another backslash (\) are displayed, for instance, $str = 'C:\\windows'; will be displayed as C:\windows when the string is outputted.
  2. Double quoted strings will display the strings data almost as is. But, variables and the following escaped sequences \n, \r, \t, \v, \f, \\, \$, \", as well as the escaped sequences for representing characters in octal notation \[0-7]{1,3} and even the escaped sequences for representing characters in hexadecimal notation \x[0-9A-Fa-f]{1,2} will all be interpreted as they were intended to be when the string is outputted. You can also put curly braces {} around a variable name to clearly say that it is a variable, this is known as complex (curly) syntax. For example, lets say you have a variable named $jobs, you can place the variable in your double quoted strings using the complex (curly) syntax in the following way for instance, echo "These ${jobs} suck!"; you can even place the curly braces outside the dollar sign ($) for example, echo "These {$jobs} suck!"; which is also a valid use of the complex (curly) syntax.
  3. Heredoc string syntax works just like the double quoted strings. Except it starts with an operator which are three less-than signs <<< immediately followed by an identifier, which is normally a word in all caps, which you can name any way you like as long as it contains only alphanumeric characters and underscores, and must start with a non-digit character or underscore. Immediately after the identifier you must start a new line. This is important to remember because there should be nothing on the same line after the initial identifier, not even a space or else the heredoc string wont work at all. After all that the strings content itself follows on the new line which can contain a series of letters, numbers, spaces, punctuation, and so on. You then need to include a closing identifier on another new line after the strings content. The closing identifier must have the same name as the opening identifier, and must begin in the first column of the new line. The closing identifiers name must contain only alphanumeric characters and underscores, and must start with a non-digit character or underscore. It is very important to remember that the line with the closing identifier must not contain no other characters, except possibly a semicolon (;) after the closing identifiers name for example, EOT;. This basically means that the identifier may not be indented, and there may not be any spaces or tabs before or after the semicolon, which is then followed by a new line. You also don't need to escape quotes when using the heredoc syntax.
  4. In PHP 5.3.0 you have a new string delimiter that you can use which is called nowdoc. The nowdoc string syntax works just like the single quoted strings, so in other words no parsing is done or variable interpolation will be done meaning that the variables will be treated as plain text. The difference is that not even single quotes or backslashes have to be escaped. Now the nowdoc string syntax is almost specified like the heredoc syntax in that it starts with an operator which are three less-than signs <<< but the identifier that is immediately followed must be enclosed in single quotes ('') when using the nowdoc string syntax, for example <<<'EOT'. The identifier is normally a word in all caps, which you can name any way you like as long as it contains only alphanumeric characters and underscores, the identifier must also start with a non-digit character or underscore. Immediately after the identifier you must start a new line. There should be nothing on the same line after the initial identifier, not even a space or the nowdoc string syntax wont work at all. After all that the strings content itself follows on the new line which can contain a series of letters, numbers, spaces, punctuation, and so on. You then need to include a closing identifier on another new line after the strings content. The closing identifier must have the same name as the opening identifier, and must also begin in the first column of the new line. The closing identifiers name must contain only alphanumeric characters and underscores, and must start with a non-digit character or underscore. It is very important to remember that the line with the closing identifier must not contain no other characters, except possibly a semicolon (;) after the closing identifiers name for example, EOT;. This basically means that the identifier may not be indented, and there may not be any spaces or tabs before or after the semicolon, which is then followed by a new line. You also don't need to escape quotes when using the nowdoc syntax.

Now that you know what strings are in PHP and the four different ways that you can specify a string in PHP. I can now show you how to count the characters in a string using PHP's iconv_strlen() function for PHP 5.

The iconv_strlen() function returns the character count of a string as an integer also known as a number.

The iconv_strlen() function has two parameters that you can use which include the str and charset parameters that I will explain in greater detail later on in this tutorial. For now I will briefly explain the str parameter since it is a required parameter. The iconv_strlen() function counts the amount of characters in the given byte sequence also known as a string which you must specify for the iconv_strlen() functions str parameter. So, basically the str parameter holds the string that will have its characters counted by the iconv_strlen() function.

Now let me show you in the example below how to code in the iconv_strlen() function and its required str parameter which will hold the string that will have its characters counted. And then I will explain in more detail about the str and charset parameters.


<?php

$required_string = '<p>Characters to be counted.</p>';  
echo iconv_strlen($required_string);
 
?>

You may also place the string directly into the iconv_strlen() functions str parameter as in the example below.


<?php

echo iconv_strlen('<p>Characters to be counted.</p>');  
 
?>

Both examples above will produce the character count value of 32.

Now let me list and explain the iconv_strlen() functions str and charset parameters below.

str

Now as I explained earlier in this tutorial the str parameter for the iconv_strlen() function is a required parameter that specifies the string for the iconv_strlen() function, so that the iconv_strlen() function can check and return the character count of the specified string. So, in other words the str parameter holds the string that will have all its characters counted by the iconv_strlen() function.

Now in the example below I will show you how to code in the required str parameter for the iconv_strlen() function.


<?php

$required_string = '<p>Characters to be counted.</p>';  
echo iconv_strlen($required_string);
 
?>

charset

You may also include the charset parameter for the iconv_strlen() function which specifies the character encoding to use when counting the strings characters. If you choose not to include the charset parameter the default character encoding that will be used when the string is evaluated will be the iconv.internal_encoding which is the character encoding ISO-8859-1. The character encoding ISO-8859-1 which is also known as the ISO Latin-1, or Latin alphabet no. 1, or Latin-1, or Latin alphabet part 1, is basically intended for Western European languages like English, Portuguese, Faroese, Spanish and German languages just to name a few.

So, in other words the iconv_strlen() function takes into account the specified character set also known as a character encoding, for instance, when the character encoding UTF-8 is specified as the value for the charset parameter the amount of characters counted in the given string may result in the number of characters not being equal to the number of bytes. For example, when the Japanese character HIRAGANA LETTER KO こ is placed inside the iconv_strlen() functions str parameter and no character encoding is specified for the charset parameter, for instance, iconv_strlen('こ') the result when outputted will be 3, since it is a 3 byte character. But when you specify the charset encoding for the charset parameter the number of characters counted in the specified string may differ from the number of bytes in the string. For example, when we indicate the charset encoding UTF-8 for the iconv_strlen() functions charset parameter, for instance, iconv_strlen('こ', 'UTF-8') the result when outputted will be 1 because according to the charset encoding UTF-8 the Japanese character HIRAGANA LETTER KO こ is considered a single character.

Below is a list of some of the character encodings that you can use for the charset parameters value for the iconv_strlen() function.

Character Encoding Languages Covered
ISO-8859-1 Afrikaans, Albanian, Basque, Breton, Catalan, English (US and UK), Faroese. Galician, German, Icelandic, Irish (new orthography), Italian, Kurdish (The Kurdish Unified Alphabet), Latin (basic classical orthography), Leonese, Luxembourgish (basic classical orthography), Norwegian (Bokmål and Nynorsk), Occitan, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Swahili, Swedish and Walloon
ISO-8859-6 Arabic
ISO-8859-7 Greek
ISO-8859-8 Hebrew
ISO-8859-13 Baltic languages (Latvian, Latgalian, Lithuanian, Samogitian and New Curonian) and Polish
ISO-8859-14 Celtic languages (Breton, Cornish, Irish, Manx, Scottish Gaelic and Welsh)
ISO-2022-KR Korean
KOI8-R Cyrillic
EUC-JP Japanese
EUC-CN Simplified Chinese
UTF-8 UTF-8 can represent every character in the Unicode character set

Now let me show you how to code in the charset parameter for the iconv_strlen() function in the example below.


<?php

$required_string = '<p>Characters to be counted.</p>';  
echo iconv_strlen($required_string, 'UTF-8');
 
?>

It's important to remember that the iconv_strlen() function will fail if an illegal character sequence is checked when using the double quoted strings or when using the heredoc string syntax. For example, if our character encoding is set to UTF-8 and our string contains an invalid byte sequence like, for instance, the incomplete byte sequence of \xC3 instead of the full byte sequence of \xC3\xA9 for the Unicode character 'LATIN SMALL LETTER E WITH ACUTE', our string will then fail when evaluted by the iconv_strlen() function. You should be aware that your character encodings specifications will determine if a byte sequence is bad or not.

The following example below is an example of an invalid byte sequence.


<?php

$required_string = "\xC3";
echo iconv_strlen($required_string, 'UTF-8');  
 
?>

The above code will output an error, that is similar to the following error below.

( ! ) Notice: iconv_strlen() [function.iconv-strlen]: Detected an incomplete multibyte character in input string in C:\wamp\www\site\example.php on line 4
Call Stack
# Time Memory Function Location
1 0.0005 668808 {main}( ) ..\example.php:0
2 0.0005 669112 iconv_strlen( ) ..\example.php:4

The above error is displayed when an illegal character sequence that is contained in double quoted strings or in the heredoc string syntax is checked by the iconv_strlen() function.

The example below is an example of a valid byte sequence.


<?php

$required_string = "\xC3\xA9";
echo iconv_strlen($required_string, 'UTF-8');  
 
?>

The output of the above code will return a value of 1.

Now let me show you another example of an invalid byte sequence in the example below.


<?php

$required_string = "<p>Characters to be counted \xC3.</p>";  
echo iconv_strlen($required_string, 'UTF-8');
 
?>

The above code will output an error, that is similar to the following error below.

( ! ) Notice: iconv_strlen() [function.iconv-strlen]: Detected an illegal character in input string in C:\wamp\www\site\example.php on line 4
Call Stack
# Time Memory Function Location
1 0.0006 669200 {main}( ) ..\example.php:0
2 0.0006 669528 iconv_strlen( ) ..\example.php:4

The above error notice is displayed when an illegal character sequence that is contained in double quoted strings or in the heredoc string syntax is checked by the iconv_strlen() function.

Now here is another example of a valid byte sequence in the example below.


<?php

$required_string = "<p>Characters to be counted \xC3\xA9.</p>";  
echo iconv_strlen($required_string, 'UTF-8');
 
?>

The above code will output a value of 34.

Now before I end this tutorial let me just say that there are also two other functions that are similar to the iconv_strlen() function that you may want to check out, which include the strlen() and mb_strlen() functions. You should be aware that the mb_strlen() function will not fail on illegal character sequences like the iconv_strlen() function will, but will simply just ignore them. I will explain more about these functions in future tutorials.

If you need more information about the iconv_strlen() function then you should probably check out the PHP manual at http://www.php.net/manual/en/function.iconv-strlen.php

Article Tags

Tag Article (Sign in if you want to tag this article.)

Share This Article

Send To Facebook Link

Tweet This Link

Send To LinkedIn Link

Send To StumbleUpon Link

Digg This Link

Send To Google Plus One Share Link

Send To Reddit Link

Send To Tumblr Link

Send To Delicious Link

Send To FriendFeed Link

Send To Diigo Link

Email Link

AddThis Link

0 Comments

Leave Comment (Sign in and be the first to comment on this article.)

Important

This site cannot guarantee the accuracy of articles, comments, answers and other types of content or media submitted by members or anonymous users and guests, and we recommend that you use common sense when following any advice or information found on this site. Read our full Content Disclaimer Agreement for further information.