Strings in Java Core Java

Conceptually, Java strings are sequences of Unicode characters. For example, the string"Javau2122" consists of the five Unicode characters J, a, v, a, and ™. Java does not have abuilt-in string type. Instead, the standard Java library contains a predefined class called,naturally enough, String. Each quoted string is an instance of the String class:

Substrings

You extract a substring from a larger string with the substring method of the String class.
For example,

creates a string consisting of the characters "Hel".

The second parameter of substring is the first position that you do not want to copy.In our case, we want to copy positions 0, 1, and 2 (from position 0 to position 2inclusive). As substring counts it, this means from position 0 inclusive to position 3exclusive.

There is one advantage to the way substring works: Computing the length of the substringis easy. The string s.substring(a, b) always has length b - a. For example, thesubstring "Hel" has length 3 – 0 = 3.

Concatenation

Java, like most programming languages, allows you to use the + sign to join (concatenate) two strings.

The preceding code sets the variable message to the string "Expletivedeleted". (Note the lackof a space between the words: The + sign joins two strings in the order received, exactlyas they are given.)

When you concatenate a string with a value that is not a string, the latter is convertedto a string. (As you will see in Chapter 5, every Java object can be converted to a string.)
For example,

This feature is commonly used in output statements. For example,

is perfectly acceptable and will print what one would want (and with the correct spacing because of the space after the word is).

Strings Are Immutable

The String class gives no methods that let you change a character in an existing string. If you want to turn greeting into "Help!", you cannot directly change the last positions of greeting into 'p' and '!'. If you are a C programmer, this will make you feel pretty helpless. How are you going to modify the string? In Java, it is quite easy: Concatenate the substring that you want to keep with the characters that you want to replace.

This declaration changes the current value of the greeting variable to "Help!".

Because you cannot change the individual characters in a Java string, the documentation refers to the objects of the String class as being immutable. Just as the number 3 is always 3, the string "Hello" will always contain the code unit sequence describing the characters H, e, l, l, o. You cannot change these values. You can, as you just saw however, change the contents of the string variable greeting and make it refer to a different string, just as you can make a numeric variable currently holding the value 3 hold the value 4.

Isn’t that a lot less efficient? It would seem simpler to change the code units than to build up a whole new string from scratch. Well, yes and no. Indeed, it isn’t efficient to generate a new string that holds the concatenation of "Hel" and "p!". But immutable strings have one great advantage: the compiler can arrange that strings are shared.

To understand how this works, think of the various strings as sitting in a common pool. String variables then point to locations in the pool. If you copy a string variable, both the original and the copy share the same characters.

Overall, the designers of Java decided that the efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating. Look at your own programs; we suspect that most of the time, you don’t change strings —you just compare them. (There is one common exception —assembling strings from individual characters or shorter strings that come from the keyboard or a file.)

C++ NOTE: C programmers generally are bewildered when they see Java strings for the first time because they think of strings as arrays of characters:

That is the wrong analogy: A Java string is roughly analogous to a char* pointer,

When you replace greeting with another string, the Java code does roughly the following:

Sure, now greeting points to the string "Help!". And even the most hardened C programmer must admit that the Java syntax is more pleasant than a sequence of strncpy calls. But what if we make another assignment to greeting?

greeting = "Howdy";

Don’t we have a memory leak? After all, the original string was allocated on the heap. Fortunately, Java does automatic garbage collection. If a block of memory is no longer needed, it will eventually be recycled.

If you are a C++ programmer and use the string class defined by ANSI C++, you will be much more comfortable with the Java String type. C++ string objects also perform automatic allocation and deallocation of memory. The memory management is performed explicitly by constructors, assignment operators, and destructors. However, C++ strings are mutable—you can modify individual characters in a string.

Testing Strings for Equality

To test whether two strings are equal, use the equals method. The expression

returns true if the strings s and t are equal, false otherwise. Note that s and t can be string variables or string constants. For example, the expression

is perfectly legal. To test whether two strings are identical except for the upper/lowercase letter distinction, use the equalsIgnoreCase method.

Do not use the == operator to test whether two strings are equal! It only determines whether or not the strings are stored in the same location. Sure, if strings are in the same location, they must be equal. But it is entirely possible to store multiple copies of identical strings in different places.

If the virtual machine would always arrange for equal strings to be shared, then you could use the == operator for testing equality. But only string constants are shared, not strings that are the result of operations like + or substring. Therefore, never use == to compare strings lest you end up with a program with the worst kind of bug—an intermittent one that seems to occur randomly.

C++ NOTE: If you are used to the C++ string class, you have to be particularly careful about equality testing. The C++ string class does overload the == operator to test for equality of the string contents. It is perhaps unfortunate that Java goes out of its way to give strings the same “look and feel” as numeric values but then makes strings behave like pointers for equality testing. The language designers could have redefined == for strings,just as they made a special arrangement for +. Oh well, every language has its share of inconsistencies.

C programmers never use == to compare strings but use strcmp instead. The Java method compareTo is the exact analog to strcmp. You can use

but it seems clearer to use equals instead.

Code Points and Code Units

Java strings are implemented as sequences of char values.The most commonly used Unicode characters can be represented with a single code unit. The supplementary characters require a pair of code units.

The length method yields the number of code units required for a given string in the

UTF-16 encoding. For example:

To get the true length, that is, the number of code points, call

The call s.charAt(n) returns the code unit at position n, where n is between 0 and s.length() – 1.
For example:

NOTE: Java counts the code units in strings in a peculiar fashion: the first code unit in a string has position 0. This convention originated in C, where there was a technical reason for counting positions starting at 0. That reason has long gone away and only the nuisance remains. However, so many programmers are used to this convention that the Java designers decided to keep it.

Why are we making a fuss about code units? Consider the sentence

Zis the set of integers

The Z character requires two code units in the UTF-16 encoding. Calling

doesn’t return a space but the second code unit of Z. To avoid this problem, you should not use the char type. It is too low-level.

If your code traverses a string, and you want to look at each code point in turn, use these statements:

Fortunately, the codePointAt method can tell whether a code unit is the first or second half of a supplementary character, and it returns the right result either way. That is, you canmove backwards with the following statements:

The String API

The String class in Java contains more than 50 methods. A surprisingly large number of them are sufficiently useful so that we can imagine using them frequently. The following API note summarizes the ones we found most useful.

NOTE: You will find these API notes throughout the book to help you understand the Java Application Programming Interface (API). Each API note starts with the name of a class such as java.lang. The class name is followed by the names, explanations, and parameter descriptions of one or more methods.

We typically do not list all methods of a particular class but instead select those that are most commonly used, and describe them in a concise form. For a full listing, consult the online documentation.

We also list the version number in which a particular class was introduced. If a method has been added later, it has a separate version number.

APIjava.lang.string

  • char charAt(int index)
    returns the code unit at the specified location. You probably don’t want to call this method unless you are interested in low-level code units.
  • int codePointAt(int index) 5.0
    returns the code point that starts or ends at the specified location.
  • int offsetByCodePoints(int startIndex, int cpCount) 5.0
    returns the index of the code point that is cpCount code points away from the code point at startIndex.
  • int compareTo(String other)
    returns a negative value if the string comes before other in dictionary order, a positive value if the string comes after other in dictionary order, or 0 if the strings are equal.
  • boolean endsWith(String suffix)
    returns true if the string ends with suffix.
  • boolean equals(Object other)
    returns true if the string equals other.
  • boolean equalsIgnoreCase(String other)
    returns true if the string equals other, except for upper/lowercase distinction.
  • int indexOf(String str)
  • int indexOf(String str, int fromIndex)
  • int indexOf(int cp)
  • int indexOf(int cp, int fromIndex)
    returns the start of the first substring equal to the string str or the code point cp, starting at index 0 or at fromIndex, or –1 if str does not occur in this string.
  • int lastIndexOf(String str)
  • int lastIndexOf(String str, int fromIndex)
  • int lastindexOf(int cp)
  • int lastindexOf(int cp, int fromIndex)
    returns the start of the last substring equal to the string str or the code point cp, starting at the end of the string or at fromIndex.
  • int length()
    returns the length of the string.
  • int codePointCount(int startIndex, int endIndex) 5.0
    returns the number of code points between startIndex and endIndex - 1. Unpaired surrogates are counted as code points.
  • String replace(CharSequence oldString, CharSequence newString)
    returns a new string that is obtained by replacing all substrings matching oldString in the string with the string newString. You can supply String or StringBuilder objectsfor the CharSequence parameters.
  • boolean startsWith(String prefix)
    returns true if the string begins with prefix.
  • String substring(int beginIndex)
  • String substring(int beginIndex, int endIndex)
    returns a new string consisting of all code units from beginIndex until the end of the string or until endIndex - 1.
  • String toLowerCase()
    returns a new string containing all characters in the original string, with
    uppercase characters converted to lowercase.
  • String toUpperCase()
    returns a new string containing all characters in the original string, with
    lowercase characters converted to uppercase.
  • String trim()
    returns a new string by eliminating all leading and trailing spaces in the original string.

Reading the On-Line API Documentation

As you just saw, the String class has lots of methods. Furthermore, there are thousands ofclasses in the standard libraries, with many more methods. It is plainly impossible to remember all useful classes and methods. Therefore, it is essential that you become familiar with the on-line API documentation that lets you look up all classes and methods in the standard library. The API documentation is part of the JDK. It is in HTML format. Point your web browser to the docs/api/index.html subdirectory of your JDK installation.

The three panes of the API documentation

The three panes of the API documentation

The screen is organized into three frames. A small frame on the top left shows all available packages. Below it, a larger frame lists all classes. Click on any class name, and the API documentation for the class is displayed in the large frame to the right. For example, to get more information on the methods of the String class, scroll the second frame until you see the String link, then click on it.

Class description for the String class

Class description for the String class

Then scroll the frame on the right until you reach a summary of all methods, sorted in alphabetical order.

Method summary of the String class

Method summary of the String class

Click on any method name for a detailed description of that method (see Figure below). For example, if you click on the compareToIgnoreCase link, you get the description of the compare To Ignore Case method. Fundamental Programming Structures in Java

Detailed description of a String method

Detailed description of a String method

TIP: Bookmark the docs /api /index .html page in your browser right now.

Building Strings

Occasionally, you need to build up strings from shorter strings, such as keystrokes or words from a file. It would be inefficient to use string concatenation for this purpose. Every time you concatenate strings, a new String object is constructed. This is time consuming and it wastes memory. Using the StringBuilder class avoids this problem.

Follow these steps if you need to build a string from many small pieces. First, construct an empty string builder:

Each time you need to add another part, call the append method.

When you are done building the string, call the toString method. You will get a String object with the character sequence contained in the builder.

NOTE: The StringBuilder class was introduced in JDK 5.0. Its predecessor, String Buffer, is slightly less efficient, but it allows multiple threads to add or remove characters. If all string editing happens in a single thread (which is usually the case), you should use StringBuilderinstead. The APIs of both classes are identical.

The following API notes contain the most important methods for the StringBuilder class.

API java.lang.StringBuilder 5.0

  • StringBuilder()
    constructs an empty string builder.
  • int length()
    returns the number of code units of the builder or buffer.
  • StringBuilder append(String str)
    appends a string and returns this.
  • StringBuilder append(char c)
    appends a code unit and returns this.
  • StringBuilder appendCodePoint(int cp)
    appends a code point, converting it into one or two code units, and returns this.
  • void setCharAt(int i, char c)
    sets the ith code unit to c.
  • StringBuilder insert(int offset, String str)
    inserts a string at position offset and returns this.


Face Book Twitter Google Plus Instagram Youtube Linkedin Myspace Pinterest Soundcloud Wikipedia

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Core Java Topics