Java Language Basics


Table Of Contents
What are identifiers, and what are the restrictions on their declaration?
What is a literal?
What is an escape sequence?
What are Java's keywords?
What is a code block?
What is an expression?
What are Java's operators?
What data types does Java support? How do Java's data types differ from those of C/C++?

 




Java Syntax


Before you can effectively read or write programs in any language, you need to know about the language's syntax rules
and restrictions. A language's syntax defines the way programs are written in that language; more specifically, the syntax
of the language defines the language elements, the way these elements are used, and the way they are used together. The
following lists the typical language elements and shows how the language syntax is concerned with these elements:

Identifiers: How are variable names composed? What are the naming restrictions and conventions?
Literals: How are constant names composed? How are their values assigned?
Keywords: What are the language's predefined words? How are they used and how are they not used?
Statements: What is a statement and how is one written?
Code blocks: How are statements grouped together?
Comments: How can the programmer add comments and notes to the program?
Expressions: What is an expression and how is one written?
Operators: What are the operators used in the language? How are they used in expressions? Can a programmer
define his/her own operators?

Identifiers


An identifier is a name that uniquely identifies a variable, a method, or a class (we will discuss variables later in this
chapter; methods and classes are discussed in a later chapter). In most languages, there are restrictions on how identifiers
are composed. The following lists Java's restrictions on identifiers:

All identifiers must begin with a letter, an underscore ( _ ), or a dollar sign ($)
An identifier can include, but not begin with numbers
An identifier cannot include a white space (tab, space, linefeed, or carriage return)
Identifiers are case-sensitive
Java keywords cannot be used as identifiers

Since some C library names begin with an underscore or a dollar sign, it is best to avoid beginning an identifier name with
these characters. Importing a C library into a program that uses an underscore or a dollar sign to start an identifier name
might cause name clashing and confusion.

In addition to these restrictions, certain conventions are used with identifiers to make them more readable. Although these
conventions do not affect the compiler in any way, it is considered a good programming practice to follow them. The
following table lists some of these conventions based on the type of identifier:


Type of Identifier

Convention

Examples

Class name

The first letter of each word is
capitalized

Mammal, SeaMammal

Function name

The first letter of each, except the
first, word is capitalized

getAge, setHeight

Variable name

The first letter of each, except the
first, word is capitalized

age, brainSize

Constant names

Every letter is capitalized and
underscores are used between words

MAX_HEIGHT, MAX_AGE




Literals


A literal, or constant, represents a value that never changes. Think of an identifier as something that represents a value,
whereas a literal is a value. For example, the number 35 is a literal; the identifier age represents a number which could
be 35. In Java, a literal can be a number (integer or floating-point), a Boolean, a character, or a string.


Integer Literals


Integer literals are written in three formats: decimal (base 10), hexadecimal (base 16), and octal (base 8). Decimal literals
are written as ordinary numbers, hexadecimal literals always begin with 0X or 0x, and octal literals begin with 0. For
example, the decimal number 10 is 0xA or 0XA in hexadecimal format, and 012 in octal format.

An integer literal can be stored in the data types byte, short, int, or long. By default, Java stores integer literals in the int
data type, which is restricted to 32-bits.

To store an integer literal in the long data type, which can store 64-bit values, add the character l or L to the end of the
literal. For example, the literal 9999L is stored as long. The following lines of code use integer literals:

int x = 12345; //12345 is a literal
int y = x * 4; //4 is a literal

In the first line, the literal 12345 is stored directly in the int variable x. In the second line, the literal 4 is used to compute a
value first, which in turn is stored in the int variable y.

Note that even though an integer literal represents a constant value, it can still be assigned to an integer variable. Think of
the variable as a storage unit that at any one time can represent a single literal value. This also applies to the other literal
types.


Floating-Point Literals


A floating-point literal is a number with a decimal point and/or exponent. A floating-point literal is written in either
standard or scientific notation. For example, 123.456 is in standard notation, while 1.23456e+2 is in scientific
notation.

Floating-point literals are stored in the 64-bit double type (the default type), or the 32-bit float type. To store a
floating-point literal in the float type, append the letter f or F to the end of the number.


Boolean Literals


A Boolean literal represents two possible states: true or false. Boolean literals are stored in the data type boolean. Unlike
C/C++ where the states of a Boolean value are represented by 0 (false) and 1 (true), Java represents these states using
the keywords true and false.


Character Literals


A character literal represents a single Unicode character. Character literals are always surrounded by single quotes; for
example, 'A' and '9' are character literals. Java uses the char type to store single characters.

The Unicode character set is a 16-bit set that supplants the 8-bit ASCII set. The Unicode set can define up to 65,536
values, which is enough to include symbols and characters from other languages. Check out the Unicode home page at
www.unicode.org for more information.



Escape Sequences


A special type of character literal is called an escape sequence. Like C/C++, Java uses escape sequences to represent
special control characters and characters that cannot be printed. An escape sequence is represented by a backslash (\)
followed by a character code. The following table summarizes these escape sequences:


Character

Escape Sequence

Backslash

\\

Backspace

\b

Carriage return

\r

Continuation

\

Double quote

\"

Form feed

\f

Horizontal tab

\t

Newline

\n

Octal character

\DDD

Single Quote

\'

Unicode character

\uHHHH

 


An octal character is represented by a sequence of three octal digits, and a Unicode character is represented by a
sequence of four hexadecimal digits. For example, the decimal number 57 is represented by the octal code \071, and
the Unicode sequence \u0039.

To illustrate the use of escape sequences, the string in the following statement prints out the words Name and ID
separated by two tabs on one line, and prints out "Joe Smith" and "999", also separated by two tabs, on the
second line:

String escapeDemo = new String ("Name\t\tID\n\"Joe\ Smith\"\t\t\"999\"");

Note that this statement is intentionally written on two lines; therefore, the continuation character (\) is used to prevent a
compiler error.


String Literals


A string literal represents a sequence of characters. Strings in Java are always enclosed in double quotes. Java handles
strings differently than

C/C++; the latter represents a string using an array of characters, while the former uses the classes String and
StringBuffer. So, of all the literal types we've discussed, only string literals are stored as objects by default. Strings are
covered in more detail in the section "Arrays and Strings."



Keywords


A keyword is a predefined identifier that has a special meaning to the Java compiler, and which cannot be redefined. The
following is a list of Java's keywords:


abstract

else

int

static

boolean

extends

interface

super

break

false

long

switch

byte

final

native

synchronized

byvalue*

finally

new

this

case

float

null

throw

cast*

for

operator*

throws

catch

future*

outer*

transient

char

generic*

package

true

class

goto*

private

try

const*

if

protected

var*

continue

implements

public

void

default

import

rest*

volatile

do

inner*

return

while

double

instanceof

short

 

* Reserved but not being used.


As you may have noticed, many of Java's keywords are borrowed from C/C++. Also, as in C/C++, keywords are
always written in lowercase. Generally speaking, Java's keywords can be categorized according to their function as
follows (examples are in parenthesis):

Data declaration keywords (boolean, float, int)
Loop keywords (continue, while, for)
Conditional keywords (if, else, switch)
Exception keywords (try, throw, catch)
Structure keywords (class, extends, implements)
Modifier and access keywords (private, public, transient)
Miscellaneous keywords (true, null, super)

Statements


A statement represents a single command, or line of code, for the compiler. This doesn't mean, however, that each line of
code is a statement; in other words, there is no one-to-one mapping between physical lines of codes and statements. As
we will see later in this chapter, some statements, such as an if statement, can be composed of multiple lines of code.

So, if a statement can take up multiple physical lines, how does the compiler know where each statement ends and the
next begins? By using semicolons to separate statements.

The Java compiler is not concerned with the length of each statement, as long as statements are always separated by
semicolons. For example, the following two statements are equivalent:

x = (y + z) / q; //statement 1
x =
(y + z
) / q; //statement 2

The second statement has whitespace characters embedded in it (whitespace characters are the space, horizontal and
vertical tabs, form-feed, and new-line). Although the Java compiler ignores all whitespace characters embedded in
statements, it is obviously bad practice to do that since it makes the code difficult to read.

Recall that in the case of string values, the continuation character ( \ ) must be used at the end of each line to allow strings
to take up multiple lines.



Code Blocks


A code block is a grouping of statements that behave as a unit. Java delimits code blocks with braces ({ and }).
Examples of code blocks are class definitions, loop statements, condition statements, exception statements, and function
bodies. In the following section of code, there are three code blocks: the function frmResolver(), the try block,
and the catch block.

 public frmResolver() {
try {
jbInit();
}
catch (Exception e) {
e.printStackTrace();
}
}

The above code also illustrates the concept of nested blocks: the try and catch blocks are nested inside the main
frmResolver() block.


Comments


Comments are natural-language statements written by the programmer to make notes about the code. There are three
styles of comments in Java. The first one begins with /* and ends with */, and allows you to write comments that span
multiple lines. This style is the same as in the C language.

The following code demonstrates the use of this style:

x = y + z; /* This is a comment.*/
z = q / p; /*This comment
extends over two lines*/

When the Java compiler encounters /*, it ignores every character that follows it until it encounters */.

The second comment style is similar to the first one, only it begins with /** and ends with */. The difference is that this
style is used with the JDK tool javadoc to automatically generate documentation from the source code (Java
documentation is beyond the scope of this course).

The third comment style is borrowed from C++. It begins with // and can be written on just one line. Here's an example:

x = y + z; //This comment cannot extend over multiple lines

Nesting comments is valid only when comments of the third style are embedded in one of the other two styles. Nesting
comments of the first two styles is illegal.

Here is an invalid nested comment:

/*This is the beginning of the comment
/*
The comment ends here
*/
this is outside the comment and will generate a compiler
error
*/

As we mentioned earlier, the compiler ignores everything between /* and */; so when it encounters the first */ it thinks
that the comment ended. The last line in the code is therefore not contained in the comment.

The following is an example of a valid nested comment:

/*This is the beginning of the comment
//This is OK
//so is this
this is the end of the comment.
*/



Expressions


An expression is a meaningful combination of identifiers, keywords, symbols, and operators that has a value of some sort.
Generally speaking, everything that can be used on the right side of an assignment sign is an expression.

Here are some examples of expressions:

s = "Hello World";
x = 123.4;
y = (x * 5) / 2;
value = getValue();
Mammal m = new Mammal();

From the previous examples, we can categorize expressions into the following:

Variable assignments: The first two expressions assign values to the variables s and x.
Operator expressions: The third expression is an example of this. Operator expressions use combinations of
variables, literals, method calls, operators, and symbols. We will examine this kind in the next section.
Method calls: The fourth expression is a call to the method getValue(), which returns a value that is assigned to
value.
Object allocation: The last expression allocates memory for the Mammal object m. Think of object allocation
expressions as special method call expressions. We will cover both types of expressions in more detail in the
"Object-Oriented Programming in Java" chapter.



Operators


Operators are special symbols that perform a particular function on operands. There are five general types of operators:
arithmetic operators, logical operators, comparison operators, assignment operators, and bitwise operators. Each of
these can be further categorized into unary and binary. Unary operators operate on a single operand, while binary
operators operate on two operands.

In the following sections, we will examine the different types of operators. In addition, we will discuss the operator
associativity and precedence. Precedence determines the priority of operators, while associativity determines the
operating order of operators of equal precedence used in a single statement.


Arithmetic Operators


Java provides a full set of operators for mathematical calculations. Java, unlike some languages, can perform mathematical
functions on both integer and floating-point values. You will probably find these operators familiar.

The following table lists the arithmetic operators:


Operator

Definition

Precedence

Associativity

++/--

Auto-increment/decrement

1

Right

+/-

Unary plus/minus

2

Right

*

Multiplication

4

Left

/

Division

4

Left

%

Modulus

4

Left

+/-

Addition/subtraction

5

Left

The modulus operator returns the remainder of dividing its first operand by its second. The auto-increment/decrement
operators are unary operators. They modify the value of their operand by adding or subtracting 1 to their value. When
used in expressions, the outcome of the auto-increment/decrement operation depends on whether the operator precedes
or succeeds the operand.

The following demonstrates this:

int y = 3, x, z;
x = ++y;
z = y--;

In the second statement, the y variable is incremented by 1, and then its new value (4) is assigned to x. In the third
statement, the auto-decrement operation takes place following the assignment of y's current value to z. In other words,
the current value of y (4) is assigned to z, then y is modified to be 3.

The following code illustrates how precedence and associativity affect operators:

int x = 1, y = 2, z = 3, i, j;
i = x + y * z; //same as i = x + (y * z)
j = ++x + -y; //same as j = (++x) + (-y)
i = x++ + -y; //same as i = x++ + (-y)

Logical Operators


Logical (or Boolean) operators allow the programmer to group Boolean expressions to determine certain conditions.
These operators perform the standard Boolean operations (AND, OR, NOT, and XOR).

The following table lists the logical operators:


OperatorDefinitionPrecedence

Associativity

 
 

!

Unary logical
complement (NOT)

2

Right

&

Evaluation AND

9

Left

^

XOR

10

Left

|

Evaluation OR

11

Left

&&

Short-circuit AND

12

Left

||

Short-circuit OR

13

Left

The evaluation operators always evaluate both operands. The short-circuit operators, on the other hand, always evaluate
the first operand, and if that determines the value of the whole expression, they don't evaluate the second operand. For
better understanding, consider the following code:

if ( !isHighPressure && (temperature1 > temperature2)) {
}
boolean1 = (x < y) || ( a > b);
boolean2 = (10 > 5) & (5 > 1);

The first statement evaluates !isHighPressure first, if it is false, it does not evaluate the second operand
(temperature1 > temperature2) since the first operand being false means the whole expression is false.
The same is true for the second statement-the value of boolean1 will be true only if x is less than y (the value of the
second operand is never determined). In the third statement, however, the compiler will compute the values of both
operands before making the assignment to boolean2.

The XOR operator produces a true value only if the operands are of different values (true and false, or false and true).


Comparison Operators


Programmers need the ability to compare values. Comparison operators, unlike logical operators, will only evaluate a
singe expression.

The following table lists the comparison operators:


Operator

Definition

Precedence

Associativity

<

Less than

7

Left

>

Greater than

7

Left

<=

Less than or equal

7

Left

>=

Greater than or equal

7

Left

= =

Equal

8

Left

!=

Not equal

8

Left

The equality operator can be used to compare two object variables of the same type (objects are discussed in the next
chapter). In this case, the result of the comparison is true only if both variables refer to the same object. Here is a
demonstration:

m1 = new Mammal();
m2 = new Mammal();
boolean b1 = m1 == m2; //b1 is false
m1 = m2;
boolean b2 = m1 == m2; //b2 is true

The result of the first equality test is false because m1 and m2 refer to different objects (even though they are of the same
type). The second comparison is true because both variables now represent the same object.

 Most of the time, however, the equals() method is used to compare objects. This method, defined in the Object class,
must be implemented in a class subclassed from Object, before objects of the class can be compared for equality.


Assignment Operators


Java, like all languages, allows you to assign values to variables. The following table lists assignment operators:


Operator

Definition

Precedence

Associativity

=

Assignment

15

Right

+=

Add and assign

15

Right

-=

Subtract and assign

15

Right

*=

Multiply and assign

15

Right

/=

Divide and assign

15

Right

&=

AND with assignment

15

Right

|=

OR with assignment

15

Right

^=

XOR with assignment

15

Right

The first operator should be familiar by now. The rest of the assignment operators perform an operation first, and then
store the result of the operation in the operand on the left side of the expression. Here are some examples:

int y = 2;
y *= 2; //same as (y = y * 2)
boolean b1 = true, b2 = false;
b1 &= b2; //same as (b1 = b1 & b2)

Bitwise Operators


Bitwise operators are of two types: shift operators and Boolean operators. The shift operators are used to shift the binary
digits of an integer number to the right or the left. Consider the following example (the short integer type is used instead
of int for conciseness):

short i = 13; //i is 0000000000001101
i = i << 2; //i is 0000000000110100
i >>= 3; //i is 0000000000000110

In the second line, the bitwise left shift operator shifted all the bits of i two positions to the left. The bitwise right shift
operator then shifted the bits three positions to the right.

 The shifting operation is different in Java than in C/C++ -mainly in how it is used with signed integers. A signed integer is
one whose left-most bit is used to indicate the integer's sign (the bit is 1 if the integer is negative). In Java, integers are
always signed, whereas in C/C++ they are signed by default. In most implementations of C/CC, a bitwise shift operation
does not preserve the integer's sign (since the sign bit would be shifted out). In Java, however, the shift operators
preserve the sign bit (unless you use the >>> to perform an unsigned shift). This means that the sign bit is duplicated,
then shifted (right shifting 10010011 by 1 is 11001001).

The following is a complete list of Java's bitwise operators:


Operator

Definition

Precedence

Associativity

~

Bitwise complement

2

Right

<<

Signed left shift

6

Left

>>

Signed right shift

6

Left

>>>

Zero-fill right shift (as if unsigned)

6

Left

&

Bitwise AND

9

Left

|

Bitwise OR

10

Left

^

Bitwise XOR

11

Left

<<=

Left-shift with assignment

15

Left

>>=

Right-shift with assignment

15

Left

>>>=

Zero-fill right shift with assignment

15

Left

 


A Special Operator: the ?: Operator


We said earlier that there are two types of operators: unary and binary. That's not exactly true. There is also a ternary
operator that Java borrows from C, the ?: operator. Here's the general syntax for this operator:

expression1? expression2: expression3;

expression1 is first evaluated. If its value is true, expression2 is computed, otherwise expression3 is.
Here is a demonstration:

int x = 3, y = 4, max;
max = (x > y)? x: y; //this is basically the same as max=x;

In this code, max is assigned the value of x or y, based on whether x is greater than y.

Some people mislabel this operator as being a conditional statement. It is not a statement. The following invalid code
illustrates why it is not a statement:

(x > y)? max = x: max = y; //can't use it as if it's a statement



Java's Data Types


Data types are entities, which represent specific types of values that can be stored in memory, and are interpreted in a
specific way by the compiler. We already introduced data types in our discussion about literals in a previous section. We
mentioned that a literal is stored in a certain data type depending on the literal's value; the literal 9, for example, can be
stored in the int data type, and the literal 'c' can be stored in the char data type.

There are two categories of data types in Java: built-in and composite data types. Built-in (or primitive) data types can
be further categorized into three kinds: numeric, Boolean, and character data types. Built-in data types are understood by
the compiler and don't require special libraries. (A special library basically refers to any collection of code that is not part
of the actual language definition). Composite types are of two kinds: arrays and strings. Composite types usually require
the use of special libraries.

Before explaining the different Java data types, we need to discuss variables.


Variables


We defined a data type as something representing a specific value that can be stored in memory. So, how do you allocate
memory for that value, and how do you access it and assign values to it? To allocate a portion of memory for the storage
of data types, you must first declare a variable of that data type, then give the variable a name (identifier) that references
it. Here's the general syntax of a variable declaration:

datatype identifier [ = defaultValue ];

The declaration begins with the type of variable, followed by the variable's identifier, then followed by an optional default
value assignment. The following are examples of different types of variable declarations:

int p; //declares the variable p to store int data types
float x, y = 4.1, z = 2.2;
boolean endOfFile = false;
char char1 = 'T';

Notice that the second line declared three variables at the same time. You can declare multiple variables of the same type
at once, as long as you separate the variable identifiers with commas.

A variable can be declared anywhere in a program, as long as its declaration precedes any reference to it. Java is a
strongly typed language, which means that all variables must be declared before they are used.

If we attempt to reference the variable x, without declaring it first, we would get a compiler error:

int y = 4, z = 2;
x = y / z; //What is x? Is it a float, char, int, or what?

In the code example above, the second line generates an error because the compiler does not know the type of x;
moreover, it does not know where in memory it is stored.

To avoid any problems, such as referencing a variable that does not yet exist, it is best to declare all variables at the
beginning of the code blocks where they are used. That makes it easier for you to keep track of all your variables.

Now that we understand what variables are, we can go on to discuss data types.


Built-in Data Types


Numeric Data Types


The numeric data types are summarized in the following table:


Type

Size

Description (smallest and largest positive values)

byte

8 bits

very small signed integer (-128 Þ 127)

short

16
bits

short signed integer (-32768 Þ 32767)

int

32
bits

signed integer (-2.14e+9 Þ 2.14e+9)

long

64
bits

long signed integer (-9.22e+18 Þ 9.22e+18)

float

32
bits

floating-point number (1.402e-45 Þ 3.402e+38)

double

64
bits

double precision floating-point (4.94e-324 Þ
1.79e+308)

If a numeric variable is not initialized by the programmer, the Java VM will automatically initialize it to 0. Most Java
compilers will also detect uninitialized variables. This is different than C/C++, where uninitialized variables contain random
values and are not detected by the compiler.


Boolean Data Types


A Boolean data type has two values: true and false. Unlike C/C++, which stores Boolean data types numerically (0 for
false and 1 for true), Java uses the built-in data type boolean. Uninitialized boolean variables are automatically set to
false. The following code illustrates the use of a boolean variable:

int a = 1, b = 0;
boolean bool = a < b; //bool is false

Character Data Types


Java uses the data type char to store a single Unicode character. Java's char type, therefore, is 16-bit wide, whereas in
C/C++, it is (by default) 8-bits wide.


Composite Data Types


Arrays


An array is a data structure, which can hold multiple elements of the same type. The array's element type can be anything:
a primitive type, a composite type, or a user-defined class. If you have used arrays in other languages, you will probably
find the way Java handles arrays interesting. Let's first see some examples of array declarations:

int studentID[];
char[] grades;
float coordinates[][];

There are two things to note about these array declarations:

The array size is not specified in most other languages the array's size must be included in its declaration.
The placement of the square brackets can follow the identifier, as in the first example, or follow the data type, as in
the second example.

Creating and Initializing Arrays


The previous array declarations did not actually allocate any memory for the arrays (they simply declared identifiers that
will eventually store actual arrays). For that reason, the sizes of the arrays were not specified.

To actually allocate memory for the array variables, you must use the new operator as follows:

int studentID[] = new int[20];
char[] grades = new char[20];
float[][] coordinates = new float[10][5];

The first statement creates an array of 20 int elements, the second creates an array of 20 char elements, and the third
creates a two-dimensional 10 by 5 float array (10 rows, 5 columns). When the array is created, all its elements are null.

The use of the new operator in Java is similar to using the malloc command in C and the new operator in C++.

To initialize an array, the values of the array elements are enumerated inside a set of curly braces. For multi-dimensional
arrays, nested curly braces are used.

The following statements illustrate this:

char[] grades = {'A', 'B', 'C', 'D', 'F');
float[][] coordinates = {{0.0, 0.1}, {0.2, 0.3}};

The first statement creates a char array called grades. It initializes the array's elements with the values 'A' through 'F'.
Notice that we did not have to use the new operator to create this array; by initializing the array, enough memory is
automatically allocated for the array to hold all the initialized values. Therefore, the first statement creates a char array of
5 elements.

The second statement creates a two-dimensional float array called coordinates, whose size is 2 by 2. The array's
first row is initialized to 0.0 and 0.1, and the second row to 0.2 and 0.3. Conceptually, coordinates is an array of
two array elements.


Accessing Array Elements


Array elements are accessed by subscripting (or indexing) the array variable. Indexing an array variable involves following
the array variable's name with the element's number (index) surrounded by square brackets. Arrays are always indexed
starting from 0. In the case of multi-dimensional arrays, you must use an index for each dimension to access an element.

Here are a couple of examples:

firstElement = grades[0]; //firstElement = 'A'
fifthElement = grades[4]; //fifthElement = 'F'
row2Col1 = coordinates[1][0]; //row2Col1 = 0.2

The following snippet of code demonstrates the use of arrays. It creates an array of 5 int elements called intArray,
then uses a for loop to store the integers 0 through 4 in the elements of the array:

int[] intArray = new int [5];
int index;
for (index = 0; index < 5; index++)
intArray [index] = index;

We will discuss for loops in a later section. Basically this code uses the loop to increment the index variable from 0 to 4, and at every pass, it stores its
value in the element of intArray indexed by index.


Strings


A string is a sequence of characters. Java uses the String data type to store strings. This data type is a member of the
java.lang package, which we will study in the "Java Class Libraries" chapter. That means that it is not a built-in type; if
you want to declare a variable of type String, you must use the java.lang package. We will learn more about packages in
the "Object-Oriented Programming in Java" chapter.

A String variable, once initialized, cannot be changed. How can it be a variable and yet cannot be changed? Recall that a
variable is jus