Java Language Basics
Table Of Contents
What are identifiers, and what are the
restrictions on their declaration?
What is a literal?
What is an escape sequence?
What are Java's keywords?
What is a code block?
What is an expression?
What are Java's operators?
What data types does Java support? How do Java's data types
differ from those of C/C++?
Java Syntax
Before you can effectively read or write programs in any language, you need to know about
the language's syntax rules
and restrictions. A language's syntax defines the way programs are written in that
language; more specifically, the syntax
of the language defines the language elements, the way these elements are used, and the
way they are used together. The
following lists the typical language elements and shows how the language syntax is
concerned with these elements:
Identifiers: How are variable names composed? What are the naming restrictions and
conventions?
Literals: How are constant names composed? How are their values assigned?
Keywords: What are the language's predefined words? How are they used and how are they not
used?
Statements: What is a statement and how is one written?
Code blocks: How are statements grouped together?
Comments: How can the programmer add comments and notes to the program?
Expressions: What is an expression and how is one written?
Operators: What are the operators used in the language? How are they used in expressions?
Can a programmer
define his/her own operators?
Identifiers
An identifier is a name that uniquely identifies a variable, a method, or a class (we will
discuss variables later in this
chapter; methods and classes are discussed in a later chapter). In most languages, there
are restrictions on how identifiers
are composed. The following lists Java's restrictions on identifiers:
All identifiers must begin with a letter, an underscore ( _ ), or a dollar sign ($)
An identifier can include, but not begin with numbers
An identifier cannot include a white space (tab, space, linefeed, or carriage return)
Identifiers are case-sensitive
Java keywords cannot be used as identifiers
Since some C library names begin with an underscore or a dollar sign, it is best to avoid
beginning an identifier name with
these characters. Importing a C library into a program that uses an underscore or a dollar
sign to start an identifier name
might cause name clashing and confusion.
In addition to these restrictions, certain conventions are used with identifiers to make
them more readable. Although these
conventions do not affect the compiler in any way, it is considered a good programming
practice to follow them. The
following table lists some of these conventions based on the type of identifier:
Type of Identifier
Convention
Examples
Class name
The first letter of each word is
capitalized
Mammal, SeaMammal
Function name
The first letter of each, except the
first, word is capitalized
getAge, setHeight
Variable name
The first letter of each, except the
first, word is capitalized
age, brainSize
Constant names
Every letter is capitalized and
underscores are used between words
MAX_HEIGHT, MAX_AGE
Literals
A literal, or constant, represents a value that never changes. Think of an identifier as
something that represents a value,
whereas a literal is a value. For example, the number 35 is a literal; the identifier age
represents a number which could
be 35. In Java, a literal can be a number (integer or floating-point), a Boolean, a
character, or a string.
Integer Literals
Integer literals are written in three formats: decimal (base 10), hexadecimal (base 16),
and octal (base 8). Decimal literals
are written as ordinary numbers, hexadecimal literals always begin with 0X or 0x, and
octal literals begin with 0. For
example, the decimal number 10 is 0xA or 0XA in hexadecimal format, and 012 in octal
format.
An integer literal can be stored in the data types byte, short, int, or long. By default,
Java stores integer literals in the int
data type, which is restricted to 32-bits.
To store an integer literal in the long data type, which can store 64-bit values, add the
character l or L to the end of the
literal. For example, the literal 9999L is stored as long. The following lines of code use
integer literals:
int x = 12345; //12345 is a literal
int y = x * 4; //4 is a literal
In the first line, the literal 12345 is stored directly in the int variable x. In the
second line, the literal 4 is used to compute a
value first, which in turn is stored in the int variable y.
Note that even though an integer literal represents a constant value, it can still be
assigned to an integer variable. Think of
the variable as a storage unit that at any one time can represent a single literal value.
This also applies to the other literal
types.
Floating-Point Literals
A floating-point literal is a number with a decimal point and/or exponent. A
floating-point literal is written in either
standard or scientific notation. For example, 123.456 is in standard notation, while
1.23456e+2 is in scientific
notation.
Floating-point literals are stored in the 64-bit double type (the default type), or the
32-bit float type. To store a
floating-point literal in the float type, append the letter f or F to the end of the
number.
Boolean Literals
A Boolean literal represents two possible states: true or false. Boolean literals are
stored in the data type boolean. Unlike
C/C++ where the states of a Boolean value are represented by 0 (false) and 1 (true), Java
represents these states using
the keywords true and false.
Character Literals
A character literal represents a single Unicode character. Character literals are always
surrounded by single quotes; for
example, 'A' and '9' are character literals. Java uses the char type to store single
characters.
The Unicode character set is a 16-bit set that supplants the 8-bit ASCII set. The Unicode
set can define up to 65,536
values, which is enough to include symbols and characters from other languages. Check out
the Unicode home page at
www.unicode.org for more information.
Escape Sequences
A special type of character literal is called an escape sequence. Like C/C++, Java uses
escape sequences to represent
special control characters and characters that cannot be printed. An escape sequence is
represented by a backslash (\)
followed by a character code. The following table summarizes these escape sequences:
Character
Escape Sequence
Backslash
\\
Backspace
\b
Carriage return
\r
Continuation
\
Double quote
\"
Form feed
\f
Horizontal tab
\t
Newline
\n
Octal character
\DDD
Single Quote
\'
Unicode character
\uHHHH
An octal character is represented by a sequence of three octal digits, and a Unicode
character is represented by a
sequence of four hexadecimal digits. For example, the decimal number 57 is represented by
the octal code \071, and
the Unicode sequence \u0039.
To illustrate the use of escape sequences, the string in the following statement prints
out the words Name and ID
separated by two tabs on one line, and prints out "Joe Smith" and
"999", also separated by two tabs, on the
second line:
String escapeDemo = new String ("Name\t\tID\n\"Joe\
Smith\"\t\t\"999\"");
Note that this statement is intentionally written on two lines; therefore, the
continuation character (\) is used to prevent a
compiler error.
String Literals
A string literal represents a sequence of characters. Strings in Java are always enclosed
in double quotes. Java handles
strings differently than
C/C++; the latter represents a string using an array of characters, while the former uses
the classes String and
StringBuffer. So, of all the literal types we've discussed, only string literals are
stored as objects by default. Strings are
covered in more detail in the section "Arrays and Strings."
Keywords
A keyword is a predefined identifier that has a special meaning to the Java compiler, and
which cannot be redefined. The
following is a list of Java's keywords:
abstract
else
int
static
boolean
extends
interface
super
break
false
long
switch
byte
final
native
synchronized
byvalue*
finally
new
this
case
float
null
throw
cast*
for
operator*
throws
catch
future*
outer*
transient
char
generic*
package
true
class
goto*
private
try
const*
if
protected
var*
continue
implements
public
void
default
import
rest*
volatile
do
inner*
return
while
double
instanceof
short
* Reserved but not being used.
As you may have noticed, many of Java's keywords are borrowed from C/C++. Also, as in
C/C++, keywords are
always written in lowercase. Generally speaking, Java's keywords can be categorized
according to their function as
follows (examples are in parenthesis):
Data declaration keywords (boolean, float, int)
Loop keywords (continue, while, for)
Conditional keywords (if, else, switch)
Exception keywords (try, throw, catch)
Structure keywords (class, extends, implements)
Modifier and access keywords (private, public, transient)
Miscellaneous keywords (true, null, super)
Statements
A statement represents a single command, or line of code, for the compiler. This doesn't
mean, however, that each line of
code is a statement; in other words, there is no one-to-one mapping between physical lines
of codes and statements. As
we will see later in this chapter, some statements, such as an if statement, can be
composed of multiple lines of code.
So, if a statement can take up multiple physical lines, how does the compiler know where
each statement ends and the
next begins? By using semicolons to separate statements.
The Java compiler is not concerned with the length of each statement, as long as
statements are always separated by
semicolons. For example, the following two statements are equivalent:
x = (y + z) / q; //statement 1
x =
(y + z
) / q; //statement 2
The second statement has whitespace characters embedded in it (whitespace characters are
the space, horizontal and
vertical tabs, form-feed, and new-line). Although the Java compiler ignores all whitespace
characters embedded in
statements, it is obviously bad practice to do that since it makes the code difficult to
read.
Recall that in the case of string values, the continuation character ( \ ) must be used at
the end of each line to allow strings
to take up multiple lines.
Code Blocks
A code block is a grouping of statements that behave as a unit. Java delimits code blocks
with braces ({ and }).
Examples of code blocks are class definitions, loop statements, condition statements,
exception statements, and function
bodies. In the following section of code, there are three code blocks: the function
frmResolver(), the try block,
and the catch block.
public frmResolver() {
try {
jbInit();
}
catch (Exception e) {
e.printStackTrace();
}
}
The above code also illustrates the concept of nested blocks: the try and catch blocks are
nested inside the main
frmResolver() block.
Comments
Comments are natural-language statements written by the programmer to make notes about the
code. There are three
styles of comments in Java. The first one begins with /* and ends with */, and allows you
to write comments that span
multiple lines. This style is the same as in the C language.
The following code demonstrates the use of this style:
x = y + z; /* This is a comment.*/
z = q / p; /*This comment
extends over two lines*/
When the Java compiler encounters /*, it ignores every character that follows it until it
encounters */.
The second comment style is similar to the first one, only it begins with /** and ends
with */. The difference is that this
style is used with the JDK tool javadoc to automatically generate documentation from the
source code (Java
documentation is beyond the scope of this course).
The third comment style is borrowed from C++. It begins with // and can be written on just
one line. Here's an example:
x = y + z; //This comment cannot extend over multiple lines
Nesting comments is valid only when comments of the third style are embedded in one of the
other two styles. Nesting
comments of the first two styles is illegal.
Here is an invalid nested comment:
/*This is the beginning of the comment
/*
The comment ends here
*/
this is outside the comment and will generate a compiler
error
*/
As we mentioned earlier, the compiler ignores everything between /* and */; so when it
encounters the first */ it thinks
that the comment ended. The last line in the code is therefore not contained in the
comment.
The following is an example of a valid nested comment:
/*This is the beginning of the comment
//This is OK
//so is this
this is the end of the comment.
*/
Expressions
An expression is a meaningful combination of identifiers, keywords, symbols, and operators
that has a value of some sort.
Generally speaking, everything that can be used on the right side of an assignment sign is
an expression.
Here are some examples of expressions:
s = "Hello World";
x = 123.4;
y = (x * 5) / 2;
value = getValue();
Mammal m = new Mammal();
From the previous examples, we can categorize expressions into the following:
Variable assignments: The first two expressions assign values to the variables s and x.
Operator expressions: The third expression is an example of this. Operator expressions use
combinations of
variables, literals, method calls, operators, and symbols. We will examine this kind in
the next section.
Method calls: The fourth expression is a call to the method getValue(), which returns a
value that is assigned to
value.
Object allocation: The last expression allocates memory for the Mammal object m. Think of
object allocation
expressions as special method call expressions. We will cover both types of expressions in
more detail in the
"Object-Oriented Programming in Java" chapter.
Operators
Operators are special symbols that perform a particular function on operands. There are
five general types of operators:
arithmetic operators, logical operators, comparison operators, assignment operators, and
bitwise operators. Each of
these can be further categorized into unary and binary. Unary operators operate on a
single operand, while binary
operators operate on two operands.
In the following sections, we will examine the different types of operators. In addition,
we will discuss the operator
associativity and precedence. Precedence determines the priority of operators, while
associativity determines the
operating order of operators of equal precedence used in a single statement.
Arithmetic Operators
Java provides a full set of operators for mathematical calculations. Java, unlike some
languages, can perform mathematical
functions on both integer and floating-point values. You will probably find these
operators familiar.
The following table lists the arithmetic operators:
Operator
Definition
Precedence
Associativity
++/--
Auto-increment/decrement
1
Right
+/-
Unary plus/minus
2
Right
*
Multiplication
4
Left
/
Division
4
Left
%
Modulus
4
Left
+/-
Addition/subtraction
5
Left
The modulus operator returns the remainder of dividing its first operand by its second.
The auto-increment/decrement
operators are unary operators. They modify the value of their operand by adding or
subtracting 1 to their value. When
used in expressions, the outcome of the auto-increment/decrement operation depends on
whether the operator precedes
or succeeds the operand.
The following demonstrates this:
int y = 3, x, z;
x = ++y;
z = y--;
In the second statement, the y variable is incremented by 1, and then its new value (4) is
assigned to x. In the third
statement, the auto-decrement operation takes place following the assignment of y's
current value to z. In other words,
the current value of y (4) is assigned to z, then y is modified to be 3.
The following code illustrates how precedence and associativity affect operators:
int x = 1, y = 2, z = 3, i, j;
i = x + y * z; //same as i = x + (y * z)
j = ++x + -y; //same as j = (++x) + (-y)
i = x++ + -y; //same as i = x++ + (-y)
Logical Operators
Logical (or Boolean) operators allow the programmer to group Boolean expressions to
determine certain conditions.
These operators perform the standard Boolean operations (AND, OR, NOT, and XOR).
The following table lists the logical operators:
OperatorDefinitionPrecedence
Associativity
!
Unary logical
complement (NOT)
2
Right
&
Evaluation AND
9
Left
^
XOR
10
Left
|
Evaluation OR
11
Left
&&
Short-circuit AND
12
Left
||
Short-circuit OR
13
Left
The evaluation operators always evaluate both operands. The short-circuit operators, on
the other hand, always evaluate
the first operand, and if that determines the value of the whole expression, they don't
evaluate the second operand. For
better understanding, consider the following code:
if ( !isHighPressure && (temperature1 > temperature2)) {
}
boolean1 = (x < y) || ( a > b);
boolean2 = (10 > 5) & (5 > 1);
The first statement evaluates !isHighPressure first, if it is false, it does not evaluate
the second operand
(temperature1 > temperature2) since the first operand being false means the whole
expression is false.
The same is true for the second statement-the value of boolean1 will be true only if x is
less than y (the value of the
second operand is never determined). In the third statement, however, the compiler will
compute the values of both
operands before making the assignment to boolean2.
The XOR operator produces a true value only if the operands are of different values (true
and false, or false and true).
Comparison Operators
Programmers need the ability to compare values. Comparison operators, unlike logical
operators, will only evaluate a
singe expression.
The following table lists the comparison operators:
Operator
Definition
Precedence
Associativity
<
Less than
7
Left
>
Greater than
7
Left
<=
Less than or equal
7
Left
>=
Greater than or equal
7
Left
= =
Equal
8
Left
!=
Not equal
8
Left
The equality operator can be used to compare two object variables of the same type
(objects are discussed in the next
chapter). In this case, the result of the comparison is true only if both variables refer
to the same object. Here is a
demonstration:
m1 = new Mammal();
m2 = new Mammal();
boolean b1 = m1 == m2; //b1 is false
m1 = m2;
boolean b2 = m1 == m2; //b2 is true
The result of the first equality test is false because m1 and m2 refer to different
objects (even though they are of the same
type). The second comparison is true because both variables now represent the same object.
Most of the time, however, the equals() method is used to compare objects. This
method, defined in the Object class,
must be implemented in a class subclassed from Object, before objects of the class can be
compared for equality.
Assignment Operators
Java, like all languages, allows you to assign values to variables. The following table
lists assignment operators:
Operator
Definition
Precedence
Associativity
=
Assignment
15
Right
+=
Add and assign
15
Right
-=
Subtract and assign
15
Right
*=
Multiply and assign
15
Right
/=
Divide and assign
15
Right
&=
AND with assignment
15
Right
|=
OR with assignment
15
Right
^=
XOR with assignment
15
Right
The first operator should be familiar by now. The rest of the assignment operators perform
an operation first, and then
store the result of the operation in the operand on the left side of the expression. Here
are some examples:
int y = 2;
y *= 2; //same as (y = y * 2)
boolean b1 = true, b2 = false;
b1 &= b2; //same as (b1 = b1 & b2)
Bitwise Operators
Bitwise operators are of two types: shift operators and Boolean operators. The shift
operators are used to shift the binary
digits of an integer number to the right or the left. Consider the following example (the
short integer type is used instead
of int for conciseness):
short i = 13; //i is 0000000000001101
i = i << 2; //i is 0000000000110100
i >>= 3; //i is 0000000000000110
In the second line, the bitwise left shift operator shifted all the bits of i two
positions to the left. The bitwise right shift
operator then shifted the bits three positions to the right.
The shifting operation is different in Java than in C/C++ -mainly in how it is used
with signed integers. A signed integer is
one whose left-most bit is used to indicate the integer's sign (the bit is 1 if the
integer is negative). In Java, integers are
always signed, whereas in C/C++ they are signed by default. In most implementations of
C/CC, a bitwise shift operation
does not preserve the integer's sign (since the sign bit would be shifted out). In Java,
however, the shift operators
preserve the sign bit (unless you use the >>> to perform an unsigned shift). This
means that the sign bit is duplicated,
then shifted (right shifting 10010011 by 1 is 11001001).
The following is a complete list of Java's bitwise operators:
Operator
Definition
Precedence
Associativity
~
Bitwise complement
2
Right
<<
Signed left shift
6
Left
>>
Signed right shift
6
Left
>>>
Zero-fill right shift (as if unsigned)
6
Left
&
Bitwise AND
9
Left
|
Bitwise OR
10
Left
^
Bitwise XOR
11
Left
<<=
Left-shift with assignment
15
Left
>>=
Right-shift with assignment
15
Left
>>>=
Zero-fill right shift with assignment
15
Left
A Special Operator: the ?: Operator
We said earlier that there are two types of operators: unary and binary. That's not
exactly true. There is also a ternary
operator that Java borrows from C, the ?: operator. Here's the general syntax for this
operator:
expression1? expression2: expression3;
expression1 is first evaluated. If its value is true, expression2 is computed, otherwise
expression3 is.
Here is a demonstration:
int x = 3, y = 4, max;
max = (x > y)? x: y; //this is basically the same as max=x;
In this code, max is assigned the value of x or y, based on whether x is greater than y.
Some people mislabel this operator as being a conditional statement. It is not a
statement. The following invalid code
illustrates why it is not a statement:
(x > y)? max = x: max = y; //can't use it as if it's a statement
Java's Data Types
Data types are entities, which represent specific types of values that can be stored in
memory, and are interpreted in a
specific way by the compiler. We already introduced data types in our discussion about
literals in a previous section. We
mentioned that a literal is stored in a certain data type depending on the literal's
value; the literal 9, for example, can be
stored in the int data type, and the literal 'c' can be stored in the char data type.
There are two categories of data types in Java: built-in and composite data types.
Built-in (or primitive) data types can
be further categorized into three kinds: numeric, Boolean, and character data types.
Built-in data types are understood by
the compiler and don't require special libraries. (A special library basically refers to
any collection of code that is not part
of the actual language definition). Composite types are of two kinds: arrays and strings.
Composite types usually require
the use of special libraries.
Before explaining the different Java data types, we need to discuss variables.
Variables
We defined a data type as something representing a specific value that can be stored in
memory. So, how do you allocate
memory for that value, and how do you access it and assign values to it? To allocate a
portion of memory for the storage
of data types, you must first declare a variable of that data type, then give the variable
a name (identifier) that references
it. Here's the general syntax of a variable declaration:
datatype identifier [ = defaultValue ];
The declaration begins with the type of variable, followed by the variable's identifier,
then followed by an optional default
value assignment. The following are examples of different types of variable declarations:
int p; //declares the variable p to store int data types
float x, y = 4.1, z = 2.2;
boolean endOfFile = false;
char char1 = 'T';
Notice that the second line declared three variables at the same time. You can declare
multiple variables of the same type
at once, as long as you separate the variable identifiers with commas.
A variable can be declared anywhere in a program, as long as its declaration precedes any
reference to it. Java is a
strongly typed language, which means that all variables must be declared before they are
used.
If we attempt to reference the variable x, without declaring it first, we would get a
compiler error:
int y = 4, z = 2;
x = y / z; //What is x? Is it a float, char, int, or what?
In the code example above, the second line generates an error because the compiler does
not know the type of x;
moreover, it does not know where in memory it is stored.
To avoid any problems, such as referencing a variable that does not yet exist, it is best
to declare all variables at the
beginning of the code blocks where they are used. That makes it easier for you to keep
track of all your variables.
Now that we understand what variables are, we can go on to discuss data types.
Built-in Data Types
Numeric Data Types
The numeric data types are summarized in the following table:
Type
Size
Description (smallest and largest positive values)
byte
8 bits
very small signed integer (-128 Þ 127)
short
16
bits
short signed integer (-32768 Þ 32767)
int
32
bits
signed integer (-2.14e+9 Þ 2.14e+9)
long
64
bits
long signed integer (-9.22e+18 Þ 9.22e+18)
float
32
bits
floating-point number (1.402e-45 Þ 3.402e+38)
double
64
bits
double precision floating-point (4.94e-324 Þ
1.79e+308)
If a numeric variable is not initialized by the programmer, the Java VM will automatically
initialize it to 0. Most Java
compilers will also detect uninitialized variables. This is different than C/C++, where
uninitialized variables contain random
values and are not detected by the compiler.
Boolean Data Types
A Boolean data type has two values: true and false. Unlike C/C++, which stores Boolean
data types numerically (0 for
false and 1 for true), Java uses the built-in data type boolean. Uninitialized boolean
variables are automatically set to
false. The following code illustrates the use of a boolean variable:
int a = 1, b = 0;
boolean bool = a < b; //bool is false
Character Data Types
Java uses the data type char to store a single Unicode character. Java's char type,
therefore, is 16-bit wide, whereas in
C/C++, it is (by default) 8-bits wide.
Composite Data Types
Arrays
An array is a data structure, which can hold multiple elements of the same type. The
array's element type can be anything:
a primitive type, a composite type, or a user-defined class. If you have used arrays in
other languages, you will probably
find the way Java handles arrays interesting. Let's first see some examples of array
declarations:
int studentID[];
char[] grades;
float coordinates[][];
There are two things to note about these array declarations:
The array size is not specified in most other languages the array's size must be included
in its declaration.
The placement of the square brackets can follow the identifier, as in the first example,
or follow the data type, as in
the second example.
Creating and Initializing Arrays
The previous array declarations did not actually allocate any memory for the arrays (they
simply declared identifiers that
will eventually store actual arrays). For that reason, the sizes of the arrays were not
specified.
To actually allocate memory for the array variables, you must use the new operator as
follows:
int studentID[] = new int[20];
char[] grades = new char[20];
float[][] coordinates = new float[10][5];
The first statement creates an array of 20 int elements, the second creates an array of 20
char elements, and the third
creates a two-dimensional 10 by 5 float array (10 rows, 5 columns). When the array is
created, all its elements are null.
The use of the new operator in Java is similar to using the malloc command in C and the
new operator in C++.
To initialize an array, the values of the array elements are enumerated inside a set of
curly braces. For multi-dimensional
arrays, nested curly braces are used.
The following statements illustrate this:
char[] grades = {'A', 'B', 'C', 'D', 'F');
float[][] coordinates = {{0.0, 0.1}, {0.2, 0.3}};
The first statement creates a char array called grades. It initializes the array's
elements with the values 'A' through 'F'.
Notice that we did not have to use the new operator to create this array; by initializing
the array, enough memory is
automatically allocated for the array to hold all the initialized values. Therefore, the
first statement creates a char array of
5 elements.
The second statement creates a two-dimensional float array called coordinates, whose size
is 2 by 2. The array's
first row is initialized to 0.0 and 0.1, and the second row to 0.2 and 0.3. Conceptually,
coordinates is an array of
two array elements.
Accessing Array Elements
Array elements are accessed by subscripting (or indexing) the array variable. Indexing an
array variable involves following
the array variable's name with the element's number (index) surrounded by square brackets.
Arrays are always indexed
starting from 0. In the case of multi-dimensional arrays, you must use an index for each
dimension to access an element.
Here are a couple of examples:
firstElement = grades[0]; //firstElement = 'A'
fifthElement = grades[4]; //fifthElement = 'F'
row2Col1 = coordinates[1][0]; //row2Col1 = 0.2
The following snippet of code demonstrates the use of arrays. It creates an array of 5 int
elements called intArray,
then uses a for loop to store the integers 0 through 4 in the elements of the array:
int[] intArray = new int [5];
int index;
for (index = 0; index < 5; index++)
intArray [index] = index;
We will discuss for loops in a later section. Basically this code uses the loop to
increment the index variable from 0 to 4, and at every pass, it stores its
value in the element of intArray indexed by index.
Strings
A string is a sequence of characters. Java uses the String data type to store strings.
This data type is a member of the
java.lang package, which we will study in the "Java Class Libraries" chapter.
That means that it is not a built-in type; if
you want to declare a variable of type String, you must use the java.lang package. We will
learn more about packages in
the "Object-Oriented Programming in Java" chapter.
A String variable, once initialized, cannot be changed. How can it be a variable and yet
cannot be changed? Recall that a
variable is jus