Friday, 21 December 2012

Makefile

Makefiles

by example


Compiling your source code files can be tedious, specially when you want to include several source files and have to type the compiling command everytime you want to do it.
Well, I have news for you... Your days of command line compiling are (mostly) over, because YOU will learn how to write Makefiles.
Makefiles are special format files that together with the make utility will help you to automagically build and manage your projects.
For this session you will need these files:

I recommend creating a new directory and placing all the files in there. note: I use g++ for compiling. You are free to change it to a compiler of your choice

The make utility

If you run
make
this program will look for a file named makefile in your directory, and then execute it.
If you have several makefiles, then you can execute them with the command:
make -f MyMakefile
There are several other switches to the make utility. For more info, man make.

Build Process

  1. Compiler takes the source files and outputs object files
  2. Linker takes the object files and creates an executable

Compiling by hand

The trivial way to compile the files and obtain an executable, is by running the command:
g++ main.cpp hello.cpp factorial.cpp -o hello

The basic Makefile

The basic makefile is composed of:

target: dependencies
[tab] system command
This syntax applied to our example would look like:
all:
 g++ main.cpp hello.cpp factorial.cpp -o hello
[Download here] To run this makefile on your files, type:
make -f Makefile-1
On this first example we see that our target is called all. This is the default target for makefiles. The make utility will execute this target if no other one is specified.
We also see that there are no dependencies for target all, so make safely executes the system commands specified.
Finally, make compiles the program according to the command line we gave it.

Using dependencies

Sometimes is useful to use different targets. This is because if you modify a single file in your project, you don't have to recompile everything, only what you modified.
Here is an example:
all: hello

hello: main.o factorial.o hello.o
 g++ main.o factorial.o hello.o -o hello

main.o: main.cpp
 g++ -c main.cpp

factorial.o: factorial.cpp
 g++ -c factorial.cpp

hello.o: hello.cpp
 g++ -c hello.cpp

clean:
 rm -rf *o hello

[Download here] Now we see that the target all has only dependencies, but no system commands. In order for make to execute correctly, it has to meet all the dependencies of the called target (in this case all).
Each of the dependencies are searched through all the targets available and executed if found.
In this example we see a target called clean. It is useful to have such target if you want to have a fast way to get rid of all the object files and executables.

Using variables and comments

You can also use variables when writing Makefiles. It comes in handy in situations where you want to change the compiler, or the compiler options.
# I am a comment, and I want to say that the variable CC will be
# the compiler to use.
CC=g++
# Hey!, I am comment number 2. I want to say that CFLAGS will be the
# options I'll pass to the compiler.
CFLAGS=-c -Wall

all: hello

hello: main.o factorial.o hello.o
 $(CC) main.o factorial.o hello.o -o hello

main.o: main.cpp
 $(CC) $(CFLAGS) main.cpp

factorial.o: factorial.cpp
 $(CC) $(CFLAGS) factorial.cpp

hello.o: hello.cpp
 $(CC) $(CFLAGS) hello.cpp

clean:
 rm -rf *o hello
[Download here] As you can see, variables can be very useful sometimes. To use them, just assign a value to a variable before you start to write your targets. After that, you can just use them with the dereference operator $(VAR).

Where to go from here

With this brief introduction to Makefiles, you can create some very sophisticated mechanism for compiling your projects. However, this is just a tip of the iceberg. I don't expect anyone to fully understand the example presented below without having consulted some Make documentation (which I had to do myself) or read pages 347 to 354 of your Unix book.
CC=g++
CFLAGS=-c -Wall
LDFLAGS=
SOURCES=main.cpp hello.cpp factorial.cpp
OBJECTS=$(SOURCES:.cpp=.o)
EXECUTABLE=hello

all: $(SOURCES) $(EXECUTABLE)
 
$(EXECUTABLE): $(OBJECTS) 
 $(CC) $(LDFLAGS) $(OBJECTS) -o $@

.cpp.o:
 $(CC) $(CFLAGS) $< -o $@


[Download here] If you understand this last example, you could adapt it to your own personal projects changing only 2 lines, no matter how many additional files you have !!!.

Wednesday, 12 December 2012

basics linux stuff

1) What is Linux?
Linux is an operating system based on UNIX, and was first introduced by Linus Torvalds. It is based on the Linux Kernel, and can run on different hardware platforms manufactured by Intel, MIPS, HP, IBM, SPARC and Motorola. Another popular element in Linux is its mascot, a penguin figure named Tux.

2) What is the difference between UNIX and LINUX?
Unix originally began as a propriety operating system from Bell Laboratories, which later on spawned into different commercial versions. On the other hand, Linux is free, open source and intended as a non-propriety operating system for the masses.
3) What is BASH?
BASH is short for Bourne Again SHell. It was written by Steve Bourne as a replacement to the original Bourne Shell (represented by /bin/sh). It combines all the features from the original version of Bourne Shell, plus additional functions to make it easier and more convenient to use. It has since been adapted as the default shell for most systems running Linux.
4) What is Linux Kernel?

The Linux Kernel is a low-level systems software whose main role is to manage hardware resources for the user. It is also used to provide an interface for user-level interaction.
5) What is LILO?
LILO is a boot loader for Linux. It is used mainly to load the Linux operating system into main memory so that it can begin its operations.
6) What is a swap space?
A swap space is a certain amount of space used by Linux to temporarily hold some programs that are running concurrently. This happens when RAM does not have enough memory to hold all programs that are executing.
7) What is the advantage of open source?
Open source allows you to distribute your software, including source codes freely to anyone who is interested. People would then be able to add features and even debug and correct errors that are in the source code. They can even make it run better, and then redistribute these enhanced source code freely again. This eventually benefits everyone in the community.
8 ) What are the basic components of Linux?
Just like any other typical operating system, Linux has all of these components: kernel, shells and GUIs, system utilities, and application program. What makes Linux advantageous over other operating system is that every aspect comes with additional features and all codes for these are downloadable for free.
9) Does it help for a Linux system to have multiple desktop environments installed?
In general, one desktop environment, like KDE or Gnome, is good enough to operate without issues. It’s all a matter of preference for the user, although the system allows switching from one environment to another. Some programs will work on one environment and not work on the other, so it could also be considered a factor in selecting which environment to use.
10) What is the basic difference between BASH and DOS?
The key differences between the BASH and DOS console lies in 3 areas:
- BASH commands are case sensitive while DOS commands are not;
- under BASH, / character is a directory separator and \ acts as an escape character. Under DOS, / serves as a command argument delimiter and \ is the directory separator
- DOS follows a convention in naming files, which is 8 character file name followed by a dot and 3 character for the extension. BASH follows no such convention.
11) What is the importance of the GNU project?
This so-called Free software movement allows several advantages, such as the freedom to run programs for any purpose and freedom to study and modify a program to your needs. It also allows you to redistribute copies of a software to other people, as well as freedom to improve software and have it released to the public.
12) Describe the root account.
The root account is like a systems administrator account, and allows you full control of the system. Here you can create and maintain user accounts, assigning different permissions for each account. It is the default account every time you install Linux.
13) What is CLI?
CLI is short for Command Line Interface. This interface allows user to type declarative commands to instruct the computer to perform operations. CLI offers an advantage in that there is greater flexibility. However, other users who are already accustom with using GUI find it difficult to remember commands including attributes that come with it.
14) What is GUI?
GUI, or Graphical User Interface, makes use of images and icons that users click and manipulate as a way of communicating with the computer. Instead of having to remember and type commands, the use of graphical elements makes it easier to interact with the system, as well as adding more attraction through images, icons and colors.
15) How do you open a command prompt when issuing a command?
To open the default shell (which is where the command prompt can be found), press Ctrl-Alt-F1. This will provide a command line interface (CLI) from which you can run commands as needed.
16) How can you find out how much memory Linux is using?
From a command shell, use the “concatenate” command: cat /proc/meminfo for memory usage information. You should see a line starting something like: Mem: 64655360, etc. This is the total memory Linux thinks it has available to use.
17) What is typical size for a swap partition under a Linux system?
The preferred size for a swap partition is twice the amount of physical memory available on the system. If this is not possible, then the minimum size should be the same as the amount of memory installed.
18) What are symbolic links?
Symbolic links act similarly to shortcuts in Windows. Such links point to programs, files or directories. It also allows you instant access to it without having to go directly to the entire pathname.
19) Does the Ctrl+Alt+Del key combination work on Linux?
Yes, it does. Just like Windows, you can use this key combination to perform a system restart. One difference is that you won’t be getting any confirmation message and therefore, reboot is immediate.
20) How do you refer to the parallel port where devices such as printers are connected?
Whereas under Windows you refer to the parallel port as the LPT port, under Linux you refer to it as /dev/lp . LPT1, LPT2 and LPT3 would therefore be referred to as /dev/lp0, /dev/lp1, or /dev/lp2 under Linux.
21) Are drives such as harddrive and floppy drives represented with drive letters?
No. In Linux, each drive and device has different designations. For example, floppy drives are referred to as /dev/fd0 and /dev/fd1. IDE/EIDE hard drives are referred to as /dev/hda, /dev/hdb, /dev/hdc, and so forth.
22) How do you change permissions under Linux?
Assuming you are the system administrator or the owner of a file or directory, you can grant permission using the chmod command. Use + symbol to add permission or – symbol to deny permission, along with any of the following letters: u (user), g (group), o (others), a (all), r (read), w (write) and x (execute). For example the command chmod go+rw FILE1.TXT grants read and write access to the file FILE1.TXT, which is assigned to groups and others.
23) In Linux, what names are assigned to the different serial ports?
Serial ports are identified as /dev/ttyS0 to /dev/ttyS7. These are the equivalent names of COM1 to COM8 in Windows.
24) How do you access partitions under Linux?
Linux assigns numbers at the end of the drive identifier. For example, if the first IDE hard drive had three primary partitions, they would be named/numbered, /dev/hda1, /dev/hda2 and /dev/hda3.
25) What are hard links?
Hard links point directly to the physical file on disk, and not on the path name. This means that if you rename or move the original file, the link will not break, since the link is for the file itself, not the path where the file is located.
26) What is the maximum length for a filename under Linux?
Any filename can have a maximum of 255 characters. This limit does not include the path name, so therefore the entire pathname and filename could well exceed 255 characters.
27)What are filenames that are preceded by a dot?
In general, filenames that are preceded by a dot are hidden files. These files can be configuration files that hold important data or setup info. Setting these files as hidden makes it less likely to be accidentally deleted.
28) Explain virtual desktop.
This serves as an alternative to minimizing and maximizing different windows on the current desktop. Using virtual desktops, each desktop is a clean slate where you can open one or more programs. Rather than minimizing/restoring all those programs as needed, you can simply shuffle between virtual desktops with programs intact in each one.
29) How do you share a program across different virtual desktops under Linux?
To share a program across different virtual desktops, in the upper left-hand corner of a program window look for an icon that looks like a pushpin. Pressing this button will “pin” that application in place, making it appear in all virtual desktops, in the same position onscreen.
30) What does a nameless (empty) directory represent?
This empty directory name serves as the nameless base of the Linux file system. This serves as an attachment for all other directories, files, drives and devices.
31) What is the pwd command?
The pwd command is short for print working directory command. It’s counterpart in DOS is the cd command, and is used to display the current location in the directory tree.
32) What are daemons?
Daemons are services that provide several functions that may not be available under the base operating system. Its main task is to listen for service request and at the same time to act on these requests. After the service is done, it is then disconnected and waits for further requests.
33) How do you switch from one desktop environment to another, such as switching from KDE to Gnome?
Assuming you have these two environments installed, just log out from the graphical interface. Then at the Log in screen, type your login ID and password and choose which session type you wish to load. This choice will remain your default until you change it to something else.
34) What are the kinds of permissions under Linux?
There are 3 kinds of permissions under Linux:
- Read: users may read the files or list the directory
- Write: users may write to the file of new files to the directory
- Execute: users may run the file or lookup a specific file within a directory
35) How does case sensitivity affect the way you use commands?
When we talk about case sensitivity, commands are considered identical only if every character is encoded as is, including lowercase and uppercase letters. This means that CD, cd and Cd are three different commands. Entering a command using uppercase letters, where it should be in lowercase, will produce different outputs.
36) What are environmental variables?
Environmental variables are global settings that control the shell’s function as well as that of other Linux programs. Another common term for environmental variables is global shell variables.
37) What are the different modes when using vi editor?
There are 3 modes under vi:
- Command mode – this is the mode where you start in
- Edit mode – this is the mode that allows you to do text editing
- Ex mode – this is the mode wherein you interact with vi with instructions to process a file
38) Is it possible to use shortcut for a long pathname?
Yes, there is. A feature known as filename expansion allows you do this using the TAB key. For example, if you have a path named /home/iceman/assignments directory, you would type as follows: /ho[tab]/ice[tab]/assi[tab] . This, however, assumes that the path is unique, and that the shell you’re using supports this feature.
39) What is redirection?
Redirection is the process of directing data from one output to another. It can also be used to direct an output as an input to another process.
40) What is grep command?
grep a search command that makes use of pattern-based searching. It makes use of options and parameters that is specified along the command line and applies this pattern into searching the required file output.
41) What could possibly be the problem when a command that was issued gave a different result from the last time it was used?
One highly possible reason for getting different results from what seems to be the same command has something to do with case sensitivity issues. Since Linux is case sensitive, a command that was previously used might have been entered in a different format from the present one. For example, to lists all files in the directory, you should type the command ls, and not LS. Typing LS would either result in an error message if there is no program by that exact name exist, or may produce a different output if there is a program named LS that performs another function.
42) What are the contents in /usr/local?
It contains locally installed files. This directory actually matters in environments where files are stored on the network. Specifically, locally-installed files go to /usr/local/bin, /usr/local/lib, etc.). Another application of this directory is that it is used for software packages installed from source, or software not officially shipped with the distribution.
43) How do you terminate an ongoing process?
Every process in the system is identified by a unique process id or pid. Use the kill command followed by the pid in order to terminate that process. To terminate all process at once, use kill 0.
44) How do you insert comments in the command line prompt?
Comments are created by typing the # symbol before the actual comment text. This tells the shell to completely ignore what follows. For example: “# This is just a comment that the shell will ignore.”
45) What is command grouping and how does it work?
You can use parentheses to group commands. For example, if you want to send the current date and time along with the contents of a file named OUTPUT to a second file named MYDATES, you can apply command grouping as follows: (date cat OUTPUT) > MYDATES
46) How do you execute more than one command or program from a single command line entry?
You can combine several commands by separating each command or program using a semicolon symbol. For example, you can issue such a series of commands in a single entry:
ls –l cd .. ls –a MYWORK
which is equivalent to 3 commands:
ls -l
cd..
ls -a MYWORK
**Note that this will be executed one after the other, in the order specified.
47) Write a command that will look for files with an extension “c”, and has the occurrence of the string “apple” in it.
Answer: Find ./ -name “*.c” | xargs grep –i “apple”
48) Write a command that will display all .txt files, including its individual permission.
Answer: ls -a -l *.txt
49) Write a command that will do the following:
-look for all files in the current and subsequent directories with an extension c,v
-strip the,v from the result (you can use sed command)
-use the result and use a grep command to search for all occurrences of the word ORANGE in the files.

Find ./ -name “*.c,v” | sed ‘s/,v//g’ | xargs grep “ORANGE”
50) What, if anything, is wrong with each of the following commands?
a) ls -l-s
b) cat file1, file2
c) ls – s Factdir

Answers:
a) there should be space between the 2 options: ls -l -s
b) do not use commas to separate arguments: cat file1 file2
c) there should be no space between hyphen and option label: ls –s Factdir

Monday, 10 December 2012

strings-part-1

Strings as arrays, as pointers, and string.h


  1. Strings as arrays: In C, the abstract idea of a string is implemented with just an array of characters. For example, here is a string:
    char label[] = "Single";
    
    What this array looks like in memory is the following:
    ------------------------------
    | S | i | n | g | l | e | \0 |
    ------------------------------
    
    where the beginning of the array is at some location in computer memory, for example, location 1000.
    Note: Don't forget that one character is needed to store the nul character (\0), which indicates the end of the string.
    A character array can have more characters than the abstract string held in it, as below:
    char label[10] = "Single";
    
    giving an array that looks like:
    ------------------------------------------
    | S | i | n | g | l | e | \0 |   |   |   |
    ------------------------------------------
    
    (where 3 array elements are currently unused).
    Since these strings are really just arrays, we can access each character in the array using subscript notation, as in:
    printf("Third char is: %c\n", label[2]);
    
    which prints out the third character, n. A disadvantage of creating strings using the character array syntax is that you must say ahead of time how many characters the array may hold. For example, in the following array definitions, we state the number of characters (either implicitly or explicitly) to be allocated for the array.
    char label[] = "Single";  /* 7 characters */
    
    char label[10] = "Single";
    
    Thus, you must specify the maximum number of characters you will ever need to store in an array. This type of array allocation, where the size of the array is determined at compile-time, is called static allocation.
  2. Strings as pointers: Another way of accessing a contiguous chunk of memory, instead of with an array, is with a pointer.
    Since we are talking about strings, which are made up of characters, we'll be using pointers to characters, or rather, char *'s.
    However, pointers only hold an address, they cannot hold all the characters in a character array. This means that when we use a char * to keep track of a string, the character array containing the string must already exist (having been either statically- or dynamically-allocated).
    Below is how you might use a character pointer to keep track of a string.
    char label[] = "Single";
    char label2[10] = "Married";
    char *labelPtr;
    
    labelPtr = label;
    
    We would have something like the following in memory (e.g., supposing that the array label started at memory address 2000, etc.):
    label @2000
    ------------------------------
    | S | i | n | g | l | e | \0 |
    ------------------------------
    
    label2 @3000
    ------------------------------------------
    | M | a | r | r | i | e | d | \0 |   |   |
    ------------------------------------------
    
    labelPtr @4000
    --------
    | 2000 |
    --------
    

    Note: Since we assigned the pointer the address of an array of characters, the pointer must be a character pointer--the types must match. Also, to assign the address of an array to a pointer, we do not use the address-of (&) operator since the name of an array (like label) behaves like the address of that array in this context. That's also why you don't use an ampersand when you pass a string variable to scanf(), e.g,
    int id;
    char name[30];
    
    scanf("%d%s", &id, name);
    

    Now, we can use labelPtr just like the array name label. So, we could access the third character in the string with:
    printf("Third char is: %c\n", labelPtr[2]);
    
    It's important to remember that the only reason the pointer labelPtr allows us to access the label array is because we made labelPtr point to it. Suppose, we do the following:
    labelPtr = label2;
    
    Now, no longer does the pointer labelPtr refer to label, but now to label2 as follows:
    label2 @3000
    ------------------------------------------
    | M | a | r | r | i | e | d | \0 |   |   |
    ------------------------------------------
    
    labelPtr @4000
    --------
    | 3000 |
    --------
    
    So, now when we subscript using labelPtr, we are referring to characters in label2. The following:
    printf("Third char is: %c\n", labelPtr[2]);
    
    prints out r, the third character in the label2 array.
  3. Passing strings: Just as we can pass other kinds of arrays to functions, we can do so with strings.
    Below is the definition of a function that prints a label and a call to that function:
    void PrintLabel(char the_label[])
    {
        printf("Label: %s\n", the_label);
    }
    
    ...
    
    int main(void)
    {
      char label[] = "Single";
      ...
      PrintLabel(label);
      ...
    }
    
    Since label is a character array, and the function PrintLabel() expects a character array, the above makes sense.
    However, if we have a pointer to the character array label, as in:
    char *labelPtr = label;
    
    then we can also pass the pointer to the function, as in:
    PrintLabel(labelPtr);
    
    The results are the same. Why??
    Answer: When we declare an array as the parameter to a function, we really just get a pointer. Plus, arrays are always automatically passed by reference (e.g., a pointer is passed).
    So, PrintLabel() could have been written in two ways:
    void PrintLabel(char the_label[])
    {
        printf("Label: %s\n", the_label);
    }
    
    OR
    
    void PrintLabel(char *the_label)
    {
        printf("Label: %s\n", the_label);
    }
    
    There is no difference because in both cases the parameter is really a pointer.
    Note: In C, there is a difference in the use of brackets ([]) when declaring a global, static or local array variable versus using this array notation for the parameter of a function. With a parameter to a function, you always get a pointer even if you use array notation. This is true for all types of arrays.

  4. Dynamically-allocated string: Since sometimes you do not know how big a string is until run-time, you may have to resort to dynamic allocation.
    The following is an example of dynamically-allocating space for a string at run-time:
    #include <stdlib.h>  /* for malloc/free */
    
    ...
    
    void SomeFunc(int length)
    {
      char *str;
    
      /* Don't forget extra char for nul character. */
    
      str = (char *)malloc(sizeof(char) * (length+1));
    
      ...
    
    
    Basically, we've just asked malloc() (the allocation function) to give us back enough space for a string of the desired size. Malloc() takes the number of bytes needed as its parameter. Above, we need the size of one character times the number of characters we want (don't forget the extra +1 for the nul character).
    We keep track of the dynamically-allocated array with a pointer and can use that pointer as we used pointers to statically-allocated arrays above (i.e., how we access individual characters, pass the string to a function, etc. are the same).
    Now, how do we get a string value into this newly-allocated array?
  5. string.h library: Recall that strings are stored as arrays (allocated either statically or dynamically). Furthermore, the only way to change the contents of an array in C is to make changes to each element in the array.
    In other words, we can't do the following:
    label = "new value";   /* No! */
    label = anotherLabel;  /* Wrong! */
    
    (where anotherLabel is a string variable).

    Aside: We could do that if label was a character pointer (instead of an array); however, what would be happening is the pointer would be taking on the address of a different string, which is not the same as changing the contents of an array.
    It would be annoying to have to do something like:
    char name[10];
    
    name[0] = 'R';
    name[1] = 'o';
    name[2] = 'b';
    name[3] = '\0';
    
    or to write loops all the time to do common string operations... Plus, we'd probably forget the nul character half the time.
    The C library string.h has several common functions for dealing with strings. The following four are the most useful ones that we'll discuss:
    • strlen(str) Returns the number of characters in the string, not including the nul character.
    • strcmp(str1, str2) This function takes two strings and compares them. If the strings are equal, it returns 0. If the first is greater than the 2nd, then it returns some value greater than 0. If the first is less than the 2nd, then it returns some value less than 0.
      You might use this function as in:
      #include <string.h>
      
      char str1[] = "garden";
      
      if (strcmp(str1, "apple") == 0)
        printf("Equal\n");
      else 
        printf("Not equal\n");
      
      OR
      
      if (strcmp(str1, "eden") > 0) 
        printf("'%s' comes after 'eden'\n", str1);
      
      The ordering for strings is lexical order based on the ASCII value of characters. Remember that the ASCII value of 'A' and 'a' (i.e., upper/lowercase) are not the same.
      An easy way to remember how to use strcmp() to compare 2 strings (let's say a and b) is to use the following mnemonics:
      Want...Use...
      a == b strcmp(a, b) == 0
      a < b strcmp(a, b) < 0
      a >= b strcmp(a, b) >= 0
      ... ...
    • strcpy(dest, source) Copies the contents of source into dest, as in:
      #include <string.h>
      
      char str1[10] = "initvalue";
      
      strcpy(str1, "second");
      
      Now, the string str1 contains the following:
      -------------------------------------------
      | s | e | c | o | n | d | \0 | u | e | \0 |
      -------------------------------------------
      
      and the word "initvalue" has been overwritten. Note that it is the first nul character (\0) that determines the end of the string.
      When using strcpy(), make sure the destination is big enough to hold the new string.

      Aside: An easy way to remember that the destination comes first is because the order is the same as for assignment, e.g:
      dest = source
      
      Also, strcpy() returns the destination string, but that return value is often ignored.

    • strcat(dest, source) Copies the contents of source onto the end of dest, as in:
      #include <string.h>
      
      char str2[10] = "first";
      
      strcat(str2, " one");
      
      Now, the string str2 contains the following:
      ------------------------------------------
      | f | i | r | s | t |   | o | n | e | \0 |
      ------------------------------------------
      
      When using strcat(), make sure the destination is big enough to hold the extra characters.

      Aside: Function strcat() also returns the destination string, but that return value is often ignored.

Segmentation Fault

There are times when you write a small or a big code and when you execute it you get a very small and precise output 'Segmentation fault'. In a small piece of code its still easy to debug the reason for this but as the code size grows it becomes very difficult to debug. Here in this article, I am providing some example scenarios which will demonstrate some reasons because of which a segmentation fault can occur.


Meaning of Segmentation Fault



Before jumping on to the actual scenarios, lets quickly discuss what does Segmentation Fault means?

A segmentation fault occurs mainly when our code tries to access some memory location which it is not suppose to access.

For example :
  1. Working on a dangling pointer.
  2. Writing past the allocated area on heap.
  3. Operating on an array without boundary checks.
  4. Freeing a memory twice.
  5. Working on Returned address of a local variable
  6. Running out of memory(stack or heap)

Examples of Segmentation Fault in C



1) Working on a dangling pointer.

Well, before discussing this scenario of segmentation fault, lets understand what is dangling pointers. A pointer which holds memory address of a memory which is already freed is known as a dangling pointer. You cannot figure out whether a given pointer is dangling or not until you use it. When a dangling pointer is used, usually a segmentation fault is observed.

Now, lets look at a code to understand it :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
  
 int main(void) 
 { 
     char *p = malloc(3); 
  
     *p = 'a'; 
     *(p+1) = 'b'; 
     *(p+2) = 'c'; 
  
     free(p); 
  
     *p = 'a'; 
  
     return 0; 
 }
In the code above, we have malloc'd 3 bytes on heap and stored the address of first byte in a pointer 'p'. Next we initialized these three bytes. Next we freed this memory and after that we are trying to use this memory again. Well this is not permitted as once a memory is freed, it no longer belongs to our process. Though, if you run the above code, it may not give a segmentation fault immediately as free() returns the memory to heap and now its up to the implementation of heap to take it back to its pool. Once its taken back to heap by kernel then the code above will start giving segmentation faults.


Lets take another example :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
 #include<string.h> 
  
 int main(void) 
 { 
     char *p ; 
  
     strcat(p, "abc"); 
     printf("\n %s \n", p); 
  
     return 0; 
 }
In the code above, we have pointer 'p', to which we have not allocated any memory. Now we use the garbage address held by the pointer 'p' in the function 'strcat()'. So in the implementation of strcat(), whenever 'p' is accessed, it will give a segmentation fault.

A yet another example could be :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
 #include<string.h> 
  
  
 void func(char ** argv) 
 { 
     char arr[2]; 
     strcpy(arr, argv[1]); 
  
     return; 
 } 
  
 int main(int argc, char *argv[]) 
 { 
     func(argv); 
     return 0; 
 }
In the above code, we try to access the second argument from command line in the function func() without even checking whether the user has even provided the second argument or not. If the user did not provide then argv[1] will point to a location that our code does not have access to. Hence, in that case we will definitely get a segmentation fault.

2)Writing past the allocated area on heap.

There are times when a logic inadvertently writes past the allocated area on heap. This may happen while performing some operations in a loop or not doing array bound checks etc. So this type of situation also results in a segmentation fault. For example, look at the following code :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
  
 int main(void) 
 { 
     char *p = malloc(3); 
  
     int i = 0; 
  
     for(i=0;i<0Xffffffff;i++) 
     { 
         p[i] = 'a'; 
     } 
  
     printf("\n %s \n", p); 
  
     return 0; 
 }
In the example above, we allocate some bytes to pointer 'p' but try to write way past these bytes in a loop. So, the result we get is a segmentation fault.

3) Operating on an array without boundary checks.

In this scenario, the logic is flawed in a way that an array is written out of its boundary limits and in a rare scenario (or in case of exploits), this buffer overflow may result in overwriting the return address(ie the address to return after executing the present function). And hence returning on a garbage address and executing the instruction kept there may very well cause segmentation fault.

Lets look the following code :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
 #include<string.h> 
  
  
 void func(char ** argv) 
 { 
     char arr[2]; 
     strcpy(arr, argv[1]); 
  
     return; 
 } 
  
 int main(int argc, char *argv[]) 
 { 
     func(argv); 
     return 0; 
 }
In the code above, we are passing the command line argument array to function func(). Inside the function func(), we try to copy the second command line argument (with index '1') into the array arr. The problem here is the function we use to copy. We use strcpy() which has no concern with the capacity of array arr. This function will not detect or prevent a buffer overflow. So if we try to enter very huge string through this logic presented above, we will definitely overwrite the return address kept in the stack of this function and will cause a segmentation fault to happen.

Here is the output of the code above(I tried to run it twice with different command line args) :

Code:
 ~/practice $ ./segfault abc 
  ~/practice $
Code:
~/practice $ ./segfault abcjflcnmscn,snlkewfdebddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd 
 Segmentation fault
As can be clearly seen, in the first attempt, the code worked fine but in the second attempt, a large command line argument probably overwrote the return address stored on stack of the function func() and hence when the control went back to this overwritten value any damn thing could have caused a segmentation fault as this memory location mostly(until you are very lucky) does not belong to our process.

4) Freeing a memory twice.

This is a bit specific to function free() but is a very common reason for segmentation faults to occur. The specification of free() specifies that if this function is used again on an already freed pointer, the results are undefined and mostly we see a segmentation fault in this scenario.

Lets quickly see the code :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
 #include<string.h> 
  
 int main(int argc, char *argv[]) 
 { 
     char *p = malloc(8); 
  
  
     free(p); 
     free(p); 
     return 0; 
 }
As clearly seen in the code above, we have allocated memory once but we have freed it back to back twice. This is wrong practice and should be avoided.

5) Working on Returned address of a local variable

I this scenario, the address of a local variable is returned to calling function and this address is used there. For example, consider the following code :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
 #include<string.h> 
  
  
 char* func() 
 { 
     char c = 'a'; 
     return &c; 
 } 
  
 int main(int argc, char *argv[]) 
 { 
     char *ptr = func(); 
     char arr[10]; 
     memset(arr,'0', sizeof(arr)); 
     arr[0] = *ptr; 
     return 0; 
 }
In the code above, we did exactly the same now since we all know that the stack of the function is unwind-ed when func() returns. So, its fatal to use any address out of a function stack that has already been unwind-ed. This may cause a segmentation fault.

6) Running out of memory.

It may happen that we run out of memory. This memory can be a stack memory or a heap memory. Lets consider the following two examples to understand this :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
 #include<string.h> 
  
 int main() 
 { 
     main(); 
 }
If you execute the code above, it continuously calls on main() recursively. So with every call, a stack is formed and none of these stacks is ever unwind-ed as we never stop calling main. So at a point all of the process's stack memory gets eaten up and then we get segmentation fault.

Consider example of heap memory :

Code:
 #include<stdio.h> 
 #include<stdlib.h> 
 #include<string.h> 
  
  
 int main() 
 { 
    unsigned int i = 0; 
  
    for(i=0; i< 0xFFFFFFFF;i++); 
         char *p = malloc(100); 
  
    return 0; 
 }
The code above may cause you program to segment fault once all the heap memory is consumed.

Conclusion



To conclude, in this article we studied the different ways in which one could screw up his/her program and get segmentation fault. This article should act as a good resource for those who are getting this error but not able to understand the reason.

Stay tuned for more!!!

Tell the World You Just Read: Reasons For Segmentation Fault In C ...

Sunday, 9 December 2012


Process Memory Concepts

1.One of the most basic resources a process has available to it is memory. There are a lot of different ways systems organize memory, but in a typical one, each process has one linear virtual address space, with addresses running from zero to some huge maximum. It need not be contiguous; i.e., not all of these addresses actually can be used to store data.

2.The virtual memory is divided into pages (4 kilobytes is typical). Backing each page of virtual memory is a page of real memory (called a frame) or some secondary storage, usually disk space.

3.The disk space might be swap space or just some ordinary disk file. Actually, a page of all zeroes sometimes has nothing at all backing it – there’s just a flag saying it is all zeroes. The same frame of real memory or backing store can back multiple virtual pages be- longing to multiple processes. 

4.This is normally the case, for example, with virtual memory occupied by GNU C Library code. The same real memory frame containing the printf function backs a virtual memory page in each of the existing processes that has a printf call in its program.

5.In order for a program to access any part of a virtual page, the page must at that moment be backed by (“connected to”) a real frame.

-->But because there is usually a lot more virtual memory than real memory, the pages must move back and forth between real memory and backing store regularly, coming into real memory when a process needs to access them and then retreating to backing store when not needed anymore. This movement is called paging.

6.When a program attempts to access a page which is not at that moment backed by real memory, this is known as a page fault.

-->When a page fault occurs, the kernel suspends the process, places the page into a real page frame (this is called “paging in” or “faulting in”), then resumes the process so that from the process’ point of view, the page was in real memory all along.

7. In fact, to the process, all pages always seem to be in real memory. Except for one thing: the elapsed execution time of an instruction that would normally be a few nanoseconds is suddenly much, much, longer (because the kernel normally has to do I/O to complete the page-in). For programs sensitive to that, the functions  can control it. Within each virtual address space, a process has tokeep track of what is at which addresses, and that process is called memory allocation. 

8.Allocation usually brings to mind meting out scarce resources, but in the case of virtual memory, that’s not a major goal, because there is generally much more of it than anyone needs. Memory allocation within a process is mainly just a matter of making sure that the same byte of memory isn’t used tostore two different things.

9.Processes allocate memory in two major ways: by exec and programmatically. Actually, forking is a third way, but it’s not very interesting. Exec is the operation of creating a virtual address space for a process, loading its basic program into it, and executing the program.

-->It is done by the “exec” family of functions (e.g. execl). The operation takes a program file (an executable), it allocates space to load all the data in the executable, loads it, and transfers control to it. That data is most notably the instructions of the program (the text), but also literals and constants in the program and even some variables

-->C variables with the static storage class  Once that program begins to execute, it uses programmatic allocation to gain additional memory. In a C program with the GNU C Library, there are two kinds of programmatic allocation: automatic and dynamic.

10.Memory-mapped I/O is another form of dynamic virtual memory allocation. Mapping memory to a file means declaring that the contents of certain range of a process’ addresses shall be identical to the contents of a specified regular file.

-->The system makes the virtual memory initially contain the contents of the file, and if you modify the memory, the system writes the same modification to the file. Note that due to the magic of virtual memory and page faults, there is no reason for the system to do I/O to read the file, or allocate real memory for its contents, until the program accesses the virtual memory.

11.Just as it programmatically allocates memory, the program can programmatically deal- locate (free) it. You can’t free the memory that was allocated by exec. When the program exits or execs, you might say that all its memory gets freed, but since in both cases the address space ceases to exist, the point is really moot.

12.Program Termination A process’ virtual address space is divided into segments. A segment is a contiguous range of virtual addresses. Three important segments are:
•The text segment contains a program’s instructions and literals and static constants.It is allocated by exec and stays the same size for the life of the virtual address space.
• The data segment is working storage for the program. It can be preallocated and preloaded by exec and the process can extend or shrink it by calling functions [Resizing the Data Segment].Its lower end is
fixed.
• The stack segment contains a program stack. It grows as the stack grows, but doesn’t
shrink when the stack shrinks.
------->Allocating Storage For Program Data
This section covers how ordinary programs manage storage for their data, including the famous malloc function and some fancier facilities special the GNU C Library and GNU
Compiler.

-->Memory Allocation in C Programs
The C language supports two kinds of memory allocation through the variables in C pro-
grams:
• Static allocation is what happens when you declare a static or global variable. Each
static or global variable defines one block of space, of a fixed size. The space is allocated once, when your program is started (part of the exec operation), and is never freed.
• Automatic allocation happens when you declare an automatic variable, such as a func-tion argument or a local variable. The space for an automatic variable is allocated
when the compound statement containing the declaration is entered, and is freed when that compound statement is exited.
In GNU C, the size of the automatic storage can be an expression that varies. In other C implementations, it must be a constant.
A third important kind of memory allocation, dynamic allocation, is not supported by
C variables but is available via GNU C Library functions.
-->Dynamic Memory Allocation
Dynamic memory allocation is a technique in which programs determine as they are running where to store some information. You need dynamic allocation when the amount of memory you need, or how long you continue to need it, depends on factors that are not known beforethe program runs.

-->For example, you may need a block to store a line read from an input file; since there is no limit to how long a line can be, you must allocate the memory dynamically and make it dynamically larger as you read more of the line.
-->May need a block for each record or each definition in the input data; since you can’t know in advance how many there will be, you must allocate a new block for each record or definition as you read it.


--->When you use dynamic allocation, the allocation of a block of memory is an action that the program requests explicitly. You call a function or macro when you want to allocate space, and specify the size with an argument. If you want to free the space, you do so by calling another function or macro. You can do these things whenever you want, as often as you want.

--->Dynamic allocation is not supported by C variables; there is no storage class “dynamic”, and there can never be a C variable whose value is stored in dynamically allocated space.

--->The only way to get dynamically allocated memory is via a system call (which is generally via a GNU C Library function call), and the only way to refer to dynamically allocated space is through a pointer.

-->Because it is less convenient, and because the actual process of dynamic allocation requires more computation time, programmers generally use dynamicallocation only when neither static nor automatic allocation will serve.
For example, if you want to allocate dynamically some space to hold a struct foobar,you cannot declare a variable of type struct foobar whose contents are the dynamicallyallocated space.

-->But you can declare a variable of pointer type struct foobar * and assignit the address of the space. Then you can use the operators ‘*’ and ‘->’ on this pointer
variable to refer to the contents of the space:
{
struct foobar *ptr
= (struct foobar *) malloc (sizeof (struct foobar));
ptr->name = x;
ptr->next = current_foobar;
current_foobar = ptr;
}
--->Unconstrained Allocation
The most general dynamic allocation facility is malloc. It allows you to allocate blocks ofmemory of any size at any time, make them bigger or smaller at any time, and free theblocks individually at any time (or never).
-->Basic Memory Allocation
To allocate a block of memory, call malloc. The prototype for this function is in ‘stdlib.h’.
void * malloc (size t size)
[Function]
This function returns a pointer to a newly allocated block size bytes long, or a nullpointer if the block could not be allocated.The contents of the block are undefined; you must initialize it yourself (or use calloc
the value as a pointer to the kind of object that you want to store in the block. Herewe show an example of doing so, and of initializing the space with zeros using the library
function memset
struct foo *ptr;
...
ptr = (struct foo *) malloc (sizeof (struct foo));
if (ptr == 0) abort ();
memset (ptr, 0, sizeof (struct foo));
You can store the result of malloc into any pointer variable without a cast, becauseISO C automatically converts the type void * to another type of pointer when necessary.But the cast is necessary in contexts other than assignment operators or if you might wantyour code to run in traditional C.
Remember that when allocating space for a string, the argument to malloc must be oneplus the length of the string. This is because a string is terminated with a null characterthat doesn’t count in the “length” of the string but does need space. For example:
char *ptr;
...
ptr = (char *) malloc (length + 1);

---> Examples of malloc
If no more space is available, malloc returns a null pointer. You should check the value ofevery call to malloc. It is useful to write a subroutine that calls malloc and reports anerror if the value is a null pointer, returning only if the value is nonzero. This function isconventionally called xmalloc. Here it is:
void *
xmalloc (size_t size)
{
register void *value = malloc (size);
if (value == 0)
fatal ("virtual memory exhausted");
return value;
}
Here is a real example of using malloc (by way of xmalloc). The function savestringwill copy a sequence of characters into a newly allocatednull terminated string:
char *
savestring (const char *ptr, size_t len)
{
register char *value = (char *) xmalloc (len + 1);
value[len] = ’\0’;
return (char *) memcpy (value, ptr, len);
}


--->The block that malloc gives you is guaranteed to be aligned so that it can hold anytype of data. On GNU systems, the address is always a multiple of eight on most systems,and a multiple of 16 on 64-bit systems.

-->Only rarely is any higher boundary (such as apage boundary) necessary; for those cases, use memalign, posix_memalign or valloc Note that the memory located after the end of the block is likely to be in use for somethingelse; perhaps a block already allocated by another call to malloc.

--> If you attempt to treat the block as longer than you asked for it to be, you are liable to destroy the data thatmalloc uses to keep track of its blocks, or you may destroy the contents of another block.If you have already allocated a block and discover you want it to be bigger, use realloc
--->Freeing Memory Allocated with malloc
When you no longer need a block that you got with malloc, use the function free to make the block available to be allocated again. The prototype for this function is in ‘stdlib.h’.
--->void free (void *ptr)
The free function deallocates the block of memory pointed at by ptr.
void cfree (void *ptr)
This function does the same thing as free. It’s provided for backward compatibilitywith SunOS; you should use free instead.

Freeing a block alters the contents of the block. Do not expect to find any data (such asa pointer to the next block in a chain of blocks) in the block after freeing it. Copy whateveryou need out of the block before freeing it! Here is an example of the proper way to free all the blocks in a chain, and the strings that they point to:
struct chain
{
struct chain *next;
char *name;
}
void
free_chain (struct chain *chain)
{
while (chain != 0)
{
struct chain *next = chain->next;
free (chain->name);
free (chain);
chain = next;
}
}
Occasionally, free can actually return memory to the operating system and make theprocess smaller. Usually, all it can do is allow a later call to malloc to reuse the space. Inthe meantime, the space remains in your program as part of a free-list used internally by malloc.

---->There is no point in freeing blocks at the end of a program, because all of the program’s space is given back to the system when the process terminates.

-->Changing the Size of a BlockOften you do not know for certain how big a block you will ultimately need at the time youmust begin to use the block. For example, the block might be a buffer that you use to holda line being read from a file; no matter how long you make the buffer initially, you mayencounter a line that is longer.
You can make the block longer by calling realloc. This function is declared in
‘stdlib.h’.
-->void * realloc (void *ptr, size t newsize)
The realloc function changes the size of the block whose address is ptr to be newsize.Since the space after the end of the block may be in use, realloc may find it necessary to copy the block to a new address where more free space is available.

--->The value of realloc is the new address of the block. If the block needs to be moved, realloc copies the old contents.If you pass a null pointer for ptr, realloc behaves just like ‘malloc (newsize)’.

-->This can be convenient, but beware that older implementations (before ISO C) may not support this behavior, and will probably crash when realloc is passed a null pointer.

--->Like malloc, realloc may return a null pointer if no memory space is available to make the block bigger. When this happens, the original block is untouched; it has not been modified or relocated.

---> In most cases it makes no difference what happens to the original block when realloc fails, because the application program cannot continue when it is out of memory, and the only thing to do is to give a fatal error message. Often it is convenient to write and use a subroutine, conventionally called xrealloc, that takes care of the error message as xmalloc
does for malloc:
void *
xrealloc (void *ptr, size_t size)
{
register void *value = realloc (ptr, size);
if (value == 0)
fatal ("Virtual memory exhausted");
return value;
}
You can also use realloc to make a block smaller. The reason you would do this is to avoid tying up a lot of memory space when only a little is needed.

-->In several allocation implementations, making a block smaller sometimes necessitates copying it, so it can fail if no other space is available. If the new size you specify is the same as the old size, realloc is guaranteed to change
nothing and return the same address that you gave.
-->Allocating Cleared Space
The function calloc allocates memory and clears it to zero. It is declared in ‘stdlib.h’.
void * calloc (size t count, size t eltsize)
This function allocates a block long enough to contain a vector of count elements, each of size eltsize. Its contents are cleared to zero before calloc returns.
You could define calloc as follows:
void *
calloc (size_t count, size_t eltsize)
{
size_t size = count * eltsize;
void *value = malloc (size);
if (value != 0)
memset (value, 0, size);
return value;
}
But in general, it is not guaranteed that calloc calls malloc internally. Therefore, if an application provides its own malloc/realloc/free outside the C library, it should always define calloc, too.
-->Efficiency Considerations for malloc
As opposed to other versions, the malloc in the GNU C Library does not round up block sizes to powers of two, neither for large nor for small sizes. Neighboring chunks can be coalesced on a free no matter what their size is.

--->This makes the implementation suitable for all kinds of allocation patterns without generally incurring high memory waste through fragmentation. Very large blocks (much larger than a page) are allocated with mmap (anonymous or via /dev/zero) by this implementation.

--->This has the great advantage that these chunks are returned to the system immediately when they are freed. Therefore, it cannot happen that a large chunk becomes “locked” in between smaller ones and even after calling free wastes memory. The size threshold for mmap to be used can be adjusted with mallopt. The use of mmap can also be disabled completely.

---> Allocating Aligned Memory Blocks
The address of a block returned by malloc or realloc in GNU systems is always a multiple of eight (or sixteen on 64-bit systems). If you need a block whose address is a multiple of a higher power of two than that, use memalign, posix_memalign, or valloc. memalign is declared in ‘malloc.h’ and posix_memalign is declared in ‘stdlib.h’.

----->With the GNU C Library, you can use free to free the blocks that memalign, posix_ memalign, and valloc return. That does not work in BSD, however—BSD does not provide any way to free such blocks.
void * memalign (size t boundary, size t size)

--->The memalign function allocates a block of size bytes whose address is a multiple of boundary. The boundary must be a power of two! The function memalign works by allocating a somewhat larger block, and then returning an address within the block that is on the specified boundary.
int posix_memalign (void **memptr, size t alignment, size t size)

---->The posix_memalign function is similar to the memalign function in that it returns a buffer of size bytes aligned to a multiple of alignment. But it adds one requirementto the parameter alignment: the value must be a power of two multiple of sizeof (void *).
----->If the function succeeds in allocation memory a pointer to the allocated memory is returned in *memptr and the return value is zero. Otherwise the function returns an error value indicating the problem.
void * valloc (size t size)
Using valloc is like using memalign and passing the page size as the value of the
second argument. It is implemented like this:
void *
valloc (size_t size)
{
return memalign (getpagesize (), size);
}
---> [How to get information about the memory subsystem?],
more information about the memory subsystem.
-->Malloc Tunable Parameters
You can adjust some parameters for dynamic memory allocation with the mallopt function. This function is the general SVID/XPG interface, defined in ‘malloc.h’. int mallopt (int param, int value)

---->When calling mallopt, the param argument specifies the parameter to be set, and
value the new value to be set. Possible choices for param, as defined in ‘malloc.h’,
are:
M_TRIM_THRESHOLD
This is the minimum size (in bytes) of the top-most, releasable chunk
that will cause sbrk to be called with a negative argument in order to
return memory to the system.
M_TOP_PAD
This parameter determines the amount of extra memory to obtain from
the system when a call to sbrk is required. It also specifies the number of
bytes to retain when shrinking the heap by calling sbrk with a negative
argument. This provides the necessary hysteresis in heap size such that
excessive amounts of system calls can be avoided.
Summary of malloc-Related Functions
------>Here is a summary of the functions that work with malloc:
void *malloc (size_t size)
Allocate a block of size bytes.
void free (void *addr)
Free a block previously allocated by malloc.
void *realloc (void *addr, size_t size)
Make a block previously allocated by malloc larger or smaller, possibly by
copying it to a new location.
void *calloc (size_t count, size_t eltsize)
Allocate a block of count * eltsize bytes using malloc, and set its contents to
zero.
void *valloc (size_t size)
Allocate a block of size bytes, starting on a page boundary.
void *memalign (size_t size, size_t boundary)
Allocate a block of size bytes, [Allocating Aligned Memory Blocks]
int mcheck (void (*abortfn) (void))
Tell malloc to perform occasional consistency checks on dynamically allocated
memory, and to call abortfn when an inconsistency is found.
-->Virtual Memory Allocation And Paging

void *(*__malloc_hook) (size_t size, const void *caller)
A pointer to a function that malloc uses whenever it is called.
void *(*__realloc_hook) (void *ptr, size_t size, const void *caller)
A pointer to a function that realloc uses whenever it is called.
void (*__free_hook) (void *ptr, const void *caller)
A pointer to a function that free uses whenever it is called.
void (*__memalign_hook) (size_t size, size_t alignment, const void *caller)
A pointer to a function that memalign uses whenever it is called.
struct mallinfo mallinfo (void)




Thursday, 19 April 2012