How to build your own Container
Introduction
The LXC is the famous virtualization tool on Linux. Internally, LXC relies on 3 main isolation infrastructure of the Linux Kernel:
- Chroot
- Cgroups
- Namespaces
In this technical note, I focus on the third one: Namespace. After the version 3.12 of Linux kernel, Linux supports 6 Namespaces:
- UTS: hostname
- IPC: inter-process communication
- PID: "chroot" process tree
- NS: mount points, first to land in Linux
- NET: network access, including interfaces (in a future post)
- USER: map virtual, local user-ids to real local ones (in a future post)
By the way, the content of this note is mainly from Yet another enthusiast blog!, including code examples. The main purpose of this note is for self-understanding virtualization.
Namespace: UTS
The clone system call
When we use the clone system call to create the new process from the parent process, there is a flag parameter. Therefore, let's view the definition of the clone system call:
/* Prototype for the glibc wrapper function */
#define _GNU_SOURCE
#include <sched.h>
int clone(int (*fn)(void *), void *child_stack,
int flags, void *arg, ...
/* pid_t *ptid, void *newtls, pid_t *ctid */ );
Firstly, briefing some important parameters is essential. The first one is the function pointer which is used for running in the new process, just like the main function in the parent's process as we usually do normal C programming. The second one is the stack memory that is used by the new process. Then there is the third one, flags. This is very important for Namespace technology because these six kinds of namespace need to be deployed with this flag.
CLONE_NEWUTS
As the description above showed us, if we want to use any kind of namespace function with calling clone, we have to set the specific flag. Therefore the flag "CLONE_NEWUTS" is our target, the interpretation below is cited from linux mannualL
CLONE_NEWUTS (since Linux 2.6.19):
If CLONE_NEWUTS is set, then create the process in a new UTS namespace, whose identifiers are initialized by duplicating the identifiers from the UTS namespace of the calling process. If this flag is not set, then (as with fork(2)) the process is created in the same UTS namespace as the calling process. This flag is intended for the implementation of containers.
A UTS namespace is the set of identifiers returned by uname(2); among these, the domain name and the hostname can be modified by setdomainname(2) and sethostname(2), respectively. Changes made to the identifiers in a UTS namespace are visible to all other processes in the same namespace, but are not visible to processes in other UTS namespaces.
Only a privileged process (CAP_SYS_ADMIN) can employ CLONE_NEWUTS.
The Simple Scenario
How to use it? When this flag is set, when we use sethostname to set the new host name, the new process will be in a different host name compared with the parent's process. The example code is below:
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char child_stack[STACK_SIZE];
char* const child_args[] = {
"/bin/bash",
NULL
};
int child_main(void* arg)
{
printf(" - World !\n");
sethostname("Alexander Namespace", 12);
execv(child_args[0], child_args);
printf("Ooops\n");
return 1;
}
int main()
{
printf(" - Hello ?\n");
int child_pid = clone(child_main, child_stack+STACK_SIZE,
CLONE_NEWUTS | SIGCHLD, NULL);
waitpid(child_pid, NULL, 0);
return 0;
}
As the code showed us, after calling child_main from the new process, the new sparrowed bash is under the new namespace. When you hit the "exit" command, the original bash with the original namespace will be back.