From: eLinux.org
This page describes optimizations to a large application and to the kernel, to shorten the time required to load and execute an application.
Two main techniques are described here: 1) use of mmap vs. read and 2) control over page mapping characteristics. These techniques are discussed below.
Kernel bootup time is drastically improved with recent efforts including CELF activities. As a next step, application bootup time should be considered to cut down the system total bootup time. The techniques described here are applicable to a large number of embedded systems, which consist of large, single-application programs.
An application may load a large amount of data when it is first initialized. This can result in a long delay as the file data is read into memory. It is possible to avoid the initial cost of this read, by using mmap() instead of read().
Instead of loading all of the data into memory with the read system call, the file can be mapped into memory with the mmap system call. Once the data file is mapped, individual pages will be demand loaded during execution, when the application reads them. Depending on the initial working set size of the data in the file, this can result in significant time savings. (For example, if an application only initially uses 50% of the data from the file, then only 50% of the data will be read into memory from persistent storage. There is extra overhead due to the cost of page-faults incurred in loading the pages on demand. However, this page fault overhead is offset by the savings in the number of page reads (compared to the read() case).
To further improve this method, the kernel can be modified to reduce page copying and page faults.
When pages are demand loaded to a memory-mapped file, the pages are kept in memory as part of the kernel “file cache” and mapped into the requesting process's address space. If the page is accessed via a write operation, then the page in the file system cache is copied to a newly allocated memory page. (This is referred to as "copy-on-write"). The copied page can be then be freely modified by the process which maps it.
Suppose, however, that a file is mapped or accessed by only one process. Then, copying the page is redundant. In this case, we can convert the page in the file cache to a private page immediately. By utilizing this assumption (only one user for the page), the cost of the copy can be eliminated. This has the side benefit of reducing memory consumption as well.
In some cases, an individual page in the process address space is accessed first with a read operation, then with a write operation. This results in two page faults for the same page (one to load the page and move it "through" the file cache, and the other to get a local copy of the page.) By eliminating the page copy, and making the page private on the first access (whether read or write), the second page fault can be reduced.
The current system is experimental, in the way it manages the files affected by this caching/virtual memory customization. It would be better to control this mechanism per file or virtual memory area. The fcntl system call or mmap system call are candidates where this control could be introduced.
None.
Sorry but there is no available patch at this time.
Hardware'
SH3(7709) 133MHz
32MB RAM
64MB CF memory
Software Kernel
Target application
Methods explanation
Results
No. | Method | Media | FS | Ave. | 1st | 2nd | 3rd | Diff. |
---|---|---|---|---|---|---|---|---|
1 | read | CF | ext3 | 4.420 | 4.418 | 4.420 | 4.421 | - |
2 | mmap | CF | ext3 | 3.995 | 3.995 | 3.995 | 3.996 | -0.424 |
3 | takeover | CF | ext3 | 3.959 | 3.959 | 3.958 | 3.966 | -0.461 |
4 | takeover | CF | squash | 4.002 | 4.000 | 4.000 | 4.007 | -0.417 |
5 | takeover(total) | RD | squash | 4.588 | 4.579 | 4.590 | 4.595 | 0.168 |
dd(CF -> RD) | RD | squash | 1.212 | 1.209 | 1.209 | 1.217 | ||
mount | RD | squash | 0.041 | 0.040 | 0.041 | 0.041 | ||
takeover | RD | squash | 3.336 | 3.330 | 3.340 | 3.337 |
device which stores the file system image is enough fast and extra RAM usage is affordable, it might be a good choice to reduce bootup time.
Status: measured
Architecture Support:
i386: unknown
ARM: unknown
PPC: unknown
MIPS: unknown
SH: works on SH3
Here is a list of things that could be worked on for this feature:
This project was demo-ed at the 2005 CELF Technical Conference. The picture of the poster is here: