What is the fundamental reason that the OS needs to enforce memory protection?

When a programmer writes a program for a computer having a 32-bit address space, what limitations on memory are imposed?

Many commercial computer designs have eventually failed because they did not provide a sufficient number of bits for the physical memory address.

(a)

Why is this an important parameter?

(b)

Why is it difficult to increase the number of address bits in an existing computer architecture?

What benefit would accrue from increasing the number of address bits from 32? Why might address spaces greater than 32 bits be required?

Assume a strict least-recently used page replacement policy is implemented by hardware that writes the time of the last access of a page into the Page Table entry. In principle, how might the memory management system make use of this information? How might the determination of the oldest page be speeded?

The LFU page replacement policy has an additional weakness not mentioned in the text. It results from the fact that during the time that the task is initialized, the pages containing the initialization code are used frequently. Consider the time following the task initialization and suggest why the LFU policy has a weakness.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780750650649500171

Processes and Operating Systems

Marilyn Wolf, in High-Performance Embedded Computing (Second Edition), 2014

4.4.1 Memory management in embedded operating systems

Memory management is more often associated with general-purpose than real-time operating systems, but as we have noted, RTOSs are often called upon to perform general-purpose tasks. An RTOS may provide memory management for several reasons:

•

Memory mapping hardware can protect the memory spaces of the processes when outside programs are run on the embedded system.

•

Memory management can allow a program to use a large virtual address space.

The next example describes the memory management structure of Windows CE.

Example 4.1

Memory Management in Windows CE

Windows CE [Bol07] is designed as a real-time, full-featured operating system for lightweight consumer devices. Windows desktop applications cannot run on Windows CE directly, but the operating system is designed to simplify porting Windows applications to Windows CE.

Windows CE supports virtual memory; the paging memory can be supplied by flash memory as well as more traditional devices such as disks. The operating system supports a flat 32-bit virtual address space. The bottom 2 GB of the address space is for user processes, while the top 2 GB is for the kernel. The kernel address space is statically mapped into the address space. The top 1 GB of the user’s space is reserved for system elements while the bottom 1 GB holds the user code, data, stack, and heap.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124105119000046

System-Level Design and Hardware/Software Co-design

Marilyn Wolf, in High-Performance Embedded Computing (Second Edition), 2014

7.3.7 Memory systems

Memory accesses dominate many embedded applications. Several co-synthesis techniques have been developed to optimize the memory system.

Co-synthesizing cache sizes and code placement

Li and Wolf [Li99] developed a co-synthesis algorithm that determines the proper cache size to tune the performance of the memory system. Their target architecture was a bus-based multiprocessor. As described in Section 4.2.5, they used a simple model for processes in the instruction cache. A process occupied a contiguous range of addresses in the cache; this assumes that the program has a small kernel that is responsible for most of the execution cycles. They used a binary variable κi to represent the presence or absence of a process in the cache: κi = 1 if process i is in the cache and 0 otherwise.

Their algorithm uses simulation data to estimate the execution times of programs. Co-synthesis would be infeasible if a new simulation had to be run for every new cache configuration. However, when direct-mapped caches change in size by powers of two, the new placement of programs in the cache is easy to determine. The example in Figure 7.19 shows several processes originally placed in a 1 KB direct-mapped cache. If we double the cache size to 2 KB, then some overlaps disappear but new overlaps cannot be created. As a result, we can easily predict cache conflicts for larger caches based on the simulation results from a smaller cache.

FIGURE 7.19. How code placement changes with cache size [Li99].

During co-synthesis, they compute the total system cost as

(EQ 7.17)C(system)=∑i∈CPUsC(CPUi)+C(icachej)+C(dcachei)+∑j∈ASICs[C(ASICj)+C(dcachej)]+∑k∈linksC(commlinkk)

where C(x) is the cost of component x.

The authors use a hierarchical scheduling algorithm that builds the full hyperperiod schedule using tasks, then individually moves processes to refine the schedule. Co-synthesis first finds an allocation that results in a feasible schedule, then reduces the cost of the hardware. To reduce system cost, it tries to move processes from lightly loaded PEs to other processing elements; once all the processes have been removed from a PE, that processing element can be eliminated from the system. It also tries to reduce the cost of a PE by reducing its cache size. However, when processes are moved onto a PE, the size of that processing element’s cache may have to grow to maintain the schedule’s feasibility.

Because the execution time of a process is not constant, we must find a measure other than simple CPU time to guide allocation decisions. Dynamic urgency describes how likely a process is to reuse the cache state to reduce misses:

(EQ 7.18)DU(taski,PEi)=SU(taski)−max(ready(taski)−available(taski))+[medianWCETbase(taski)−WCET(taski,PEi)]

In this formula, SU is the static urgency of a task, or the difference between the execution time and its deadline; the worst-case execution times are measured relative to the current cache configuration.

Memory management

Wuytack et al. [Wuy99] developed a methodology for the design of memory management for applications, such as networking, that require dynamic memory management. Their methodology refined the memory system design through the following several steps.

The application is defined in terms of abstract data types (ADTs).

The ADTs are fined into concrete data structures. The proper data structures are chosen based on size, power consumption, and so on.

The virtual memory is divided among one or more virtual memory managers. Data structures can be grouped or separated based on usage. For example, some data structures that are lined to each other may be put together.

The virtual memory segments are split into basic groups. The groups are organized to allow parallel access to data structures that require high memory performance.

Background memory accesses are ordered to optimize memory bandwidth; this step looks at scheduling conflicts.

The physical memory is allocated. Multiport memory can be used to improve memory bandwidth.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124105119000071

Embedded Systems Analysis

Ronald van der Knijff, in Handbook of Digital Forensics and Investigation, 2010

Memory Management

Memory management refers to all methods used to store code and data in memory, keep track of their usage, and reclaim memory space when possible. At a low level this means a mapping between the physical chips and a logical address space via a memory map. At a higher level virtual address spaces with a memory management unit (MMU) might be used to give application programs the impression of contiguous working memory. In reality the memory may be physically fragmented into parts stored on different physical chips or other storage media. If a forensic examiner wants to know which physical memory locations of a device have been copied, in-depth knowledge of memory management is necessary as discussed in the section on data collection later in this chapter.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123742674000082

Investigating Live Virtual Environments

Diane Barrett, Gregory Kipper, in Virtualization and Forensics, 2010

Memory Management

Memory management processes in virtual machines can affect the amount of recoverable information from the virtual machine. For example, VMware makes use of shadow page tables while the XenSource approach does not, except on a temporary basis. XenSource, through kernel modifications provides limited access for the guest OS directly to physical memory page tables.

In shadow paging, a table is maintained to efficiently virtualize memory access between the virtual memory pages of the guest OS and the underlying physical machine pages. Shadow page tables shield guest OSes from their dependence on specific machine memory. This allows the hypervisor to optimize that memory.

Virtual machine memory virtualization is based on the same principle as the virtual machine monitor (VMM) used for regular OS page files, where the OS keeps an address table of virtual page numbers for each process that corresponds to physical page numbers. In VMware's ESX server, one of two techniques is used for dynamically increasing or reducing the amount of memory allocated to virtual machines. Either a memory balloon driver is loaded into the guest OS or paging is implemented from the VM to a server swap file. When a VMware virtual machine is powered up, a swap file is created in the same directory as the virtual machine configuration file. The memory balloon driver is part of the VMware Tools package and if it is not installed, ESX server hosts use swapping to forcibly recover memory.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781597495578000060

A deeper look into the system

Igor Ljubuncic, in Problem-Solving in High Performance Computing, 2015

Memory management

Memory management is one of the more complex parts of system troubleshooting, as it involves a certain degree of guesswork and estimation. Still, you can achieve good results by monitoring the tunables and adjusting them to match your scenario. This means being familiar with the meaning of available parameters and their associated values.

•

Dirty_background_bytes – Contains the amount of dirty memory threshold value at which the kernel will start writing dirty pages to the permanent storage. This will be done by background kernel threads (known as pdflush). Why is this useful? For instance, you may see pdflush processes using a very high percentage of CPU, hogging the resources. This could be an indicator of a wider problem. In some situations, you may have the freedom to change the dirty parameters and check whether the issue temporarily goes away.

•

Dirty_background_ratio – This parameter is the percentage of total available memory at which the kernel will start writing dirty data. On high-memory machines, this could translate into tens of gigabytes of pages.

•

Dirty_bytes – To complicate things a little, this tunable contains the amount of dirty memory at which the flushing will be triggered. It is mutually exclusive of dirty_ratio, and one or the other will be zero (unused).

•

Dirty_expire_centisecs – This tunable defines the age of dirty data that can be flushed in hundredths of a second. Pages that have been stored in the memory for longer than the specified interval will be written to disk.

•

Dirty_ratio – The percentage threshold at which the process generating disk writes will start writing out dirty data. Again, on systems with large memory, the percentage can translate into a significant amount.

•

Dirty_writeback_centisecs – This tunable specifies the interval for the kernel flusher threads to wake and write dirty data to disk.

It becomes apparent that the disk writing policy is a nontrivial mask combining different values set in these variables. However, being aware of their power can help, especially when troubleshooting performance or optimizing systems.

•

Drop_caches – When set, this tunable tells the kernel to begin dropping caches. It can accept a limited set of integers, namely, 1 (pagecache), 2 (slab objects), or 3 (both). The operation is nondestructive, and no data will be lost by running it. However, the purpose of this tunable may seem questionable. Why would anyone want to interfere with the normal way the kernel manages its memory?

Again, we go back to the question of performance troubleshooting and optimization. It may be useful to drop caches to time system operations by making sure no object is retrieved from memory, which is essentially a fast operation, but rather from the intended storage, like a network file system or local disk. Furthermore, if the host is exhibiting abnormal operation (possibly due to a bug) with very large caches, dropping them and observing the behavior may confirm the suspicion. However, do note that dropping caches can take a very long time, because it might essentially mean tens or hundreds of megabytes (or even gigabytes) worth of data being written to disk, causing a temporary I/O and CPU load.

•

Swappiness – This is another useful tunable, which defines how aggressively the kernel swaps memory pages to swap devices (if present). The values range from 0 to 100, with 100 being the most aggressive routine. The default value will vary between distributions and kernel versions. It is important in that it can affect interactive responsiveness and performance, and the number may have to be tweaked to match the hardware, including the size of physical memory, as well as the usage model.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128010198000040

Tuning Query Performance

Lilian Hobbs, ... Pete Smith, in Oracle 10g Data Warehousing, 2005

10.5.1 Tuning PGA Memory

PGA memory is the most critical memory parameter for the resource-intensive queries found in a data warehouse. In a data warehouse, PGA memory is used by SQL operations, such as sorts, hash joins, bitmap merges, and bulk loads. The amount of PGA memory used by each operation is called its work area. Due to the memory-intensive nature of these operations, tuning the work areas is very crucial to ensure good query performance. If enough memory is not available, intermediate data may need to be written to temporary segments on disk, which can slow down performance significantly. Prior to Oracle 9i, in order to tune query performance, the DBA had to tune various initialization parameters, such as SORT_AREA_SIZE, HASH_AREA_SIZE, CREATE_BITMAP_AREA_SIZE and BITMAP_MERGE_AREA_SIZE, to get good performance. However, this was a very difficult and time-consuming process because, the ideal values for these parameters may vary from query to query and may depend on the load on the system. Oracle 9i introduced the automatic PGA memory management feature to ease this burden. Additionally, in Oracle Database 10g the PGA advisor can be used to determine the ideal memory size setting for the system.

Automatic PGA memory management will automatically balance the work area sizes across SQL statements so as to make best use of available memory. The DBA only needs to specify the total amount of memory available for the database instance by setting the initialization parameter, PGA_AGGREGATE_TARGET. To enable automatic memory management the initialization parameter WORKAREA_SIZE_POLICY must be set to AUTO (setting it to MANUAL will revert back to manual memory management).

Hint:

The automatic PGA memory management is not available when you use the shared server–based operation in Oracle. For a data warehouse, the number of connections is usually not an issue; hence, it is recommended to use the dedicated server model.

Before we discuss how to tune PGA memory, let us learn how to monitor the PGA memory usage.

Monitoring PGA Memory Usage

The amount of PGA memory allocated and used by each Oracle server process can be seen in the V$PROCESS view, as follows. The PGA_USED_MEM column represents the memory currently in use, PGA_ALLOC_MEM column is total memory currently allocated by the process (some of it may be freed but not yet returned to the operating system), and PGA_MAX_MEM is the maximum ever allocated by that process. In a well-tuned system, every process should be allocating adequate memory but no more than necessary.

SELECT spid, pga_used_mem, pga_alloc_mem, pga_max_mem FROM v$process;

The memory used by the work area of SQL statements is seen in the V$SQL_WORKAREA view. You can join to V$SQL to get the complete text for the SQL statements. (We have edited the following output to show only part of the text for lack of space.)

While a query execution is in progress, you can monitor the work area usage in the V$SQL_WORKAREA_ACTIVE view.

Complex SQL operations need adequate work area memory; otherwise, the operation may need to spill over to temporary segments on disk. The optimal memory size is one that allows the entire operation to be performed entirely in memory. If memory is somewhat less than optimal, then one or more extra passes over the data may be required. A one-pass operation is the next best to the optimal and will perform reasonably well and as work area sizes get larger, may be the typical case. However, a multipass operation will severely degrade performance and should be avoided as much as possible.

Hint:

A well-tuned system should have a high percentage of optimal and one-pass executions and very few, if any, multipass executions.

The view V$SQL_WORKAREA_HISTOGRAM can be used to find the distribution of optimal, one-pass, and multipass query executions in your system. The view shows the number of optimal, one-pass, and multipass executions for different ranges of work area sizes. For example, in the following query, most executions use work area sizes under 2MB and are able to execute with the optimal amount of PGA memory. There is one query with work areas between 4 and 8 MB, which needed a one-pass execution and two queries with work areas between 8 and 16 MB, which needed a multi-pass execution.

Now that we understand how to monitor PGA memory usage and identify any multipass executions, let us see how we go about tuning it.

Tuning PGA_AGGREGATE_TARGET

As we mentioned earlier, available physical memory on a system running Oracle must be distributed among the Operating System, SGA, and PGA. For a data warehouse, a good rule of thumb is to set the PGA_AGGREGATE_TARGET initially to about 40 percent to 50 percent of available physical memory and then tune it based on execution of a real workload.

Oracle Database 10g will continuously monitor how much PGA memory is being used by the entire instance by collecting statistics during executions of queries. This information can be seen in the V$PGASTAT view.

Using these statistics, it estimates how the performance would vary if the PGA_AGGREGATE_TARGET were set to different values. This information is published in the V$PGA_TARGET_ADVICE view and can be used to tune the PGA_AGGREGATE_TARGET parameter.

Hint:

Set the initialization parameter STATISTICS_LEVEL to TYPICAL (default) or ALL; otherwise, the V$PGA_TARGET_ADVICE view is not available.

The following query shows the various statistics in V$PGASTAT:

The first step is to look at the top two lines of this output: the aggregate PGA target parameter, which is the current setting of PGA_AGGREGATE_TARGET, and the aggregate PGA auto target, which Oracle has calculated as the total memory it can use for SQL work areas. The difference is the estimated memory needed for other processing, such as PL/SQL, and is not tuned by the automatic memory management feature. In our example, the PGA_AGGREGATE_TARGET is 15.7MB, and the total amount available for work areas is 4.2MB. Note that if you find the auto target is much smaller than the PGA_AGGREGATE_TARGET, as is the case in our example, this is one indication that there is not enough PGA memory for work areas and you may need to increase it.

To confirm whether your PGA_AGGREGATE_SETTING is too small, you should look at two quantities - the cache hit percentage and the overallocation count, underlined in the preceding output. The cache hit percentage indicates the percentage of work areas that operated with an optimal allocation of memory. The overallocation count indicates how many times Oracle had to step over the user-defined limit for PGA_AGGREGATE_TARGET, because there was not enough memory available. In a well-tuned system, the overallocation count should be zero, meaning the available PGA memory was sufficient, and the cache hit percentage should be over 80 percent, meaning most queries execute with the optimal amount of memory. If you find that your cache hit percentage is too low or the overallocation count is nonzero, you have insufficient PGA memory. In our example, the cache hit ratio is 61 percent, which is low, and the overallocation count is 426, which is not good.

In these cases you should look at the V$PGA_TARGET_ADVICE view for advice. The view shows the projected values of the cache hit percentage and the overallocation count for various memory sizes. For each memory size, the FACTOR column shows which factor of the current memory setting it is. For example, the row where the FACTOR column is 1 is the current setting, in our example 15.7MB.

When tuning PGA memory, you must ensure that the PGA_AGGREGATE_TARGET is at least set to a value where the overallocation count is zero. Otherwise, there is simply not enough memory for all the work areas. In this example, the minimum memory setting where overallocation count goes to zero is around 28MB. Further, notice that as you increase memory size, the cache hit ratio value increases rapidly up to a point (88 percent for around 63MB in the previous output), and after that it starts to increase more slowly. This point is the optimal value of PGA memory. You must ideally set your PGA memory at or close to this optimal value.

You can see a graphical representation of this view in Oracle Enterprise Manager. From the Advisor Central page (Figure 10.6), if you follow the Memory Advisor link and then click on PGA link, you will see the screen shown in Figure 10.24, which shows the current PGA settings and usage.

Figure 10.24. PGA Memory Advisor

From this page, if you click on the Advice button, you will see a line graph as shown in Figure 10.25 with the memory size setting on the X-axis and the cache hit percentage on the Y-axis. You will find that the initial part of the line graph indicates the threshold below which you will see nonzero overallocation count. The optimal value of memory is where this line starts to taper off. The vertical line shows the current setting of the PGA_AGGREGATE_TARGET parameter and can be moved to choose a new setting for this parameter. Once you have chosen a new value, you simply press the OK button and the change will be made.

Figure 10.25. PGA Target Advice in Oracle Enterprise Manager

Hint:

At all times, you must ensure that there is adequate physical memory on your system to accommodate increases in the PGA memory for all users. If not, then you will need to decrease the SGA memory size, which may not always be desirable. Increasing PGA memory size without available physical memory means that there will be thrashing, which will only slow the system down.

In the next section, we will discuss how to set the value for the SGA memory.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781555583224500126

ARM PROCESSOR FUNDAMENTALS

ANDREW N. SLOSS, ... CHRIS WRIGHT, in ARM System Developer's Guide, 2004

2.5.2 MEMORY MANAGEMENT

Embedded systems often use multiple memory devices. It is usually necessary to have a method to help organize these devices and protect the system from applications trying to make inappropriate accesses to hardware. This is achieved with the assistance of memory management hardware.

ARM cores have three different types of memory management hardware—no extensions providing no protection, a memory protection unit (MPU) providing limited protection, and a memory management unit (MMU) providing full protection:

▪

Nonprotected memory is fixed and provides very little flexibility. It is normally used for small, simple embedded systems that require no protection from rogue applications.

▪

MPUs employ a simple system that uses a limited number of memory regions. These regions are controlled with a set of special coprocessor registers, and each region is defined with specific access permissions. This type of memory management is used for systems that require memory protection but don't have a complex memory map. The MPU is explained in Chapter 13.

▪

MMUs are the most comprehensive memory management hardware available on the ARM. The MMU uses a set of translation tables to provide fine-grained control over memory. These tables are stored in main memory and provide a virtual-to-physical address map as well as access permissions. MMUs are designed for more sophisticated platform operating systems that support multitasking. The MMU is explained in Chapter 14.

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781558608740500034

Memory Management for Embedded Network Applications.

Sven Wuytack, ... Chantal Ykman-Couvreur, in Readings in Hardware/Software Co-Design, 2002

IV GLOBAL DESIGN FLOW ISSUES

Fig. 4 gives an overview of the proposed memory management design flow, which is the result from detailed application and solution strategy studies. Each of the steps will be detailed in the following sections.

At the highest level, the application is specified in terms of ADT's. The ADT refinement step refines these ADT's into a combination of concrete data structures, such as linked lists or pointer arrays with one or more access keys. The virtual memory management step, defines a number of virtual memory segments and their corresponding custom memory managers. Each virtual memory segment reserves an amount of memory to store all instances of one or more concrete data types. To this end, the virtual memory segment is divided into a number of slots, each capable of storing a single instance of the data type. The VMM step determines, via analysis or simulation of a number of scenarios, the amount of slots that is required to store all instances of that data type.

For dynamically allocated data types, the VMM step also determines a custom memory manager for the corresponding virtual memory segment, implementing the allocation and deallocation functions. The ADT and VMM refinement are combined in the dynamic memory management stage.

During the physical memory management stage, the virtual memory segments will be assigned to a number of allocated memories. However, to increase the mapping freedom and the simultaneous accessibility of the data, the virtual memory segments are first split into so called basic groups according to some specific rules.

Next, the background memory accesses are ordered to optimize the required storage bandwidth. Finally, the memory allocation and assignment step allocates a number of memories and assigns the basic groups to them. This determines the size of the memories in terms of bitwidth and word depth as well as the number and type (read, write, or read/write) of ports on each memory. The result is a heavily power and/or area optimized custom memory architecture for the given application.

What is memory protection Why is it important to have an OS?

Memory protection prevents a process from accessing unallocated memory in OS as it stops the software from seizing control of an excessive amount of memory and may cause damage that will impact other software which is currently being used or may create a loss of saved data.

How can the OS prevent a process from accessing memory of another process?

How do Operating Systems prevent programs from accessing memory? Short answer: On x86 processors they do it by activating Protected Mode(32-bit) or Long Mode(64-bit). ARM or other processors implement similar concepts.

How does the OS handle memory management?

It checks how much memory is to be allocated to processes. It decides which process will get memory at what time. It tracks whenever some memory gets freed or unallocated and correspondingly it updates the status.