RS/6000 data management lecture under AIX
Components of an application :
Introduction
- This lecture introduces the theoretical and practical notion dealing with data management on Aix system (file system, LVM) and a presentation of the SSA cabinet sub-system.
- This lecture is designed for operators (blue slope) and administrators (red and black slope)
1 Data
Internal disk
SSA disk, SSA cabinets, loops, cards
The internal disks
The internal disks (hdisks) are connected to the mother card by a SCSII interface
They are visible only by the local machine
They use SCSII technology
On the production system they are automatically mounted
Notes:
Syspack scheme (how the internal disks using SCSI adapters are mirrored)
The SSA disks
The SSA disks are plugged into a cabinet linked individually to each H70 by an individual SSA card.
They are visible from both members of the cluster but active on one machine at a time.
The disks are numbered by card, loop and serial number.
A pdisk can be composed of multiple physical disks using SSA ‘raid’ hard-coded technology.
Organisation of a SSA cabinet
A D40 SSA cabinet can hold up to 16 disks
On the production:
- 3 SSA cabinets
- 8 disks per cabinet
- 3 SSA adapter cards per machine
- 24 cables (6*4) linking the 3 cabinets to the 2 machines by 6 loops
Notes:
Overview of the H70 19” rack including the 2 H70 servers H70H and H70B, the 3 D40 SSA cabinets SSA and a scheme of the SSA loop.
Organisation of a SSA cabinet
Notes:
Figure of the internal wrapping and the external cable of a SSA D40 shared by the H70H and H70B server
Bypass Card
Bypass cards manage the sharing of a disk between servers. They are part of the electronics of the D40 cabinet (ssaencl). They are 4 cards per SSA cabinet.
Those cards enforce the way a disk is shared by the servers (exclusive or concurrent access)
Those cards diagnose the opening of a loop during a failure.
Bypass Card: mode
The bypass card has two modes: bypass (auto) and forced inline
- In bypass mode, it behaves as a switch that closes automatically when it detects that one of the jumpers is offline. The purpose of this mode is to “repair” automatically the broken chain to keep available all the disks of the loop.
- In forced inline mode, the bypass cards behaves as a switch which remains always open.
Bypass card: behaviour
In a normal situation (both members of the cluster are UPs), their behaviour has, in general, no impact.
But, when one of the servers is down (power off/standby), their functionality can cut off access to the disk.
Bypass cards are concerned only by the activity of the local loop.
SSA Loop Rules
Roughly:
- A loop starts always from A1 plug and finishes with A2 or from B1 to come back by the B2 plug.
- Never mix an ‘A’ plug (A1,A2) with ‘B’ plugs (B1,B2) in the same loop
- Never link together two loops (even in the case that bypass cards close a loop when one the server is down)
Bypass Card: normal operation
Notes:
Detail of the internal physical wrapping and the external SSA loop wrapping. This detailed scheme shows bypass cards
Production tuning
The J4/J5 and J12/J12 jumpers are set up in “auto” mode
The J1/J16 and J8/J9 jumpers are set up in “forced inline” mode
Bypass Card : H70H down
Notes:
Behavior of the bypass cards when H70 Haut is down
- two bypass card are unchanged (j4/j5 & j12/j13)
- other bypass card stay open (j1/j6 & j8/j9) due to forced inline mode eventhough the card detect the absence of activity on the jumoers
H70H down: behaviour
When H70H server is down, the bypass card managing J1/J16 detects the absence of electric activity on both inputs but remains open conforming to its forced inline mode.
The J8/J9 card behaves the same way
H70H down: impact
Without inline mode, the red A loop will mix with the B blue loop. No disk will be accessible as the same disk will be visible on two different loops which is illegal.
By staying open, all the disks are visible but they can be accessed only by one end of the SSA loop.
By pass card: H70B down
Notes:
Figure illustrating the way that the bypass card behaves when the H70 Bas is down.
Take notes that the behavior is not symetric
H70B down behaviour
When H70H is down the bypass card controlling jumper J4/J5 detects the absence of activity and closes its internal switch conforming to its auto mode.
Bypass card J12/J13 does the same
H70B down: impact
By closing, the jumpers leave full access to the disk of the server.
H70H signals the opening of the loop but it has no impact as it can contact the hard disk through both ends of the loop.
Volume Group
A VG is a collection of disks and an allocation map of PP:Physical Partition.
A VG is active on only one machine at a time
A VG allows one to pack all the data necessary for an application
A VG definition can be imported from the disk
Notes:
Figure of a Volume Group mirrored and shared between two servers.
Volume Group, properties
When one activates a volume group, all its LV and FS’s become visible (lsfs -c) and usable (‘mountable”).
After importing a VG, you have to reset ownership of raw devices LV
Do not confuse:
- varyoffvg disables the VG locally
- exportvg erases the local definition (kept in the ODM database) of the VG, LV and FS. In some cases the modification of the VG on the one other member requires reimporting the VG (eg: addition of new disk to a VG)
- exportvg doesn’t format the disk
Logical Volume
A Logical Volume is a portion of the VG disk space supporting a File System or raw data.
A LV can be mirrored freely on any hdisk of the VG (3 copys max)
The unit of a LV size is the Logical Partition
Notes:
Figures of LVs inside a volumegroup
Logical Volume, properties
The LVM (Logical Volume Manager) allows one to manage freely online: allocation, extension and mirroring of any LV on the PP scale. It’s a major and powerful tuning and security tool.
All LV (except sysdumps) on the production are mirrored (copies=2)
All LV must be inactive before varying off a VG.
A raw LV is not “shadowed” by the VMM (Virtual Memory Manager) d ’AIX contrary to file systems (jfs LV).
Logical Volume, insight
Unix rights are effective only for Raw logical volumes (owner and mode set to /dev/ryyylv).
LV type can be raw, jfs, or jfslog
Unix rights set on dev/yyylv are always disregarded
The Major number (inode) of a LV match the VG owner one
A LV can be stripped (with the granularity of the PP) using the LVM abstraction layer (not implemented on the production system)
JFS File Systems
A file system is composed of directories and files identified by inodes
A JFS FS is a particular formatting of an LV
A JFS FS is defined by an LV name and a mount point (directory)
Notes:
Figure of how file systems overlaps mounting point.
- hd2 FS is the mainroot of all filesystems
JFS File System, properties
A file system is called Journalised File System because the VG keeps a LOG of the modifications on a particular LV (jfslog) acting like a “redolog”.
The jfslog is replayed in case of a crash (super-block marked dirty) when the FS is mounted again.
A FS can be unmounted only if no processes have opened any file (ie: fuser -xcu /a15jfs ) or have used a subdirectory as the current directory (pwd) (fuser -fcu …)
The FAT characteristics can be set-up only at creation time (nb inode versus max size of an inode)
JFS File System, insight
The mount point can be changed only offline
A FS can be extended online
When a FS is mounted, it hides the contents of the mounting point directory.
Shells are sensitive to FS Unix rights only, contrary to binaries that take in account the underneath mounting point rights as well (see “C” cwd function)
A small file will be allocated using smaller blocks that nbpi defines (by fragmenting it into fragment size: property of a FS).
NFS File System
An NFS file system is a shared distant directory content mounted over another mounting point (anywhere).
The directory or its parents must be exported (exportfs) to the distant server to be locally mounted
Notes:
Figures of NFS filesytems relationship between the NFS server and the NFS client.
NFS File System, properties
A NFS can be mounted only if the server recognises the client by its primary name (according to the server hostfile matching the primary IP address).
An NFS demon (nfsd) acts on the server on behalf of the NFS client regarding FS operations and acquires necessary locks on the server.
Shutting down the server without unmounting clients can turn client processes into zombies entering a kernel mode (kill -9 is ineffective). The only way to kill those zombies is to shutdown the client.
NFS File System, insight
A NFS file system can be temporary created (using mount -o ro ..) or permanent (using mknfs)
A FS can be partially exported (by exporting only a subdirectory)
You can export only once a directory with one mode only (RO, or RW). You cannot export at the same time a parent and a subdirectory. Only the first directory founded in the export file will be exported.
Mountgroup
A mountgroup is a free grouping of unrelated file systems
A MG allows you to mount or demount this group in one step
Actually a MG is only a label attached voluntarily onto a FS.
Notes:
Lexicon
inode:
- file or directory identified by a unique number (inode) inside a file system
LP: Logical Partition
- The LP is the logical abstraction that processes handle when accessing a file system
PP: Physical Partition
- A LP is physically written on the VG on 2 (‘x’) exact copies (PP). ‘x’ defines the mirroring of the logical volume (1, 2 or 3).
Resources
IBM: LVM, File System ....
- gg24484 Aix Storage Management.pdf
www.storage.ibm.com: SSA / D40
Last Update : $Date: Dec 03 2001 00:18:16 $