Background
##################
First of all, go look at the diagram (comms.gif), read this, then 
look at the diagram again.

Next, some terminology...  
Here I will use CRM to refer to the light blue section.  That is, 
the entire collection of processes/daemons/modules on a node that, 
as a whole, manage resources in the cluster.  CRMd refers to one 
of the dark blue boxes.  It is the "master subsystem" if you like.  
Its role is to co-ordinate the actions of all the other pieces of 
the puzzle, including those on other nodes.
 
Key points from the diagram:
- All communications with the CRM are done with Heartbeat messages 
  routed through the CRMd.  These messages contain a text 
  representation of an XML document, the schema of which is outlined 
  at the end of this document.
- All communications internal to the CRM (ie. between its subsystems) 
  is performed with IPC messages.  Again all messages are routed via 
  the CRMd and contain the same XML documents as Heartbeat messages.
- All admin clients (eventually) end up sending Heartbeat messages 
  and are thus subject to existing HA client security is available.
- The RPC layer allows the cluster to be controled from non-member 
  hosts (subject to RPC security which is available for free).
- The option of syncronous or asynchronous RPC calls will be provided. 
  This will probably be in the form of a flag sent as part of the 
  function call.

Advantages:
- The only source of "requests" is the CRMd which means it *never* has 
  to forward on "request" messages for any of it sub-systems.  This is 
  useful for the security of the system (see security.txt).
- Potentially, most CRM<-->CRM communications can be replaced with RPC 
  calls.
- We are able to re-use existing security mechanisms (IPC, HA, RPC, 
  unix_auth via RPC) to protect the system.

Message scenarios:
##################
There are really only 3 messaging scenarios in this system (exluding 
broadcast vs. point-to-point).  Again this is nice as it keeps down the 
number of "special cases".

1) Sub-system <--(IPC)--> CRM <--(IPC)--> Sub-system 
2) Sub-system <--(IPC)--> Local CRM <--(Heartbeat)--> Remote CRM <--(IPC)--> Sub-system 
3) Admin Client <--(Heartbeat Broadcast)--> Remote CRM <--(IPC)--> Sub-system 

Message examples:
##################
1.1) the DC telling the local LRM to start a resource
1.2) the LRM asking the CIB about a resource

2.1) the DC telling a remote LRM to start a resource
2.2) the DC asking (all) the CIB(s) to provide their view of the world

3.1) an admin request to add/remove/modify a resource
3.2) an admin request to force a failover of a resource or a
recomputation of the resource dependencies.

Message Notes:
##################
Messages may be sent to the CRM from local sub-systems via IPC or from
other HA clients via Heartbeat.  It is then the responsibility of the
CRM to unpack the message and pass it on to the correct sub-system.  If
the destination sub-system is the DC and the DC is not running on the
current node, the message is discarded without error.

Where the DC receives a message from another node, it will also keep
track of the sending host and the reference number so that it can direct
the replies appropriately.  The exception to this is where the message
is from the DC.  

Messages to the DC are *always* sent as broadcast messages and the DC
*must always* acknowledge the message with either the results of the
message or a "thankyou" message.  The reason for this is that the DC may
change or a DC election may be in progress.  The implication of this is
that the sender should always set a timer and resend dc_messages if they
have not been acknowledged.  The DC will be able to detect duplicates by
examining the destination sub-system and the reference number and we
will rely on HA to ensure the delivery of DC responses.

All messages are full crm_messages.  I toyed with only sending the *_request
or *_response piece of the message to and/or from the relevant sub-systems, 
but it just got messy.  This way, the routing role of the CRM is much easier.
And easier equals lower complexity, which means less bugs, which is good for
everyone.

Schema Notes:
###################

Key Attributes
===============

reference:	provides the ability to track which request a responce
		is in relation to and where the local CRM should send it.
*_filter:	allow the operation to be limited to a particular type,
		id and/or priority
timeout:	allows the receiver to know how long the sender is
		expecting the task to take so we can act and report back 
		accordingly.


Attribute values
=================
Where the list ends with |... , the complete list of possibilities will be 
fleshed out at a later date.

Message Schema:
###################

<!ELEMENT crm_message (options, data?)>
<!ATTLIST crm_message
          version          #CDATA                       '1'
	  message_type	   (none|request|response)	'none'
          sys_from         (none|crmd|cib|lrm|admin)	'none'
          sys_to           (none|crmd|cib|lrm|admin)	'none'
	  host_from	   #CDATA
	  host_to	   #CDATA
	  reference	   #CDATA
          timestamp        #CDATA                       '0'>


<!ELEMENT options>
<!ATTLIST options
	  operation		#CDATA
	  result?		(ok|failed|...)		'ok'
          verbose?		(true|false)		'false'
          timeout?		#CDATA			'0'
	  filter_priority?	#CDATA	<!-- might be useful later -->
          filter_type?		#CDATA
	  filter_id?		#CDATA>

<!-- data is one of ping_item, cib_fragment, lrm_status -->

<!ELEMENT ping_item>
<!ATTLIST ping_item
	  crm_subsystem    (none|crmd|dc|cib|lrm)		     'none'
	  ping_status	   (error|timeout|stopped|running|sick)	     'timeout'>

<!ELEMENT lrm_status (resource_info)*>
<!ATTLIST lrm_status>

<!ELEMENT resource_info>
<!ATTLIST resource_info 
	  res_id		#CDATA
          last_op		(noop|start|stop|restart)	     'noop'
          last_op_result        (fail|pass|unknown|...)		     'unknown'
	  status		#CDATA>


<!-- always describe which part of the cib are being returned -->
<!ELEMENT cib_fragment (cib, obj_failed?)>
<!ATTLIST cib_fragment
          cib_section  (none|all|nodes|resources|constraints|status) 'none'>

<!ELEMENT obj_failed (failed_update)*>
<!ATTLIST obj_failed>

<!ELEMENT failed_update>
<!ATTLIST failed_update 
	  id			#CDATA
          object_type		(none|node|resource|constraint|state) 'none'
          operation		(none|add|update|delete|replace)      'none'
	  reason?		(unknown|...)			      'unknown'>

