For other Custom Systems Developed by August, click here.

MobilView Data Warehouse

and Active Dictionary Using XML Variant

Created by August for ExxonMobil

The client is a GIS (Graphical Information System).  This Graphical User Interface displays live maps of the world.  Its purposes is to obtain data from the various data sources and add each layer of data to the GIS map.   For instance you could plot the outline of a foreign country, display Wells that had only certain characteristics, plus other objects of interest.  You could then click those objects to get their properties, and if graphical in nature, display those graphics.  A typical graphic is a seismic trace.

While this may sound somewhat specific to the technical oil industry, it is not.  It is merely a collection of data sources, databases and repositories that have somewhat related data, but need to be consolidated into a unified view of all the systems.  The graphical depiction and manipulation of objects is a very intuitive user interface. This is exactly what a Data Warehouse is.

ExxonMobil had a number of different data sources, data bases, proprietary data repositories, plus flat file systems.  Each needed its own application to access and display the data.  MobilView became the Data Warehouse, because it could query all the different data sources and allow the user to manipulate the data as if it came from a single, unified data source.

A problem was no two data sources or repositories were compatible with the others.  Some had SQL access.  Some had 'C' API access.  Some had proprietary programs to access the data.  Actually, the entire system initially had basically nothing in common.

August was brought in to put intermediate layers of access, servers, brokers and agents to make all the sources appear to be instances of the same repository.

XML Extended

August utilized an XML-variant to solve the problem.   It's implementation is ahead of its time, because it can do things that XML still cannot.  Our XML-variant can ask which data sources are defined, which are currently on-line,  what types of business area data each provides and asks in what format to query the data.  Current XML will allow you to transfer data, but not allow you to issue queries or talk to the repository directly.  We created a general purpose, repository independent query language (similar to SQL) that could also access non-relational data stores.  Each data store Access Agent is self-describing so you add new and completely unrelated data, and it will be immediately available for query and display by the Data Warehouse Client.

Current standard XML technology just presumes the XML document was somehow requested, and processes it once it arrives.   The ExxonMobil system embeds the XML-variant language and parsers in every phase of the application, ranging from the user interface (to display and query data) to the repository management daemons.
 

The Data Warehouse Client

 It has the basic Graphical Canvas and uses a completely graphical approach to data presentation and navigation.  Around it are buttons to access the various Remote Data Sources.   Once a Remote Data Source communicates the types of data it can provide, additional buttons are displayed to allow you to request those types of data.

Architecture Overview

The diagram below describes the basic architecture of the Data Warehouse.
Note: click on the red circled letters for information about that part of the diagram.Page-1
#E#G#FSockets#E#C#A#D#B#D#C#brokers#brokers#brokers#E
 


 

The Data Warehouse Client

Click here to see a screen capture of the User Client Screen.  It has the basic Graphical Canvas.  Around it are buttons to access the various data sources.


 

Query Data Source Brokers

The client had no knowledge of what data services were available.  Redundant Data Source Brokers were created.  They were servers in themselves, but the client asked these Brokers what data sources were available and operational (reachable).  The Broker then returned a list of Data Source Servers that were on-line.
 


 

Data Source Brokers

Periodically each Broker would obtain the status of the various Sources and cache the information for access by multiple Clients.

These Brokers were on different machines for redundancy.  This made it easy for the Client to determine what data sources were available.  Every time a new Data Source Server was added, it automatically appeared as a button on the Client, because the Broker told the Client of its existence.


 

Monitor Availability of Data Servers

Because the various Remote Data Servers were on different machines, there were a number of potential failure modes.  The repository could be down for maintenance, the computer turned off, the network could have a routing problem, etc.  Usually a client program will check the availability of a data source, but this usually requires a long timeout period waiting for a response until it decide that none is forthcoming.  Since there were so many Remote Data Sources, all these timeouts would be non-productive for the user.

The Data Source Brokers periodically talked to each Remote Data Server to obtains its availability and status.  This way, when a Client program inquired, the Broker could give the complete world view of the Remote Data Sources immediately.  The users loved this.
 


 

Database Access Agents

Each data source had a different way used to access the data.  Some were relational databases, like Oracle and Ingres.  Here SQL using embedded statements in the server could access and return the data.

One Data Source came with a proprietary browser, which was useless for distributed access.  Fortunately, the product came with an API.  So, August created a server process that embedded API calls to access the data.

The schemas, models and file formats were completely different.  So a mapping of specific attributes to generic attributes was done in the Data Servers.

Each Remote Data Server was completely different from the others, because the type of repository or database or file system had nothing in common with the others.  It was a unifying XML-Like protocol that made all the Remote Data Server appear to be identical in function and using the same access language.

The beauty of this approach is that should the company find another data source, regardless of its content, vendor or access techniques, August created a specialized Remote Data Server that knew how to access its proprietary data, but return the model and metadata plus provide access methods that were uniform within the entire system.  This accounts for the wide acceptance and success of this Data Warehouse.
 


 

XML-Like Protocol

Communication between the Client and Data Source Servers was in a self describing grammar that is very similar to XML.  We could not utilize XML because major drafts had not yet been accepted as final specifications. It was tailored to the needs of this specific system and customer.  Also we added certain XML-Like features (that are still in draft form) that allow you to ask each database what types of
data it can provide, and what data and data types you can ask for.

The client could request an inventory of the basic business area that Remote Data Source served.  Most of the servers could provide several different business areas of data, not just one.

The client then for each business area type of data, requested a list of the schema or model, its components, complete with descriptions and metadata.  This way the user at the Client saw in English what was available, but the program knew the underlying data types and access methods needed to obtain the data components.


 

TCP/IP Socket Data  Channels

Since the various Remote Data Servers were in different computers spread out over a wide geographic area, TCP/IP sockets were used to the communications.  This allowed the servers to be anywhere in the world, but appear just as available.  This allowed all the components of the system appear to be all running on the same machine, from the Client's perspective.


 

Threads: Remote Database Query Agents

Each time the Client wanted to communicate with a Remote Data Source, it created an asynchronous  thread process.  Each thread communicated with the Remote Data Server using the XML-Like protocol asking what types of data it had.  The client then asked for each data type a list of the attributes it could ask for, plus the data types.

So the remote data sources were completely self describing, much like an Enterprise Java Bean.

This MetaMetadata was return to the client.  Then appropriate buttons were dynamically generated to allow the user to select from what was available from each of the Remote Data Servers.

Since the threads were non-blocking and asynchronous, the various access buttons on the User Client would be grayed out until data was available.  This meant that a large number of  different queries could be overlapping at the same time, and the client would handle the data as each was received.

Objects on the Client Drawing Canvas

As each object was placed on the graphics canvas, they were made "live".  You could click on one or more or draw a region to select a number of objects.  You could then request various attributes from a Pick List.

What is most powerful, is that for an Object, different attributes could have come from different servers!  The Data Warehouse made it appear as if all the data came from the same source to the User.  But different Remote Data Servers served different pieces of information about the objects, so they had to have a consolidated view from an overall perspective.

But after all, that is what a Data Warehouse is supposed to do!