| For other Custom Systems Developed by August, click here. |
While this may sound somewhat specific to the technical oil industry, it is not. It is merely a collection of data sources, databases and repositories that have somewhat related data, but need to be consolidated into a unified view of all the systems. The graphical depiction and manipulation of objects is a very intuitive user interface. This is exactly what a Data Warehouse is.
ExxonMobil had a number of different data sources, data bases, proprietary data repositories, plus flat file systems. Each needed its own application to access and display the data. MobilView became the Data Warehouse, because it could query all the different data sources and allow the user to manipulate the data as if it came from a single, unified data source.
A problem was no two data sources or repositories were compatible with the others. Some had SQL access. Some had 'C' API access. Some had proprietary programs to access the data. Actually, the entire system initially had basically nothing in common.
August was brought in to put intermediate layers of access, servers, brokers and agents to make all the sources appear to be instances of the same repository.
XML Extended
August utilized an XML-variant to solve the problem. It's implementation is ahead of its time, because it can do things that XML still cannot. Our XML-variant can ask which data sources are defined, which are currently on-line, what types of business area data each provides and asks in what format to query the data. Current XML will allow you to transfer data, but not allow you to issue queries or talk to the repository directly. We created a general purpose, repository independent query language (similar to SQL) that could also access non-relational data stores. Each data store Access Agent is self-describing so you add new and completely unrelated data, and it will be immediately available for query and display by the Data Warehouse Client.
Current standard XML technology just presumes the XML document was somehow
requested, and processes it once it arrives. The ExxonMobil
system embeds the XML-variant language and parsers in every phase of the
application, ranging from the user interface (to display and query data)
to the repository management daemons.
The Data Warehouse Client
It has the basic Graphical Canvas and uses a completely graphical approach to data presentation and navigation. Around it are buttons to access the various Remote Data Sources. Once a Remote Data Source communicates the types of data it can provide, additional buttons are displayed to allow you to request those types of data.
Architecture Overview
The diagram below describes the basic architecture of the Data Warehouse.
Note: click on the red circled letters
for information about that part of the diagram.
The Data Warehouse Client
Click here to see a screen capture of the User Client Screen. It has the basic Graphical Canvas. Around it are buttons to access the various data sources.
Query Data Source Brokers
The client had no knowledge of what data
services were available. Redundant Data Source Brokers were created.
They were servers in themselves, but the client asked these Brokers what
data sources were available and operational (reachable). The Broker
then returned a list of Data Source Servers that were on-line.
Data Source Brokers
Periodically each Broker would obtain the status of the various Sources and cache the information for access by multiple Clients.
These Brokers were on different machines for redundancy. This made it easy for the Client to determine what data sources were available. Every time a new Data Source Server was added, it automatically appeared as a button on the Client, because the Broker told the Client of its existence.
Monitor Availability of Data Servers
Because the various Remote Data Servers were on different machines, there were a number of potential failure modes. The repository could be down for maintenance, the computer turned off, the network could have a routing problem, etc. Usually a client program will check the availability of a data source, but this usually requires a long timeout period waiting for a response until it decide that none is forthcoming. Since there were so many Remote Data Sources, all these timeouts would be non-productive for the user.
The Data Source Brokers periodically talked
to each Remote Data Server to obtains its availability and status.
This way, when a Client program inquired, the Broker could give the complete
world view of the Remote Data Sources immediately. The users loved
this.
Database Access Agents
Each data source had a different way used to access the data. Some were relational databases, like Oracle and Ingres. Here SQL using embedded statements in the server could access and return the data.
One Data Source came with a proprietary browser, which was useless for distributed access. Fortunately, the product came with an API. So, August created a server process that embedded API calls to access the data.
The schemas, models and file formats were completely different. So a mapping of specific attributes to generic attributes was done in the Data Servers.
Each Remote Data Server was completely different from the others, because the type of repository or database or file system had nothing in common with the others. It was a unifying XML-Like protocol that made all the Remote Data Server appear to be identical in function and using the same access language.
The beauty of this approach is that should
the company find another data source, regardless of its content, vendor
or access techniques, August created a specialized Remote Data Server that
knew how to access its proprietary data, but return the model and metadata
plus provide access methods that were uniform within the entire system.
This accounts for the wide acceptance and success of this Data Warehouse.
XML-Like Protocol
Communication between the Client and Data
Source Servers was in a self describing grammar that is very similar to
XML. We could not utilize XML because major drafts had not yet been
accepted as final specifications. It was tailored to the needs of this
specific system and customer. Also we added certain XML-Like features
(that are still in draft form) that allow you to ask each database what
types of
data it can provide, and what data and
data types you can ask for.
The client could request an inventory of the basic business area that Remote Data Source served. Most of the servers could provide several different business areas of data, not just one.
The client then for each business area type of data, requested a list of the schema or model, its components, complete with descriptions and metadata. This way the user at the Client saw in English what was available, but the program knew the underlying data types and access methods needed to obtain the data components.
TCP/IP Socket Data Channels
Since the various Remote Data Servers were in different computers spread out over a wide geographic area, TCP/IP sockets were used to the communications. This allowed the servers to be anywhere in the world, but appear just as available. This allowed all the components of the system appear to be all running on the same machine, from the Client's perspective.
Threads: Remote Database Query Agents
Each time the Client wanted to communicate with a Remote Data Source, it created an asynchronous thread process. Each thread communicated with the Remote Data Server using the XML-Like protocol asking what types of data it had. The client then asked for each data type a list of the attributes it could ask for, plus the data types.
So the remote data sources were completely self describing, much like an Enterprise Java Bean.
This MetaMetadata was return to the client. Then appropriate buttons were dynamically generated to allow the user to select from what was available from each of the Remote Data Servers.
Since the threads were non-blocking and asynchronous, the various access buttons on the User Client would be grayed out until data was available. This meant that a large number of different queries could be overlapping at the same time, and the client would handle the data as each was received.
Objects on the Client Drawing Canvas
As each object was placed on the graphics canvas, they were made "live". You could click on one or more or draw a region to select a number of objects. You could then request various attributes from a Pick List.
What is most powerful, is that for an Object, different attributes could have come from different servers! The Data Warehouse made it appear as if all the data came from the same source to the User. But different Remote Data Servers served different pieces of information about the objects, so they had to have a consolidated view from an overall perspective.
But after all, that is what a Data Warehouse
is supposed to do!