L&A Analytics Technology
We use Hadoop for data infrastructure and analytics which allows us to create analytical services and products for our clients. Hadoop’s core advantages over traditional database and analytical technologies are that it is:
- Highly scalable: Hadoop is a highly scalable data processing platform, because it can process and distribute very large data sets across hundreds of inexpensive servers that operate in parallel.
- Flexible: Hadoop enables us to easily access new data sources and tap into different types of data (both structured and unstructured) using analytics to generate value.
- Fast: Hadoop’s unique storage method is based on a distributed file system that ‘maps’ data wherever it is located on a cluster. The tools for data processing are often on the same servers where the data is located, resulting in much faster data processing. When dealing with large volumes of unstructured data, Hadoop’s data processing is highly efficient.
- Resilient to failure: A key advantage of using Hadoop is its high level of fault tolerance. When data is sent to an individual node, that data is also replicated to other nodes in the cluster, which means that in the event of failure, there are multiple copies available for use.
- Cost effective: Hadoop offers a cost effective storage solution for our clients’ growing data sets.
Our Hadoop stack uses a variety of components, each offering different capabilities to configure analysis services to fit client needs.
L&A hadoop data stack
L&A Hadoop data stack: Benefits
Five core capabilities of the iR3 data stack are that it is now:
1: Easier to add client data from a variety of sources: Previously we used OLAP on premise for analytics/warehousing, which made it difficult to add new customers due to on premise deployments and expensive Export Transform & Load (ETL) activities (.g. data cleansing, schema design). Hadoop technology allows for multi-tenant cloud/on premise services with streaming analytics being offered to multiple clients simultaneously.
2: Scalable: System design for spreading resources across multiple “commodity” servers, which can be added as required such that the software infrastructure requires no further changes! This technology shift for supporting scale-out as well is scale-up is important for organisations of all sizes.
3: Tooled for agile data modelling: Database technology chosen to minimise effort spent designing schemas (that will change frequently). For each schema change, new ETL activities and changes throughout the system are required to surface data. A schema-less, design-for-consumption data modelling approach means that we can be more agile. While we have created a standard data model that covers the major facets of asset tracking and analysis, we can also support the idiosyncrasies of new customers with minimal fuss.
4: Providing up-to-the-moment results: Older data warehousing approaches struggled to maintain up-to-the-minute results for a number of reasons. The technology came from a time when batch, nightly updates were the expectation and the norm. In short, now, once processing is happening in-memory and in a distributed fashion, expensive disk read/writes are avoided and “continuous intelligence” is possible.
5: Enabling predictive real-time analytics: Once up-to-the-moment results are possible, our expectations about the types of intelligence we can expect are reset. Our algorithms are now geared towards continuous signal processing to allow for proactive action.
Analytics work package: Predictive
Analytics work package: Preventative
Analytics work package: Scheduling
L&A have strong working relationships with members of the Complexity and Network Group at Imperial College London and the Director of the Centre for Complexity Sciences at Warwick University, with whom we collaborate on big data analytics queries and R&D projects.