OctopusMind

Smart data, or intelligent data exploitation, lies at the core of the evolution of our profession.
The exponential growth of data and the increasing number of available information sources make it necessary to rely on artificial intelligence techniques to sort through vast amounts of unstructured or Big Data content, in order to retain only what is relevant for a specific objective.

This involves:

improving data quality through automated processing;
verifying and cross-checking available information, particularly against open data sources;
discovering new relevant insights hidden within data;
creating predictive models to better anticipate outcomes or automatically enrich datasets.

Although these technologies are complex, they ultimately make it possible to provide simple tools for end users.
Users can improve their day-to-day efficiency, analyze situations, and broaden their perspective through interactive data exploration, relevant information retrieval, automated alerts, synthetic analyses and visualizations, indicator updates, and predictions.

At OctopusMind, this analytical approach serves both business opportunity detection and economic environment analysis, notably through J360.
Augmented analytics relies on Machine Learning and Natural Language Processing (NLP) technologies to automate data preparation, insight discovery, and the sharing of analytical perspectives. Its main benefit is enabling analysts to save a significant amount of time.

Analyses that traditionally require substantial time and resources can be greatly simplified and accelerated thanks to these technologies.

Deep Learning enables the construction of semantic models by projecting textual information combined with heterogeneous data into a shared semantic space.
Data and relationship extraction techniques allow access to millions of insights buried in text, making it possible to compute indicators, structure information, report findings, and generate predictions.

Our Use of Artificial Intelligence

Let’s take a quick tour of the technologies inside our “black box”.

Our raw material is data. It is collected by automated agents (web scraping), downloaded from open data sources, queried through the semantic web or reference corpora, or obtained via participatory production (crowdsourcing).

We use Elasticsearch for search and analytics, along with our own intelligent data analysis tools based on Machine Learning and Natural Language Processing (NLP).
For those familiar with the field, here are some of the techniques we rely on:

principal component analysis (PCA)
automatic clustering
decision tree ensembles (random forests)
conditional random fields (CRF)
neural networks in various forms (multi-layer perceptrons (MLP), convolutional networks, autoencoders, recurrent networks…)

This comprehensive and constantly evolving toolbox opens up numerous possibilities for both structured and unstructured datasets:

automatic data association (similarity, recommendation)
structured information extraction from text (locations, quantitative data, categorical attributes, noise reduction)
automatic data categorization along multiple dimensions

All these tools, combined with our expertise, allow us to deliver a service that enhances our users’ competitiveness.