Our black box

The increased analysis of the data

The smart data is in the core of our expertise evolution. The exponential growth of data and the multiplicity of available information sources require the use of artificial intelligence technologies to sort the massive amount of unstructured data, to reveal and create only useful information.

The challenge is to :

  • improve data quality by automatic processing
  • verify and cross-check available information by comparing them with open data
  • discover new relevant information within the data
  • to create predictive models to better anticipate or automatically enrich the data.

These technologies allow the end user to have simple tools. In this way, he can increase his efficiency in its daily work, analyze a situation or enlarge its fieldof view: interactive exploration of data, search of relevant information, automatic alerts, synthetic analyzes and visualizations, real time indicators, predictions.

machine learning

At OctopusMind, this analysis is used to detect business opportunities and to explore the business environment with J360. It can also be at the service of communities and citizens, with CityZenMap. The augmented analytics is based on Machine Learning technologies and natural language processing. Its main advantage is to allow data analysts to save a lot of time and efficiency. (see the summary of the Gartner report « Augmented Analytics Is the Future of Data and Analytics » , published on July 27, 2017

Analyzes, which usually consume a lot of time and resources, can be greatly simplified and accelerated with this technology.

Our use of artificial intelligence

Let’s go for a quick review of the technologies of our "black box" :

Our raw material is the data. It will be extracted by robots. (web scraping), downloaded from open data sources, queried from the semantic web or langage corpora, or obtained by crowdsourcing.

We use ElasticSearch to search and analyse our data. We have developed our own Machine Learning and natural language processing (NLP) tools. For specialists, here are some of our secrets :

  • Principal Component Analysis (PCA)
  • Data clustering
  • Random forests
  • Conditional random field (CRF) and especially neural networks in many forms (multilayer perceptron (MLP), convolutional neural network (CNN), autoencoders, recurrent neural network…)

A very complete toolbox in constant evolution that opens multiple possibilities on a dataset, structured or not :

  • automatic data association (similarity, recommendation)
  • extraction of textual information in structured form (location, quantitative data, categorisation, noise suppression)
  • automatic categorization of data, along multiple axes

All these tools, combined with the computing power of today's servers and our expertise, allow us to offer a service that increases the competitiveness of our users.

Our presentations

Text mining
EGC 2019 - By Oussama Ahmia

See the document
Text mining
EGC 2019 - By Oussama Ahmia

Watch the video
Scikit-learn for text mining
PyData 2016 - By Oussama Ahmia

Watch the video
Put ElasticSearch at your service
PyConFr 2019 - By Alexandre Garel

Discover some of the possibilities offered by Elasticsearch from the developer's point of view.

Watch the video
After Jupyter Notebook, here is JupyterLab
PyConFr 2019 - By Maxime Morinière

Presentation of the Jupyter notebook and its successor JupyterLab, new features and some use cases.

Watch the video