Data mining approaches for monitoring and decision support of indoor microalgae production

Microalgae are rich sources of protein, lipids, carbohydrates, and pigments. During the last decades, they emerged as a sustainable source of raw materials in the food, feed (aquaculture), nutraceuticals, medicinal and cosmetic industries. The high production cost associated with industrial microalgae manufacturing is a pressing concern that challenges their adoption as a viable source of green energy fuels and raw materials in other industries. The question of variable algae quality associated with open outdoor production can be addressed using a closed indoor production strategy with better control systems. Yet the burden of production cost is more severe in an indoor production system. The microalgae production cost can be reduced by improving photosynthetic efficiency, process optimization and automation. Real-time multi-parameter optimization for maximal growth and efficient automation with process control techniques for reduced labor cost are interesting solutions to explore from an industrial engineering perspective.

A major focus of this research is to develop data mining approaches for monitoring and decision support of indoor algae production in the proprietary vertical flat panel photobioreactor system (ProviAPT) developed by Proviron, Belgium. At Proviron, fully automated microalgae cultivation is functional across lab scale, pilot scale and production scale units. All three units are equipped with Siemens PLCs, Arduino and Raspberry Pi modules integrated with a supervisory control and data acquisition system (‘OpenSCADA’). Despite high productivity and automation levels, there is room for improvement, especially in growth parameter optimization and further improving biomass productivity. The large volume of online data generated by sensors and actuators archived by the SCADA system enables the application of data mining techniques and machine learning models for prediction and process control strategies. This demands the design and development of a centralized data warehouse. Based on unit-level data acquisition, the online data available from the SCADA system is currently collected at four distinct levels from the lab and pilot-scale photobioreactors. Reactor unit level data (sensor output and actuators input), industrial log data, feed unit data (nutrient recipes, feed volumes), calculated (real-time and cumulative) growth data, offline measurement data and metadata are cleaned, preprocessed, and stored.  A ‘R Shiny’-based dashboard is developed on top of the data warehouse to enable statistical analysis, visualization, and monitoring.

The long-term objective of this research consists in developing models that are simple, transferrable, and capable of predicting optimal growth conditions from the real-time monitoring data with high accuracy for an industrial microalgae production environment. The big data volume and disruptive changes in computation power enable the application of statistical, machine learning and deep learning techniques. At any stage, collecting, integrating, and processing data in a uniform way is crucial to developing reliable predictive models. Still, the application of machine learning techniques on bioprocess data is not a simple task. Hybrid deep learning models or a combination of machine learning models with mechanistic models are interesting strategies we plan to explore in this scenario.

In an industrial production environment, the quality of data from the sensors and actuators are important parameters to consider. Data quality validation based on statistical outlier detection on industrial and bioprocess variables in the early stage of the research is a significant task. Unsupervised multivariate outlier detection models can be used to identify bioprocess anomalies and detect and flag abnormal industrial process variable behaviors. The insights from these models can be further used for identifying and predicting sensor and actuator failures and planning maintenance schedules.

Upon optimizing the data integration procedure using data transformation and feature engineering techniques, we can expand the predictive models to include the various growth conditions and discover patterns between growth conditions and outcomes. The generated models should be evaluated for their capability to support an operator in defining improved growth conditions in an industrial production environment. The results should be validated based on two criteria:

  1. What growth conditions are expected to achieve the maximal yield?
  2. What growth condition combinations should be evaluated next to allow to improve the model most efficiently?

The outcomes can be further used as feedback to improve the model.

by Shyam Krishnan

Figure 1. Indoor microalgae production unit at Proviron, Hemiksem, Belgium.