Die chronologische Liste zeigt aktuelle Veröffentlichungen aus dem Forschungsbetrieb der Hochschule Weihenstephan-Triesdorf. Zuständig ist das Zentrum für Forschung und Wissenstransfer (ZFW).
Current methods for end-to-end constructive neural combinatorial optimization usually train a policy using behavior cloning from expert solutions or policy gradient methods from reinforcement learning. While behavior cloning is straightforward, it requires expensive expert solutions, and policy gradient methods are often computationally demanding and complex to fine-tune. In this work, we bridge the two and simplify the training process by sampling multiple solutions for random instances using the current model in each epoch and then selecting the best solution as an expert trajectory for supervised imitation learning. To achieve progressively improving solutions with minimal sampling, we introduce a method that combines round-wise Stochastic Beam Search with an update strategy derived from a provable policy improvement. This strategy refines the policy between rounds by utilizing the advantage of the sampled sequences with almost no computational overhead. We evaluate our approach on the Traveling Salesman Problem and the Capacitated Vehicle Routing Problem. The models trained with our method achieve comparable performance and generalization to those trained with expert data. Additionally, we apply our method to the Job Shop Scheduling Problem using a transformer-based architecture and outperform existing state-of-the-art methods by a wide margin.
Mehr
Ashima Khanna,
Prof. Dr. Florian Haselbeck,
Prof. Dr. Dominik Grimm
Predicting Protein Thermostability through Deep Learning Leveraging 3D Structural Information (2024) Biological Materials Science - A workshop on biogenic, bioinspired, biomimetic and biohybrid materials for innovative optical, photonics and optoelectronics applications 2024 .
In protein engineering, improving thermostability is essential for many industrial and pharmaceutical applications. However, the experimental process of identifying stabilizing mutations is time-consuming due to the enormous search space. With the increasing availability of protein structural and thermostability data, computational approaches using deep learning to identify thermostable candidates are gaining popularity. In this work, we present and benchmark a novel graph neural network, ProtGCN, that incorporates geometric and energetic details of proteins to predict changes in Gibbs free energy (ΔG), a key indicator of thermostability, upon single point mutations. Unlike conventional methods that rely on sequence or structural features, our model uses protein graphs with rich node features, carefully preprocessed from a comprehensive dataset of approximately 4149 mutated sequences across 117 protein families. In addition, ProtGCN is enhanced by incorporating embeddings from the Evolutionary Scale Modeling (ESM) protein language model into the protein graphs. This integration allows ProtGCN (ESM) to outperform comparison models, achieving competitive performance with XGBoost and a protein language model-based multi-layer perceptron on all evaluation metrics, and outperforming all models on further analyses. A strength of ProtGCN (ESM) is its ability to correctly identify and predict stabilizing and destabilizing mutations with extreme effects, which are typically underrepresented in thermostability datasets. These results suggest a promising direction for future computational protein engineering research.
Prof. Dr. Florian Haselbeck,
Maura John,
Yuqi Zhang,
Jonathan Pirnay,
Juan Pablo Fuenzalida-Werner,
Ruben Costa,
Prof. Dr. Dominik Grimm
Superior Protein Thermophilicity Prediction With Protein Language Model Embeddings (2024) Biological Materials Science - A workshop on biogenic, bioinspired, biomimetic and biohybrid materials for innovative optical, photonics and optoelectronics applications 2024 .
Protein thermostability is an essential property for many biotechnological fields, such as enzyme engineering and protein-hybrid optoelectronics. In this context, machine learning-based in silico predictions have the potential to reduce costs and development time by identifying the most promising candidates for subsequent experiments. The development of such prediction models is enabled by ever-growing protein databases and information on protein stability at different temperatures. In this study, we leverage protein language model embeddings for thermophilicity prediction with ProLaTherm, a Protein Language model-based Thermophilicity predictor. We assess ProLaTherm against several feature-, sequence-, and literature-based comparison partners on a new benchmark dataset derived from a significant update of published data. ProLaTherm outperforms all comparison partners both in a nested cross-validation setup and on protein sequences from species not seen during training with respect to multiple evaluation metrics. In terms of Matthew's correlation coefficient, ProLaTherm surpasses the second-best competitor by 18.1% in the nested cross-validation setup. Using proteins from species that do not overlap with species from the training data, ProLaTherm outperforms all competitors by at least 9.7%. On this data, it misclassified only one non-thermophilic protein as thermophilic. Furthermore, it correctly identified 97.4% of all thermophilic proteins in our test set with an optimal growth temperature above 70°C.
Josef Eiglsperger,
Prof. Dr. Florian Haselbeck,
Viola Stiele,
Claudia Guadarrama Serrano,
Kelly Lim-Trinh,
Prof. Dr. Klaus Menrad,
Prof. Dr. Thomas Hannus,
Prof. Dr. Dominik Grimm
Accurately forecasting demand is a potential competitive advantage, especially when dealing with perishable products. The multi-billion dollar horticultural industry is highly affected by perishability, but has received limited attention in forecasting research. In this paper, we analyze the applicability of general compared to dataset-specific predictors, as well as the influence of external information and online model update schemes. We employ a heterogeneous set of horticultural data, three classical, and twelve machine learning-based forecasting approaches. Our results show a superiority of multivariate machine learning methods, in particular the ensemble learner XGBoost. These advantages highlight the importance of external factors, with the feature set containing statistical, calendrical, and weather-related features leading to the most robust performance. We further observe that a general model is unable to capture the heterogeneity of the data and is outperformed by dataset-specific predictors. Moreover, frequent model updates have a negligible impact on forecasting quality, allowing long-term forecasting without significant performance degradation.
Mehr
Nikita Genze,
Wouter K Vahl,
Jennifer Groth,
Maximilian Wirth,
Michael Grieb,
Prof. Dr. Dominik Grimm
Sustainable weed management strategies are critical to feeding the world’s population while preserving ecosystems and biodiversity. Therefore, site-specific weed control strategies based on automation are needed to reduce the additional time and effort required for weeding. Machine vision-based methods appear to be a promising approach for weed detection, but require high quality data on the species in a specific agricultural area. Here we present a dataset, the Moving Fields Weed Dataset (MFWD), which captures the growth of 28 weed species commonly found in sorghum and maize fields in Germany. A total of 94,321 images were acquired in a fully automated, high-throughput phenotyping facility to track over 5,000 individual plants at high spatial and temporal resolution. A rich set of manually curated ground truth information is also provided, which can be used not only for plant species classification, object detection and instance segmentation tasks, but also for multiple object tracking.
Mehr
Fabian Schäfer,
Manuel Walther,
Prof. Dr. Dominik Grimm,
Alexander Hübner
Assigning inpatients to hospital beds impacts patient satisfaction and the workload of nurses and doctors. The assignment is subject to unknown inpatient arrivals, in particular for emergency patients. Hospitals, therefore, need to deal with uncertainty on actual bed requirements and potential shortage situations as bed capacities are limited. This paper develops a model and solution approach for solving the patient bed-assignment problem that is based on a machine learning (ML) approach to forecasting emergency patients. First, it contributes by improving the anticipation of emergency patients using ML approaches, incorporating weather data, time and dates, important local and regional events, as well as current and historical occupancy levels. Drawing on real-life data from a large case hospital, we were able to improve forecasting accuracy for emergency inpatient arrivals. We achieved up to 17% better root mean square error (RMSE) when using ML methods compared to a baseline approach relying on averages for historical arrival rates. We further show that the ML methods outperform time series forecasts. Second, we develop a new hyper-heuristic for solving real-life problem instances based on the pilot method and a specialized greedy look-ahead (GLA) heuristic. When applying the hyper-heuristic in test sets we were able to increase the objective function by up to 5.3% in comparison to the benchmark approach in [40]. A benchmark with a Genetic Algorithm shows also the superiority of the hyper-heuristic. Third, the combination of ML for emergency patient admission forecasting with advanced optimization through the hyper-heuristic allowed us to obtain an improvement of up to 3.3% on a real-life problem.
Mehr
Prof. Dr. Florian Haselbeck,
Maura John,
Yuqi Zhang,
Jonathan Pirnay,
Juan Pablo Fuenzalida-Werner,
Ruben Costa,
Prof. Dr. Dominik Grimm
Protein thermostability is important in many areas of biotechnology, including enzyme engineering and protein-hybrid optoelectronics. Ever-growing protein databases and information on stability at different temperatures allow the training of machine learning models to predict whether proteins are thermophilic. In silico predictions could reduce costs and accelerate the development process by guiding researchers to more promising candidates. Existing models for predicting protein thermophilicity rely mainly on features derived from physicochemical properties. Recently, modern protein language models that directly use sequence information have demonstrated superior performance in several tasks. In this study, we evaluate the usefulness of protein language model embeddings for thermophilicity prediction with ProLaTherm, a Protein Language model-based Thermophilicity predictor. ProLaTherm significantly outperforms all feature-, sequence- and literature-based comparison partners on multiple evaluation metrics. In terms of the Matthew’s correlation coefficient, ProLaTherm outperforms the second-best competitor by 18.1% in a nested cross-validation setup. Using proteins from species not overlapping with species from the training data, ProLaTherm outperforms all competitors by at least 9.7%. On these data, it misclassified only one nonthermophilic protein as thermophilic. Furthermore, it correctly identified 97.4% of all thermophilic proteins in our test set with an optimal growth temperature above 70°C.
Genome-wide association studies (GWAS) are a powerful tool to elucidate the genotype–phenotype map. Although GWAS are usually used to assess simple univariate associations between genetic markers and traits of interest, it is also possible to infer the underlying genetic architecture and to predict gene regulatory interactions. In this chapter, we describe the latest methods and tools to perform GWAS by calculating permutation-based significance thresholds. For this purpose, we first provide guidelines on univariate GWAS analyses that are extended in the second part of this chapter to more complex models that enable the inference of gene regulatory networks and how these networks vary.
Mehr
Betreuung der Publikationsseiten
Gerhard Radlmayr
Referent für Wissenstransfer und Forschungskommunikation
Wir verwenden Cookies. Einige sind notwendig für die Funktion der Webseite, andere helfen uns, die Webseite zu verbessern. Um unseren eigenen Ansprüchen beim Datenschutz gerecht zu werden, erfassen wir lediglich anonymisierte Nutzerdaten mit „Matomo“. Um unser Internetangebot für Sie ansprechender zu gestalten, binden wir außerdem externe Inhalte unserer Social-Media-Kanäle ein.