Initial development of web application to support excipient selection for immediate release tablets using message passing neural network

Research Article

Initial development of web application to support excipient

selection for immediate release tablets using message passing

neural network

Linh Nguyen Trana,*, Hang Phan Thi Thua, Linh Vu Ngoc Haia

aHanoi University of Pharmacy, 13-15 Le Thanh Tong, Hanoi, Vietnam

Journal of Pharmaceutical Research and Drug Information, 2023, 14 (5): 23-31

A R T I C L E I N F O

Article history

Received 13 May 2023

Revised 15 sept 2023

Accepted 24 Nov 2023

Keywords

Immediate release tablets

Artificial intelligence

Message passing neural

network

Excipient selection

Web application

A B S T R A C T

The formulation of immediate release tablets is a challenging task due to

the need to balance multiple factors such as bioavailability and stability.

The selection of appropriate excipients is critical in achieving these

objectives. The recent successful application of Artificial Intelligence (AI)

and Message Passing Neural Network (MPNN) in predicting

physicochemical and biological properties of drug molecules suggests for

the researchers on the ability to apply these models in selecting excipients.

The aim of this study is to develop an innovative approach for selecting

excipients using AI and MPNN and to create a user-friendly web

application to support excipient selection for immediate release tablets.

The study utilized a database of 13,278 immediate-release tablets to train,

validate, and test the MPNN model on the basis of Simplified Molecular-

Input Line-Entry system (SMILES) of drug substances. The performance

of the model was validated based on its ability to predict the probability

of selecting an excipient reasonably. A web application named FormAI

was developed using the Streamlit web framework and integrated with the

trained model. The MPNN model demonstrated good performance, with

an average Area Under Curve > 0.98 and R2 > 0.99, indicating its ability

to predict the probability of selecting an excipient reasonably. The FormAI

application provides a user-friendly platform for excipient selection. The

results of the study demonstrate the potential of using AI and MPNN in

drug formulation design, specifically in excipient selection for immediate-

release tablets. The FormAI application provides a practical solution for

pharmaceutical scientists and formulators.

* Corresponding author: Linh Nguyen Tran; e-mail address: linhnt@hup.edu.vn

https://doi.org/10.59882/1859-364X/134

Journal homepage: jprdi.vn/JP

Journal of Pharmaceutical Research and Drug Information

An oﬃcial journal of Hanoi University of Pharmacy



Introduction

Excipient selection is one of the most

important but also difficult issues in the

process of researching and developing drug

products. Traditionally, excipient selection

is often based on the “trial and error”

method which can be time-consuming,

costly, and does not always lead to the most

effective formulation. Experimental design

and optimization methods can be applied but

require formulators to have knowledge and

experience in drug design. To address this

issue, researchers have explored the use of

Artificial Intelligence (AI) as a powerful

tool in pharmaceutical research and

development.

In AI models, Graph Neural Networks

(GNNs), such as Message Passing Neural

Networks (MPNNs, a special type of GNN),

are increasingly being used in pharmaceutical

research and development. MPNN is a type

of neural network that can operate on

structured graph data (consisting of nodes and

vectors linking nodes), such as the structure

of drug molecules (in this case, atoms in the

molecule act as nodes, while chemical bonds

act as vectors). MPNN has been shown to be

highly effective in predicting drug molecule

properties such as solubility, oil-water

distribution coefficient, biological effects or

toxicity [1], [2], [3]. However, until now, no

studies have been published on the use of AI

models for selecting excipients in drug

formulation research.

The goal of this study was to develop an

intelligent web application that uses artificial

intelligence and message passing neural

networks to help formulators shorten the

research and development process for

compressed and released drugs by suggesting

suitable excipients for each type of drug

substance. Applying this application will help

reduce research costs and time, as well as

improve product quality. The results of the

study can provide the pharmaceutical

industry with a useful tool to enhance

competitiveness and meet market demand.

Materials and Methods

Data collection

Data on the ingredients of tablets was

collected from DailyMed. This is a public

database managed by the US Food and Drug

Administration (FDA), containing detailed

information on drug products approved by

the FDA. The web scraping tool

BeautifulSoup was used to collect

information on the ingredients of tablets from

DailyMed’s web pages. The collected data

included the name of the drug products, drug

substances, dosage form, strength, and

corresponding excipients. These data were

then stored in a CSV (comma-separated

values) file format for ease of retrieval and

use for model training purposes. Collecting

and storing this data will provide an

important database resource for similar

studies in the future.

Data preparation

The drug substance and excipient names

in the CSV file were standardized by a unique

identifier (Unique Ingredient Identifiers -

UNIIs); the drug substance structural

formulas in simplified molecular-input line-

entry system (SMILES) format was added to

the database and transformed into an input

tensor for the MPNN model. This

transformation allowed the researchers to use

complex data analysis algorithms and tools to

make reasonable predictions about excipient

selection for immediate-release tablet

formulation. The input features generated

from SMILES ensured the integrity and

accuracy of the chemical structure data of the

Linh Nguyen Tran et al. J.Pharm.Res-DI. 2023, 14(5): 23-31

drug molecule used in the MPNN model

training process.

Model developpement

The MPNN model had the following

architecture [4]:

Input Layer: Received input as the

attributes of each atom in the molecule and

the attributes of each bond between atoms.

This layer had sub-layers:

- Atom_features (Input Layer): The input

was a 42-dimensional vector containing

information about all common atoms in the

drug molecule.

- Bond_features (Input Layer): The input

was a 7-dimensional vector containing

information about all types of chemical bonds

between atoms in the molecule.

- Pair_indices (Input Layer): The input

was a 2-dimensional vector containing

information about pairs of atoms linked by a

chemical bond.

Message Passing Layer: Performed the

process of message passing between atoms in

the molecule. This layer had 1 sub-layer:

- Message_passing (Message Passing):

The input was the input layers prepared

earlier and returned the feature vector of

atoms after being passed through bonds.

Graph Pooling Layer: Performed pooling

to reduce the output dimension of feature

vectors of atoms in the molecule. This layer

had 1 sub-layer:

Global_average_pooling1d: The input was

the feature vector of atoms after being passed

through bonds and returned a global feature

vector of the molecule.

Fully Connected Layer: Passed the global

feature vector of the molecule through fully

connected layers to calculate predictions

about appropriate excipients for immediate-

release tablet formulation. This layer had sub-

layers:

- Dense_2 (Dense): The input was the

global feature vector of the molecule and

returned a feature vector with size determined

when optimizing the model.

- Dense_3 (Dense): The input was a

feature vector with size determined when

optimizing the model and returned the

probability of an excipient being selected for

immediate-release tablet formulation.

The activation function for the last layer

was a sigmoid function used to calculate

probability of an excipient being selected.

The activation function for the other layers

was ReLu function [5].

The loss function (which represents the

difference between the actual value and the

value predicted by the model) was

BinaryCrossentropy [6]. When training the

model, a lower value of this function was

better.

Training and validating model

The model was written in Python 3.9,

which is a simple and easy-to-write syntax

language, making it one of the most popular

programming languages today. Python is

highly flexible and can be used to develop

web applications, computer software,

artificial intelligence, data analysis, and many

other fields. The process of training and

evaluating the model was performed on

Google Colab using TensorFlow and Keras.

Google Colab is a Google service platform

that allows users to access a free Jupyter

Notebook environment for data analysis and

running Python code. TensorFlow is an open-

source library developed by Google for

processing large data and developing

machine learning and deep learning models.

Keras is an application programming

interface (API) for TensorFlow to help

simplify the building and training of deep

learning models.

Linh Nguyen Tran et al. J.Pharm.Res-DI. 2023, 14(5): 23-31

The database was randomly divided into

a training set (train_dataset) accounting for

60% for model training, a validation set

(valid_dataset) accounting for 30% for model

validation, and a test set (test_dataset)

accounting for 10% to test the model.

The model fitting method was used to train

the model on the training dataset. During

training, callbacks were used to minimize

overfitting and increase model stability. The

model was evaluated through the area under

the curve (AUC) representing the accuracy of

the model over training epochs and R2 value.

The trained model was saved then reloaded

to retest on the test dataset. The built model

is capable of suggesting excipients for drugs

not in the original database.

Web application deployment

The trained MPNN model was stored on

Github (https://github.com) and integrated

into the web application developed on

Streamlit platform (https://streamlit.io) to

deploy the model.

In order to assist researchers in the

selection of excipients, we present the

Excipient Selection Scale, which is

determined through the following formula:

Excipient Selection

Scale= P×(1-(1-P)×(1-Q)) (1)

Where:

P: The predicted probability of finding the

excipient under consideration in drug

products containing the input drug substance

by MPNN model.

Q: Proportion of drug products containing

the excipient under consideration over total

drug products in the database.

The Excipient Selection Scale takes values

from 0 to 1 and the larger it is, the more

excipients should be considered for selection

The Streamlit framework was used to

build a simple and easy-to-use interface.

Users could enter the name or SMILES of the

drug substances and the application would

return a list of excipients along with the

Excipient Selection Scale. In addition, some

information about molecular structure and

physical, chemical, and pharmacokinetic

properties of the drug was also predicted with

the help of SwissADME tool

(http://www.swissadme.ch).Files must be in

MS Word only and should be formatted for

direct printing, using the CRC MS Word

provided. Figures and tables should be

embedded and not supplied separately.

Results and discussion

Results

Data collection and preparation

After collecting and preparing the data, a

significant dataset of immediate-release tablet

ingredients from DailyMed, including 13,278

drug products produced from 622 different

drug substances (including different

derivatives of each drug substance) was

obtained. The number of excipients collected

in this dataset was 322.

Based on the collection results, three

separate datasets for use in the training,

validation, and testing of the model were

created. The training set consisted of 7,967

products, the validation set consisted of 3,983

products, and the test set consisted of 1,328

products. These were datasets that were large

and diverse enough to ensure the feasibility

and accuracy of the training, validation, and

testing process.

MPNN model developpement

The MPNN Model function defined an

MPNN (Message Passing Neural Network)

model in TensorFlow. The input parameters

of this function included:

- Atom_dim: The size of the feature vector

for each atom in the molecule.

Linh Nguyen Tran et al. J.Pharm.Res-DI. 2023, 14(5): 23-31

- Bond_dim: The size of the feature vector

for each bond between atoms in the molecule.

- Batch_size: The number of samples

(drug products) in each training batch.

- Message_units: The number of units in

the message passing layer.

- Message_steps: The number of messages

passing steps performed.

- Num_attention_heads: The number of

attention heads used in the attention layer.

- Dense_units: The number of neuron units

in the fully connected layers.

The Adam optimization algorithm (7) was

used to optimize the model’s parameters. The

initial learning rate was set to 0.001 and

decreased by a factor of 10 after every 20

epochs. The training process stopped if the

model’s error on the validation dataset did not

decrease after 20 consecutive epochs. The

number of neurons in each layer was

configured to achieve the best performance.

The architecture of the optimized MPNN

model is shown in Figure 1.

Figure 2 represents the value of the Area

Under Curve (AUC) which represents the

accuracy of the model over the number of

Linh Nguyen Tran et al. J.Pharm.Res-DI. 2023, 14(5): 23-31

Figure. 1. The structure of MPNN

Initial development of web application to support excipient selection for immediate release tablets using message passing neural network

The recent successful application of Artificial Intelligence (AI) and Message Passing Neural Network (MPNN) in predicting physicochemical and biological properties of drug molecules suggests for the researchers on the ability to apply these models in selecting excipients.

Chủ đề:

Sức khỏe nghề nghiệp

Tài liệu liên quan

Thực trạng mắc stress ở nhân viên y tế tại Trung tâm Y tế huyện Thanh Ba, tỉnh Phú Thọ năm 2024 và một số yếu tố liên quan

Phân tích một số yếu tố liên quan tới trải nghiệm của người bệnh, người nhà khám tai mũi họng tại Phòng khám Đa khoa và Khám bệnh nghề nghiệp, Trường Đại học Y Hà Nội năm 2024

Đánh giá tác động của đào tạo nâng cao năng lực đến sự tự tin trong chăm sóc sức khoẻ tâm thần cho người bệnh ung thư của điều dưỡng viên

Bài giảng Thương tích học - ThS. BS Lê Thị Thanh Phương

Khảo sát tình trạng rối loạn cơ xương của bác sĩ răng hàm mặt ở các cơ sở y tế tại thành phố Huế năm 2024

Đề cương ôn tập môn Mô học: Mô thần kinh - Hệ thần kinh (Có đáp án)

Đề cương ôn tập môn Mô học: Hệ tuần hoàn (Có đáp án)

Đề cương ôn tập môn Mô học: Hệ hô hấp (Có đáp án)

Đề cương ôn tập môn Mô học: Hệ cơ (Có đáp án)

Bài giảng Cơ quan tạo huyết và miễn dịch - TS. Phạm Ngọc Khôi

Tài liêu mới

Tài liệu 45 câu hỏi ôn tập về Trung thất - phổi

Khám kiểm tra vùng chậu và xét nghiệm Pap quá nhiều và không cần thiết ở phụ nữ trẻ

Tổng quan hành vi tình dục nguy cơ ở trẻ vị thành niên

Điều trị duy trì cho bệnh nhân ung thư buồng trứng tái phát: Đã đến thời khắc thích hợp?

Thuốc mới trong điều trị cho bệnh nhân ung thư biểu mô buồng trứng: Đâu là nhu cầu y khoa chưa được đáp ứng?

Kháng insulin: Cơ chế sinh bệnh và Hướng điều trị

Hướng dẫn điều trị dựa trên y học thực chứng cho những cặp đôi vô sinh vô căn

Song thai

Liên quan giữa tải lượng HBV DNA và HBeAg ở thai phụ có HBsAg dương tính

Tỷ lệ mất ngủ và các yếu tố liên quan ở thai phụ ba tháng cuối thai kỳ tại Bệnh viện Hùng Vương

Thai bám ở sẹo mổ lấy thai được xử trí thành công bằng phương pháp Timor-Trisch: Báo cáo trường hợp và điểm qua y văn

Chất béo thực vật có nguy cơ gây đột quỵ thấp hơn chất béo động vật

Dinh dưỡng tối ưu từ 1000 ngày đầu tiên Tạo nền tảng vững chắc cho thế hệ tương lai

Giá trị của tỉ số Protein/Creatinine niệu bất kỳ trong chẩn đoán Tiền sản giật có dấu hiệu nặng

Tỷ lệ trẻ sơ sinh đủ tháng có chỉ số APGAR 1 phút dưới 7 điểm và các yếu tố liên quan tại Bệnh viện Đa khoa tỉnh Bình Định năm 2021

AI tóm tắt

Giới thiệu tài liệu

Đối tượng sử dụng

Từ khoá chính

Nội dung tóm tắt

Giới thiệu

Về chúng tôi

Việc làm

Quảng cáo

Liên hệ

Chính sách

Thoả thuận sử dụng

Chính sách bảo mật

Chính sách hoàn tiền

DMCA

Hỗ trợ

Hướng dẫn sử dụng

Đăng ký tài khoản VIP

093 303 0098

support@tailieu.vn

Phương thức thanh toán

Theo dõi chúng tôi

Facebook

Youtube

TikTok