HPLC for Pharmaceutical Scientists 2007 (Part 10)

Chia sẻ: Big Big | Ngày: | Loại File: PDF | Số trang:30

Thêm vào BST

Báo xấu

121
lượt xem 20
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

In modern high-performance liquid chromatography (HPLC), computers in a broad sense are used in every instrumental module and at every stage of analysis. Computers control the ﬂow rate, eluent composition, temperature, injection volume, and injection process. Detector output signal is converted from analog form into the digital representation to recognize the presence of peaks, and then at higher level of computer analysis a chromatogram is obtained. All these computer-based functions are performed in the background, and the chromatographer usually does not think about them. The second level of computer utilization in HPLC is extraction of valuable analytical...

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: HPLC for Pharmaceutical Scientists 2007 (Part 10)

10 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT Yuri Kazakevich, Michael McBrien, and Rosario LoBrutto 10.1 INTRODUCTION In modern high-performance liquid chromatography (HPLC), computers in a broad sense are used in every instrumental module and at every stage of analy- sis. Computers control the ﬂow rate, eluent composition, temperature, injec- tion volume, and injection process. Detector output signal is converted from analog form into the digital representation to recognize the presence of peaks, and then at higher level of computer analysis a chromatogram is obtained. All these computer-based functions are performed in the background, and the chromatographer usually does not think about them. The second level of computer utilization in HPLC is extraction of valuable analytical and physicochemical information from the chromatogram. This includes standard analytical procedures of peak integration, calibration and quantitation, and more complex correlation of the retention dependencies with variation of selected parameters. At the third (and probably highest) level, a computer is used for the sophis- ticated analysis of many different experimental results stored in databases. This level is usually regarded as a knowledge management level and can have quite a variety of different goals: • Selection of the starting conditions for method development by using information of similar separations HPLC for Pharmaceutical Scientists, Edited by Yuri Kazakevich and Rosario LoBrutto Copyright © 2007 by John Wiley & Sons, Inc. 503
504 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT • Optimization of the existing method, to speed up the analysis, increase ruggedness of the chromatographic method, and so on • Review of a multitude of data from different experiments and their cor- relation with information from other physicochemical methods • Cross-laboratory information exchange (early drug discovery, preformu- lation groups, drug metabolism and pharmokinetic groups, drug substance and drug product groups) In this chapter the third level of computer-assisted HPLC—the use of expert systems (like Drylab [1], AutoChromTM [2], and ChromSword® [3]) for effec- tive method development—is discussed. Computer-assisted method development has received a great deal of atten- tion from management within the pharmaceutical industry, mainly from the perspective of cost savings associated with faster and more efﬁcient develop- ment. Adoption and incorporation of the tools in day-to-day workﬂows has been relatively limited due in part to a reluctance of chromatographers to believe that computers can replace the intuition of the expert chromatogra- pher. With the present state-of-the-art, there is little question that computers can play a role in efﬁcient method development. However, it must be accepted that computers are a supplement to, rather than a replacement for, the knowl- edge of the method development chromatographer. Two main types of software tools exist that are directly applicable to the problem of chromatographic method development. 1. Optimization or experimental design software packages for modeling the chromatographic response as a function of one or more method vari- ables. These can also play a key role in data management of the consid- erable information that results from rigorous method development exercises. 2. Structure-based prediction software predicts retention times or impor- tant physicochemical processes based on chemical structures. Applica- tion databases store chromatographic methods for later retrieval and adaptation to new samples with similar structures and physicochemical parameters. 10.2 PREDICTION OF RETENTION AND SIMULATION OF PROFILES In Chapters 2, 3, and 4, all aspects of the analyte retention on the HPLC column are discussed. There are many mathematical functions describing retention dependencies versus various parameters (organic composition, tem- perature, pH, etc.). Most of these dependencies rely on empirical coefﬁcients. Analyte retention is a function of many factors: analyte interactions with the stationary and mobile phases; analyte structure and chemical properties; struc-
PREDICTION OF RETENTION AND SIMULATION OF PROFILES 505 ture and geometry of the column packing material; and many other parame- ters. The theoretical functional description of the inﬂuence of the eluent com- position, mobile-phase pH, salt concentration, and temperature, as well as the inﬂuence of the type of organic modiﬁer and type of salt added to the mobile phase, are discussed in detail in Chapter 2 and 4. Currently, eluent composition, column temperature, and eluent pH are the only continuous parameters used as the arguments in functional optimization of HPLC retention. However, other parameters such as ionic strength, buffer concentration and concentration of salts and/or ion-pairing reagents can be taken into account, and mathematical functions for these can be constructed and employed. The simplest and the most widely used forms of retention time prediction for analytical scale HPLC are based on the empirical linear dependence of the logarithm of the retention factor on the eluent composition. 10.2.1 General Thermodynamic Basis Association of the chromatographic retention factor with the equilibrium con- stant is the basis for all optimization or prediction algorithms. As was shown in Chapter 2, this association is only very approximate and should be used with caution. In short, an approximate mathematical description of the retention factor dependences on the eluent composition and temperature is written in the form  ∑ ∆Gan.frag. ∆Gel.  k = exp −f  (10-1)  RT RT  where f is the molar fraction of the organic eluent modiﬁer, DGel. is the Gibbs free energy of the organic eluent modiﬁer interaction with the stationary phase; R is the gas constant; T is the absolute temperature, and DGan.frag. is the Gibbs free energy of the interactions of structural analyte fragments with the stationary phase. Equation (10-1) is based on the assumption of simple additivity of all inter- actions and a competitive nature of analyte/eluent interactions with the sta- tionary phase. The paradox is that these assumptions are usually acceptable only as a ﬁrst approximation, and their application in HPLC sometimes allows the description and prediction of the analyte retention versus the variation in elution composition or temperature. For most demanding separations where discrimination of related components is necessary, the accuracy of such pre- diction is not acceptable. It is obvious from the exponential nature of equa- tion (10-1) that any minor errors in the estimation of interaction energy, or simple underestimation of mutual inﬂuence of molecular fragments (neglected in this model), will generate signiﬁcant deviation from predicted retention factors.
506 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT 10.2.2 Structure–Retention Relationships Many attempts to correlate the analyte structure with its HPLC behavior have been made in the past [4–6]. The Quantitative structure–retention relation- ships (QSRR) theory was introduced as a theoretical approach for the pre- diction of HPLC retention in combination with the Abraham and co-workers adaptation of the linear solvation energy relationship (LSER) theory to chro- matographic retention [7, 8]. The basis of all these theories is the assumption of the energetic additivity of interactions of analyte structural fragments with the mobile phase and the stationary phase, and the assumption of a single-process partitioning-type HPLC retention mechanism. These assumptions allow mathematical repre- sentation of the logarithm of retention factor as a linear function of most con- tinuous parameters (see Chapter 2). Unfortunately, these coefﬁcients are mainly empirical, and usually proper description of the analyte retention behavior is acceptable only if the coefﬁcients are obtained for structurally similar components on the same column and employing the same mobile phase. To date, the shortcomings in the theoretical [22] and functional description of HPLC column properties make all these theories insufﬁcient for practical application to HPLC method design and selection. In the past, several theoretical models were proposed for the description of the reversed-phase retention process. Some theories based on the detailed consideration of the analyte retention mechanism give a realistic physico- chemical description of the chromatographic system, but are practically inap- plicable for routine computer-assisted optimization or prediction due to their complexity [9, 10]. Others allow retention optimization and prediction within a narrow range of conditions and require extensive experimental data for the retention of model compounds at speciﬁed conditions [11]. Probably the most widely studied is the solvophobic theory [12] based on the assumption of the existence of a single partitioning retention mechanism and using essentially equation (10-1) for the calculation of the analyte reten- tion. Carr and co-workers adapted the solvophobic theory [12, 13] and LSER theory [11, 14–17] to elucidate the retention of solutes in a reversed-phase HPLC system on nonpolar stationary phases. The free energy of transfer of a molecule from the mobile phase to the sta- tionary phase, DG, can be regarded as a linear combination of the free reten- tion energies, DGi, arising from various molecular subunits (solvatochromic parameters). Many solvatochromic parameters for some analytes could be found in the literature [18–21]. The signs and magnitudes of the coefﬁcients depict the direction and relative strength of different kinds of solute/station- ary and solute/mobile phase interactions contributing to the retention in the investigated matrix [11–15]. The most inﬂuential factors governing RP-HPLC retention on alkyl and phenyl-type bonded phases were determined to be hydrogen bonding and the solute molecular volume [12, 13, 20, 23]. The hydro-
OPTIMIZATION OF HPLC METHODS 507 gen bonding is measured as the effect of complexation between hydrogen- bond acceptor (HBA) solutes and hydrogen-bond donor (HBD) bulk phases [24]. The solute molecular volume is comprised of two terms: One measures the cohesiveness of the chromatographic phases (both the mobile and sta- tionary phases) and the other is the dispersive term that measures the ability of the chromatographic phases to interact with solutes via dispersive forces. 10.3 OPTIMIZATION OF HPLC METHODS 10.3.1 Off-Line Optimization The most common software tools used for chromatographic method develop- ment are optimization packages. All of these tools take advantage of the fact that the retention of a given compound will change in a predictable manner as a function of virtually any continuous chromatographic variable. The classic example (and certainly most common application) of computer- assisted chromatographic optimization is eluent composition, commonly called solvent strength optimization. The chromatographer performs at least two experiments varying the gradient slope for gradient separations or con- centration of organic modiﬁer for isocratic separations at a certain tempera- ture. The system is then modeled for any gradient or concentration of organic modiﬁer. A simplistic description of the chromatographic zone migration through the column under gradient conditions is given in Chapter 2. At iso- cratic conditions the linear dependence of the logarithm of retention factor on the eluent composition is used for optimization: ln(k ) = Aj + B (10-2) where k is the retention factor of the compound, φ is the fraction of organic solvent in the mobile phase, and A and B are constants for a given compound, chromatographic column, and solvent system. Based on a few experiments, the constants in the expression can be extracted, and retention of each compound can be predicted. This optimization approach can be used to model both retention times and selectivities due to the fact that both the A and B terms are unique for a given analyte. The typical output from method optimization software is a resolution map, as shown in Figure 10-1. The map shows resolution of the critical pair (two closest eluting peaks) as a function of the parameter(s). The example shows resolution as a function of gradient time (slope of the gradient). The resolu- tion map has several advantages as an experimental display tool: It forms a concise summary of experiments performed, it allows the chromatographer to select areas of interest and communicate the expected result, and it facilitates the viewing of data that would allow for a more robust separation.
508 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT Figure 10-1. DryLab® software version 3.0 modeling the separation of a mixture of naphthalenes. Resolution of the critical pair (the two peaks that elute closest together) is denoted as a function of time of gradient. Experimental runs are shown as solid lines on the resolution map; selected prediction is a dashed line. Optimization of the eluent composition is commonly based on the linear relationship of ln k to f (10-4) and generally applicable for ideal chromato- graphic systems with unionizible analytes in methanol/water mixtures. It is commonly assumed that: • A single partitioning-like equilibrium process dominates in the retention mechanism. • Analyte ionization changes do not occur in the pertinent solvent range. • Column property changes do not occur over the course of the experiment. Like in any optimization tool, the chromatographer should be wary of extrapolation beyond the scope of the training experiments. Behavior of certain parameters, like temperature and solvent strength, is fairly easily modeled. Other parameters, such as buffer concentration and pH, can be much more difﬁcult to model. In these cases, interpolation between fairly closely spaced points (actual experiments that were performed) is most appropriate. Figure 10.2 shows a resolution map for a two-dimensional system in which solvent composition and triﬂuoroacetic acid concentration are simultaneously optimized. The chromatographer has collected systematic experiments at TFA concentrations of 5, 9, 13, and 17 mM and acetonitrile concentrations of 30, 50, and 70 v/v% for a series of small molecules on a Primesep 100 column.
OPTIMIZATION OF HPLC METHODS 509 Figure 10-2. ACD/LC SimulatorTM 9.0 modeling the separation of a series of com- pounds as a function of solvent composition and TFA concentration (mM). Experi- ments are shown as white dots on the resolution map with the predicted optimal method shown in yellow. See color plate. Note. The type and concentration of the organic eluent can cause a pH shift of the aqueous portion of the mobile phase as well as change the ionization state of the analyte in a particular hydro-organic mixture. Temperature can also lead to change in the ionization constants of analytes. Even when chromatographers are careful to keep buffer strengths constant during modiﬁcation of organic solvent strengths, effective analyte pKa changes and mobile-phase pH changes as a result of solvent strength, which can cause changes in ionization state of compounds, changes in the resultant mobile- phase pH, and/or changes in the behavior of chromatographic columns [25]. Departures from linearity can be particularly striking in acetonitrile as opposed to methanol. For systems in which the greatest possible quality of method is required in terms of resolution, run time, and robustness, the results from predictions should be veriﬁed against experimental data and, where nec- essary, nonlinear predictions should be used to reﬁne the model and to locate the optimal conditions. Computer-assisted optimization of parameters has not been universally accepted, primarily due to a lack of ease of use. All compounds must be tracked across all experiments, and all retention times must be introduced to the system for each component. This is sometimes difﬁcult because signiﬁcant variations in the retention and elution order could be observed for certain ana- lytes. With diode array detection, even if the different analytes have distinct
510 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT diode array proﬁles, the analytes with low concentration in the mixture may still be difﬁcult to track. The use of MS detection can assist in the detection of the peaks in the different experiments, with the assumption that they are not isomers of each other. Software vendors have begun to address much of this with the implementation of automated peak-tracking systems (see Section 10.3.4.2) and direct transfer of experimental information from chromatogra- phy data systems. Advantages of this technique are the efﬁciency of development of methods, structured development proﬁles, and effective reporting of what was per- formed during the different method development iterations. In addition, it is possible to model the effect of parameter variation on the robustness of methods in addition to general chromatographic ﬁgures of merit: apparent efﬁciency, tailing, resolution of critical pairs, backpressure of system, total run time. 10.3.2 On-Line Optimization Recently there has been renewed interest in automated method development in which the optimization software directly interfaces with the instrument in order to run or suggest new experiments based on the prior results that gen- erated the initial resolution maps. In the late 1980s, a number of approaches to this problem were attempted, but none of these tools prevailed, due in part to the challenges of tracking peaks between experiments. The current second-generation tools offer more promise due to (a) a focus on secondary detection techniques for peak tracking and (b) better automa- tion tools offered by instrument vendors. The advantages of on-line automation are the achievement of time savings in relation to the chromatographic method development time. The software can make decisions at any time of the day or night and can immediately communicate this information to the instrument after the completion of the experiment. There is also a more subtle beneﬁt to the link of optimization software to the chromatography data system. Method development “wizards” with drop-down menus/user-deﬁned ﬁelds can simplify the process of conﬁg- uring the instrument sequence/method prior to a method development session. Disadvantages of on-line optimization lie primarily in the maturity of this technology. If manual method development is based on the experience and intuition, the automated method development in principle should follow the logic of chromatographic theory, which unfortunately is not yet developed enough to provide a logical guide for automated optimization. Software and instrument vendors are relying on the statistical optimization with minimal use of available theoretical developments and only on the level of simple parti- tioning mechanism and energetical additivity. The capacity of software inno- vators to address detection limit, peak-tracking, and artiﬁcial intelligence issues remains in question at present, but the considerable commitment by
OPTIMIZATION OF HPLC METHODS 511 instrument and software vendors points to the future value of these tools. As spectroscopic peak-tracking algorithms mature, the effectiveness of the tools will grow considerably. 10.3.3 Method Screening There are some chromatographic parameters that do not readily lend them- selves to optimization. There have been some efforts to quantify the selectiv- ity in chromatographic columns [26, 27], but it is often difﬁcult to achieve targeted values for each of the parameters involved without custom prepara- tion of materials. Experimental mobile-phase pH values must typically be very close together in order to enable subsequent pH optimization. Column and pH choice are critical to the selectivity of a given system, so it is clear that their effects should not be ignored. One solution to this problem is to screen different columns and pH values prior to commencing any kind of optimiza- tion. The screening results are reviewed, and optimization systems at a particu- lar pH are designed accordingly. With the advent of column switchers and more reproducible alternative column materials, it is now quite feasible to screen multiple pH values—for example, at high, medium, and low pH—using scouting gradients in order to choose the column and pH at which to perform further optimization experi- ments. This is a particularly tempting scenario when few or no chemical struc- tures are available for the synthetic by-products or degradation products in the sample, or when samples are particularly complex. Recently there has been considerable development on systems for selection of optimal pH and type of column concomitantly [28]. For complex samples, it can be time-consuming and challenging to review all the results of system screens objectively. In addition, online optimization precludes the direct involvement of the chromatographer. For this reason, it is desirable to use some numerical description of the potential effectiveness of a given set of conditions so the on-line optimization software can trigger further separations on the chromatographic system. Screening review tools cannot work solely based on the venerable “resolu- tion of the critical pair” approach; the results of an initial screen must be able to give nonzero results even with co-elution of two components, when the reso- lution of the critical pair will, of course, be zero. Additionally, a suitability approach involving criteria related to run time is unwise, since run time can be ﬁne-tuned based on solvent strength or ﬂow rate in ﬁnal optimization. Rather, at the screening stage, the chromatographer should be focused on sufﬁcient selectivity to form the basis of an eventual suc- cessful separation, and then ﬁne-tuning can be performed. There are a number of different measures of the desirability of an initial screen, including average resolution, resolution of critical pair, selectivity of critical pairs, and so on. The chromatographer need not be intimately familiar with the nuances of every rating system available. The only key is to be certain that appropriate
512 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT TABLE 10-1. Numerical Approaches to Ranking Separations Approach Basis Application Minimum resolution Resolution of closest-eluting peaks Final model of (resolution of the (RCP) separation critical pair) Method suitability Product or minimum of various Final model of criteria: run time, resolution of separation critical pair, and resistance of (customizable) viability to small changes in conditions Mean resolution Average resolution Assessment of selectivity Run time (RT) versus N = 1 if RT < Target; Evaluating suitability Target (t) and N = 0 if RT > maximum; of solvent strength Maximum (M) N = 1 − (RT − t)/(M − t) and column choice Equidistance deviation from equal peaks resolution Comparison of RunTime − t0 starting systems N= n−1 Resolution score average value of normalized Comparison of resolutions between all the peaks starting systems detected on a chromatogram RsScore = ∑ Rs n N −1 rating systems are used at appropriate times. Table 10-1 shows some common approaches to the rating of chromatographic column screens [29]. 10.3.4 Method Optimization All approaches to method optimization based on multiple experiments have the requirement that all components be detected and that they be tracked between runs. For complex samples, this is typically the most labor-intensive aspect of method development. For unattended method development, the instrument is required to monitor the change in retention of each component automatically. The historical limitations to this technology have been a key stumbling block in the widespread adoption of automated method development. 10.3.4.1 Peak Matching in Method Optimization. An initial solution to the problem of peak tracking across multiple experiments was the isolation of each impurity on a preparative or semiprep scale, followed by injection of each component individually. The chromatographic world has essentially rejected this concept outright. Very few chromatographers have the time or willingness to isolate standards for each component. The use of crude samples and mother
OPTIMIZATION OF HPLC METHODS 513 liquors enriched with synthetic byproducts for initial method development of drug substance is recommended. Another approach is to look at the molecule of interest and predict most probable degradation product(s) and use forced degraded samples for initial method development. For example, if a compound contains ester functionality, then acidic stress conditions can be employed to discern the retention of the carboxylic acid degradation product and other resultant degradation products. As another example, if a compound contains a pyridinal functionality, it might be subject to oxidation and catalyzed under light stress conditions; therefore a forced degradation solution in the presence of peroxide/light can be used to generate the resultant N-oxide degradation product. 10.3.4.2 LC/UV-Vis and LC/MS. Hyphenated detection in modern chro- matography has led to a great deal of interest in automated and semiauto- mated peak tracking based on diode array and mass spectral data. While several algorithms have been published for the utilization of hyphenated data for peak tracking [30, 31] based on a spectral match angle approach [32], there are few commercially available tools. Multivariate “chemometric” approaches seem to have the most potential for future success. There are two main com- mercially available approaches to peak tracking using diode array data. In the Waters® AMDS system using DryLab, peaks are tracked based on a library search technique, using match angles for extracted spectra. Essentially, after peak-picking, spectra are extracted and searched against a library formed from the spectra from other chromatograms. ACD/AutoChrom uses the “mutual automated peak matching” [33] or UV- MAP approach based on extraction of pure variables from diode array data. The UV-MAP algorithm applies abstract factor analysis (AFA) followed by iterative key set factor analysis to the augmented data matrix in order to extract retention times for each of the selected experiments. No commercial system for peak-tracking based on mass spectrometry (MS) data has been published to date. Recently [34], a customized MS-based peak tracking tool was reported using algorithms connecting to the Agilent Chem- Station Plus chromatography data system. This algorithm uses a logic-based approach to the extraction of molecular weights from MS data. Components are assigned based on isotope ratio conﬁrmation, adduct assignment, and elution characteristics. Retention time extraction was reported to be approxi- mately 80% successful, with failures primarily attributed to insufﬁcient ion- ization of components. A similar approach is used in the ACD/AutoChrom product, combining MS with diode array detection in order to address some issues with low signal individual detectors. Disadvantages. Neither the MS and ultraviolet (UV) detectors provide a complete solution alone. UV spectra simply are not unique enough to differentiate between closely related compounds. Under the conditions typically used for liquid chromatography, compounds may fail to give
514 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT sufﬁcient ionization with MS detection. In addition, the modiﬁcation of con- ditions that is inherent to method development causes spectral and ionization changes. All of these provide a tremendous challenge to software designers, but initial results appear promising. Advantages. This is critical technology to enable both automated and routine application of computer-assisted optimization. The manual effort required for traditional approaches to data interpretation in chromatographic method development is quite considerable. 10.3.4.3 Composite Samples and Data Management. A recent trend in pharmaceutical development has been the development of methods for the resolution and quantitation of related compounds based on a “proactive” strategy. During early drug development, a large number of different tests will be conducted on prospective drug candidates, including impurities analysis for stability indicating methods. The development of methods for this purpose is problematic because ﬁnal synthetic routes and formulations are not yet established, so the resultant impurity proﬁles will change as the synthesis is optimized and the ﬁnal market image is deﬁned. However, in order to avoid impeding the development process, it is important to have quantitative methods readily at hand and then modify them if needed as the drug devel- opment process continues. Many groups have chosen to approach this problem from the point of view of development of methods for all anticipated compounds such that practi- cally any sample conﬁguration can be treated with the same method, or with only slightly altered set of conditions. One of the more common approaches to method development in the drug substance and drug product groups in the pharmaceutical industry is to ﬁrst generate forced-decomposition samples (using mild conditions, not more than 5–10% degradation) based on treatment of the compounds with various stress conditions including, typically, UV light, heat, acid, base, and peroxide. These decomposed samples are injected separately, and then a method is designed to separate components in the forced degradation samples as if they were all present in the same sample. The development of methods for these “compos- ite samples” is typically required to be exceedingly rigorous. Columns, solvent systems, and pH values will be screened, and multidimensional optimization performed. The software tools that have been discussed in this chapter are invaluable for this kind of project. However, there is an additional challenge with this kind of method development. The amount of raw data generated in this kind of project can be particularly daunting. Before embarking on choosing the optimal conditions for optimization, generally a pH screen (at least ﬁve pH values) in either gradient or isocratic mode is performed to determine the most suitable pH ranges for the active pharmaceutical ingredient (at least one unit below or above the target analyte pKa in a particular hydro-organic system). This results in at least ﬁve experi-
OPTIMIZATION OF HPLC METHODS 515 ments on one column using LC/DAD detection. Once the acceptable pH ranges are determined (where analyte is predominately in its ionized state or in its neutral state), then column screening can be performed if necessary. If we consider the second step in method development for the API sample as the screening of six columns at two pH values with shallow gradient slope, we can see that 12 initial screening methods will be generated, with at least two hyphenated chromatographic traces in the form of LC/MS and LC/DAD data. This will result in managing at least 24 hyphenated data traces. If a steep gra- dient slope was also investigated, this would increase the number of hyphen- ated traces to 48. Then, in the third step, the two best columns at a particular pH with a particular gradient condition would then be chosen for analysis of the API sample, blank and the ﬁve different stressed samples. Thus for this wave, the chromatographer must review and manage 28 hyphenated data traces (14 LC/MS, 14 DAD). In all of the method development experiments described with this approach the chromatographer would have to manage a total of 43 chromatograms. For the massive amounts of data collected with complex samples, obviously peak-matching tools as discussed in Section 10.3.4.2 become quite invaluable. In addition, it is critical to manage the complex data in an efﬁcient manner. If the user is reduced to cut-and-paste for peak tables, or even to transfer from the raw data, any reexamination of the data can be very confusing. Typical chromatography data systems organize data by ﬁlenames or by sample/ project. However, it is critical in this case to organize data according to the experimental method, since for each method there are multiple chromato- graphic traces, each contributing to the overall, effective experimental result. Recently, software has been designed to manage analytical data in this manner; the data for original traces is sorted by the chromatographic method, tracing for which sample/condition set the data were collected. In the project architecture, information is grouped according to experi- mental conditions, or “experiments.” Multiple detector traces are arranged for each subsample, with subsamples organized by experiment. Experiments are grouped according to waves that are designed for optimization and/or screen- ing objectives. Finally, one or more waves composes a method development project. Figure 10-3 shows the AutoChrom workspace window that shows the organization of chromatograms for individual subsamples in a forced degra- dation study, with the summary of the components in the composite chro- matogram. Multiple detectors for each subsample have been “collapsed” in this view to enable the view of all subsamples at once. Figure 10-4 shows the overall data hierarchy. The advantages of this kind of organization system are clear. Any issues with accuracy of transcription are alleviated. Since peak tables are automati- cally extracted from the data traces, there is no need for cut-and-paste func- tions. However, the destination path must be set prior to the transfer, and also the proper integration thresholds must be conﬁgured. Data can be part of multiple optimization/screening waves at the same time. In addition, there are
516 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT Figure 10-3. ACD/AutoChrom 1.0 workspace window. Figure 10-4. The data hierarchy. considerable advantages with regard to speed. Since the peak tables are extracted directly from the hyphenated data and summarized in the project window, the user has access to all peak data without loading the full datasets. The raw spectral data are loaded on demand. The primary disadvantage of this approach is in terms of setup of the system. If the approach is not combined with instrument control, then a process must be devised for efﬁcient transfer of information to the data system.
STRUCTURE-BASED TOOLS 517 10.4 STRUCTURE-BASED TOOLS It is uncommon for the method development chromatographer to have absolutely zero information with regard to the chemical structures present in a given sample. Typically, at least one or more compounds are known. There are several software tools intended to enable the chromatographer to lever- age on knowledge of these structures in order to enhance the method devel- opment process. These include knowledge management tools such as application databasing, prediction of physicochemical parameters, and structure-based retention time prediction. 10.4.1 Knowledge Management Building and deploying a chromatographic and spectral database has a goal to turn disparate experiments into a global chromatographic knowledge base by archiving applications according to chemical structure. This could result in a global-wise knowledge base, searchable and retrievable with all relevant information, experimental tests, and results stored. This would allow for a more efﬁcient workﬂow with a homogeneous repository for all relevant data, allowing users to process, evaluate, compare, and generate reports in one environment. Success is no longer just about capturing better data—it’s your ability to share that knowledge to help improve the organization’s productivity. With improvements in instruments and personnel productivity, today’s laboratories are producing signiﬁcant quantities of scientiﬁc data. How can pharmaceutical companies convert the results of this productiv- ity into knowledge? Data need to be captured, processed, and interpreted for immediate use, as well as stored and managed to support future product devel- opment. The value of data increases when all researchers are able to access, share, and leverage each other’s knowledge. Software/databases that can bridge all instruments, data sources, and information centers to meet these challenges head on, is encouraged. The motivation toward saving methods including chromatographic and spectral data is that the information can be communicated to other groups working on the same or similar compounds in other divisional areas. Software that can incorporate the tools for creating a chromatographic/spectral knowl- edge base would be needed to achieve this endeavor. The database design could include the chromatography and spectral acquisition details, and these data could be correlated with structures of drug compounds and their associ- ated impurities, degradation products, metabolites, and so on. If a good start- ing point could be deﬁned, then scientists can save time in their method development journey. Programs that allow for structures or partial structures searching can be used to assist with the selection of starting points. These data could be easily searched for. The method development work that a chromatographer plans to
518 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT employ may have been performed prior in early development or in another department within the organization (data can be shared across oceans). These data could be included in a separations/spectral knowledge base. Based on chemical structures, chromatographers can build on what was done in the past and/or use the previous conditions as excellent starting points for analysis. The main advantages include: • Structure-based searches—internal database • Access to commercially available applications • Linking chromatographic methods to the structures • Linking spectral data (MS, NMR, 2D-NMR, IR, UV) to the structures • Finding applications based on functionality • Finding information needed to duplicate an experiment • Contains information/avenue to evaluate and modify an experiment prior to attempting it • Sharing information cross-functionally (DS, drug substance; DP, drug product; DMPK, drug metabolism pharmacokinetics; EDD, early drug discovery) However, as with any technology, a reality check needs to be performed and it has to be determined if implementing such a database will add value to the organization. An evaluation of the current workﬂows needs to be performed, and a critical gap analysis should be completed. The following questions should be analyzed in the preparation of database implementation: • Do the processing and interpretation of analytical data need to be accel- erated? If so, in what ways? • How do we share data now? How do we want to share data in the future? • Is it that the retrieval of data needs to be faster? If so, can we quantify how much faster it needs to be? • Is it that the creation of reports needs to be easier and faster? • How do we currently share data across the globe, especially within multi- national pharmaceutical companies that have research and development divisions worldwide (United States, Europe, Asia, etc.) Other pertinent questions that could arise during the paper evaluation process include. • Need to identify if there is global interest? • What is the speed of the data retrieval? • Can the database be easily interfaced with the different analytical instru- mentations available worldwide (Chromeleon, Empower, MassLynx, etc.)
STRUCTURE-BASED TOOLS 519 • What linkages to research databases are needed? • Is this technology maturing to the point where it will have a major impact to our business? • Is the software user-friendly? • Can it be supported by IT? What platforms are available? • Will analysts use it? 10.4.2 Applications Databases One of the primary questions that have plagued method development chro- matographers is, “Where do I start?” This question applies equally to any school of thought, whether the chromatographer uses no optimization tools, uses computer-assisted optimization, or even uses on-line optimization. In any of these cases, the chromatographer must choose a proper starting point. One approach to this problem is to use methods developed in the past as a knowledge base for the determination of a starting point. Stored methods are retrieved, and method development sessions can be designed based on the past work performed in different line units of the organization (early drug dis- covery, preformulation group, DS and DP groups). A key point here is the need for chemical structures to assist with locating similar compounds. It is not likely that researchers will ﬁnd their compounds of interest unless they have been studied before (unless there are comparator products and a USP monograph has been written and method inputted into the chromatographic database), but substructure and structure similarity searches can ﬁnd similar compounds that have been the focus of earlier development. It is likely that these methods can be an excellent pointer to new opportunities. Structure-based separation databases integrated with other analytical and pharmaceutical information provides a basis for a signiﬁcant increase of devel- opment efﬁciency. If analytical chemists from the various areas of drug development (drug metabolism, preformulation, formulation, drug substance) enter their sep- arations of the target compounds into the database and link the structures of the potential impurities/degradation products/metabolites identiﬁed, this pro- vides a plethora of information to groups developing methods in later phases of the drug development continuum. This is useful as an interactive tool for sharing information across groups or functions avoiding replication of method development of difﬁcult separations. It can provide more suitable starting points to further develop/optimize the needed separations in the different functional areas. The use and organization of this type of database will be discussed. It is important that a distinction be made between chemical formulae and chemical structures. For databases with any type of diversity to be realized, the chemical formula cannot provide effective retrieval of compounds. Structure-based searches can take three different approaches:
520 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT • Structure • Substructure • Structure similarity [35] Structure searches look for molecules that are identical in every way. Sub- structure searches can be used to target functionalities that the chromatogra- pher deems to be instrumental to the separation at hand. Structure similarity searches are the primary tool in the application database; structures are ranked numerically according to similarity, with essentially all reactive groups taken into account. There are a number of different approaches to structure similarity, including Tanimoto, Dice, cosine, Hamming distance, and Euclidean distance [35]. All of these approaches rank structure similarity between 0 and 1, but will give different values. However, the overall ranking of structures tends to be very similar. To date, no structure similarity search algorithm has emerged as clearly superior for purposes of modeling chromatographic behavior. Application databases have been particularly popular in the world of chiral method development (Figure 10-5). While it has been observed that small changes in compounds can result in loss of effectiveness (separation selectiv- ity) for a given method, the results of searches can be used to create targeted method screens that can reduce the time and expense of development [36]. Most commercially available applications databases contain some capacity for update of user applications. This is a key capability, because the most relevant structures are likely to be found within the organization, rather than outside. When updating applications, it is extremely useful to have Figure 10-5. The 2005 version of the ChirBaseTM LC chiral applications database con- tains over 100,000 entries.
STRUCTURE-BASED TOOLS 521 compatibility with the original chromatography data system, such that methods are read directly from the original dataﬁle, rather than input manually. However, if manual inputs are required, then form-based inputs with the most common variables should be used to maintain consistency of the information inputted. Other ﬁelds can be searchable as well; and for any searchable ﬁeld to provide any meaningful hits in the future, these ﬁelds “must” be populated. The ease of use of structure similarity search means that chromatographers can mine these tools for organizational and/or published knowledge in this area in a few seconds. Additionally, any effort to accumulate a knowledge base should be accom- panied with careful control of data consistency. 10.4.3 Structure-Based Prediction 10.4.3.1 Prediction of Physicochemical LogP, LogD, pKa. There are three main physicochemical terms of use to the experienced chromatographer. These are LogP, LogD, and pKa. LogP (octanol/water partition coefﬁcient) is the classic measure of hydrophobicity of an uncharged species. There are a number of LogP predic- tion systems available, including PrologPTM, clogPTM, ACD/LogP DBTM, and others. These systems are consistent in that they estimate the hydrophobicity of the compound based on contributions from characterized fragments (Figure 10-6). The accuracy of these predictions is generally quite good, but the rel- evance of LogP to liquid chromatography is questionable due to ionization of Figure 10-6. Prediction of LogP of Viagra with Pallas version 3.1.
522 COMPUTER-ASSISTED HPLC AND KNOWLEDGE MANAGEMENT Figure 10-7. LogD curve for Viagra. ACD/LogDTM version 9.00 (note two tautomeric forms were predicted and only one shown). many compounds of interest. However, LogP calculations can give a very fast estimation of the compounds’ general nature—that is, Is my compound hydrophilic, hydrophobic, or very hydrophobic? LogD is the measure of the hydrophobicity of a species as it exists in solution. The distinction between LogD and LogP is based on the pKa for the compound, and thus while LogP is a simple numerical value, LogD is a function of pH. LogD curves can be very useful in chromatographic method development, since they can assist with the design of robust separations. The ﬂat areas of the LogD curve (Figure 10-7), represent pH ranges that should give stable reten- tion times as a function of pH in that region. However, this is only true for the neutral form of the basic compound and the neutral form of the acidic com- pound. For basic compounds (or basic functionalities) the lower the pH, the more the ionic equilibrium is shifted toward the protonated form of the analyte, which continually increases its concentration in the aqueous phase and decreases its content in oil phase. Therefore there is no plateau region at low pH. However, for an acidic compound (or acidic functionalities), as the pH is increased, the ionic equilibrium is shifted toward the ionized form of the analyte, which will result in continually increasing the acidic analytes concen- tration in the aqueous phase and decreasing its content in the oil phase. A decrease in the LogD versus pH curve would be observed at these higher pHs. The single physicochemical parameter of most importance to the liquid chromatographer is the analyte pKa. The pKa values of the various ionizable functionalities for Viagra are shown in Figure 10-8. Ionization of an analyte