TdT is a promising polymerase, important for several biotechnological applications. Here, we applied a number of semi-rational and rational protein engineering strategies, which yielded enhanced variants of Mus musculus TdT. These observations might be helpful for engineering of other TdTs as well as for engineering campaigns of other enzymes.
The approach of screening a limited part of the library (in this case, ~ 6%) has once again demonstrated its effectiveness, which is consistent with previous studies [45]. As early as the first stage of saturation mutagenesis, a double mutation was identified that simultaneously improved both catalytic activity and protein expression level.
Structural model analysis (Fig. 8) shows that the double substitution E230A/D231T leads to a significant rearrangement of the local interaction network at the junction of two alpha-helices. The ionic bond between the original residues D231 and E233 is broken. The new residue, T231, forms stable contacts with E233 and the backbone of G227. Furthermore, this substitution appears to induce a local bend in the alpha-helix, creating the conditions for an additional hydrogen bond to form between T231 and S235.
Thus, a single double mutation leads to the replacement of one interaction with a whole network of new ones, which, according to calculations in the Foldit program, makes a significant contribution to lowering the free energy of the structure (ΔΔG). Despite these substitutions being located far from the active site, they likely stabilize the overall conformation of the protein, preventing its local unfolding. Modeling in PyMOL also shows that this double mutation may cause a bend at the end of the alpha-helix, and in this context, the E230A substitution is preferable as it eliminates steric hindrance for such a conformational change.
The analysis of the Q287 site is of particular interest. According to X-ray structural data (PDB: 4i27), an ionic bond exists between residues Q287 and K290 in the wild-type protein. However, during the first two rounds of saturation mutagenesis, substitutions that disrupt this interaction were selected. This paradox is explained by structural relaxation in the Foldit program: in the optimized model, this ionic bond disappears. This is consistent with the high B-factor value for this site, which indicates its inherent flexibility and possibly the transient nature of this interaction in silico.
The nature of the stabilizing effect of the Q287K substitution remains not fully clear, as it does not lead to the formation of obvious new bonds. At the same time, the positive effect of the Q287P substitution, found in the first round, can be explained by the reduction in conformational flexibility of the main chain due to the introduction of the rigid proline structure. It can be assumed that the proline variant was accidentally missed in the sampling during the second round of mutagenesis, and therefore was not identified as the best mutant. It is noteworthy that when using the mutation prediction tool based on a neural network in Foldit, the program suggests a substitution to proline as the most promising for this position.
In the first round of mutagenesis at position S223, a substitution to arginine (S223R) was selected, which coincides with the results previously obtained by [23] and confirms the significance of this mutation. However, in the third round of iterative mutagenesis, which was carried out on a template with existing substitutions, arginine was replaced by lysine (S223K) at this same position.
The original residue S223 forms a polar interaction with E226 within the same alpha-helix. The S223R substitution likely either strengthens this existing interaction or promotes the formation of a new salt bridge with residue E236, located on an adjacent alpha-helix. The preference for lysine over arginine in the third round is apparently due to a synergistic effect with the mutations from the first round (E230A/D231T). Modeling in PyMOL shows that the combination of these substitutions leads to the formation of an extensive network of ionic bonds that connects the two adjacent alpha-helices, thereby making a significant contribution to the overall stabilization of the structure.
The analysis of positions R182 and E183 also revealed interesting features. The original residue R182, according to X-ray data, forms interactions that connect different parts of the protein globule (Fig. S4), which at first glance makes it an undesirable target for mutagenesis. Nevertheless, this residue is characterized by a high B-factor value, indicating its mobility. This observation is confirmed by modeling in Foldit, where after structural relaxation, most of the initial interactions of R182 disappear, leaving only a contact with N178 within the same alpha-helix. Despite the apparent importance of the original residues, the double mutation R182S/E183R led to a small increase in thermostability and, more importantly, to a significant increase in the protein's expression level.
When modeled in the Foldit program, the M313R substitution seemed promising as it was predicted to create a new intra-helical ionic bond with residue E317. Alternative modeling in PyMOL indicated an interaction of this arginine with the adjacent flexible Loop 2. Furthermore, another modeling scenario suggested the formation of a salt bridge network linking three residues: the engineered R313, E317, and R414 from Loop 2. Regardless of the exact nature of the interaction, the introduction of this mutation led to a slight increase in thermostability, which is likely due to the stabilizing effect of the new salt bridge. Similarly, for the C302E mutation, the formation of a new interaction with residue T340 was predicted during modeling. A key feature of this substitution is its location in close proximity to the active site: T340 is part of the loop involved in coordinating the Mg²⁺ ion. Contrary to expectations that changes in this critical area would lead to a loss of activity, the mutation led to a slight increase in the enzyme's thermostability. The effect of this substitution on specificity towards different nucleotides requires a separate detailed investigation.
The most significant contribution to the enzyme's stabilization was made by the double substitution S275E/K276E (Fig. 9), which was designed based on predictions from the ProteinMPNN and Carbonara algorithms. The best variant from the previous rounds of saturation mutagenesis, mutant A4, was used as the template for introducing these mutations. This change led to an increase in the protein's melting temperature by 5.5°C. Of particular interest is the fact that exactly the same substitutions were independently obtained in a large-scale directed evolution study carried out by [25]. Structural analysis shows that the K276E substitution helps to structure the disordered region between two alpha-helices by forming new interactions with the backbones of amino acids R272 and T273. Thus, an initially flexible region becomes more rigid. These substitutions are located in a domain that, although distant from the catalytic center, is involved in binding with single-stranded DNA. An increase in temperature likely leads to the local unfolding of this region, which interferes with substrate binding. The stabilization of this area prevents this process. The contribution of the S275E substitution to overall stability is less clear. Calculations in Foldit show that it lowers the overall Gibbs free energy, but its individual contribution to stabilization requires further experimental verification.
Unfortunately, this double substitution, along with stabilizing the enzyme without loss of activity, led to a significant decrease in its expression level. While the A4 mutant obtained by saturation mutagenesis showed a 26-fold increase in expression level compared to the wild-type protein, the introduction of this double mutation completely nullified this advantage. As a result, the expression level of the final variant returned to the initial values characteristic of the wild-type enzyme.
Thus, it can be concluded that the substitutions with the greatest stabilizing effect were those that allowed for the coordination of 3–4 amino acids together at the junction of secondary structures. The distance of such substitutions from the active site was not a decisive factor. All three successful substitutions predicted within the framework of classical rational design (without using neural networks) involved replacing a hydrophobic residue with a hydrophilic one, creating new polar bonds. Attempts to modify the existing system of ionic bonds using rational mutagenesis were essentially unsuccessful. Only the application of two neural network algorithms allowed for the discovery of a non-obvious substitution that made the most significant contribution to thermostability.
The direct screening methodology we employed, based on the analysis of mutant activity in crude cell lysates via polyacrylamide gel electrophoresis (PAGE), has several inherent limitations.
The primary drawback is its low sensitivity, which prevents the reliable detection of subtle improvements in thermostability. Consequently, mutations that confer a slight increase in stability but concurrently decrease catalytic activity or expression levels are likely to be falsely discarded. In such cases, a decrease in the product band intensity on the gel would be misinterpreted as a negative result, leading to the loss of potentially valuable variants.
In the work carried out to increase the thermostability of TdT, a comprehensive improvement of the key properties of the enzyme was achieved. Along with an increase in thermostability, the catalytic activity and, especially importantly, the protein expression level were significantly increased. For instance, while the yield of wild-type (WT) protein was 1.5 mg per liter of culture, after the first round of mutagenesis this figure reached 6 mg/L, and the final round mutant yielded 40 mg/L. The activity at 37°C also gradually increased because the more active a mutant was, the brighter the band it left on the electrophoresis gel, and the more likely it was to be selected as successful. In addition, we assessed the residual activity after a 10-minute heating period; naturally, the higher a mutant's activity and quantity, the more intense the band it will show after such heating, regardless of the increase in thermostability.
Analysis of various mutagenesis strategies showed that targeting residues with a high B-factor is an effective approach, while random mutations outside these flexible regions did not lead to success. Attempts at rational design aimed at "stitching" the enzyme's domains together to prevent its unfolding were also largely unsuccessful. This emphasizes that the use of modern computational tools, particularly neural networks, is critically important for narrowing the search space and identifying non-obvious but effective mutations.
Thus, the initial task of increasing only the thermostability led to the creation of an enzyme that was comprehensively improved in all its main parameters. The resulting mutant variant is an excellent scaffold for further engineering, particularly for improving its activity towards 3'-modified nucleoside triphosphates used in enzymatic DNA synthesis technologies.
A comparison of our final variant, 275, with recently engineered TdTs reveals the unique strengths of our approach. The thermostability enhancement achieved for mutant 275 (ΔTₘ = +6.5°C) is significant, although more profound increases were reported by (Niu et al.) (ΔTₘ = +10.3°C for M7-8) and (Forget et al.) (ΔTₘ = +20°C for TdT-33) .
However, our work highlights a critical parameter for the biotechnological viability of an enzyme: the recombinant expression level. Our intermediate variant, A4, demonstrated a remarkable 26-fold increase in protein yield compared to the wild-type. This dramatic improvement in manufacturability is a key advantage for any industrial application and was a central outcome of our engineering trajectory, a metric not quantitatively detailed in the aforementioned studies.
Furthermore, our stepwise evolution strategy (WT → A4 → 275) provides a clear and interpretable path of improvement, allowing for detailed structural analysis of the contribution of key mutations, as discussed above. This contrasts with approaches that introduce a large number of mutations simultaneously, where the effects of individual changes can be convoluted. Thus, our final variant 275 represents a highly balanced candidate, combining a significant increase in stability and a dramatically improved activity background inherited from its A4 parent. This makes it a valuable and technologically accessible enzyme for enzymatic DNA synthesis and a robust scaffold for further engineering.