Participants
Thirteen Japanese undergraduate students enrolled in a psychology laboratory course volunteered for this study. Nine participants who completed all experimental sessions (seven women, two men; aged 18–19 years) were included in the analyses. Data from the remaining four were excluded for protocol deviations—either discontinuation and restart of the experimental session or missing more than two sets of data for unknown reasons. The study protocol was reviewed and approved by the institutional ethics committee and all participants provided informed consent.
Stimuli
Illustrations. For six AI-generated, anime-style characters (three women, three men), I created upper-body, smiling portraits using GPT-4o (OpenAI, 2024; see Figure 1 for women's illustrations).
Names. Six Japanese full names (family name + given name), each totaling four kanji characters, were generated using an online virtual-name generation service (https://namegen.jp/). No kanji character was repeated across names.
Scripts. Seven scenarios were prepared: (1) introduction, (2) lunchtime, (3) break at a part-time job, (4) club activity, (5) café, (6) seminar, and (7) between-class recess. For each scenario and each character, two alternative scripts were produced by editing an initial GPT-4o draft, yielding 84 scripts in total (7 scenarios × 6 characters × 2 scripts). In addition, six “common” scripts (shared across characters) were prepared for second training, in which all characters read the same line.
Voices. For each script, an MP3 audio file was generated with an online text-to-speech service. Distinct voice models were selected so that each character had a unique voice; model–character assignments were randomized and then held constant thereafter.
Each participant completed the MTS tasks described below using either the female set or the male set of AI-character stimuli. The mapping between scripts and voices was predetermined and fixed. Pairings of illustrations and names with the scripted voices were finalized after a pretest and individualized based on each participant’s responses.
Supplementary materials. For review purposes, at the end of submitted manuscript includes are the illustrations and names of the Japanese AI characters together with the English translations of all dialogue scripts (the original Japanese scripts are not included). Upon acceptance, these materials will be made publicly available via the Open Science Framework (OSF).
Experimental tasks
Four MTS tasks were implemented using lab.js (Henninger et al., 2022) and delivered via a web server using JATOS (Lange et al., 2015):
Voice-to-Illustration (VtoI): a spoken voice served as the sample stimulus; three illustrations were the comparison stimuli.
Voice-to-Name (VtoN): a spoken voice served as the sample stimulus; three printed names were the comparison stimuli.
Illustration-to-Name (ItoN): an illustration served as the sample stimulus; three printed names were the comparison stimuli.
Name-to-Illustration (NtoI): a printed name served as the sample stimulus; three illustrations were the comparison stimuli.
On every trial, the sample and comparison stimuli were presented simultaneously. The visual sample (illustration or printed name) appeared at the top center of the screen. The three comparison stimuli were arranged horizontally at screen center in a randomized order. Participants made their selections by clicking one of the comparison stimuli with the mouse. A block consisted of three MTS trials.
During test phases, an additional option labeled “I don’t know” in Japanese was displayed below the comparison array to discourage random guessing. During training phases, correct responses produced a “◯” and incorrect responses produced an “✕” at the bottom center of the screen; no feedback was provided in test phases. To clearly distinguish test and training phases and mitigate changes in responding when feedback was withdrawn, the background color was honeydew (#f0fff0) in test phases and white (#ffffff) in training phases, and the current phase label (“Test” or “Practice” in Japanese) was shown at the top right of the screen (see Procedure below).
Apparatus
The study was administered online. Participants used their own personal computers. Stimuli were presented and responses were recorded in the web browser, and audio was played through the participants’ headphones. Because data were collected on participants’ devices, hardware and software properties (e.g., CPU performance, operating system and browser, screen size/resolution) and network conditions could not be controlled.
Procedure
Before participation, students were informed of the approximate time required for the experimental session and asked to choose a time window during which they could work in a quiet room without interruption. After reading and signing an informed consent form, participants received the URL for the experiment.
1. Instruction. Participants read the following task instructions in Japanese and adjusted the audio volume as directed. In this experiment, you will learn the voices, faces, and names of three AI characters. When “+” appears on the screen, click it. A line of dialogue, a name, or an illustration will be presented. Decide whose voice, face (illustration), or name it is, then click one of the three choices. If you are unsure, click “I don’t know.” Note that some trials may not include the “I don’t know” option. There are two types of phases. In Test phases, the background is light green. No feedback on correctness is shown. In Practice phases, the background is white. A “◯” appears for a correct response and an “✕” for an incorrect response. Use this feedback to help you learn. As you continue to respond correctly, the dialogue lines will change. Practice repeats until you maintain correct responses across several lines. When you are ready, click “OK.”
2. Pretest. Participants then completed, in a fixed order (VtoI → VtoN → ItoN → NtoI), two blocks of MTS trials using Scenario 7. Each block was administered once without feedback. Based on pretest performance, I finalized, separately for each participant, the correct-response mapping for the MTS tasks by evaluating all 216 one-to-one illustration–name–voice assignments and selecting the candidate that minimized the number of matches between the participant’s pretest responses and the candidate’s ItoN and NtoI pairs, subject to a prespecified ceiling (≤ 3). Ties were resolved by applying the same rule to VtoI and VtoN (≤ 3), and any remaining ties by the smallest combined totals. The full algorithm and commented JavaScript code are provided in the supplementary materials.
3. First training phase. Participants were divided such that half first trained on VtoI and the other half first trained on VtoN. Training used Scenarios 1–3. Blocks were repeated until the percentage of correct responses reached 100% within a block.
4. Midtest. After reaching criterion, participants completed a midtest identical to the pretest (Scenario 7; two blocks; no feedback).
5. Second training phase. After the midtest, all participants were trained on the remaining (previously untrained) MTS task using Scenarios 4–6. Blocks were repeated until participants again achieved 100% correct responses within a block.
6. Posttest. After completing training on both MTS tasks, participants completed a posttest identical to the pretest (Scenario 7; two blocks; no feedback).
Upon finishing the posttest, participants emailed the experimenter. The experimenter reviewed the data; if accuracy was ≥ 80% across all MTS tasks, a thank-you email and an exit questionnaire were sent, asking whether headphones were used and whether any issues (e.g., network interruptions) occurred during the session. No problems were reported. If a participant did not meet the 80% criterion, they were invited to complete additional training with the common scripts and then a follow-up test. During this additional training, scripts 1–3 from the common set were used first, followed by scripts 4–6.
AI-character set (female vs. male voices), participant gender (female vs. male), and training order (VtoI→VtoN vs. VtoN→VtoI) yielded eight possible experimental conditions. Participants were randomly assigned to these conditions, but cell sizes were not balanced because of the limited sample size and attrition.