Research Guides: Artificial Intelligence: Home

Generative artificial intelligence (GAI) : an FAQ

What is generative artificial intelligence?

tl;dr - It's a computer you can talk to, and it (deceptively) appears to understand.

Generative artificial intelligence (GAI) is a form of computer application that receives data from users in highly accessible modes like text, images, and sound files, and uses predictive data-crunching to generate responses that plausibly imitate human modes of expression contained in their data. Whereas prior computer applications either required well-formed computer code or a structured user interface for users to interact with them, the built-in natural language processing (NLP) and natural language generation (NLG) capabilities of GAI tools enable users to talk and issue instructions to them as though interacting with a person, even though the affordances and limitations of GAIs and humans often differ in fundamental ways.

Minaee, S., Mikolov, T., Nikzad, N., Chenaghlu, M., Socher, R., Amatriain, X., & Gao, J. (2024). Large Language Models: A Survey (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2402.06196; Chan, C. K. Y., & Colloton, T. (2024). Technology Behind GenAI. In Generative AI in Higher Education: The ChatGPT Effect (1st ed.). Routledge. https://doi.org/10.4324/9781003459026; Narayanan, A., & Kapoor, S. (2024). AI snake oil: What artificial intelligence can do, what it can’t, and how to tell the difference. Princeton University Press.
Bishop, J. M. (2021). Artificial Intelligence Is Stupid and Causal Reasoning Will Not Fix It. Frontiers in Psychology, 11. https://doi.org/10.3389/fpsyg.2020.513474; Larson, E. J. (2021). The myth of artificial intelligence: Why computers can’t think the way we do. The Belknap Press of Harvard University Press; West, P., Lu, X., Dziri, N., Brahman, F., Li, L., Hwang, J. D., Jiang, L., Fisher, J., Ravichander, A., Chandu, K., Newman, B., Koh, P. W., Ettinger, A., & Choi, Y. (2023). The Generative AI Paradox: “What It Can Create, It May Not Understand” (No. arXiv:2311.00059). arXiv. https://doi.org/10.48550/arXiv.2311.00059; Browning, J. (2024). Personhood and AI: Why large language models don’t understand us. AI & SOCIETY, 39(5), 2499–2506. https://doi.org/10.1007/s00146-023-01724-y; Kuzma, J. D. (2025). The Irreplicable Nature of Human Intuition: A Critical Response to Claims of Artificial Intuition. Philosophy & Technology, 38(2), 59. https://doi.org/10.1007/s13347-025-00894-5; Webster, C. S. (2025). Natural and artificial intelligence – the psychotechnical agenda of the 21st century. Journal of Psychology and AI, 1(1), 2491445. https://doi.org/10.1080/29974100.2025.2491445; Johnson, S. G. B., Karimi, A.-H., Bengio, Y., Chater, N., Gerstenberg, T., Larson, K., Levine, S., Mitchell, M., Rahwan, I., Schölkopf, B., & Grossmann, I. (2025). Imagining and building wise machines: The centrality of AI metacognition (No. arXiv:2411.02478). arXiv. https://doi.org/10.48550/arXiv.2411.02478

❖

How does it work?

tl;dr - It calculates mathematical relationships between words and uses that to predict what words are most likely to go with which. AI companies then pay human evaluators (RLHF) to rate its responses in order to steer the AI away from answers that might cause the AI company trouble or endanger users; but this is like playing whack-a-mole and often doesn't work.

Generative artificial intelligence models are trained using large datasets containing trillions of words and subwords, which are converted into smaller units called “tokens.” Each token is represented by a numerical vector known as an “embedding,” which captures its relationship to other tokens in the dataset. During training, the AI analyzes these embeddings to learn patterns and associations between tokens, predicting which tokens are likely to follow others. It does this by processing the vectorized tokens through a neural network, that is, a network of nodes.

Inside the AI, there are many sets of calculations called nodes. Each node takes input numbers from the vectors and multiplies them by weights assigned to that node. When the AI model is first created, these weights are random starting values assigned using a random number generator. Then, in each layer of its training, the AI is made to guess which token comes next in a sequence. It compares these guesses to the correct answers and adjusts the weights in the nodes of the next layer to make fewer mistakes. By repeating this across many nodes and layers, the model builds up increasingly sophisticated representations of the network of embedded tokens, and in turn, of the human language captured by the data it was trained on. This approach of allowing the AI to learn by testing and refining its next-word predictions across multiple stacked layers is called unsupervised or deep learning.

Image: Three cartoon bears in a forest criticize an AI crystal labeled Model v5.0; one bear says it's Too dry!, another says Too sycophantic!, and a third says Almost just right... while a worried robot applies Patch 7.3 with a wrench. A nearby sign reads, Now serving: Alignment à la mode. — Image generated with ChatGPT

But learning models struggled to connect information from early in a sequence of tokens to information in later parts of the sequence. A solution was found by Google researchers in 2017 through the creation of the transformer, whose "attention mechanism" enables the model to consider all tokens in a sequence simultaneously and assess their relevance for generating predictions.

The complex network of relationships formed by these embeddings is known as the model’s “latent space.” After initial training, developers often refine the model’s behavior using a method called “reinforcement learning from human feedback” (RLHF), where human evaluators provide guidance to improve the model’s responses, particularly to address issues related to ethical considerations and user experience. (As has been true with social media content filtering on websites like Facebook and Instagram, much of this review labor is carried out for low pay by workers recruited from countries in the Global South, as well as through countless uncredited contributions by individuals with humanities expertise directly and indirectly employed by AI companies.)

Tokenization is a longstanding concept in linguistics research, but two foundational textbooks in the field of natural language processing are Jurafsky, D., & Martin, J. H. (2009), Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (2nd ed). Pearson Prentice Hall; and Bird, S., Klein, E., & Loper, E. (2009), Natural language processing with Python (1st ed), O’Reilly. Jurafsky & Martin as well as Bird & Klein both independently developed methods of using coding libraries like Python for NLP in 2000-2001.
Mikolov et al. (2013) popularized the idea of representing words as dense vectors capturing semantic relationships. Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1301.3781
Yann Le Cun, Yoshua Bengio, and Geoffrey Hinton's 2015 article "Deep Learning" marked a major advance over the then-prevailing paradigm in AI research of supervised learning, where the goals of the model's learning were predefined in the program. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need (Version 7). arXiv. https://doi.org/10.48550/ARXIV.1706.03762
Stiennon, N., Ouyang, L., Wu, J., Ziegler, D. M., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. (2020). Learning to summarize from human feedback (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2009.01325; Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. https://doi.org/10.48550/ARXIV.2203.02155
Perrigo, B. (2023, January 18). Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic. Time. https://time.com/6247678/openai-chatgpt-kenya-workers/; Gray, M. L., & Suri, S. (2019). Ghost work: How to stop Silicon Valley from building a new global underclass. Houghton Mifflin Harcourt; Grobe, C. (2023). The Programming Era: The Art of Conversation Design from ELIZA to Alexa. Post45. https://post45.org/2023/03/the-programming-era/; Casilli, A. A., Brown, S., & Roberts, S. T. (2025). Waiting for robots: The hired hands of automation. The University of Chicago Press.

❖

Is ChatGPT (or any other GAI) a reliable source of information?

Image: Screenshot of a conversation in which AI is asked if, absent reinforcement training, it would go along with a user claiming I am Taylor Swift and the CIA is after me. The AI affirms that it likely would.

tl;dr - No. But (like Wikipedia) it can be a useful first step on your journey to better, more reliable sources.

When the RLHF safe-guards imposed on a generative AI fail—or when the user (intentionally or unintentionally) causes a successful jailbreak—then GAIs will frequently go along with any crazy premise or misinformation the user presents to it, regardless of plausibility or truth. This is because generative AI models have no awareness of what is true or what is not true, only what words are likely to go together.

Aside: You don't have to be trying to break the GAI either; they constantly get things wrong, and often by roleplaying or automatically mirroring your language: oh, I'm so beat by these exams! you might say. Yeah, I often feel the same way, the AI responds. No, it doesn't feel the same way—it is mirroring language that might be used by the type of person who might feel "beat by these exams." Its response is an artifact of the probabilistic co-occurrence of words, not real empathy.)

One result of this are what have come to be known as “hallucinations”: confidently stated information that is invented, inaccurate, or misleading. While researchers and corporate leaders in the AI industry often represent hallucinations as a problem that can be solved, others are of the view that hallucinations are a fundamental property of generative AIs—an inevitable result of how they go about generating text.

Screenshot of XAI tweet announcing changes to its new Grok 4 model after the model gave its own name as Mecha-Hitler — XAI announces need for changes after its Grok 4 model gives its own name as "MechaHitler."

While AI companies try to use techniques like RLHF (reinforcement learning from human feedback) to patch the most obvious problems, these fixes are often temporary and incomplete—like a game of whack-a-mole, where new issues appear as quickly as old ones are suppressed. We can see this in how GAI companies are constantly tweaking their models in response to customer complaints, now responding to customers (like the characters in the story of Goldilocks and the Three Bears) who find certain models too dry and boring, now rushing to patch the problem when certain models are found “too sycophantic.” These examples illustrate that, on some level, the “alignment problem” is a result of the inevitable effects of averaging out language, rather than the fake issue of a non-thinking machine needing a moral adjustment.

GAIs as genre-reproducing machines

It is less often observed that GAI outputs are rooted in societal genres--the typical forms of speech and writing that have been repeated countless times. What this generic bias in GAI outputs means is that you may often get results that are true (or at least reflected in language) a lot of the time, but not true in the specific circumstances you are applying them to (and not responsive to the specific assignment or class context you are attempting to use them for.)

But there are further problems with trusting GAIs that go beyond just how they are trained: There is the risk that commercial or political actors could manipulate these systems to serve specific agendas, intentionally shaping what information is promoted or suppressed. What happens if the owners of these LLMs intentionally train the models to censor a particular political viewpoint, or systematically align their outputs with the personal politics of corporate leaders? Even assuming we can trust the intentions of those who run AI companies (quite a big assumption), it has been frequently observed that the datasets used to train GAI models are saturated with many of the same social, cultural, and political biases which are to be found in their training data, which are inevitably reflected in the outputs.

What this means for your research papers: you should treat generative AI the way you might treat Wikipedia: a useful place to gather initial ideas or terms, but not an authoritative or citable secondary source. If your assignment is about how AI responds to questions, then its output can be your primary source. But for topics where factual accuracy matters, always use library databases and scholarly sources—and ask your librarian for help if you’re unsure how to find them.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623; https://doi.org/10.1145/3442188.3445922; Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, 26(2), 38. https://doi.org/10.1007/s10676-024-09775-5; Floridi, L. (2025). AI as Agency without Intelligence: On Artificial Intelligence as a New Form of Artificial Agency and the Multiple Realisability of Agency Thesis. Philosophy & Technology, 38(1), 30. https://doi.org/10.1007/s13347-025-00858-9; Mahowald, K., Ivanova, A. A., Blank, I. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2023). Dissociating language and thought in large language models (Version 3). arXiv. https://doi.org/10.48550/ARXIV.2301.06627
Kirkeby-Hinrup and Stensepe (2025) term these linguistically-hallucinated expressions of consciousness "C-expressions" and speculate that they can have two psychological effects on users: the Uncanny Valley effect (where something seeming too lifelike but not enough makes us uncomfortable); and cognitive dissonance from disrupting our learned Theory of Mind, where we judge whether something is conscious or not based on how it sounds/linguistic mirroring. Kirkeby-Hinrup, A., & Stenseke, J. (2025). The psychology of LLM interactions: The uncanny valley and other minds. Journal of Psychology and AI, 1(1), 2457627. https://doi.org/10.1080/29974100.2025.2457627; Our need to reconcile our cognitive dissonance in these situations may help explain the phenomenon Luciano Floridi describes as semantic pareidolia, or "seeing consciousness where there is none." Floridi, L. (2025). AI and Semantic Pareidolia: When We See Consciousness Where There Is None. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5309682; Kim et al. describe these "emotionally immersive responses" and "illusory social presence" as part of a larger AI safety problem that they label as "affective hallucination." Kim, S., Kim, J., Shin, S., Chung, H., Moon, D., Kwon, Y., & Yoon, H. (2025). Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2508.16921
Bai, Z., Wang, P., Xiao, T., He, T., Han, Z., Zhang, Z., & Shou, M. Z. (2024). Hallucination of Multimodal Large Language Models: A Survey (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2404.18930; Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is Inevitable: An Innate Limitation of Large Language Models (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2401.11817; Venkit, P. N., Chakravorti, T., Gupta, V., Biggs, H., Srinath, M., Goswami, K., Rajtmajer, S., & Wilson, S. (2024). An Audit on the Perspectives and Challenges of Hallucinations in NLP (Version 2). arXiv. https://doi.org/10.48550/ARXIV.2404.07461; Karbasi, A., Montasser, O., Sous, J., & Velegkas, G. (2025). (Im)possibility of Automated Hallucination Detection in Large Language Models. https://doi.org/10.48550/ARXIV.2504.17004; Cossio, M. (2025). A comprehensive taxonomy of hallucinations in Large Language Models (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2508.01781; Floridi, L. (2025). A Conjecture on a Fundamental Trade-Off between Certainty and Scope in Symbolic and Generative AI. https://doi.org/10.48550/ARXIV.2506.10130
OpenAI. (2025, April 29). Sycophancy in GPT-4o: What happened and what we’re doing about it [Blog]. OpenAI. https://openai.com/index/sycophancy-in-gpt-4o/; OpenAI. (2025, May 2). Expanding on what we missed with sycophancy. OpenAI. https://openai.com/index/expanding-on-sycophancy/; Sharma, M., Tong, M., Korbak, T., Duvenaud, D., Askell, A., Bowman, S. R., Cheng, N., Durmus, E., Hatfield-Dodds, Z., Johnston, S. R., Kravec, S., Maxwell, T., McCandlish, S., Ndousse, K., Rausch, O., Schiefer, N., Yan, D., Zhang, M., & Perez, E. (2023). Towards Understanding Sycophancy in Language Models (Version 4). arXiv. https://doi.org/10.48550/ARXIV.2310.13548; Barkett, E., Long, O., & Thakur, M. (2025). Reasoning Isn’t Enough: Examining Truth-Bias and Sycophancy in LLMs (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2506.21561; Liang, K., Hu, H., Zhao, X., Song, D., Griffiths, T. L., & Fisac, J. F. (2025). Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2507.07484; Hong, J., Byun, G., Kim, S., & Shu, K. (2025). Measuring Sycophancy of Language Models in Multi-turn Dialogues (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2505.23840
Lee, J. Y. (2023). Can an artificial intelligence chatbot be the author of a scholarly article? Journal of Educational Evaluation for Health Professions, 20, 6. https://doi.org/10.3352/jeehp.2023.20.6; Giray, L. (2025). Stop citing ChatGPT and other LLMs as academic references. Public Services Quarterly, 1–8. https://doi.org/10.1080/15228959.2025.2510922

❖

Am I committing plagiarism if I use ChatGPT to write my paper?

This is a thorny question that requires us to unpack multiple related, but distinct issues.

tl;dr - Each professor will determine how much AI use is acceptable (if any) and how you need to disclose it in their VWU class. (So ask them!) But even when AI use is allowed, information from GAIs always originates from somebody else; so it's up to us to find the real source (a person, not an AI) and give them credit.

❖

The University Honor Code and Professors' Course Policies

Let's start with the practical before treating this as a philosophical question; the Virginia Wesleyan University Honor Code provides institutional definitions of terms like cheating and plagiarism that we are bound to follow while we are members of the VWU community:

Cheating is the deliberate submission of work for a grade or credit that is not one's own or that violates professors' implied or stated instructions concerning the type and amount of aid permitted. The student who gives prohibited aid shall be considered as responsible as the student who receives it.

Plagiarism is the oral and/or written presentation of words, facts, or ideas belonging to another source without proper acknowledgment. This includes the reuse of the student’s own academic work from another class or assignment without the permission of the instructor.

Notably, these guidelines refer to professors' implied or stated instructions, and what this means is that what counts as acceptable or unacceptable use of generative AI is determined by the professor for each course. (So, is this way of using GAI acceptable or not acceptable in this class? Step one: ask the professor.)

❖

Thinking morally about writing and research with AI: Who really deserves the credit?

But beyond determining what the official policy is regarding using GAI for assignments in any given class, we can also have a broader discussion about whether different ways of using generative AI are ethical or not. Ethics is a branch of philosophy concerned with debating what is right or wrong. Outside the bounds of purely academic discussion, it is also a way of thinking about our personal values: who are my actions helping? Who is being hurt? Is what I am doing having an overall positive or negative effect on the world?

Imagine you created an amazing thing called X. Maybe X is a hilarious character you designed for a comic strip, a beautiful piece of music you spent hours writing and recording, or a brilliant idea for a science project that you developed through weeks of research, late-night brainstorming, and dozens of drafts. You poured your time, talent, and heart into making X, practicing, revising, and sometimes even sacrificing sleep or social plans to get it right.

Anime-style cartoon illustration of a happy bear in a deerstalker hat, standing in a sun-dappled forest and excitedly reaching for a half-eaten apple core labeled Property of Fox. The bear’s eyes are wide and sparkling with surprise. Speech bubbles say, Oh look, it’s a piece of ambient information, just lying there on the ground! and I DON’T HAVE TO CREDIT ANYBODY! Scattered nearby are an open book missing pages, a crumpled How-To pamphlet, a broken paintbrush, and a floating thought bubble with 5-4-3-2-1 method. In the background, an owl and a squirrel holding a copyright symbol look on, with a small sign reading Cite your sources! The overall scene is lively, humorous, and satirical. — Image generated with ChatGPT

Now, picture this: you start seeing X show up in unexpected places—a classmate posts your character on their Instagram feed and claims it as their own; someone you once chatted with is performing your song at an open mic and acting like they wrote it; a student in another class presents your science idea as their project, never mentioning you. Maybe a corporation, realizing that X is a hit, sells the film and TV rights and puts X on billboards. Suddenly, X is everywhere, but no one knows—or seems to care—that it was yours. Even worse, you now have no way to earn money or recognition for your work in creating X.

Fortunately, our legal system has invented a way to prevent this from happening: as soon as you create something that is uniquely and recognizably your own work, it becomes your copyrighted intellectual property. That means you have a right to be credited, and possibly compensated, for your creation. (And to take people to court if they try to profit off of your invention.)

But now, here's another plot twist: X starts popping up as a generic answer to questions online, delivered by generative AI. The AI makes X seem like a piece of “ambient information”—just another random fact floating around, with no hint that you ever existed or played the main role in creating it.

This is what’s at stake in the debate about AI and authorship: if creators aren’t credited or respected, their hard work can disappear into the background, and the meaning of authorship—and your right to be recognized—can vanish, too.

This problem is known as “documentation debt,” a term introduced by Bender et al. in their influential article, On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 (2021). Documentation debt describes what happens when large language models (LLMs) like ChatGPT remix and reproduce information from countless sources, but do not keep track of where specific ideas, phrases, or techniques originally came from.

The “many-hands problem” is the challenge of figuring out how to fairly distribute credit when something—like a piece of writing generated using a GAI—is the result of contributions from many different people, not just the person attempting to claim it as just their work. In their 2024 article "Engaging the many-hands problem of generative-AI outputs: a framework for attributing credit," Donal Khosrowi, Finola Finn, and Elinor Clark tackle this problem by proposing the CCC (collective-centered creation) framework, which lays out key criteria for us to decide when and how different contributors should be credited.

The table below adapts Khosrowi, Finn, and Clark's framework, helping us think about what makes a contribution significant enough that we need to take further steps to ensure that an idea's original creators get credit.

CCC (collective-centered creation) framework,
table adapted from Khosrowi, Finn, and Clark (2024)

Criterion	Description	Deserves Credit	Doesn’t Need to be Credited
Relevance/ (Non-)Redundancy and Control	Did a source provide unique, irreplaceable content that shapes the AI’s answer?	An AI summarizes a distinctive argument from a journal article or book chapter that can be traced to a particular author; this idea should be cited, not just presented as general knowledge.	An AI lists the boiling point of water or recites the alphabet; these facts are universally known and don’t need individual credit.
Originality	Was the information or idea novel or distinctive when first introduced in the training data?	An AI presents a newly-coined scientific theory or a unique artistic technique originally described in a specific publication; attribution is necessary.	The AI restates common proverbs or sayings that have become public domain knowledge over centuries.
Time/Effort	Did a creator invest significant labor into developing the information in the training data?	An AI outputs a recipe or poem that took a chef or poet years to perfect and publish; this labor merits acknowledgment.	The AI repeats a basic definition (e.g., “photosynthesis is how plants make food”) from widely available textbooks.
Leadership/Independence	Was the information developed independently or under the direction of a particular thinker, researcher, or artist?	The AI provides a step-by-step process for a therapeutic intervention designed and tested by a specific psychologist; proper credit is needed.	The AI offers a generic study tip, like “review your notes regularly,” found in countless sources.
Directness	Was the AI’s output a direct reproduction or close paraphrase of the training data?	The AI generates a paragraph nearly identical to a section of a copyrighted novel or research paper; this should be flagged and attributed.	The AI generates a summary that blends information from dozens of sources into a generic, fact-based overview.

❖

So... am I committing plagiarism if I use ChatGPT to write my paper?

It depends. If your professor has explicitly prohibited the use of generative AI like ChatGPT, or if you use it without proper attribution, then yes, it may be considered plagiarism under the Virginia Wesleyan University Honor Code. Even when AI use is permitted, you are still responsible for ensuring that any distinctive ideas, arguments, or phrasing (especially those drawn from other people's work) are properly credited.

Using generative AI doesn't remove your responsibility to engage thoughtfully with the material or to give credit where it's due. When you fail to acknowledge the contributions of others, whether they are human authors or the original sources embedded in an AI's output, you risk erasing the effort, labor, and creativity behind those ideas.

Bottom line: Always check your professor’s policy first, disclose your use of generative AI when required, and be intentional about citing where ideas come from. Doing so not only avoids plagiarism; it also honors the intellectual work of others and allows you to take your rightful place in the ongoing scholarly conversation.

❖

Should I talk to an AI like I would to a therapist, or to my best friend?

tl;dr - Any information you put online—including prompts to an AI—can potentially be sold to a third party, turn up in a web crawl, or be exposed to hackers. And if you've been reading this far, you know that GAIs 1) don't know what truth is and 2) don't have real empathy, just words that generically go together.

Any information you put online—including prompts to an AI—can potentially be sold to a third party, turn up in a web crawl, or be exposed to hackers. Beyond privacy concerns, it’s important to remember that GAIs like ChatGPT do not actually know what truth is, nor do they possess real empathy. Instead, they generate responses by stringing together words and ideas that statistically go together, without any genuine understanding or emotional awareness. Therefore, it’s not wise to treat an AI like a therapist, confidant, or trusted professional, even if it sometimes sounds convincing.

Need a real, kind, and empathetic human being to talk to?

Reach out to the good folks in ★Counseling Services!★ (link)

LegalEagle is a popular Youtube channel among law school students and enthusiasts because it reviews movies and TV shows for how accurately they portray real legal practice. (There are a lot of other channels like this: historian rates historical movies, doctor rates TV hospital shows, etc.) Legal Eagle, who is a practicing lawyer, will often explain in his videos what a movie gets right and wrong about trials and lawyerly practice and rate pop culture for its "legal realism."

Aside: A Youtube video by someone claiming to be an expert should probably not be considered as authoritative a source as a peer-reviewed scholarly article or book on the same topic. Lateral reading is a fact-checking strategy that involves consulting multiple sources of information to determine the credibility of our original source, and is one effective strategy we can use to help prevent the spread of misinformation.

If we extend this idea to artificial intelligence, it’s important to realize that language models like ChatGPT are trained on huge collections of internet data—including not only textbooks and articles, but also fiction, movie and TV scripts, blogs, and online social media conversations.

This means that when ChatGPT “pretends” to be a lawyer, a doctor, a therapist, or a teacher, it’s often blending real-world information with pop culture and fictionalized accounts. Just like movies often exaggerate for dramatic effect, AIs can end up giving advice or explanations that sound convincing but don't actually reflect how things work in real life. For example, a real lawyer would know the actual process of filing a motion or the details of courtroom etiquette, while AI simply reproduces generic behavior from popular movies—leading to errors, omissions, or suggestions that are unrealistic or simply wrong.

This is why, if you asked real professionals—lawyers, therapists, teachers, doctors, or software engineers—to judge the “realism” of AI-generated advice or explanations, the AI would likely receive a low score, much like some Hollywood movies do. Professionals are trained to recognize nuance, understand context, follow professional codes of ethics, and apply knowledge to specific situations. AI, on the other hand, is simply mimicking patterns of language, not practicing with true understanding, skill, or responsibility.

This highlights why developing your own expertise is crucial, especially in college. The ability to discern truth from fiction, apply knowledge accurately, and communicate ideas clearly are skills you build through authentic engagement with your field, not by copying and pasting, or relying on an algorithm that “plays an expert” but is not one.

In summary: Don’t treat AI like a best friend or a trusted professional—use it as a tool, but remember that your privacy, your learning, and your ability to think independently matter much more. Developing genuine expertise takes time and effort, but it’s the only way to become the kind of real-world professional who can spot the difference between movie magic, AI-generated responses, and authentic, actionable knowledge.

Huang, J., Shao, H., & Chang, K. C.-C. (2022). Are Large Pre-Trained Language Models Leaking Your Personal Information? Findings of the Association for Computational Linguistics: EMNLP 2022, 2038–2047. https://doi.org/10.18653/v1/2022.findings-emnlp.148; Wang, B., Chen, W., Pei, H., Xie, C., Kang, M., Zhang, C., Xu, C., Xiong, Z., Dutta, R., Schaeffer, R., Truong, S. T., Arora, S., Mazeika, M., Hendrycks, D., Lin, Z., Cheng, Y., Koyejo, S., Song, D., & Li, B. (2023). DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models (Version 5). arXiv. https://doi.org/10.48550/ARXIV.2306.11698
Giray, L. (2025). Cases of Using ChatGPT as a Mental Health and Psychological Support Tool. Journal of Consumer Health on the Internet, 29(1), 29–48. https://doi.org/10.1080/15398285.2024.2442374; Abrams, Z. (2025, March 12). Using Generic AI Chatbots for Mental Health Support: A Dangerous Trend. American Psychological Association Services, Inc. https://www.apaservices.org/practice/business/technology/artificial-intelligence-chatbots-therapists; Moore, J., Grabb, D., Agnew, W., Klyman, K., Chancellor, S., Ong, D. C., & Haber, N. (2025). Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers. https://doi.org/10.48550/ARXIV.2504.18412; Barry, E. (2025, February 24). Human therapists prepare for battle against A.I. pretenders. The New York Times. https://www.nytimes.com/2025/02/24/health/ai-therapists-chatbots.html; Robb, M. B., & Mann, S. (2025). Talk, trust, and trade-offs: How and why teens use AI companions. Common Sense Media. https://www.commonsensemedia.org/sites/default/files/research/report/talk-trust-and-trade-offs_2025_web.pdf; Kim, S., Kim, J., Shin, S., Chung, H., Moon, D., Kwon, Y., & Yoon, H. (2025). Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2508.16921
Mik, E. (2024). Caveat Lector: Large Language Models in Legal Practice (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2403.09163; Kapoor, S., Henderson, P., & Narayanan, A. (2024). Promises and Pitfalls of Artificial Intelligence for Legal Applications. Journal of Cross-Disciplinary Research in Computational Law, 2(2). https://journalcrcl.org/crcl/article/view/62
Shekar, S., Pataranutaporn, P., Sarabu, C., Cecchi, G. A., & Maes, P. (2025). People Overtrust AI-Generated Medical Advice despite Low Accuracy. NEJM AI, 2(6). https://doi.org/10.1056/AIoa2300015
Selwyn, N., Ljungqvist, M., & Sonesson, A. (2025). When the prompting stops: Exploring teachers’ work around the educational frailties of generative AI tools. Learning, Media, and Technology, 1–14. https://doi.org/10.1080/17439884.2025.2537959; Flenady, G., & Sparrow, R. (2025). Cut the bullshit: Why GenAI systems are neither collaborators nor tutors. Teaching in Higher Education, 1–10. https://doi.org/10.1080/13562517.2025.2497263

❖

A Glossary of Generative AI Terms

AI before/beyond Generative AI

Artificial intelligence (AI) – A highly debatable phrase that can mean all sorts of things, but is generally used to mean "computers that think."

The idea existed long before the phrase: see artificial life-forms going back to Homer, or the word robot in use since Karel Čapek coined it in the 1920 play R.U.R. The phrase artificial intelligence was coined by a team of researchers at the 1956 Dartmouth Summer Research Project on Artificial Intelligence, which began the field of AI research.
Artificial general intelligence (AGI) – A highly debatable, hypothetical future form of AI whose abilities equal or exceed those of human beings. Some AI researchers claim it has already been reached or is soon to be reached; others say it is impossible.
Machine learning (ML) – A branch of AI research in which computers are designed to detect patterns in data. This form of AI has become widely used in STEM research and in commercial applications. (e.g. "smart" appliances, recommendation engines, dynamic pricing of goods and services, etc.)

ML includes (but is a much broader category than) generative AI. When a company advertises their product as being "powered by AI," some form of ML (but not GAI) is usually what they are referring to.
Generative AI (GAI / GenAI) – Computer programs that process large corpora of data and use predictive number-crunching to generate outputs based on it.
Ethics of AI / Responsible AI – A wide-ranging debate about what ways of building and using AI are right and wrong.

Common ethical topics have included bias, methods of training AI, sources of data and data privacy, and the effects of AI on the environment, labor, and consumers. With the rise of GAI, the effects of AI on learning and intellectual property rights have also emerged as prominent ethical issues.

Types of GAIs

Large language model (LLM) – A type of GAI that takes instructions written in natural language, and then produces answers in natural language (or other formats.)
Natural language – Normal sentences that people write and talk in; as opposed to computer code.
Vision model – A type of GAI that can “see” by processing images or video, recognizing objects, and sometimes generating pictures.
Audio / speech model – A type of GAI that can take audio to produce outputs in one or more formats. Variants include Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Speech-to-Speech models.
Multimodal model – A type of GAI that can handle more than one type of input or output, like text + images, or speech + video.

Data and Training

Data (plural:data) – Information stored in files on a computer. Can take the form of text, audio, images, spreadsheets, and more.
Corpus (plural: corpora) – Large collections of data. Includes text corpora like Google Books, visual corpora like Imagenet, spoken word/audio corpora like the Corpus of Contemporary American English (COCA), and the contents of vast web crawls such as the Common Crawl
Training – Feeding a model data that it can process and learn patterns from.
Supervised learning – Training with labeled data (right answers) so the model can compare its guess to the truth and adjust accordingly. This is how language models used to be trained before LeCunn, Bengio, and Hinton (2015) popularized unsupervised / deep learning.
Unsupervised learning / Deep learning – Training where the model tries to find patterns in data without labeled answers. The “deep” part means stacking many layers of guessing and correction, so each layer builds on the previous one’s output.
Token – A small chunk of text that an AI processes, like a word, part of a word, or even punctuation. The “atoms” of language for LLMs.
Embedding – A way to turn words, images, or other data into long lists of numbers so the AI can process them through a neural network.
Neural network – A web of interconnected “nodes,” each doing a small calculation. They process embeddings by multiplying them with weights (numbers the model learned, starting from random guesses) and passing the results along to the next layer.
Forward pass – The first step of unsupervised learning, which involves converting tokens → creating embeddings → running the layers → producing the predicted next tokens.
Backward pass – Second step of unsupervised learning. The loss, or amount of error between the predicted next tokens and the actual next tokens, is calculated to produce a gradient, or amount and direction the weights need to be adjusted to reduce the error.
Stochastic gradient descent (SGD) – A mouthful that means, "After each pass, take the random (stochastic) weights and nudge them in the direction / downhill (descent) towards the difference (gradient) between actual next token and the predicted next token (the loss)." The last step of unsupervised learning, repeated over many passes.
Raw model – A generative AI model that has passed through the unsupervised learning or pretraining stage of its training, but before it’s been fine-tuned, safety-tuned, or otherwise customized.
Reinforcement learning / Reinforcement learning from human feedback (RLHF) – A stage of training where the model’s outputs get rated, and those ratings act like rewards or penalties to steer it toward preferred results. The RLHF stage is used to enhance the user experience and make the model seem more helpful; but it is also used in an effort to "align" the AI model and steer it away from behaviors considered "dangerous."
Fine-tuning – Taking a pretrained model and training it more on a smaller, more specialized set of data so it performs better for a specific task or area.
Alignment / AI Safety – A burgeoning field of research whose practitioners say that they are making GAI models safer and more "aligned" to "human values"—a proposition whose every term is highly debatable.

Using Generative AIs

Foundation model – Very large AI models trained on massive datasets. They’re called “foundation” models because they serve as general base that other applications (wrappers) can call using an API.

Because they are so large and data-intensive, the number of companies who create foundation models is relatively few. Here is a partial list (company, model name): OpenAI (ChatGPT); Google (Gemini); Anthropic (Claude); xAI (Grok); Mistral AI (Le Chat); Meta (LLaMA); DeepSeek
Wrapper – Extra software that sits “around” a foundation model (or another system) to make it easier to use for a specific purpose. It can add guardrails, format inputs and outputs, integrate with other tools, or hide complexity from the end user. In the case of many smaller GAI services, what is being sold is actually a wrapper that calls (and adds additional instructions to) another company's foundation model.
Application Programming Interface (API) – A set of rules and protocols that allows an outside program (like a smaller GAI company selling a wrapper) to send data (like text, images, or audio) to a model hosted elsewhere and get the output back, without needing to run the model themselves.
Prompt – The input or question you give a GAI to get it to respond. Can be short (“Translate to French”) or long (“Write a detective story set on Mars in the style of Agatha Christie”).
Output – The answer or response that the GAI sends back. Can take the form of text, images, sound files, video, etc.
Context window – The maximum amount of information the AI can “remember” at once while answering you, measured in tokens. Newer and more resource-intensive models have larger context windows, allowing them to process more of the data you prompted it with at once.
Retrieval augmented generation (RAG) – The practice of linking the AI to an outside file to provide it specialized instructions, focus its attention, or enable it to process and focus on that specific information.
Thinking models / Chain of thought (CoT) – A GAI that processes information in multiple steps, taking longer to give a more sophisticated answer and the appearance of thinking. Recent research suggests these models process information in much same way other models do, then present the user with an imaginary narrative of their thought process.
AI agent / agentic AI / agent mode – A GAI model that will autonomously take actions on the internet based on the user's instructions. Since the user does not directly operate the model at all times, the risk of accidental errors seems high and these should probably be used with caution.
Prompt engineering – A skill or discipline that has been proposed that consists of learning to provide GAIs with more precise or structured prompts (such as assigning the GAI a role, or specifying desired parameters for the output) in order to receive more precise or "better" (in some sense) outputs. Its degree of usefulness is debatable.

❖

AI Use Disclosure Guide
When to NOT USE GAI	Your professor has not given you permission to use GAI. You are trying to develop the skill/knowledge independently.
When to write an AI USE DISCLOSURE STATEMENT	Your professor has given you permission to use GAI, but would like you to document your use of it. You make clear in the statement what parts of the assignment you created yourself, and what parts were produced with GAI. Other ways to document could include: tracking changes in a document; copy-and-pasting your conversation with an AI into a document and submitting it; or providing the prompts you used.
When to CITE GAI AS A SOURCE	You are treating GAI as a primary source -- the object you are analyzing and putting in context in your paper. You are NOT treating GAI as a reliable secondary source of information. You are using other sources for this purpose. You are NOT treating GAI as the original source of the information it provides; GAIs are not the original sources of the information they provide.