MOSTI’s Large Language Models (LLMs) – Some Legal Concerns

RDS Project
Jun 25
3 min read

Personal Data Security

LLMs are trained on extremely large datasets of text, enabling them to learn intricate language patterns and nuances. LLMs are designed to continuously learn and evolve by storing data that has been fed into the system. This raises concerns regarding personal data protection, which falls under the Personal Data Protection Act 2010 (PDPA). In order to develop a more efficient and accurate LLM system, companies may unlawfully collect and process personal data from millions of users without consent, in violation of the PDPA.

A recent example is LinkedIn, which was sued in January 2025 for allegedly sharing premium users' private messages with third parties to train AI models without obtaining user consent.1 The lawsuit claims that LinkedIn updated its privacy settings in August 2024, automatically opting users into data sharing without proper notification and later revised its privacy policy in September 2024 to include AI model training data usage.

Similarly, OpenAI and Microsoft, are facing their second class-action lawsuit in a San Francisco Federal Court for allegedly breaching multiple privacy laws in developing OpenAI's chatbot, ChatGPT.2

As the Ministry of Science, Technology & Innovation (MOSTI) plans to develop an LLM, concerns arise regarding the use of personal data to train the model. Since the Government has access to extensive datasets, there is a genuine risk that personal data could be utilised to make the LLM more compatible with the Malaysian user interface. Additionally, the collaboration between MOSTI and local companies in developing the LLM raises further concerns about whether these companies would have access to sensitive personal data stored by MOSTI.

The storage and use of personal information create potential risks of data breaches and unauthorised access. Section 6 of the PDPA emphasises that data controllers must obtain explicit consent before processing personal data, while Section 9 of the PDPA mandates the accuracy and protection of such data. Meanwhile, Section 8 of the PDPA specifies that personal data can only be retained for as long as necessary for its intended purpose. If MOSTI fails to implement proper security measures, it could lead to serious issues regarding the misuse or leakage of personal data, ultimately violating the protections afforded under the PDPA.

Data Confidentiality And Integrity

Another key legal concern regarding the development of LLM by MOSTI is data confidentiality of inputted data by users. While LLMs are designed to generate responses based on vast amounts of training data, they also rely on user-provided input to refine their outputs. Users often submit sensitive or confidential information to achieve more personalised and relevant responses. However, there is a risk that this data may be stored, processed or even repurposed for unintended uses beyond the original user query.

The model’s ability to retain contextual knowledge means that sensitive information could be integrated into future outputs, potentially exposing confidential or proprietary data. This raises significant concerns, particularly for industries dealing with confidential business communications, legal documents or financial transactions. If an LLM system trained on such data inadvertently discloses or replicates proprietary information, it could lead to breaches of confidentiality agreements and legal liabilities. Additionally, the concern is amplified as government agencies have access to extensive datasets, including individual records, company information, and national records.

If such information is fed into the system without robust confidentiality protection, it could expose sensitive data to exploitation. There is a risk that this data may be stored, processed or even repurposed for unintended uses beyond the original user query. For example, data originally provided by one user to personalise responses could later be used to train the model for different purposes, such as improving the system's general output or targeting advertisements, without the user's explicit consent. This could lead to the inadvertent disclosure of confidential or personal information.

Concerns About Enforcement

While MOSTI has committed to adhering to the National Guidelines on Artificial Intelligence Governance and Ethics (AIGE) in developing LLM, several legal concerns remain. One of the key challenges is that these guidelines lack the force of law, unlike legislated statutes, they do not carry direct penalties for non-compliance. This raises concerns about enforcement, particularly in ensuring transparency and confidentiality in the development and deployment of LLM.

Furthermore, Malaysia currently lacks an independent regulatory body with investigatory or enforcement powers to oversee LLM effectively. Without a comprehensive legal framework to address these issues, the reliance on ethical guidelines alone may be insufficient to safeguard against potential risks and ensure responsible LLM development.

1.https://www.reuters.com/legal/microsofts-linkedin-sued-disclosing-customer-information-train-ai- models-2025-01-22/?utm_source=chatgpt.com

2.https://www.reuters.com/legal/litigation/openai-microsoft-hit-with-new-us-consumer-privacy-class- action-2023-09-06/

25 June 2025