Microsoft Office Tools Reportedly Collect Data for AI Training, Requiring Manual Opt-Out

November 26, 2024

Microsoft’s Office suite is the staple in productivity tools, with millions of users entering sensitive personal and company data into Excel and Word. According to @nixCraft, an author from Cyberciti.biz, Microsoft left its “Connected Experiences” feature enabled by default, reportedly using user-generated content to train the company’s AI models. This feature is enabled by default, meaning data from Word and Excel files may be used in AI development unless users manually opt-out. As a default option, this setting raises security concerns, especially from businesses and government workers relying on Microsoft Office for proprietary work. The feature allows documents such as articles, government data, and other confidential files to be included in AI training, creating ethical and legal challenges regarding consent and intellectual property.

Disabling the feature requires going to: File > Options > Trust Center > Trust Center Settings > Privacy Options > Privacy Settings > Optional Connected Experiences, and unchecking the box. Even with an unnecessary long opt-out steps, the European Union’s GPDR agreement, which Microsoft complies with, requires all settings to be opt-in rather than opt-out by default. This directly contradicts EU GDPR laws, which could prompt an investigation from the EU. Microsoft has yet to confirm whether user content is actively being used to train its AI models. However, its Services Agreement includes a clause granting the company a “worldwide and royalty-free intellectual property license” to use user-generated content for purposes such as improving Microsoft products. The controversy raised from this is not new, especially where more companies leverage user data for AI development, often without explicit consent.

For the current LLM AI models, the data on which they are being trained is the key to distinguishing them from competitors. Quality data is the prize, and when a unique dataset like the one Microsoft has access to is collected, that AI model could outperform the competition by a mile in tasks like writing and basic reasoning. Especially with sensitive data not available to the public, Microsoft could extend its AI lead. However, LLMs are not immune to leaking a part of their training data, so a skilled professional could extract it. For now, users who wish to protect their intellectual property are advised to review their settings carefully.