Filtering and Compliance Settings

Configuration of filters for compliance

[Last updated: September 10, 2023]
One of backplain's features is the ability to control and enforce compliant AI chats. In the main admin section, the Compliance section includes two tiles for administrators to configure filters and review any violations.

As of Version 0.1.10 the compliance settings section includes the Personal Information Filters section defined below.

Compliance Settings

When you click on the Settings button in the Compliance tile on the main admin page, you'll notice three things.

These settings will apply to the entire organization
There is an All Filters switch for Open AI Moderation and Personal Information Filters
Both options can be expanded and individual custom filters can be toggled on and off.

Offensive Content Moderation

You should click the dropdown button next to the toggle switch. This will give you a list of all the Open AI Offensive Content Moderation filters that will be enabled as soon as you toggle this feature ON. These filters take advantage of Open AI's internal policies and will alert you when someone's prompt violates them. Backplain checks the prompts for filtering and any violation prior to sending the prompt to the AI service and if a violation is found will return a response with an orange warning stating:

This content may violate the organization's content policy. If you believe this to be in error, please submit your feedback.

in the above warning, content policy is a link to the list of filters that you saw when you clicked the dropdown button for the Open AI moderation. Also, submit your feedback is a link to the support page where they can open a ticket.

It will also ask them to review their prompt before sending. All of this happens within backplain based on the filters you've selected, and the prompt never gets submitted to the AI Service.

Personal Information Filters

Much like the Offensive Content Moderation filters, the Personal Information Filters can be turned on or off individually. The main difference is these filters apply to all the AI LLM models that backplain aggregates for you. They also block any prompt that filters the triggers from ever leaving the backplain service and utilize the same warning and alerting as above. Thes will need to be defined and turned on or off specific to your users and the nature of the work your users perform. The basic PII (Personally Identifiable Information) are options to turn on or off, US Passport, US SSN, IBAN Code, etc. Others like URL or IP address or here as well to give admins with a need for tighter control to log and restrict these types of requests.

As backplain continues to grow it's list of AL LLM's these filters will continue to perform the function of protecting your and your organizations data.

Compliance Alerts

On the main Admin page, the Compliance button in the Compliance tile takes you to the Compliance Alerts page. Although user prompts that violate policy do not get sent to the AI service, they do get sent to the Compliance Alerts page.

Along with the filter that was triggered, the date, and the alert type of the policy violation, the user who attempted to ask the policy violating prompt will be listed as well. This is a reporting page where you as an admin can view policy violations.

Under the Actions column, each violation of policy will have an ellipsis. Clicking the ellipsis will bring up an action menu. The only current action is to view the message. This takes you to the policy violating message allowing you to determine whether the prompt merits further investigation, the user needs further training, or the filter needs to be modified. This is also a logged feature that can be used, should it be necessary, to collect data and inform the necessary teams to comply with your company's policies.

Under the Message Column, the snippet of the message will have a pop out icon next to it that will also direct you to the message. The Alert Category column shows which filters triggered the alert. The other columns should be self-explanatory, User shows username, Filter tells you which of the two filter types was triggered, Alert Category shows which toggle or toggles were set off.

The Date column, although also self-explanatory, has a partner component above it. The Pick a date calendar option allows you to select a date range for the alerts. Click once to set the beginning range and a second time to define the end range. Good for grabbing alerts when you know the timing of an incident.

The search feature will help narrow down a specific user, phrase. filter, alert category, etc. It searches through all aspects of alerts to help discover what you are looking for.

More to Come

We have exciting plans for the future! We are continuously working on developing custom filters, templates, and actions for policy violations. Our goal is to provide you with even more features. The ability to toggle Open AI moderation and Personal Information policies, prevent prompts that violate them, and alert on those violations is just the beginning of our journey, not the end.