Upgrade Word Censor to exclude URLs

https://www.phpbb.com/ideas/
Author:
mamba
Posted:
Fri Jul 13, 2018 7:01 am
Rating:
Status:
New
Ideas Bot
Registered User
Posts: 437
Joined: Sat Oct 13, 2012 10:06 am

Upgrade Word Censor to exclude URLs

Post by Ideas Bot » Fri Jul 13, 2018 7:01 am

Word censoring currently is being used (abused?) by some people to act as a URL modifier, but for many other forum owners and managers, it's a huge pain to have your word censor parsing your posts' html and interfering willy-nilly with URLs. It breaks links of kinds (images, webpages), including inside BBcode substitution html.

Example: you have a word censor that changes the word "crap" to "cr*p"

Then you discover the links to images that contain the string crap, like

Code: Select all

http:www.example.com/scrap.jpg
is now broken because it was altered to

Code: Select all

http:www.example.com/scr*p.jpg
Many people have reported this as a bug over the years, e.g.:
https://tracker.phpbb.com/browse/PHPBB3-9809 (Rejected by A_Jelly_Doughnut in 2010 with "Definitely working as intended. Censoring words in URLs is a common request")

Other reports:
https://tracker.phpbb.com/browse/PHPBB3-7709 (rejected as "expected behaviour")
https://tracker.phpbb.com/browse/PHPBB3-7385 (closed for no good reason)
https://tracker.phpbb.com/browse/PHPBB3-7195 (closed for no good reason)

Solution:

There should be an option on the word censoring panel in ACP to disable parsing of URLs.

User avatar
AmigoJack
Registered User
Posts: 5348
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by AmigoJack » Fri Jul 13, 2018 7:38 am

You mean when crap is censored on your board then I can circumvent it by just writing http://you.re/crap? Whereas rephrasing URLs is rather easy: https://en.wiktionary.org/wiki/scr%61p is the same as https://en.wiktionary.org/wiki/scrap? Ugh...
The worst thing about censorship is ███████████

User avatar
mamba
Registered User
Posts: 426
Joined: Thu Jan 16, 2003 7:59 pm

Re: Upgrade Word Censor to exclude URLs

Post by mamba » Fri Jul 13, 2018 7:49 am

AmigoJack wrote:
Fri Jul 13, 2018 7:38 am
You mean when crap is censored on your board then I can circumvent it by just writing http://you.re/crap?
Oh please, how many people would think of doing that? You're talking about a tiny, tiny minority.
Whereas rephrasing URLs is rather easy: https://en.wiktionary.org/wiki/scr%61p is the same as https://en.wiktionary.org/wiki/scrap? Ugh...
Ugly workaround.
Using 3.2, PHP version 7, MySQL 5.5, Host: hostgator shared Linux, Style: Prosilver

User avatar
AmigoJack
Registered User
Posts: 5348
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by AmigoJack » Fri Jul 13, 2018 7:56 am

mamba wrote:
Fri Jul 13, 2018 7:49 am
Oh please, how many people would think of doing that?
Oh, that doesn't matter. One user alone can do it to such an extent that you'll return to the old censor pattern.

mamba wrote:
Fri Jul 13, 2018 7:49 am
Ugly workaround.
Double standards: your example was picture file - embedding a picture never displays the address. Do you even know you can do [url=address]text[/url] which, again, would not display the actual address?
The worst thing about censorship is ███████████

User avatar
KevC
Support Team Member
Support Team Member
Posts: 68384
Joined: Fri Jun 04, 2004 10:44 am
Location: Oxford, UK
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by KevC » Fri Jul 13, 2018 8:07 am

Ideas Bot wrote:
Fri Jul 13, 2018 7:01 am
Example: you have a word censor that changes the word "crap" to "cr*p"

Then you discover the links to images that contain the string crap, like

Code: Select all

http:www.example.com/scrap.jpg
is now broken because it was altered to

Code: Select all

http:www.example.com/scr*p.jpg
That would not happen in that example. You would have to censor *crap or *crap*
-:|:- Support Request Template -:|:-
Image
Cheap UK Hosting
"In the land of the blind the little green bloke with no pupils is king - init!"

User avatar
AmigoJack
Registered User
Posts: 5348
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by AmigoJack » Fri Jul 13, 2018 8:50 am

KevC wrote:
Fri Jul 13, 2018 8:07 am
That would not happen in that example.
Correct. And word boundaries are appllied to URLs the same, so to censor crap it needs to be i.e. http://void.net/s.crap/er, whereas only *crap would censor http://void.net/scrap.jpg
The worst thing about censorship is ███████████

User avatar
mamba
Registered User
Posts: 426
Joined: Thu Jan 16, 2003 7:59 pm

Re: Upgrade Word Censor to exclude URLs

Post by mamba » Fri Jul 13, 2018 8:59 am

Okay fine, here's another example. I have a word censor that converts lowercase "abc" to "ABC" (not the actual string, but you get the idea). It's a acronym. Now that actually does break links, I've spent a few hours today running jobs in phpmyadmin fixing it.

It DOES happen. If the link contains for example "...images/abc.jpg" or "...example.com/abc.html" or "...images/abc-myimage.jpg" and many more variations, the link gets broken.

There are lots of word boundaries that would match: . and - are just two.

A word censor should not be parsing URLs! We need a switch to turn this behaviour off.
Using 3.2, PHP version 7, MySQL 5.5, Host: hostgator shared Linux, Style: Prosilver

User avatar
david63
Jr. Extension Validator
Posts: 14893
Joined: Thu Dec 19, 2002 8:08 am
Location: Lancashire, UK
Name: David Wood
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by david63 » Fri Jul 13, 2018 9:53 am

mamba wrote:
Fri Jul 13, 2018 8:59 am
We need a switch to turn this behaviour off.
On a word by word basis?
David
Remember: You only know what you know and - you don't know what you don't know!
My CDB Contributions | How to install an extension
I will not be accepting translations for any of my extensions in Github - please post any translations in the appropriate topic.
No support requests via PM or email as they will be ignored

User avatar
mamba
Registered User
Posts: 426
Joined: Thu Jan 16, 2003 7:59 pm

Re: Upgrade Word Censor to exclude URLs

Post by mamba » Fri Jul 13, 2018 1:16 pm

david63 wrote:
Fri Jul 13, 2018 9:53 am
On a word by word basis?
That would entail a new column in the database per word

I'd be happy with a single switch that could be set, applying to all words simultaneously.
Using 3.2, PHP version 7, MySQL 5.5, Host: hostgator shared Linux, Style: Prosilver

User avatar
Mick
Support Team Member
Support Team Member
Posts: 20231
Joined: Fri Aug 29, 2008 9:49 am
Location: Cardiff

Re: Upgrade Word Censor to exclude URLs

Post by Mick » Fri Jul 13, 2018 1:25 pm

I don’t see that half a dozen or so mentions of this constitutes “many people” out of the 400,000 members here or the tens of thousands of other phpBB users who have never joined this board. The tickets have mainly been closed as invalid.
"The more connected we get the more alone we become" - Kyle Broflovski

There are no ‘threads’ in phpBB, they are topics.

User avatar
HiFiKabin
Community Team Member
Community Team Member
Posts: 3267
Joined: Wed May 14, 2014 9:10 am
Location: Swearing at the PC, UK
Name: James
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by HiFiKabin » Fri Jul 13, 2018 3:53 pm

Slightly OT I know but the word censor is (IMHO) useless in any case.

As has been pointed out *crap* will censor scrap. But what about cock? In some context it is offensive, in others perfectly fine (I won't post an example, you are all clever enough to work it out)

*balls*? I have a large collection of footballs

*tit*? I have bluetits in the garden.

Have a firm no swearing policy and enforce it. The more extensive the word censor, the more people will try to get around it

b@lls

t!t

User avatar
mamba
Registered User
Posts: 426
Joined: Thu Jan 16, 2003 7:59 pm

Re: Upgrade Word Censor to exclude URLs

Post by mamba » Fri Jul 13, 2018 8:48 pm

The bottom line here is that most people use the Word Censor as a way of modifying or restricting text in posts. It is an unexpected "feature" that it also interferes with URLs. For many, it's an unwanted feature. I have about 100 word censors set up, and most are for readability purposes, for instance capitalizing place names ("berlin" becomes "Berlin"), fixing acronyms ("ocd" becomes "OCD"), fixing spelling ("Ive" becomes "I've", "im" becomes "I'm"), etc. This makes my forum a much more pleasant experience for users to read because they spend less time trying to interpret meaning. But it also breaks hyperlinks and images. I've had this issue arise several times over the years, and surely my experience is not uncommon.

At least can we discuss what code changes would be required to restrict the censor function to text only? Here is the relevant code:

functions.php

Code: Select all

/**
* Generate regexp for naughty words censoring
* Depends on whether installed PHP version supports unicode properties
*
* @param string	$word			word template to be replaced
*
* @return string $preg_expr		regex to use with word censor
*/
function get_censor_preg_expression($word)
{
	// Unescape the asterisk to simplify further conversions
	$word = str_replace('\*', '*', preg_quote($word, '#'));

	// Replace asterisk(s) inside the pattern, at the start and at the end of it with regexes
	$word = preg_replace(array('#(?<=[\p{Nd}\p{L}_])\*+(?=[\p{Nd}\p{L}_])#iu', '#^\*+#', '#\*+$#'), array('([\x20]*?|[\p{Nd}\p{L}_-]*?)', '[\p{Nd}\p{L}_-]*?', '[\p{Nd}\p{L}_-]*?'), $word);

	// Generate the final substitution
	$preg_expr = '#(?<![\p{Nd}\p{L}_-])(' . $word . ')(?![\p{Nd}\p{L}_-])#iu';

	return $preg_expr;
}

functions_content.php:

Code: Select all

/**
* Censoring
*/
function censor_text($text)
{
	static $censors;

	// Nothing to do?
	if ($text === '')
	{
		return '';
	}

	// We moved the word censor checks in here because we call this function quite often - and then only need to do the check once
	if (!isset($censors) || !is_array($censors))
	{
		global $config, $user, $auth, $cache;

		// We check here if the user is having viewing censors disabled (and also allowed to do so).
		if (!$user->optionget('viewcensors') && $config['allow_nocensors'] && $auth->acl_get('u_chgcensors'))
		{
			$censors = array();
		}
		else
		{
			$censors = $cache->obtain_word_list();
		}
	}

	if (count($censors))
	{
		return preg_replace($censors['match'], $censors['replace'], $text);
	}

	return $text;
}

If the required modifications are too difficult to easily post here, the only choice I and other affected admins have is to disable the word censoring function/delete all word censoring.
Using 3.2, PHP version 7, MySQL 5.5, Host: hostgator shared Linux, Style: Prosilver

User avatar
JoshyPHP
Code Contributor
Posts: 963
Joined: Mon Jul 11, 2011 12:28 am

Re: Upgrade Word Censor to exclude URLs

Post by JoshyPHP » Fri Jul 13, 2018 9:19 pm

The text inside of posts is censored somewhere else in the codebase, specifically this line. If you replace true with false it won't apply the censor inside of HTML attributes.

If you want this idea to ever be implemented your best bet is to make a Pull Request for it.
I wrote the thing that does BBCodes in 3.2.

User avatar
mamba
Registered User
Posts: 426
Joined: Thu Jan 16, 2003 7:59 pm

Re: Upgrade Word Censor to exclude URLs

Post by mamba » Fri Jul 13, 2018 9:24 pm

Wow thanks Joshy! It looks almost too easy to modify

/phpbb/textformatter/s9e/renderer.php

Code: Select all

	$html = $this->renderer->render($xml);
		if (isset($this->censor) && $this->viewcensors)
		{
			$html = $this->censor->censorHtml($html, true);
		}
So it's as simple as changing the line to

Code: Select all

			$html = $this->censor->censorHtml($html, false);
So here I go again, modifying core code without using an extension....

A pull request would have to ask for the ability to switch between true and false in that line, right? Based on the reception to this topic, as well as the "invalid" tag appended to the previous bug reports, I don't think it's worth doing....
Using 3.2, PHP version 7, MySQL 5.5, Host: hostgator shared Linux, Style: Prosilver

User avatar
JoshyPHP
Code Contributor
Posts: 963
Joined: Mon Jul 11, 2011 12:28 am

Re: Upgrade Word Censor to exclude URLs

Post by JoshyPHP » Fri Jul 13, 2018 10:24 pm

Bug reports were marked as invalid because it's not a bug per se. If you actually make a PR for it (e.g. a Yes/No setting in the admin panel under Post settings) I would expect it to be accepted.

It all depends on how useful you think that feature is and how much effort it deserves.
I wrote the thing that does BBCodes in 3.2.

Post Reply

Return to “phpBB Ideas”

Who is online

Users browsing this forum: No registered users and 22 guests