Upgrade Word Censor to exclude URLs

https://www.phpbb.com/ideas/
User avatar
mamba
Registered User
Posts: 603
Joined: Thu Jan 16, 2003 7:59 pm
Location: Australia

Upgrade Word Censor to exclude URLs

Post by mamba »

Word censoring currently is being used (abused?) by some people to act as a URL modifier, but for many other forum owners and managers, it's a huge pain to have your word censor parsing your posts' html and interfering willy-nilly with URLs. It breaks links of kinds (images, webpages), including inside BBcode substitution html.

Example: you have a word censor that changes the word "crap" to "cr*p"

Then you discover the links to images that contain the string crap, like

Code: Select all

http:www.example.com/scrap.jpg
is now broken because it was altered to

Code: Select all

http:www.example.com/scr*p.jpg
Many people have reported this as a bug over the years, e.g.:
https://tracker.phpbb.com/browse/PHPBB3-9809 (Rejected by A_Jelly_Doughnut in 2010 with "Definitely working as intended. Censoring words in URLs is a common request")

Other reports:
https://tracker.phpbb.com/browse/PHPBB3-7709 (rejected as "expected behaviour")
https://tracker.phpbb.com/browse/PHPBB3-7385 (closed for no good reason)
https://tracker.phpbb.com/browse/PHPBB3-7195 (closed for no good reason)

Solution:

There should be an option on the word censoring panel in ACP to disable parsing of URLs.
Using latest version of PHPBB
User avatar
AmigoJack
Registered User
Posts: 6108
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by AmigoJack »

You mean when crap is censored on your board then I can circumvent it by just writing http://you.re/crap? Whereas rephrasing URLs is rather easy: https://en.wiktionary.org/wiki/scr%61p is the same as https://en.wiktionary.org/wiki/scrap? Ugh...
  • "The problem is probably not my English but you do not want to understand correctly. ... We will not come anybody anyway, nevertheless, it's best to shit this." Affin, 2018-11-20
  • "But this shit is not here for you. You can follow with your. Maybe the question, instead, was for you, who know, so you shoved us how you are." axe70, 2020-10-10
  • "My reaction is not to everyone, especially to you." Raptiye, 2021-02-28
User avatar
mamba
Registered User
Posts: 603
Joined: Thu Jan 16, 2003 7:59 pm
Location: Australia

Re: Upgrade Word Censor to exclude URLs

Post by mamba »

AmigoJack wrote: Fri Jul 13, 2018 7:38 am You mean when crap is censored on your board then I can circumvent it by just writing http://you.re/crap?
Oh please, how many people would think of doing that? You're talking about a tiny, tiny minority.
Whereas rephrasing URLs is rather easy: https://en.wiktionary.org/wiki/scr%61p is the same as https://en.wiktionary.org/wiki/scrap? Ugh...
Ugly workaround.
Using latest version of PHPBB
User avatar
AmigoJack
Registered User
Posts: 6108
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by AmigoJack »

mamba wrote: Fri Jul 13, 2018 7:49 amOh please, how many people would think of doing that?
Oh, that doesn't matter. One user alone can do it to such an extent that you'll return to the old censor pattern.

mamba wrote: Fri Jul 13, 2018 7:49 amUgly workaround.
Double standards: your example was picture file - embedding a picture never displays the address. Do you even know you can do [url=address]text[/url] which, again, would not display the actual address?
  • "The problem is probably not my English but you do not want to understand correctly. ... We will not come anybody anyway, nevertheless, it's best to shit this." Affin, 2018-11-20
  • "But this shit is not here for you. You can follow with your. Maybe the question, instead, was for you, who know, so you shoved us how you are." axe70, 2020-10-10
  • "My reaction is not to everyone, especially to you." Raptiye, 2021-02-28
User avatar
KevC
Support Team Member
Support Team Member
Posts: 72343
Joined: Fri Jun 04, 2004 10:44 am
Location: Oxford, UK
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by KevC »

Ideas Bot wrote: Fri Jul 13, 2018 7:01 am Example: you have a word censor that changes the word "crap" to "cr*p"

Then you discover the links to images that contain the string crap, like

Code: Select all

http:www.example.com/scrap.jpg
is now broken because it was altered to

Code: Select all

http:www.example.com/scr*p.jpg
That would not happen in that example. You would have to censor *crap or *crap*
-:|:- Support Request Template -:|:-
Image
"Step up to red alert. Sir, are you absolutely sure? It does mean changing the bulb"
User avatar
AmigoJack
Registered User
Posts: 6108
Joined: Tue Jun 15, 2010 11:33 am
Location: グリーン ヒル ゾーン
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by AmigoJack »

KevC wrote: Fri Jul 13, 2018 8:07 amThat would not happen in that example.
Correct. And word boundaries are appllied to URLs the same, so to censor crap it needs to be i.e. http://void.net/s.crap/er, whereas only *crap would censor http://void.net/scrap.jpg
  • "The problem is probably not my English but you do not want to understand correctly. ... We will not come anybody anyway, nevertheless, it's best to shit this." Affin, 2018-11-20
  • "But this shit is not here for you. You can follow with your. Maybe the question, instead, was for you, who know, so you shoved us how you are." axe70, 2020-10-10
  • "My reaction is not to everyone, especially to you." Raptiye, 2021-02-28
User avatar
mamba
Registered User
Posts: 603
Joined: Thu Jan 16, 2003 7:59 pm
Location: Australia

Re: Upgrade Word Censor to exclude URLs

Post by mamba »

Okay fine, here's another example. I have a word censor that converts lowercase "abc" to "ABC" (not the actual string, but you get the idea). It's a acronym. Now that actually does break links, I've spent a few hours today running jobs in phpmyadmin fixing it.

It DOES happen. If the link contains for example "...images/abc.jpg" or "...example.com/abc.html" or "...images/abc-myimage.jpg" and many more variations, the link gets broken.

There are lots of word boundaries that would match: . and - are just two.

A word censor should not be parsing URLs! We need a switch to turn this behaviour off.
Using latest version of PHPBB
User avatar
david63
Registered User
Posts: 20646
Joined: Thu Dec 19, 2002 8:08 am

Re: Upgrade Word Censor to exclude URLs

Post by david63 »

mamba wrote: Fri Jul 13, 2018 8:59 am We need a switch to turn this behaviour off.
On a word by word basis?
David
Remember: You only know what you know and - you don't know what you don't know!

I now no longer support any of my extensions but they will start to become available here
User avatar
mamba
Registered User
Posts: 603
Joined: Thu Jan 16, 2003 7:59 pm
Location: Australia

Re: Upgrade Word Censor to exclude URLs

Post by mamba »

david63 wrote: Fri Jul 13, 2018 9:53 am On a word by word basis?
That would entail a new column in the database per word

I'd be happy with a single switch that could be set, applying to all words simultaneously.
Using latest version of PHPBB
User avatar
Mick
Support Team Member
Support Team Member
Posts: 26508
Joined: Fri Aug 29, 2008 9:49 am

Re: Upgrade Word Censor to exclude URLs

Post by Mick »

I don’t see that half a dozen or so mentions of this constitutes “many people” out of the 400,000 members here or the tens of thousands of other phpBB users who have never joined this board. The tickets have mainly been closed as invalid.
  • "The more connected we get the more alone we become" - Kyle Broflovski©
  • "The good news is hell is just the product of a morbid human imagination.
    The bad news is, whatever humans can imagine, they can usually create.
    " - Harmony Cobel
User avatar
HiFiKabin
Community Team Member
Community Team Member
Posts: 6673
Joined: Wed May 14, 2014 9:10 am
Location: Swearing at the PC, UK
Name: James
Contact:

Re: Upgrade Word Censor to exclude URLs

Post by HiFiKabin »

Slightly OT I know but the word censor is (IMHO) useless in any case.

As has been pointed out *crap* will censor scrap. But what about cock? In some context it is offensive, in others perfectly fine (I won't post an example, you are all clever enough to work it out)

*balls*? I have a large collection of footballs

*tit*? I have bluetits in the garden.

Have a firm no swearing policy and enforce it. The more extensive the word censor, the more people will try to get around it

b@lls

t!t
User avatar
mamba
Registered User
Posts: 603
Joined: Thu Jan 16, 2003 7:59 pm
Location: Australia

Re: Upgrade Word Censor to exclude URLs

Post by mamba »

The bottom line here is that most people use the Word Censor as a way of modifying or restricting text in posts. It is an unexpected "feature" that it also interferes with URLs. For many, it's an unwanted feature. I have about 100 word censors set up, and most are for readability purposes, for instance capitalizing place names ("berlin" becomes "Berlin"), fixing acronyms ("ocd" becomes "OCD"), fixing spelling ("Ive" becomes "I've", "im" becomes "I'm"), etc. This makes my forum a much more pleasant experience for users to read because they spend less time trying to interpret meaning. But it also breaks hyperlinks and images. I've had this issue arise several times over the years, and surely my experience is not uncommon.

At least can we discuss what code changes would be required to restrict the censor function to text only? Here is the relevant code:

functions.php

Code: Select all

/**
* Generate regexp for naughty words censoring
* Depends on whether installed PHP version supports unicode properties
*
* @param string	$word			word template to be replaced
*
* @return string $preg_expr		regex to use with word censor
*/
function get_censor_preg_expression($word)
{
	// Unescape the asterisk to simplify further conversions
	$word = str_replace('\*', '*', preg_quote($word, '#'));

	// Replace asterisk(s) inside the pattern, at the start and at the end of it with regexes
	$word = preg_replace(array('#(?<=[\p{Nd}\p{L}_])\*+(?=[\p{Nd}\p{L}_])#iu', '#^\*+#', '#\*+$#'), array('([\x20]*?|[\p{Nd}\p{L}_-]*?)', '[\p{Nd}\p{L}_-]*?', '[\p{Nd}\p{L}_-]*?'), $word);

	// Generate the final substitution
	$preg_expr = '#(?<![\p{Nd}\p{L}_-])(' . $word . ')(?![\p{Nd}\p{L}_-])#iu';

	return $preg_expr;
}

functions_content.php:

Code: Select all

/**
* Censoring
*/
function censor_text($text)
{
	static $censors;

	// Nothing to do?
	if ($text === '')
	{
		return '';
	}

	// We moved the word censor checks in here because we call this function quite often - and then only need to do the check once
	if (!isset($censors) || !is_array($censors))
	{
		global $config, $user, $auth, $cache;

		// We check here if the user is having viewing censors disabled (and also allowed to do so).
		if (!$user->optionget('viewcensors') && $config['allow_nocensors'] && $auth->acl_get('u_chgcensors'))
		{
			$censors = array();
		}
		else
		{
			$censors = $cache->obtain_word_list();
		}
	}

	if (count($censors))
	{
		return preg_replace($censors['match'], $censors['replace'], $text);
	}

	return $text;
}

If the required modifications are too difficult to easily post here, the only choice I and other affected admins have is to disable the word censoring function/delete all word censoring.
Using latest version of PHPBB
User avatar
JoshyPHP
Code Contributor
Posts: 1288
Joined: Mon Jul 11, 2011 12:28 am

Re: Upgrade Word Censor to exclude URLs

Post by JoshyPHP »

The text inside of posts is censored somewhere else in the codebase, specifically this line. If you replace true with false it won't apply the censor inside of HTML attributes.

If you want this idea to ever be implemented your best bet is to make a Pull Request for it.
I wrote the library that handles markup in phpBB 3.2+.
User avatar
mamba
Registered User
Posts: 603
Joined: Thu Jan 16, 2003 7:59 pm
Location: Australia

Re: Upgrade Word Censor to exclude URLs

Post by mamba »

Wow thanks Joshy! It looks almost too easy to modify

/phpbb/textformatter/s9e/renderer.php

Code: Select all

	$html = $this->renderer->render($xml);
		if (isset($this->censor) && $this->viewcensors)
		{
			$html = $this->censor->censorHtml($html, true);
		}
So it's as simple as changing the line to

Code: Select all

			$html = $this->censor->censorHtml($html, false);
So here I go again, modifying core code without using an extension....

A pull request would have to ask for the ability to switch between true and false in that line, right? Based on the reception to this topic, as well as the "invalid" tag appended to the previous bug reports, I don't think it's worth doing....
Using latest version of PHPBB
User avatar
JoshyPHP
Code Contributor
Posts: 1288
Joined: Mon Jul 11, 2011 12:28 am

Re: Upgrade Word Censor to exclude URLs

Post by JoshyPHP »

Bug reports were marked as invalid because it's not a bug per se. If you actually make a PR for it (e.g. a Yes/No setting in the admin panel under Post settings) I would expect it to be accepted.

It all depends on how useful you think that feature is and how much effort it deserves.
I wrote the library that handles markup in phpBB 3.2+.
Post Reply

Return to “phpBB Ideas”