If you end up using any of this code or sharing it, please give us credit in the code comments
We use this on our PHPBB2 board but it I think it can be used the same on a phpbb3 board as well.
Since our board is pretty complicated, some things are stripped out but this should give those who are interested a good starting point. Of course, you will want to add any (what you think is necessary) DB insert validations that you think you need for your setup.
Another note - even though search engines in many cases can be trusted, keep an eye on what they are doing on your site from the IP blocks that they are supplying to you. In the past there have been incidents of less than honest activity from many of the major players so having consistent oversight into the activity on your board from all users is very important. If you do not have good oversight over all of your board's visitors and activity you would be absolutely shocked by what you would find.
This mod maintains a list in your database of the valid IP blocks from Google, Apple, and Bing that you can allow access to your board through another mod such as functions_ip_track by aUsTiN-Inc
You'll need to run the script (at an interval of your choosing) as a cron or Windows job using at least php 5.3. It might work on earlier versions - we do not have any records of testing it on anything earlier than 5.3. We've stripped it back to be compatible with that version. Check your php.ini that the needed extensions are enabled as well
1st step, you need to create the table to hold the IP information in your phpbb database
Code: Select all
--
-- Table structure for table `sebot_ip_ranges`
--
CREATE TABLE IF NOT EXISTS `sebot_ip_ranges` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`ip_start` varchar(15) NOT NULL,
`ip_end` varchar(15) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ip_range` (`ip_start`,`ip_end`)
) ;
Code: Select all
<?php
/***************************************************************************
* googlebotipaddresses.php
* -------------------
* Author : JLA FORUMS - [email protected]
* Created : XXXX
* Last Updated : Saturday, Mar 08, 2025
*
* Version : 1.0.0 for approved shared release - JLA
*
***************************************************************************/
// This is setup for a phpbb2 board but you can make the necessary changes for it to work on a phpbb3 board as well easily enough.
define('IN_PHPBB', true);
$phpbb_root_path = //Change this to fit your board
include($phpbb_root_path . 'extension.inc');
include($phpbb_root_path . 'common.'.$phpEx);
global $db;
// URLs to fetch Googlebot, Special Crawlers, Bingbot, and Applebot IP ranges - these should be updated if any of the services change them - which can happen). These URLs are good as of March 2025
$urls = [
'googlebot' => 'https://developers.google.com/static/search/apis/ipranges/googlebot.json', //These are for Googlebot
'special_crawlers' => 'https://developers.google.com/static/search/apis/ipranges/special-crawlers.json', // For some special google crawlers you might want to allow
'bingbot' => 'https://www.bing.com/toolbox/bingbot.json', // For Bing
'applebot' => 'https://search.developer.apple.com/applebot.json'//For applebot
];
// Important to check for each service what to add to robots.txt to disallow anything to do with AI training or AI access to your site. For example Apple has something like Applebot Extended. They have it documented what to add to Robots.txt to disallow.
// Array to store all IP ranges
$ip_ranges = array();
$url_errors = array();
// Fetch and process data from each URL
foreach ($urls as $name => $url) {
// Initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$response = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
// Check if the request was successful
if ($http_code != 200 || empty($response)) {
$url_errors[] = "Failed to fetch data from $name URL: HTTP code $http_code";
continue; // Skip this URL and proceed to the next one
}
// Decode the JSON response
$data = json_decode($response, true);
if (json_last_error() !== JSON_ERROR_NONE || !isset($data['prefixes'])) {
$url_errors[] = "Invalid JSON data from $name URL";
continue; // Skip this URL and proceed to the next one
}
// Extract IPv4 prefixes and convert them to start-end ranges
foreach ($data['prefixes'] as $prefix) {
if (isset($prefix['ipv4Prefix'])) {
list($network, $mask) = explode('/', $prefix['ipv4Prefix']);
$ip_start = long2ip(ip2long($network) & (-1 << (32 - $mask)));
$ip_end = long2ip(ip2long($network) + pow(2, (32 - $mask)) - 1);
$ip_ranges[] = array(
'ip_start' => $ip_start,
'ip_end' => $ip_end,
);
// Echo the processed IP range
echo "Processed IP range: $ip_start - $ip_end\n";
}
}
}
// If any URLs failed, echo the errors and stop further processing
if (!empty($url_errors)) {
foreach ($url_errors as $error) {
echo "$error\n";
}
echo "No changes were made to the database due to URL errors.\n";
echo "-->COMPLETED WITH ERRORS!\n\n";
exit;
}
// Fetch existing IP ranges from the database
$existing_ranges = array();
$sql = 'SELECT ip_start, ip_end FROM sebot_ip_ranges';
$result = $db->sql_query($sql);
while ($row = $db->sql_fetchrow($result)) {
$existing_ranges[] = $row;
}
$db->sql_freeresult($result);
// Compare fetched ranges with existing ranges
$ranges_to_insert = array();
$ranges_to_keep = array();
foreach ($ip_ranges as $range) {
$range_key = $range['ip_start'] . '-' . $range['ip_end'];
if (!in_array($range, $existing_ranges)) {
$ranges_to_insert[] = $range; // Range is new or changed
} else {
$ranges_to_keep[] = $range; // Range is unchanged
}
}
// If there are new or changed ranges, update the database
if (!empty($ranges_to_insert)) {
// Clear only the old ranges that are not in the new data
foreach ($existing_ranges as $existing_range) {
if (!in_array($existing_range, $ip_ranges)) {
$sql = "DELETE FROM sebot_ip_ranges
WHERE ip_start = '" . $existing_range['ip_start'] . "'
AND ip_end = '" . $existing_range['ip_end'] . "'";
$db->sql_query($sql);
}
}
// Insert new or changed ranges
foreach ($ranges_to_insert as $range) {
$sql = "INSERT INTO sebot_ip_ranges (ip_start, ip_end)
VALUES ('" . $range['ip_start'] . "', '" . $range['ip_end'] . "')";
$db->sql_query($sql);
// Echo the IP range being written to the database
echo "Writing to database: " . $range['ip_start'] . " - " . $range['ip_end'] . "\n";
}
echo "IP ranges updated successfully.\n";
} else {
echo "No changes in IP ranges.\n";
}
echo "-->COMPLETED!\n\n";
sleep(3500); // this is good to keep for example if you run the script on an hourly basis to see if you had any problems and you do not have any sort of error notification script or oversight. You can comment it out if you do not need it.
?>
We also put an example of using WINCACHE that will catch the result in memory so you only make a single DB query each hour for this info. Using Wincache or similar to reduce DB queries results in HUGE performance improvements for PHPBB.
Code: Select all
// EXAMPLE CODE for checking IPs for valid SE BOTS to allow access to your phpBB
//add Googlebot and others to wincache and if not there pull ranges from DB -- JLA
// WinCache key for storing/retrieving IP ranges
$gbwincache_key = 'SEBOT_IP_RANGES';
// Check if the IP ranges are cached in WinCache
$ip_ranges_cached = false;
$ip_ranges = wincache_ucache_get($gbwincache_key, $ip_ranges_cached);
if (!$ip_ranges_cached)
{
// If not cached, fetch IP ranges from the database
$sql = 'SELECT ip_start, ip_end FROM googlebot_ip_ranges';
$result = $db->sql_query($sql);
$ip_ranges = array();
while ($row = $db->sql_fetchrow($result)) {
$ip_ranges[] = array(
'ip_start' => $row['ip_start'],
'ip_end' => $row['ip_end'],
);
}
$db->sql_freeresult($result);
// Store the IP ranges in WinCache for future use (cache for 1 hour)
wincache_ucache_set($gbwincache_key, $ip_ranges, 3600);
}
//Check SEbot IPs
// Check if the IP is within any of the allowed ranges
$ip_allowed = false;
foreach ($ip_ranges as $range)
{
$ip_start = $range['ip_start'];
$ip_end = $range['ip_end'];
// Check if the IP is within the range
if (sprintf('%u', ip2long($ip)) >= sprintf('%u', ip2long($ip_start)) && sprintf('%u', ip2long($ip)) <= sprintf('%u', ip2long($ip_end)))
{
$ip_allowed = true;
break;
}
}
if (!$ip_allowed)
{
// Here you can put code to deny the visitor and send to an error page or something else. Any IP that is not in the range list of valid Search Engine bots will execute the code you put here
}
Let us know if you have any questions.