Tag Archives: Robotstxt

Google Says Very Few Robots.txt Recordsdata Are Over 500KB

Gary Illyes shared a nice little tidbit on LinkedIn about robots.txt files. He said that only a tiny number of robots.txt files are over 500 kilobytes. I mean, most robots.txt files have a few lines of text, so this makes sense but still, it is a nice tidbit of knowledge. Gary looked at over a […]

Google To Work On Complementary Robots.txt Protocols

Google announced last night that it is looking to develop a complementary protocol to the 30-year-old robots.txt protocol. This is because of all the new generative AI technologies Google and other companies are releasing. This announcement comes shortly after the news around Open AI accessing paywalled content for its ChatGPT service. But I know many […]

Google Will Ignore Robots.txt Directives If It Serves A 4xx Standing Code

Here is another PSA from Gary Illyes of Google. In short, if you serve a 4xx status code with your robots.txt file, then Google will ignore the rules you have specified in that file. Why? Well, 4xx status codes means the document is not available, so Google won’t check it because the server says it […]

Google says url case additionally issues for Robots.txt

We have known for likes Google does it and can treat the same url but differently with different cases. Therefore, domain.com/Apple vs. domain.com/apple can be viewed as different URLs by Google. But Google seems to be stricter on this rule when it comes to the robots.txt file. Using domain.com/Apple vs. domain.com/apple, Google can tell if […]