User-agent: * Syntax not understood


An optional robots.txt file, put in the root of the web application and freely accessible, holds instructions to crawling robots which parts of the web site not to index.

The 'Syntax not understood' error can be observed when you try to test the indexability of your site in Google Webmaster Tools > Crawl > BlockedURLs. If you 'View page source' and search for 'User-agent' you will most probably find a row like:

The trouble is with that  entry. It is an invisible character entity in html, commonly referred to as "ZERO WIDTH NO-BREAK SPACE" character, decimal code 65279, hexadecimal code FEFF. In this case it is coming from the BOM (byte order mask) - the optional 3 bytes at the beginning of a UTF-8 formatted file, hex EF BB BF. UTF-8 is the required encoding, but Google currently does not like the optional BOM in the expected clear and simple robots.txt file and parsing fails.

Problem resolution:

  1. Remove the BOM from your robots.txt file. (Edit the file in an editor like Notepad++. Use menu item Encoding > Encode in UTF-8 without BOM and save the file. Close and reopen the text editor and the file and check if the correct encoding option is indicated when pressing menu item Encoding, just to be sure, or review the content in hex.)

  2. Upload the file. Optionally "fetch it as Google" and "send to index" from Webmaster Tools > Crawl > Fetch as Google. Good to fetch your home page, too.

  3. Wait one or two days and check if the error has disappeared. Google needs some time to get the new version of your file (There is an indication when robots.txt was downloaded last time under Crawl > BlockedURLs). Until that time Webmaster Tools use the latest downloaded version. You can retype the first line of your robots.txt in the text area, thus removing the parasite BOM entry remainder and when pressing Test the error will not appear. But if you reload the page, the error is there again.

Back to List