Need to edit htaccess while moving on WordPress

Stack Exchange

Question from WordPress Development on Stack Exchange

I am moving my site from another CMS to WordPress. I currently have this .htaccess

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /atoponwp/
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /atoponwp/index.php [L]
</IfModule>

# END WordPress

As I need to keep my old links in Google Search consistent with the new site, how can I modify current .htaccess in order to have the following rule accomplished?

Old links:

http://<mysite>/index.php?page=<pagename>

should be changed to:

http://<mysite>/<pagename>

Thanks for any help.

Fabio

My answer

[Read, comment, and vote on my answer at Stack Exchange]

This question seems simple, but if you use some of the basic features of WordPress (such as categories), then the solution can become very complex and difficult. My suggestion below assumes that you use a very simple setup.

First, I will assume that when you convert your content to WordPress that what you call [pagename] is identical (including capitalization) in your old CMS and in WP. In WP, [pagename] has the technical name, “slug”, but in some places it is called “postname” or “post name”. After you convert your content while making sure the [pagename] in your old CMS is the slug in WP, you should configure your permalinks.

In WP, click Settings, Permalinks. Select the Post name option because that matches what you asked for in your question, “//[mysite]/[pagename]”.

Now, WP will properly recognize page requests in your new format “//[mysite]/[pagename]”. The last step is to configure your server to change the URLs. There are multiple ways to accomplish this, but since you specifically mentioned .htaccess, I will stick to that and I will use a simple method that should work for all of your posts.

<IfModule mod_rewrite.c> # make sure the server can execute the directives
RewriteEngine On # turn on the group of directives you will need
RewriteRule ^index\.php\?page=(.*) /$1 [R=301,L] # do the dirty work
</IfModule> # close your conditional check

Assuming you are on Apache, the following two pages will be useful:
https://httpd.apache.org/docs/current/mod/mod_rewrite.html
https://httpd.apache.org/docs/current/rewrite/intro.html

Let’s breakdown the directive that does the heavy lifting:

RewriteRule ^index\.php\?page=(.*) /$1 [R=301,L]

RewriteRule is the directive and has three parts. It means something like “RewriteRule {if URL looks like this} then {change it to this} and {some additional flags to do special things}.”

The first part is a regular expression: (forgive me if you already know RegEx)

^index\.php\?page=(.*)

^ tells the server to start looking for URLs at the beginning of the string, and that does not include the domain name.

Most of the other characters mean exactly what they look like, “i” means i and “a” means a, but there are some special characters in there.

The period “.” means match exactly one character: in other words, it is a wildcard. So, to your server, “index.php” means “index” followed by any possible character followed by “php”. We don’t want to confuse the server so we use the backslash \ to escape the period and tell the server that we mean period and only period, hence:

index\.php

Similarly, the question mark “?” has a special meaning, so we must escape it also. That should explain at least this much:

^index\.php\?page=

The next part of the expression is beautiful and powerful (and yes, I am a nerd). The parentheses capture a string so that you can later reuse it. What will it capture? As explained above, the period means match exactly one character. The asterisk * means to repeat the previous command multiple times: put them together and .* means match at least one character until something stops you. In this case, the end of the string is what stops the server from matching characters.

The magic here is that .* is inside the parentheses (.*), so all of the characters that it matches will be “captured”: stored for later use. Where will we use them? Let’s move on to the second part of the directive:

$1

It’s so simple that it almost seems wrong. This part of the directive tells the server how to reconstruct the URL. When you captured the characters using the parenthesis, the server temporarily stored them in memory and because it was the first string of characters you captured, the server conveniently named the memory space $1. (Recall that the dollar sign is shorthand for the word “string”.)

So, by writing $1, you are telling the server to use the captured characters as the new URL. It will be appended to the domain name and since the domain name ends with a forward slash, the URL will look like //[mysite]/[pagename].

Finally, the flags. “R” means redirect. This is a code for the accessing client to read. It is especially important for search engines because it tells them, “Hey, I know you were looking for this page at a specific URL, but that page is not here right now, so I am redirecting you to a different URL.” But why are you redirecting? That is the point behind the redirection code. 301 means a permanent redirect, so the search engine will update its database–replacing the old URL with the new URL.

The L flag means “last”, as in this is the last RewriteRule I want the server to process on this particular URI request.

And that, my friend, is one of the simplest ways to accomplish your goal. If you try to do anything more complex, prepare to learn a lot and to cry a lot. My .htaccess file has 141 lines and it is still growing. There is a small dent on my desk where I have repeatedly pounded my head trying to get things working properly.

Good luck!

Liked it? Take a second to support Hunter Hogan on Patreon!
Become a patron at Patreon!