You can use the Zendesk REST API to make backup copies of all the articles in your knowledge base. The backups can be useful in case you need to check or revert to a previous version of an article.
This tutorial covers the following tasks:
- What you need
- The plan
- Create the Python file
- Create folders for the backups
- Get all the articles in a language
- Paginate through the results
- Write the articles to files
- Create a backup log
- Code complete
- Restoring articles
You can back up a Help Center with only 34 lines of Python code. You can then restore any number of articles with a second, 27-line script.
What you need
You need a text editor and a command-line interface like the command prompt in Windows or the Terminal on the Mac. You'll also need Python 3 and a special library to make HTTP requests.
To set up your development environment:
-
If you don't already have Python 3, download and install it from http://www.python.org/download/. Python is a powerful but beginner-friendly scripting and programming language with a clear and readable syntax. Visit the Python website to learn more.
-
If you have Python 3.3 or earlier, download and install pip, a simple tool for installing and managing Python packages. See these instructions.
Note: If you have Python 3.4 or better, you already have pip. Skip ahead. -
Use the following pip command in your command-line interface to download and install the Requests library, a library that makes HTTP requests in Python easy:
$ pip3 install requests
Note: The dollar sign ($) represents the command prompt. Don't enter it.If you have Python 3.3 or earlier, use
pip
instead ofpip3
on the command line. -
Finally, when copying the examples in this tutorial, make sure to indent lines exactly as shown. Indentation matters in Python.
If you're interested in taking a deeper dive into Python after finishing this tutorial, see the following free resources:
- Think Python by Allen B. Downey
- Dive into Python 3 by Mark Pilgrim
The plan
The goal is to back up all the articles in a specified language in your knowledge base. You want to be able to run the script as many times as you need to back up each language in your knowledge base at different times.
Here are the basic tasks the script must carry out to create the backups:
- Download the HTML of the articles from the knowledge base.
- Create an HTML file for each article in a folder on your hard drive.
- Create a backup log for easy reference later.
Backing up the images in the articles is outside the scope of this article. It might be covered in a future tutorial.
Create the Python file
-
Create a folder named backups where you want to download the backups.
-
In a text editor, create a file named make_backup.py and save it in your new backups folder.
-
In the editor, add the following lines to the file.
import requests credentials = 'your_zendesk_email', 'your_zendesk_password' zendesk = 'https://your_subdomain.zendesk.com' language = 'some_locale'
You start by importing the requests library, a third-party Python library for making HTTP requests. You should have installed it earlier. See What you need.
The credentials variable specifies your Zendesk Support sign-in email and password. Before running the script, replace the placeholders your_zendesk_email and your_zendesk_password with actual values. Example:
credentials = 'jane_doe@example.com', '3w00tfawn56'
For security reasons, only enter your password when you're ready to run the script. Delete it when you're done.
The zendesk variable identifies your Zendesk Support instance. The language variable specifies the language of the articles you want to back up. Replace the placeholder values with your own. Example:
zendesk = 'https://obscura.zendesk.com' language = 'en-US'
See Language codes for supported languages for valid values for language.
Also, make sure to include 'https://' in your Zendesk Support url.
Create folders for the backups
In this section, you tell the script to automatically create a folder in your backups folder to store the backup. The folder will have the following structure to easily organize multiple backups in multiple languages:
/backups
/2015-01-24
/en-US
-
Import the native os and datetime libraries at the top of the script:
import os import datetime
-
Add the following lines after the last line in the script:
date = datetime.date.today() backup_path = os.path.join(str(date), language) if not os.path.exists(backup_path): os.makedirs(backup_path)
The script gets today's date and uses it along with your language variable to build the new path. When the script runs, the backup_path might be something like 2015-01-24/en-US.
The script then checks the make sure the directory doesn't already exist (in case you ran the script earlier on the same day). If not, it creates the directory.
Your script so far should look like this:
import os
import datetime
import requests
credentials = 'your_zendesk_email', 'your_zendesk_password'
zendesk = 'https://your_subdomain.zendesk.com'
language = 'some_locale'
date = datetime.date.today()
backup_path = os.path.join(str(date), language)
if not os.path.exists(backup_path):
os.makedirs(backup_path)
You can test this code. Make sure to specify a locale for the language variable (the credentials don't matter at this point), navigate to your backups folder with your command line, and run the script from the command line as follows:
$ python3 make_backup.py
A folder is created in the backups folder with the current date and the value of your language variable.
Get all the articles in a language
In this section, you send a request to the Help Center API to get all the articles in the language you specified. You'll use the following endpoint in the Articles API:
GET /api/v2/help_center/{locale}/articles.json
The endpoint is documented in this section of the API docs on developer.zendesk.com.
-
In the script, create the final endpoint url by adding the following statement after the last line in the script (don't use any line breaks):
endpoint = zendesk + '/api/v2/help_center/{locale}/articles.json'.format(locale=language.lower())
Before you can use the endpoint in a request, you need to prepend your Zendesk Support url to the string and specify a value for the
{locale}
placeholder. The statement builds the final url from the Zendesk Support url you specified, the endpoint path in the docs, and the article language you specified. The value of your language variable is inserted (or interpolated) at the{locale}
placeholder in the string.Because some locales listed in the language codes article have uppercase letters while the API expects lowercase letters, the value of the language variable is converted to lowercase to be on the safe side.
Using the example in this tutorial, the final endpoint url would be as follows:
'https://obscura.zendesk.com/api/v2/help_center/en-us/articles.json'
-
Use the endpoint url to make the HTTP request and save the response from the API.
response = requests.get(endpoint, auth=credentials)
The statement uses the requests object's
get()
method with the endpoint variable to make a GET request to the API. The method includes an argument named auth that specifies your basic authentication credentials. -
Check the request for errors and exit if any are found:
if response.status_code != 200: print('Failed to retrieve articles with error {}'.format(response.status_code)) exit()
According to the API doc, the API returns a status code of 200 if the request is successful. In other words, if the status code is anything other than 200 (if response.status_code != 200), then something went wrong. The script prints an error message and exits.
-
If no errors are found, decode and assign the response to a variable (no indent):
data = response.json()
The Zendesk REST API returns data formatted as JSON. The
json()
method from the requests library decodes the data into a Python dictionary. A dictionary is simply a set of key/value pairs formatted almost identically to JSON. Example dictionary:{'id': 35436, 'author_id': 88887, 'draft': true}
Consult the Zendesk API docs to figure out how the data dictionary is structured. For example, according to the articles API doc, the JSON returned by the API has the following structure:
You can deduce from the doc that the data dictionary consists of one key named articles. Its value is a list of articles, as indicated by the square brackets. Each item in the list is a dictionary of article properties, as indicated by the curly braces.
-
Use your new knowledge of the data structure to check the results so far:
for article in data['articles']: print(article['id'])
The snippet iterates through all the articles in your data dictionary and prints the id of each article. This is only temporary code for testing. You could print the article body with
article['body']
, but scanning that much HTML in your console could be a pain. We'll delete the print statement after we're done testing.
Your script so far should look as follows:
import os
import datetime
import requests
credentials = 'your_zendesk_email', 'your_zendesk_password'
zendesk = 'https://your_subdomain.zendesk.com'
language = 'some_locale'
date = datetime.date.today()
backup_path = os.path.join(str(date), language)
if not os.path.exists(backup_path):
os.makedirs(backup_path)
endpoint = zendesk + '/api/v2/help_center/{locale}/articles.json'.format(locale=language.lower())
response = requests.get(endpoint, auth=credentials)
if response.status_code != 200:
print('Failed to retrieve articles with error {}'.format(response.status_code))
exit()
data = response.json()
for article in data['articles']:
print(article['id'])
Replace all the placeholders with actual values and run the script again from the command line:
$ python3 make_backup.py
You should get a list of up to 30 article ids confirming that the articles were retrieved successfully. You won't see more than 30 articles even if you have more because the API limits the number to prevent bandwidth and memory issues. In the next section, you change the script to paginate through all the results.
Paginate through the results
In this section, you paginate through the article results to see all the articles. The JSON returned by the endpoint may only contain a maximum of 30 records, but it also contains a next_page
property with the endpoint URL of the next page of results, if any. Example:
"next_page": "https://example.zendesk.com/api/v2/en-US/articles.json?page=2",
...
If there's no next page, the value is null:
"next_page": null,
...
Your code will check the next_page
property. If not null, it'll make another request using the specified URL. If null, it'll stop. To learn more, see Paginating through lists.
-
Insert the following line (in bold) after the endpoint variable declaration:
endpoint = zendesk + '/api/v2/help_center/{locale}/articles.json'.format(locale=language.lower()) while endpoint:
-
Indent all the lines that follow the
while
statement.while endpoint: response = requests.get(endpoint, auth=credentials) if response.status_code != 200: print('Failed to retrieve articles with error {}'.format(response.status_code)) exit() data = response.json() for article in data['articles']: print(article['id'])
-
Add the following statement as the last line and indent it too:
endpoint = data['next_page']
This sets up a loop to paginate through the results. While the endpoint variable is true -- in other words, while it contains a url -- a request is made. After getting and displaying a page of results, the script assigns the value of the
next_page
property to the endpoint variable. If the value is still a url, the loop runs again. If the value is null, such as when the API returns the last page of results, the loop stops.
Your modified code should look as follows:
while endpoint:
response = requests.get(endpoint, auth=credentials)
if response.status_code != 200:
print('Failed to retrieve articles with error {}'.format(response.status_code))
exit()
data = response.json()
for article in data['articles']:
print(article['id'])
endpoint = data['next_page']
Run the script again from the command line:
$ python3 make_backup.py
You should get a list of all the articles in the language in your knowledge base.
The next step is to make copies of the articles on your computer.
Write the articles to files
In this section, you create HTML files of all the articles in your knowledge base.
The twist here is that the body
attribute of an article only contains the HTML of the body, as its name suggests. The article's title isn't included. The title is specified by another attribute named title
. You'll add the title to the article's HTML before writing the file.
-
Replace the following test line:
print(article['id'])
with the following lines:
if article['body'] is None: continue title = '<h1>' + article['title'] + '</h1>' filename = '{id}.html'.format(id=article['id']) with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f: f.write(title + '\n' + article['body']) print('{id} copied!'.format(id=article['id']))
Make sure to indent them at the same level as the print statement. The lines perform the following tasks:
- Skips any blank articles
- Creates an H1 tag with the article title
- Creates a file name based on the article ID to guarantee unique names
- Creates a file in the folder the script created earlier using the backup_path variable
- Combines the title, a line break, and the article body in one string
- Writes the string to the file
- Prints a message to the console so you can track the progress of the backup operation.
Your modified code should look as follows:
for article in data['articles']:
if article['body'] is None:
continue
title = '<h1>' + article['title'] + '</h1>'
filename = '{id}.html'.format(id=article['id'])
with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:
f.write(title + '\n' + article['body'])
print('{id} copied!'.format(id=article['id']))
If the article body is blank, the continue
statement on the third line skips the rest of the steps in the for
loop and moves to the next article in the list. The logic prevents the script from printing any empty drafts in your Help Center that might be acting as placeholders for future content. It also prevents the script from breaking when you try to concatenate a string type with a Python 'NoneType' in the snippet's next-to-last line (title + '\n' + article['body']
).
Run the script again from the command line:
$ python3 make_backup.py
The script writes all the articles in your knowledge to your language folder. Open a few files in a text editor to check the HTML.
Create a backup log
In this section, you create a backup log for easier reference later. The log will consist of a csv file with File, Title, and Author ID columns and a row for each article that's backed up.
-
Import the native csv library at the top of the script:
import csv
-
Create the following log variable (in bold) just before the first endpoint variable declaration:
log = []
endpoint = zendesk + '/api/v2/help_center/ ... ...The variable declares an empty list. After writing each article to file, the script will update the list with information about the article.
-
Add the following
log.append()
statement (in bold) immediately following and at the same indent level as the print statement:print('{id} copied!'.format(id=article['id']))
log.append((filename, article['title'], article['author_id']))After writing an article, the script appends a data item to the log list. The double parentheses are intended. You're appending a Python tuple, a kind of list that uses parentheses. The csv library uses tuples to add rows to a spreadsheet. Each row consists of a filename, title, and author id.
-
Add the following lines at the bottom of the script. The first line should be flush to the margin (no indent and no wrap):
with open(os.path.join(backup_path, '_log.csv'), mode='wt', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow( ('File', 'Title', 'Author ID') ) for article in log: writer.writerow(article)
After writing all the articles, the script creates a file called _log.csv. The underscore ensures the file appears first in any file browser. The script adds a header row and then a row for each article in the log list.
Code complete
Your completed script should look like as follows. A copy of the script is also attached to this article.
Use the command line to navigate to your backups folder and run the script:
$ python3 make_backup.py
The script makes a backup of your knowledge base in a language folder. It also creates a log file that you can use in a spreadsheet application.
Restoring articles
You can restore any backed up article with a second script that reads the content of each file, parses it into an HTML tree to extract the title and body for Help Center, and uses the API to update the article in Help Center.
The script in this section updates existing articles; it doesn't create new ones. To create, it would need to be modified to use a different endpoint, as well as to specify a section and author for the article.
You'll need version 2.4.2 or greater of the requests library. To check your version, run $ pip show requests
at the command line. To upgrade, run $ pip install requests --upgrade
.
If you don't already have Beautiful Soup, you'll need to install it. Beautiful Soup is a Python library for parsing, navigating, searching, and modifying HTML trees. To install Beautiful Soup:
-
At the command line, enter:
$ pip install beautifulsoup4
The command downloads and installs the latest version of Beautiful Soup.
-
Install lxml, an HTML parser that works with Beautiful Soup:
$ pip install lxml
Beautiful Soup works with a number of parsers. The lxml parser is one of the fastest.
To restore selected articles:
-
Copy the following script in a new text file, name it restore_articles.py, and save it in your backups folder with your make_backup.py file.
-
Replace the placeholder values in the Settings section with your own:
-
credentials - Your Zendesk Support sign-in email and password. A security best practice is to enter these only before running the script, and then deleting them after. Example:
credentials = 'jtiller@example.com', 'pasSw0rd0325'
-
zendesk - Your Zendesk Support instance. Make sure to include 'https:\'. Example:
zendesk = 'https://omniwear.zendesk.com'
-
backup_folder - A folder name created by the backup script. Example:
backup_folder = '2017-01-04'
-
language - A locale corresponding to a subfolder in your backup folder. Example:
language = 'en-us'
-
restore_list - An array of article ids. Example:
restore_list = [200459576, 201995096]
.
-
credentials - Your Zendesk Support sign-in email and password. A security best practice is to enter these only before running the script, and then deleting them after. Example:
-
Use the command line to navigate to your backups folder and run the script:
$ python3 restore_articles.py
98 Comments
Merci, Charles. This is fantastic :)
Thanks, i'll give this one a go. I've been using a Ruby script for the last year or so, but this one looks like it is better.
Cheers
Thanks, this is really helpful.
Just in case, did anyone write a restore script ? This would be really nice to translate the knowledge base.
I'd been thinking about how to "back up" our production knowledgebase to our sandbox.
Once the backup is complete as per this method, could we restore to our sandbox?
Also, +1 to @Roland's question about a restore script.
This is awesome. Thank you so much
From what I can tell (on this page as well as on the API definition page for Knowledge Base), you cannot pull the "last udpated by" value of the KB. Does anybody know a way to do this?
@Adam Goolie Gould
Hey Adam!
"Technically" you could apply your backup from your production knowledge base to your Sandbox. However, the benefit of doing so may not necessarily be worth the added steps. This is due to the Sandbox environment being completely separate from your production instance of Zendesk. What this means is that, you can more easily restore your backup to your production environment (if/when needed) from presumably the same files that you saved on the machine that you performed the initial backup.
So, yes you could do this, I just caution against doing so in effort of implementing some sort of "synced" redundancy as that is not the case with the Sandbox environment.
I hope this information helps!
Cheers,
Fred Thomas | Customer Advocate
@Roland, you'll find scripts and instructions on restoring html files back from localization here:
https://help.zendesk.com/hc/en-us/articles/229489108#post
It's part of a larger article on using the API to automate the first loc handoff.
This is awesome, but this is CRAZY that there is not a way to backup or restore content inside the app. Wrote about that here - https://support.zendesk.com/hc/en-us/community/posts/209398128-Need-ability-to-recover-or-back-up-Help-Center-content?preview_as_role=manager
Is there any way, via perhaps the API, to also get the images downloaded as well?
Hi Russur,
There's no image API, but once you've downloaded the articles on your system, a number of Python libraries and techniques can let you read the image URLs in the files and make requests to download them. I like BeautifulSoup for parsing HTML, and Requests to make HTTP requests. You can do a Google search for other options.
As for me, I'd write a script that opened each file and used BeautifulSoup to get the image urls:
Then I'd grab the src attribute in each img tag and use it to make a request for the image file from the server using the Requests library:
Note: I'm checking to make sure the first 4 characters start with http so it's a valid request url.
At this point, this image is in memory on my system. Next, I'd grab the filename from the src attribute and write it to file:
One thing to be careful about: Most web servers only allow browsers to download images. So I'd set a header so my request looked like it's coming from a browser:
Hope this helps.
Also, is there any way to get the Section names, that contain the html files?
Thanks
That did great! Here is my hacked code that i added. (This goes about the line:
endpoint = data['next_page']
# begin included code to search and pull out images
tree=BeautifulSoup(article['body'], "html.parser")
images = tree.find_all('img')
for image in images:
src = image['src']
if src[:4] != 'http': continue
response = session.get(src, stream=True)
file_name = src.split('/')[-1]
image_dir = src.split('/')[-2]
file_name = str(article['id']) + '_' + image_dir + '_' + file_name
with open(os.path.join(backup_path, file_name), mode='wb') as f:
for chunk in response.iter_content():
f.write(chunk)
# End of included code
Also added the
from bs4 import BeautifulSoup
towards the top also.
This will work to get the graphic as well as the directory name that Zendesk created for the image. I will probably update this to get the Section ID Name, and maybe recreate the directory structure. Thanks!
Is there a way to get the Title of the Section, that the article is contained in? I was trying to get the Section API call to work, but having no luck?
Hi Russur,
You could sideload the sections with your articles in the API call (the bit in bold in the following example):
endpoint = zendesk + '/api/v2/help_center/{locale}/articles.json?include=sections'.format(locale=language.lower())
Then you can associate the section_id in each article record with a section record, which will contain the section title.
For more info on how sideloading works, see https://help.zendesk.com/hc/en-us/articles/229489048-Sideloading-related-records
For a tutorial that covers sideloading along the way and a technique to associate records, see Getting large data sets, especially the section on sideloading.
Every time I try to run the make_backup.py python3 script, I keep getting the following:
Failed to retrieve articles with error 401
Hi David, a 401 points to a problem with the authentication credentials. Can you double-check the Zendesk email and password you entered on line 7 under "Code complete" above?
The other thing to check is to see if your Zendesk is configured to allow passwords for API requests. In the admin interface, click the Admin button (gear icon) in the lower-left, then Channels > API. At the bottom of the page there should be a checkbox to enable password access.
The enable password access checkbox is checked. My credentials are fine as I can get into Zendesk and manage articles. API is not liking something. We use gmail accounts for authentication into zendesk. Could this be the problem?
That could be the problem. Because you're authenticated with a Google password, your Zendesk profile might not have a Zendesk password. If you're an admin, you should be able to add one yourself. See Resetting user passwords. Use the Set option instead of the Reset one.
That did the trick. Everything seems to be working fine now. Thanks.
OK, I got all articles exported fine. The articles contains all of the images and video attachments on them. I am now playing with the idea having to restore a deleted article using curl. I am using the following syntax:
curl https://servienthelp.zendesk.com/api/v2/help_center/sections/{id}/articles.json \
-d '{"article": {"title": "How to take pictures in low light", "body": "Use a tripod", "locale": "en-us" }}' \
-v -u {email_address}:{password} -X POST -H "Content-Type: application/json"
I can see {id} is the section id to which the article belong to.
title is the actual title of the article being imported.
body - do not know what that is or how to obtain that information.
locale - language source
email_address - needed for authentication
password - needed for authentication.
I successfully created the article, but do not see the body of the article.
Hi David, I'm not sure I understand the question. The `body` attribute specifies the content of the article. It's probably going to be one long JSON string, so with curl it's probably easier to import it in the curl statement. See Move JSON data to a file in our curl article.
When I run the make_backup.py program, I get a bunch of html files which are the articles. I open one of the html filea and I see the content, including any attachments. How do I import these html file(s) when needed to do so? Is there a way via curl command that will allow it. I have all the information I need like title, section id the article belongs to, the actual html file to be imported, the article is, and its position. So there is a scripted way to get the articles out, but is there a scripted way to get them back in?
Ah, I see. You'll need a script to parse the content of each HMTL file, convert it to JSON, and post the data to HC. cURL is probably not the most efficient tool for this if you have more than a handful of articles.
In Python, you can use the BeautifulSoup library to parse the content. One technique is described in Add the article translations, which is part of a larger tutorial on publishing localized articles on Help Center.
I would be using the curl command on a per document basis, unless you know of an easier way of doing this. After I run the make_backup.py python script, I have a log file containing a lot of information like title, section id the article belongs to, the html filename of the article, and its position. I also have all of the articles in html format. I open these html files and see the article, its images, and video attachments. Are you saying the article (html file) has to be converted to json in order to be uploaded to HC? The script above downloaed the articles perfectly. I there a similar script, or curl command to allow me to upload a document that has been deleted. I would have to locate the html file that represents the deleted article based on the title. Once the html file has been identified as the document to be uploaded, I would think curl or python script would do the trick.
Hi David,
The backup script doesn't actually download the HTML files. It downloads JSON data, then decodes and writes the data to files (lines 27 to 33 in the "Code complete" sample above).
The API uses JSON is its data exchange format. The process of uploading articles to Help Center is the reverse of downloading the articles. Each article has to be converted to JSON and then sent to the API in that format. After the API receives the data, the JSON is decoded and published.
In the cURL example you gave above, the "body" element is JSON (as is the entire "-d" line):
For more on encoding and decoding JSON, see the article Working with JSON.
Hope this helps.
The code for backing up all of the Zendesk articles is great. Is Zendesk working on a python script to import the html files, convert them to json, and then publish them? I have found that you can open the html file via text editor and copy all of the tags that make up the body (content). Create new article in Zendesk, choose section where the article is to be placed, give it the same title as before, click the <|> source code button and past what you copied from the html file earlier, the article shows up fine with images and any embedded videos. Easy. Just wanted to see if Zendesk was going to create a python script to import the backed up files generated from the Zendesk make_backup.py script. Thanks.
Thanks for this! I was able to backup the majority of articles.
Apart from the articles which are publicly available, we also have some sections which are only available for logged in users. These articles don't seem to be included in the backup. Is there any way to do this?
Hi Jasper,
The articles returned by the API depend on the user role of the person making the API request. The API returns only the articles that the requesting agent, end user, or anonymous user can normally view in Help Center when using the web UI. To back up a Help Center with restricted content, you should ideally have the user role of Help Center manager to get all the content.
Administrators are Help Center managers by default. You can add Help Center managers by giving agents Help Center manager privileges. See Understanding Help Center roles and setting permissions.
Let me know if that's not the problem.
Charles
Hi Charles,
I already have the role of Administrator, so that's probably not the problem.
Is there anything else I can do to solve this?
Please sign in to leave a comment.