Backing up your knowledge base with the Zendesk API

Have more questions? Submit a request

71 Comments

  • ccaux

    Hi, I am trying to do exactly the same thing as Howie (exactly same use case). 

    I have started a reference doc to map IDs for Category and Sections and have created those in my Sandbox. 

    Is there a script available to recreate all the articles? I understand that the newly create articles will have new IDs. 

    Alternatively is there a recommended process for major overalls of the Help Centre (new IA, new design etc..)? 

     

    1
  • Charles Nadeau

    Thanks @Tami.


    Since you're creating articles, you'll want to use the Create Article endpoint:


    https://developer.zendesk.com/rest_api/docs/help_center/articles#create-article


    The translations endpoint in the restore script is generally used to update the content of an existing article in a specific language, or add a translation of an article.


    The correct url to the localization article is https://help.zendesk.com/hc/en-us/articles/229489108#post. I updated it in the comments above.


     


     

    1
  • Howie Paul

    Hi, 

    Long time listener, first time Python utiliser!

    Ive found this guide very handy and easy to follow, so thanks!

    Backup works lovely, but my purpose for this is so I can restore it to our sandbox.

    Youve provided a restore script, but only in the scenario where the files exist already. Can you help a guy out if they in fact dont exist?

     

    Thanks

    Howie

    0
  • Charles Nadeau

    Hi Russ,


    It could be fairly straightforward. Take a look at this Gist for one approach to the problem:


    https://gist.github.com/chucknado/8473cfbad848425e075ee4c0bfd66af4


    First, you get all the articles in the section.


    Second, you set up a for loop to check each one. If comments are already disabled, skip it. If not, fire off another API request to update the `comments_disabled` property.


     

    0
  • Joseph May

    Hi Howie-

    The resource would have to be re-created in sandbox, as IDs would not carry over from your primary acct to sandbox. That said, I generally advise against using production data in a sandboxed environment. May I please ask what your goals are here?

    0
  • Charles Nadeau

    You could use the Article Attachments API at https://developer.zendesk.com/rest_api/docs/help_center/article_attachments#create-article-attachment. 


    If you have a large knowledge base with lots of images, you might consider hosting the images on a separate server. You could then use a simple FTP client like Cyberduck to upload your images to the server. That's what the Docs team at Zendesk does to keep our sanity. We host thousands of images in several languages on an Amazon S3 file server from Amazon Web Services (I'm sure other options are available). Our articles link directly to them there.


    Charles


     

    0
  • Fred Thomas

    @Adam Goolie Gould

    Hey Adam!

    "Technically" you could apply your backup from your production knowledge base to your Sandbox. However, the benefit of doing so may not necessarily be worth the added steps. This is due to the Sandbox environment being completely separate from your production instance of Zendesk. What this means is that, you can more easily restore your backup to your production environment (if/when needed) from presumably the same files that you saved on the machine that you performed the initial backup.

    So, yes you could do this, I just caution against doing so in effort of implementing some sort of "synced" redundancy as that is not the case with the Sandbox environment.

    I hope this information helps!

    Cheers,

    Fred Thomas | Customer Advocate

    0
  • David Conway

    The enable password access checkbox is checked.  My credentials are fine as I can get into Zendesk and manage articles.  API is not liking something.  We use gmail accounts for authentication into zendesk.  Could this be the problem?

    0
  • Howie Paul

    Hi joseph, thanks for your response.

     

    My goals for the use of the sandbox then:

    I have created a new theme for the Help Center which I do not want to make public, but I do want to sent it round to gauge peoples opinion. The best way I found to do this is by using the samdbox.

    Given the high level of customisation I employ I would need my documents imported too, in order to give the full feel of the new theme,

    Hope that makes sense.

    Howie

    0
  • Charles Nadeau

    @Roland, @Adam, @David - 


    I added a section to the tutorial on restoring backed up articles. See Restoring articles above. Because it will necessarily overwrite Help Center content, please use it at your own risk.

    0
  • Russur

    Hi Charles,


     


    Your python script has helped me tremendously!! Thanks so much!


     


    Quick question, do you have some Python code that would allow me to go through a section, and edit Article attributes?


     


    It seems when zen desk updated the Help Center, it turned all of our articles to be able to be commented. I wanted the python script to go through all articles in a section, check if  disable comments is True, if not, then set it True.


    Would you have some skeleton python like that?


     


    Thanks,


     


     


    Russ

    0
  • Adam Goolie Gould

    I'd been thinking about how to "back up" our production knowledgebase to our sandbox.

    Once the backup is complete as per this method, could we restore to our sandbox?

    Also, +1 to @Roland's question about a restore script.

    0
  • David Conway

    Every time I try to run the make_backup.py python3 script, I keep getting the following:


    Failed to retrieve articles with error 401

    0
  • Charles Nadeau

    Hi Mike,

    Just FYI, I redacted the screenshot link in your comment. It showed sign-in credentials.

    Thanks.

    0
  • Bryan Flynn

    Thanks Matt for stepping in and thanks Rachel for sharing what the fix was -- quite easy for these small things to sneak in and not obvious!

    0
  • Joseph May

    Hey Howie-

    Thanks for the explanation. The long and short of it is that all these resources will need to be recreated in the sandbox, generating their own unique IDs for Category, Section, Article, attachment, etc. Does this make sense? These values are not carried over from production.

    0
  • Charles Nadeau

    Thanks, Jasper. I ran your version of the script with my Help Center and it successfully backed up all restricted content.


    In your case, only the articles from a specific section aren't being picked up, correct? That suggests there's something different with the section.


    If you can see the restricted articles in Help Center in a browser, and you use the same credentials to run the script that you use to sign in to Help Center, then the problem is not the access restrictions. The script "sees" the same articles you see in Help Center.


    Can you check the language setting of the category that contains the restricted section? (Browse to the category and click Edit Category on the toolbar.) The language of both the category and the section should be Dutch (nl).


     

    0
  • Jasper

    Hi Charles,


    In the CSV file, I can see that 106 articles were backed up. So the limit should not be reached yet. 


    Therefore I'll post my code here. Thanks for your help!


    import os
    import datetime
    import csv

    import requests


    credentials = 'INSERT EMAIL HERE', 'INSERT PASSWORD HERE'
    session = requests.Session()
    session.auth = credentials

    zendesk = 'https://caresharingnetherlands.zendesk.com'
    language = 'nl'

    date = datetime.date.today()
    backup_path = os.path.join(str(date), language)
    if not os.path.exists(backup_path):
    os.makedirs(backup_path)

    log = []

    endpoint = zendesk + '/api/v2/help_center/{locale}/articles.json'.format(locale=language.lower())
    while endpoint:
    response = session.get(endpoint)
    if response.status_code != 200:
    print('Failed to retrieve articles with error {}'.format(response.status_code))
    exit()
    data = response.json()

    for article in data['articles']:
    title = '<h1>' + article['title'] + '</h1>'
    filename = '{id}.html'.format(id=article['id'])
    with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:
    f.write(title + '\n' + article['body'])
    print('{title} copied!'.format(title=article['title']))

    log.append((filename, article['title'], article['author_id']))

    endpoint = data['next_page']

    with open(os.path.join(backup_path, '_log.csv'), mode='wt', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(('File', 'Title', 'Author ID'))
    for article in log:
    writer.writerow(article)

    0
  • Roland Pellegrin

    Thanks, this is really helpful.
    Just in case, did anyone write a restore script ? This would be really nice to translate the knowledge base.

    0
  • Nikolaus Thumma

    Hey Charles - I'm running into an auth issue when trying to connect through the python script.  Does this not work if SSO is enabled?

     

    Thank You

    0
  • Russur

    Is there a way to get the Title of the Section, that the article is contained in? I was trying to get the Section API call to work, but having no luck?


     

    0
  • Jasper

    Hi Charles,


    The articles all have the local 'nl'.


    Nevertheless, I tried your solution to see if it would change anything. It didn't.


    This is an example of a URL of an article which is not included: /hc/nl/articles/208967385--Intern-Module-toevoegen-aan-praktijk 


     


     

    0
  • Charles Nadeau

    Hi Jasper,


    I'm not able to reproduce this on restricted content in my HC. The endpoint gets all articles regardless of access restriction or draft status. Maybe a pagination issue? You mention that the backup works partially for you. Does the number of files created total 30? That's the number of records per page returned by the API. Maybe the script is not fetching the next pages. Here are the two lines in the script that do that:


    while endpoint:
    ...
    endpoint = data['next_page']

    Another possibility is that you have a lot of articles and you're hitting the API rate limit. On the Essential plan, you can make up to 10 requests a minute, enough to back up 300 articles (10 page requests x 30 articles per page). On the Team plan, you can make 200 requests per minute, or enough for 6,000 articles.


    If these don't sound like the culprits, can post your code here (or on a file sharing site and share the link here) -- after stripping out your credentials? Thanks.

    0
  • Brent Schaus

    Merci, Charles. This is fantastic :)

    0
  • Russur

    Also, is there any way to get the Section names, that contain the html files?


     


    Thanks

    0
  • Dan Derks

    Hi Charles!

    Wonderful script, thank you -- we're all backed up.

    quick note: my machine has Python 2 and Python 3, so the pip commands defaulted to my Python 2 installation. perhaps it'd help others to explicitly use pip3?:

    $ pip3 install beautifulsoup4
    $ pip3 install lxml

    Re: Creating an Article with the Restore script, could you please walk a novice through which lines would need changed and what additional info is necessary?

    Thank you!

    0
  • Jasper

    Hi Charles,


    I already have the role of Administrator, so that's probably not the problem.


    Is there anything else I can do to solve this? 

    0
  • Charles Nadeau

    Hi Jasper,


    Is it possible the restricted articles are set to another locale like en-uk instead of en-us, or some other locale? If so, you could modify the script to remove the locale filter, from:


    endpoint = zendesk + '/api/v2/help_center/{locale}/articles.json'.format(locale=language.lower())

    to:


    endpoint = zendesk + '/api/v2/help_center/articles.json'

     

    0
  • Charles Nadeau

    Hi Russur,


    There's no image API, but once you've downloaded the articles on your system, a number of Python libraries and techniques can let you read the image URLs in the files and make requests to download them. I like BeautifulSoup for parsing HTML, and Requests to make HTTP requests. You can do a Google search for other options.


    As for me, I'd write a script that opened each file and used BeautifulSoup to get the image urls:


    tree = BeautifulSoup('html_source')  
    
    images = tree.find_all('img')

    Then I'd grab the src attribute in each img tag and use it to make a request for the image file from the server using the Requests library:


    for image in images:  
    
       src = image['src']
    if src[:4] != 'http': continue
        response = session.get(src, stream=True)

    Note: I'm checking to make sure the first 4 characters start with http so it's a valid request url.


    At this point, this image is in memory on my system. Next, I'd grab the filename from the src attribute and write it to file:


    file_name = src.split('/')[-1]  
    
    with open(os.path.join(file_path, file_name), mode='wb') as f:
        for chunk in response.iter_content():
        f.write(chunk)

    One thing to be careful about: Most web servers only allow browsers to download images. So I'd set a header so my request looked like it's coming from a browser:


    session = requests.Session()  
    
    session.auth = ('your_email', 'your_pwd')
    session.headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0'}

    Hope this helps.

    0
  • Jasper

    Thanks for this! I was able to backup the majority of articles.


    Apart from the articles which are publicly available, we also have some sections which are only available for logged in users. These articles don't seem to be included in the backup. Is there any way to do this? 

    0

Please sign in to leave a comment.

Powered by Zendesk