Backing up your knowledge base with the Zendesk API

Have more questions? Submit a request

88 Comments

  • ccaux
    Comment actions Permalink

    Hi, I am trying to do exactly the same thing as Howie (exactly same use case). 

    I have started a reference doc to map IDs for Category and Sections and have created those in my Sandbox. 

    Is there a script available to recreate all the articles? I understand that the newly create articles will have new IDs. 

    Alternatively is there a recommended process for major overalls of the Help Centre (new IA, new design etc..)? 

     

    1
  • Bryan Flynn
    Comment actions Permalink

    Hey there @ccaux. As this article's code is basically open-source, perhaps someone else in the community has added this feature and will share. Other than that, the article was really for demo purposes and the code is not actively updated.

    As for customizing Help Center (HC), the support.zendesk.com/hc community has a lot more content and activity around HC cutomizations and ideas. There are also knowledge base articles there on the subject, too. The develop.zendesk.com site is more focused on APIs, Apps framework, and Embeddables (widgets & Mobile SDKs). I know this is a delayed response but hope it helps all the same.

    0
  • Travis Abdelhamed
    Comment actions Permalink

    When creating the log file using the provided code, your file will have a blank row between every article when opening in windows. To eliminate the blank row, modify the following line accordingly (adding the newline parameter):

     

    with open(os.path.join(backup_path, '_log.csv'), mode='wt', newline='', encoding='utf-8') as f:

     

     

    0
  • Nikolaus Thumma
    Comment actions Permalink

    Hey Charles - I'm running into an auth issue when trying to connect through the python script.  Does this not work if SSO is enabled?

     

    Thank You

    0
  • Charles Nadeau
    Comment actions Permalink

    Hi Jasper,

    There are 3 authentication options for using the API:

    • basic auth credentials using a Zendesk username & password
    • a Zendesk API token
    • an OAuth token created using a Zendesk API client

    For details, see https://developer.zendesk.com/rest_api/docs/support/introduction#security-and-authentication.

     

     

    0
  • Dan Derks
    Comment actions Permalink

    Hi Charles!

    Wonderful script, thank you -- we're all backed up.

    quick note: my machine has Python 2 and Python 3, so the pip commands defaulted to my Python 2 installation. perhaps it'd help others to explicitly use pip3?:

    $ pip3 install beautifulsoup4
    $ pip3 install lxml

    Re: Creating an Article with the Restore script, could you please walk a novice through which lines would need changed and what additional info is necessary?

    Thank you!

    0
  • Rachel M
    Comment actions Permalink

    Hi!

    I'm stuck on the credentials bit. We use an SSO, so I went the API token route. However, I can't seem to reference it a way that makes everyone happy. 

    I've tried:

    credentials = (base-64-encoded username/token:API_TOKEN_HERE)
    credentials = Authorization: "Basic (base-64-encoded username/token:API_TOKEN_HERE)"
    credentials = Authorization: 'Basic (base-64-encoded username/token:API_TOKEN_HERE)'
    session.auth = Authorization: "Basic (base-64-encoded username/token:API_TOKEN_HERE)"

    etc, and I keep receiving either a "SyntaxError: invalid syntax" or "TypeError: 'str' object is not callable" error. I think I'm missing something rather obvious. Any help would be appreciated!

     

    Thanks!

    Rachel

    0
  • Matt McLean
    Comment actions Permalink

    Rachel,

    Try this:

    import requests

    credentials = 'your_zendesk_email/token', 'your_token'
    zendesk = 'https://your_subdomain.zendesk.com'
    language = 'some_locale'

     

    Hope this helps.

    0
  • Rachel M
    Comment actions Permalink

    Hi Matt! 

    Ah – that worked! Thank you!

    So it ran. And it only output a .csv – which only contained the column headers, and no actual info. But now I'm closer. 

    Any thoughts? Thanks again!

     

    Rachel

    date = datetime.date.today()
    backup_path = os.path.join(str(date), language)
    if not os.path.exists(backup_path):
    os.makedirs(backup_path)

    log = []

    endpoint = zendesk + '/api/v2/help_center/us-en/articles.json'.format(locale=language.lower())
    while endpoint:
    response = session.get(endpoint)
    if response.status_code != 200:
    print('Failed to retrieve articles with error {}'.format(response.status_code))
    exit()
    data = response.json()

    for article in data['articles']:
    if article['body'] is None:
    continue
    title = '<h1>' + article['title'] + '</h1>'
    filename = '{id}.html'.format(id=article['id'])
    with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:
    f.write(title + '\n' + article['body'])
    print('{id} copied!'.format(id=article['id']))

    log.append((filename, article['title'], article['author_id']))

    endpoint = data['next_page']

    with open(os.path.join(backup_path, '_log.csv'), mode='wt', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(('File', 'Title', 'Author ID'))
    for article in log:
    writer.writerow(article)
    0
  • Rachel M
    Comment actions Permalink

    eep! it worked!

    figured out what was wrong. I had "us-en" as opposed to "en-us". so silly – but i'm so happy it's working!

     

    Rachel

    0
  • Bryan Flynn
    Comment actions Permalink

    Thanks Matt for stepping in and thanks Rachel for sharing what the fix was -- quite easy for these small things to sneak in and not obvious!

    0
  • Nicolas
    Comment actions Permalink

    Hi Charles,

     

    Thanks for the great script.

    I am facing an issue, though: the script doesn't retrieve all articles.

    In my case, I manage to get 59 articles back while the Guide Admin Portal mentions 111 published, 4 drafts and 34 archived.

    I have checked that:

    - the superuser credentials I use allow me full access on frontend, to all user-segmented sections;

    - all articles are written in the same language.

    Only articles visible visible to everyone have been retrieved, not the one belonging to the user-segmented sections.

    Any idea about how to solve this?

    Cheers,

     

    Nicolas

    0
  • Russell Dunn
    Comment actions Permalink

    Thank you for the script!

    Echoing a few earlier comments however, I'm also struggling to retrieve all articles. I get 155 articles exported, but have 173 published articles. I am an administrator.

    We do have some articles that are only visible to Agents/managers, as per Nicolas' comment above, would that be the problem? How then would I also retrieve these articles?

    0
  • Devan - Community Manager
    Comment actions Permalink

    Hello Russell,

    So this usually has something to do with the HC view permissions of the person making the API request (as identified by the credentials used to authenticate the request). If the person making the API request can't view certain articles in the HC (as specified by the view permissions in Guide), then the API won't return those articles to the person.

    Hope this helps to clarify things a bit better and let us know if there is anything else we can assist with please let us know. 

    0
  • Russell Dunn
    Comment actions Permalink

    Well yes thats what I thought, but when I said administrator the correct terminology in Guide is Manager, which is what my account is. I'm passing in my credentials as well, not running anonymously.

    0
  • Devan - Community Manager
    Comment actions Permalink

    Hello Russel,

    The only other thing that might apply in the instance is the API credentials the being using. Are you using your own email/password credentials or an API token with your email? If using an API token, the request would be restricted to the permissions of the admin who **created** the API token initially, not the person making the request regardless of whether the requester is a Guide Manager.

    If that not the issue, then the next step would be to examine the 18 articles that are not returned to look for a common denominator.

    0
  • Alan Oehler
    Comment actions Permalink

    I'm having a strange problem running my backup script. It worked fine about 18 months ago when I last used it. Now I am getting this error:

    Traceback (most recent call last):

      File "make_backup2.py", line 37, in <module>

        with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:

    TypeError: 'encoding' is an invalid keyword argument for this function

    Anyone have an idea?

    0
  • Robert Renda
    Comment actions Permalink

    Alan --

    I'm running the same code/same syntax:

                with open(os.path.join(backup_path, filename), mode='w', encoding='utf-8') as f:
                    f.write(title + '\n' + category['name'] + category['description'])
     
    It has to be something else than that line you posted. TypeErrors are typically caused by combining objects of the wrong type, or calling a function with the wrong type of object.
    0
  • Thomas Crowley
    Comment actions Permalink

    Hi,

    Not sure whether this was covered by any of the earlier answers - if it was you can point me back to the relevant one - but I'm trying to export the site as a set of self-contained HTML files, with the images embedded in them.

    I've got the script working to download the articles, and using the BeautifulSoup script I've also managed to download the images. Is there a way to download the articles and their images as single files?

    Or, if not, is there a way of merging them?

    Thanks,

    Tom

    0
  • Bryan Flynn
    Comment actions Permalink

    Hi Thomas,

    Is there a way to download the articles and their images as single files?

    While the product's REST APIs often provide access to various resources, there's sometimes these cases where separate calls are needed to collect everything in one logical unit. So, unfortunately not.

     

    0
  • Thomas Crowley
    Comment actions Permalink

    If I wanted to scrape attachments other than images (like PDF, DOCX, etc.) how would I do that? Can I modify the image part of the script or would I need something quite different?

    0
  • Bryan Flynn
    Comment actions Permalink

    Hi Thomas. If you go through a ticket's comments (using GET /api/v2/tickets/{ticket_id}/comments.json), you'll see the content_url property on a comment if there's an attachment. You can then use the URL value to retrieve the file.

    You can check out the general REST API attachment reference at: https://developer.zendesk.com/rest_api/docs/support/attachments

    There's no filtering of attachment types using these APIs, however, so that's something you would have to do.

    Does this help?

    0
  • Thomas Crowley
    Comment actions Permalink

    It was more for pulling out all the attachments we've got in our Help Center. We've got a lot of pages that are just an intro with an attachment and I wanted to do an audit of all the attachments.

    0
  • Gonchik Tsymzhitov
    Comment actions Permalink

    Thanks for that article, it helps to me reduce much more time :)

    0
  • Bethany S.
    Comment actions Permalink

    Hi Charles, any guidance for how to tweak your script to place all the articles on *one* html or xml file? Thanks!

    0
  • Charles Nadeau
    Comment actions Permalink

    Hi Bethany,

    To make sure you can more easily restore your articles later, your best bet would be to save the articles to a JSON file. So instead of writing each article to a file (lines 30 to 32 in the Gist in the article above), you'd append it to a list variable, then write the data in the variable to a single JSON file.

    You'd start by declaring articles = [] before the loop starts, then replacing lines 30 to 32 with:

    articles.append(article)

    After the loop, you could print the articles variable to a JSON file:

    file = Path('article_backup.json')
    with file.open(mode='w', encoding='utf-8') as f:
    return json.dump(articles, f, sort_keys=True, indent=2)

    You'll need to import the json and Path libraries at the top of the file:

    import json
    from pathlib import Path
    0
  • Bethany S.
    Comment actions Permalink

    Hi Charles, I am actually specifically looking to get the articles as one XML file not JSON. Can you help with that?

    0
  • Charles Nadeau
    Comment actions Permalink

    The principle would be the same. Instead of serializing the "articles" variable to JSON, you'd just use a separate xml library to serialize it to XML, then save the XML as a file. I haven't tried it but you can do a search on "serializing Python data to xml".

    0

Please sign in to leave a comment.

Powered by Zendesk